usainboltz.generator¶
Boltzmann generator for Context-free grammars.
This module provides functions for generating combinatorial objects (i.e.
objects described by a combinatorial specification, see
usainboltz.grammar
) according to the Boltzmann
distribution.
Given an unlabelled combinatorial class A, the Boltzmann distribution of parameter \(x\) is such that an object of size n is drawn with the probability \(\frac{x^n}{A(x)}\) where \(A(x)\) denotes the ordinary generating function of A. For labelled classes, this probability is set to \(\frac{x^n}{n!A(x)}\) where \(A(x)\) denotes the exponential generating function of A. See [DuFlLoSc04] for details.
By default, the objects produced by the generator are nested tuples of strings
(the atoms). For instance ('z', ('z', 'epsilon', 'epsilon'), ('z', 'epsilon', 'epsilon'))
is a
balanced binary tree with 3 internal nodes (z) and 4 leaves (e). To alter this
behaviour and generate other types of objects, you can specify a builder
function for each type of symbol in the grammar. The behaviour of the builders
is that they are applied in bottom up order to the structure “on the fly” during
generation. For instance, in order to generate Dyck words using the grammar for
binary trees, one can use a builder that return ""
for each leaf "(" +
left child + ")" + right child
for each node. The builders will receive a
tuple for each product, a string for each atom and builders for unions should be
computed using the union_builder()
helper. See the example below for the
case of Dyck words.
Examples
>>> from usainboltz import *
>>> from usainboltz.generator import rng_seed
>>> rng_seed(0xDEADBEEFCA7B0172)
Both binary trees and Dyck words can be generated from the following grammar
>>> epsilon, z, B = Epsilon(), Atom(), RuleName("B")
>>> grammar = Grammar({B: epsilon + z * B * B})
>>> generator = Generator(grammar, B, singular=True)
In order to build Dyck words, one can interpret epsilon as the empty word
and nodes as the derivation: D -> ( D ) D
>>> def leaf_builder(_):
... return ""
>>> def node_builder(tuple):
... _, left, right = tuple
... return "(" + left + ")" + right
>>> generator.set_builder(B, union_builder(leaf_builder, node_builder))
>>> res = generator.sample((10, 20))
>>> dyck_word = res.obj
>>> dyck_word
'(())(((((()(((()())))))))())()'
>>> len(dyck_word) in range(20, 41)
True
If on the contrary we want to see trees as S-expressions, we can interpret
epsilon as the leaf "leaf"
and nodes as (node left_child
right_child)
>>> def leaf_builder(_):
... return "leaf"
>>> def node_builder(tuple):
... _, left, right = tuple
... return "(node {} {})".format(left, right)
>>> generator.set_builder(B, union_builder(leaf_builder, node_builder))
>>> res = generator.sample((10, 20))
>>> print(res.obj)
(node
(node
leaf
(node
leaf
(node
(node
(node
(node leaf leaf)
leaf)
(node
(node
leaf
(node
(node
(node leaf leaf)
(node
(node
leaf
(node
(node leaf leaf)
leaf))
leaf))
leaf))
leaf))
(node
(node leaf leaf)
leaf))))
leaf)
Note that the builders mechanism can also be used to compute some statistics on the fly without explicitly building the whole structure. For instance the following example illustrates how to generate the height of a tree following the Boltzmann distribution without building the tree.
>>> def leaf_height(_):
... return 0
>>> def node_height(tuple):
... _, left, right = tuple
... return 1 + max(left, right)
>>> generator.set_builder(B, union_builder(leaf_height, node_height))
>>> res = generator.sample((10, 20))
>>> res.obj
13
>>> res.sizes[Atom()]
16
For labelled grammar, the procedure is the same. In this case, the default builders also output the labelling, they do not only generate the structure.
Example
>>> z = Atom()
>>> e, M = Epsilon(), RuleName()
>>> g = Grammar({M: e + z * M + z * M * M}, labelled=True)
>>> generator = Generator(g, M, singular=True)
>>> res = generator.sample((5,10)) # random
>>> tree = res.obj
>>> tree
(4, (5, (0, (6, (3, (7, (1, 'epsilon', 'epsilon'), 'epsilon'), 'epsilon')), (2, 'epsilon', 'epsilon')), 'epsilon'))
Functions
rng_seed |
|
union_builder |
Factory for generating builders for union rules. |
Classes
Generator |
High level interface for Boltzmann samplers. |
Mapping |
Exceptions
GeneratorConfigError |
-
class
usainboltz.generator.
Generator
¶ High level interface for Boltzmann samplers.
-
__init__
¶ Make a Generator out of a grammar.
Parameters: - grammar (Grammar) – a combinatorial grammar
- rule_name (RuleName) – the rule name of the symbol to generate.
- singular (bool) – if set, do singular sampling (use the singularity of the generating functions as parameter).
- expectations (Dict[Symbol,number]) – this is passed to the oracle to
skew the distribution in order to target some specific values
for the expected number of the specified symbols. See
OracleFromPaganini.tuning()
for more details. - oracle (Oracle) – an oracle for computing the values of some generating functions. If not supplied, a default generic oracle that should work for most use-cases is automatically generated.
Examples:
>>> from usainboltz import *
Some examples using the grammar for binary trees >>> z, B = Atom(), RuleName() >>> grammar = Grammar({B: Epsilon() + z * B * B})
Typical use: >>> generator = Generator(grammar, B, singular=True)
Since their is only one symbol in the grammar, the second argument can be omitted: >>> generator = Generator(grammar, singular=True)
-
get_builder
¶ Retrieve the current builder for a non-terminal symbol.
Parameters: non_terminal (RuleName) – the name of the non-terminal symbol. Returns: - the current builder currently bound to the rule name
- passed as argument.
Return type: function
-
sample
¶ Generate a term of the grammar given the prescribed sizes.
Parameters: sizes_windows (Tuple[int,int]] | Dict[(Atom|Marker),Tuple[int,int]]) – for atoms (resp. markers) for which a size window is specified, the generator will only generate objects whose number of those atoms (resp. markers) are in the specified size windows. For instance a generator.sample({z: (100, 200)})
will only generate objects with a number ofz
belonging to the interval [100, 200].
-
-
exception
usainboltz.generator.
GeneratorConfigError
¶ -
__init__
¶ Initialize self. See help(type(self)) for accurate signature.
-
args
¶
-
with_traceback
()¶ Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
-
-
usainboltz.generator.
rng_seed
()¶
-
usainboltz.generator.
union_builder
()¶ Factory for generating builders for union rules.
The recommended way to write a builder for a union rule is to use this helper: write a auxilliary builder for each component of the union and compose them with
union_builder
.Parameters: builders (List[function]) – builder functions, one for each component of the union. Returns: - a builder function for a disjoint union. The resulting
- function applies one of the functions passed as arguments to its input depending of the component of the union its argument belongs to.
Return type: function Examples
Assume the symbol
D
is defined byUnion(A, B, C)
, then defining a builder for D looks like:>>> def build_A(args): ... # Do something ... pass
>>> def build_B(args): ... # Do something ... pass
>>> def build_C(args): ... # Do something ... pass
>>> build_D = union_builder(build_A, build_B, build_C)
For instance, for binary trees defined by
B = Union(leaf, Product(z, B, B))
, this could be:>>> def build_leaf(_): ... return BinaryTree()
>>> def build_node(args): ... z, left, right = args ... return BinaryTree([left, right])
>>> build_binarytree = union_builder(build_leaf, build_node)