Usain Boltz

Usain Boltz is a Python/Cython library meant to automate the random generation of tree-like structures.

The library was primarily designed to be used with the Sagemath mathematics software systems but Sagemath is no longer a dependency and the library is now perfectly usable both within and without a Sagemath environment.

Principle (non-technical)

The use of Usain Boltz starts with the specification of the structures we want to sample using grammar module. Then, optionally, we can tell generator (Generator) how to build the objects, thanks to the builders mechanism.

For example, imagine that we want to sample tree-like structures

Tree := Leaf | Tree * Tree

These structures may be represented as XML strings:

<tree>
  <tree>
    <leaf/>
    <leaf/>
  </tree>
  <tree>
    <leaf/>
  </tree>
</tree>

but also as JSON objects:

{ 'tree' : [
   { 'tree' : [ 'leaf', 'leaf' ] },
   { 'tree' : [ 'leaf' ]}
   ]
}

However the inherent tree structure is the same. Thus the same generator may generate both objects using different builders.

Principle (technical)

Usain Boltz is based on the principles of the Boltzmann method introduced in the article [DuFlLoSc04], whose goal is to provide fast and uniform random sampler for a large variety of combinatorial structures.

The starting point is to describe the structures we want to sample by a Grammar. Each rule of that grammar defines a combinatorial class: a set where objects have a finite size given by their number of Atom. Then the idea is to sample objects \(\gamma\) from a class \(\mathcal{A}\) with respect to the Boltzmann model of parameter \(x\):

\[\mathbb{P}(\gamma) = \frac{x^{|\gamma|}}{A(x)}\]

where \(|\gamma|\) is the size of \(\gamma\) and \(A\) is the generating series of \(\mathcal{A}\) defined by \(A(z)=\sum_{\gamma \in \mathcal{A}} z^{|\gamma|}\).

Given that model, one can compute that the expected size of a sampled object is given by:

\[\mathbb{E}[|\gamma|] = x \cdot \frac{A'(x)}{A(x)}\]

So a good choice of \(x\) allows to target a specific size for the sampled objects. This is the work of the oracle.

References

[DuFlLoSc04]Philippe Duchon, Philippe Flajolet, Guy Louchard, and Gilles Schaeffer. 2004. Boltzmann Samplers for the Random Generation of Combinatorial Structures. Comb. Probab. Comput. 13, 4-5 (July 2004), 577-625. DOI=http://dx.doi.org/10.1017/S0963548304006315

Supported features

Usainboltz is still a work in progress and not all the combinators from the grammar module are currently fully supported. This table describes what should be expected to work and what is still under development.

List of currently supported features
  Grammar Oracle Generator
z, ε, +, ×
Seq
Set partial (see 1 below)
Cycle partial (see 1 below)
MSet partial (see 1 below) partial (see 3 below)
UCycle partial (see 2 below)

Limitations:

  1. Paganini (our default oracle) does not support constrained sets, multisets and cycles (that is iterated rules with leq=n and/or geq=m)) unless n = m. This can be worked around by providing your own oracle, e.g. by passing a dictionary of float values to build_oracle().
  2. Paganini (our default oracle) only implements unlabelled cycles (UCycle) of fixed size (that is cycles of the form UCycle(<expr), leq=n, geq=n)). This can be worked around by providing your own oracle, e.g. by passing a dictionary of float values to build_oracle().
  3. At the moment, the kinds of multisets we support are exactly those that are supported by paganini (see point 1 above).

Modules

usainboltz.grammar Context-free grammars for Boltzmann generation.
usainboltz.generator Boltzmann generator for Context-free grammars.
usainboltz.oracle Various oracle implementations for Boltzmann sampling.

Examples

usainboltz.examples.expnums A Boltzmann sampler for Set partitions (Bell numbers).
usainboltz.examples.knr19
usainboltz.examples.mondrian Mondrian paintings generation.
usainboltz.examples.polya_trees A Boltzmann sampler for Pólya n-ary trees.
usainboltz.examples.alcohols A Boltzmann sampler for alcohols

Indices and tables