Usain Boltz
===========
Usain Boltz is a Python/Cython library meant to automate the random generation
of tree-like structures.
The library was primarily designed to be used with the
`Sagemath `_ mathematics software systems but Sagemath is no
longer a dependency and the library is now perfectly usable both within and without
a Sagemath environment.
Principle (non-technical)
-------------------------
The use of Usain Boltz starts with the specification of the structures we want
to sample using :py:mod:`~usainboltz.grammar` module. Then, optionally, we can
tell generator (:py:class:`~usainboltz.generator.Generator`) how to build the
objects, thanks to the builders mechanism.
For example, imagine that we want to sample tree-like structures
::
Tree := Leaf | Tree * Tree
These structures may be represented as XML strings:
::
but also as JSON objects:
::
{ 'tree' : [
{ 'tree' : [ 'leaf', 'leaf' ] },
{ 'tree' : [ 'leaf' ]}
]
}
However the inherent tree structure is the same. Thus the *same*
generator may generate both objects using different builders.
Principle (technical)
---------------------
Usain Boltz is based on the principles of the *Boltzmann method* introduced
in the article [DuFlLoSc04]_, whose goal is to provide fast and
uniform random sampler for a large variety of combinatorial structures.
The starting point is to describe the structures we want to sample by a
:py:class:`~usainboltz.grammar.Grammar`. Each rule of that grammar
defines a *combinatorial class*: a set where objects have a finite
size given by their number of
:py:class:`~usainboltz.grammar.Atom`. Then the idea is to sample
objects :math:`\gamma` from a class :math:`\mathcal{A}` with respect
to the *Boltzmann model of parameter* :math:`x`:
.. math::
\mathbb{P}(\gamma) = \frac{x^{|\gamma|}}{A(x)}
where :math:`|\gamma|` is the size of :math:`\gamma` and :math:`A` is
the generating series of :math:`\mathcal{A}` defined by
:math:`A(z)=\sum_{\gamma \in \mathcal{A}} z^{|\gamma|}`.
Given that model, one can compute that the expected size of a sampled
object is given by:
.. math::
\mathbb{E}[|\gamma|] = x \cdot \frac{A'(x)}{A(x)}
So a good choice of :math:`x` allows to target a specific size for the
sampled objects. This is the work of the :py:mod:`~usainboltz.oracle`.
.. rubric:: References
.. [DuFlLoSc04] Philippe Duchon, Philippe Flajolet, Guy Louchard, and Gilles
Schaeffer. 2004. Boltzmann Samplers for the Random Generation of
Combinatorial Structures. Comb. Probab. Comput. 13, 4-5 (July 2004), 577-625.
DOI=http://dx.doi.org/10.1017/S0963548304006315
Supported features
------------------
Usainboltz is still a work in progress and not all the combinators from the
:py:mod:`~usainboltz.grammar` module are currently fully supported.
This table describes what should be expected to work and what is still under
development.
.. csv-table:: List of currently supported features
:header: "", "Grammar", "Oracle", "Generator"
"z, ε, +, ×", "✅", "✅", "✅"
"Seq", "✅", "✅", "✅"
"Set", "✅", "partial (see 1 below)", "✅"
"Cycle", "✅", "partial (see 1 below)", "❌"
"MSet", "✅", "partial (see 1 below)", "partial (see 3 below)"
"UCycle", "✅", "partial (see 2 below)", "❌"
Limitations:
1. Paganini (our default oracle) does not support constrained sets, multisets
and cycles (that is iterated rules with :code:`leq=n` and/or :code:`geq=m)`)
unless :code:`n = m`.
This can be worked around by providing your own oracle, e.g. by passing a
dictionary of float values to :py:func:`~usainboltz.oracle.build_oracle`.
2. Paganini (our default oracle) only implements unlabelled cycles
(:code:`UCycle`) of fixed size (that is cycles of the form
:code:`UCycle(