usainboltz.grammar

Context-free grammars for Boltzmann generation.

Grammars use the basic operators of the symbolic method of analytic combinatorics to specify labelled or unlabelled combinatorial classes. For instance, binary trees can be specified by B = leaf + Z * B * B which, using the syntax implemented in this module, looks like:

Examples

>>> z, leaf = Atom(), Marker("leaf")
>>> B = RuleName("B")
>>> # Using the explicit syntax
>>> Grammar({B: Union(leaf, Product(z, B, B))})
{
  B : Union(leaf, Product(z, B, B))
}
>>> # Using the syntactic sugar + and * for unions and products
>>> Grammar({B: leaf + z * B * B})
{
  B : Union(leaf, Product(z, B, B))
}

Note that we make the difference between

  • terminal symbols of the grammar: Atom (leaf and z in the above example).
  • non-terminal symbols: RuleName (B in the above example).

The most basic constructions of the symbolic method are the Cartesian product Product and the disjoint union Union. In the example of binary trees, disjoint union is used to decompose the class between leaves and internal nodes. The Cartesian product is used to define internal nodes as a pair of children, the z appearing at each internal node means that each node counts as \(1\) in the size of a tree.

The exhaustive list of supported constructions is given below. Detailed explanations about each of these operators can be found on the Wikipedia page of the symbolic method and in [FS2009].

Here are two other examples of specifications, for general plane trees, illustrating the use of the Seq (sequence) operator and the use of multi-rules grammars:

Examples

>>> z = Atom()
>>> T, S = RuleName("T"), RuleName("S")
>>> # With Seq
>>> Grammar({T: z * Seq(T)})
{
  T : Product(z, Seq(T))
}
>>> # Equivalent definition without Seq
>>> nil = Epsilon()
>>> Grammar({
...     T: z * S,
...     S: nil + T * S,
... })
{
  S : Union(epsilon, Product(T, S)),
  T : Product(z, S)
}

Todo

Write about labeling

AUTHORS:

  • Matthieu Dien (2019): initial version
  • Martin Pépin (2019): initial version

References

[FS2009]Philippe Flajolet, Robert Sedgewick, 2009, Analytic combinatorics, Cambridge University Press.

Classes

Atom() Terminal symbol of a grammar, accounts for the size.
Cycle(arg, str], leq, geq, eq) Labelled cycles.
Epsilon() The epsilon symbol.
Grammar(rules, …) Context free grammars.
IteratedRule(arg, str], leq, geq, eq) The base class for iterated constructions Seq, Set, Cycle, MSet, etc.
MSet(arg, str], leq, geq, eq) Unlabelled multi-sets.
Marker(name) Marker symbols.
Product(*factors) Cartesian product of two or more rules.
Rule The super class of all grammar rules.
RuleName(name) Non terminal symbols of a grammar.
Seq(arg, str], leq, geq, eq) Sequences.
Set(arg, str], leq, geq, eq) Labelled sets.
Singleton
Symbol(name) Base class for all grammar symbols (Atom, RuleName, Epsilon).
UCycle(arg, str], leq, geq, eq) Unlabelled cycles.
Union(*terms) Disjoint union of two or more rules.
class usainboltz.grammar.Atom

Terminal symbol of a grammar, accounts for the size.

Atom is the class containing only one element of size 1.

Note

There is only one instance of the Atom class.

>>> Atom()
z
>>> Atom() is Atom()
True
__init__()

Initialize self. See help(type(self)) for accurate signature.

markers() → Set[usainboltz.grammar.Marker]

Return the set of all markers contained in the expression.

class usainboltz.grammar.Cycle(arg: Union[usainboltz.grammar.Rule, str], leq: Optional[int] = None, geq: Optional[int] = None, eq: Optional[int] = None)

Labelled cycles.

A cycle is like a sequence whose components can be cyclically shifted. For instance: [a, b, c], [b, c, a] and [c, a, b] represent the same cycle. In the following example, Cycle is used to represent the class of permutations as a set of cycles:

Example

>>> z, P = Atom(), RuleName("P")
>>> Grammar({P: Set(Cycle(z))})
{
  P : Set(Cycle(z))
}

The number of elements of the cycle can be constrained to be greater or smaller that some integer constants:

Examples

>>> z = Atom()
>>> Cycle(z, geq=5, leq=10)
Cycle(z, geq = 5, leq = 10)
>>> Cycle(z, geq=5)
Cycle(z, geq = 5)
>>> Cycle(z, leq=10)
Cycle(z, leq = 10)
__init__(arg: Union[usainboltz.grammar.Rule, str], leq: Optional[int] = None, geq: Optional[int] = None, eq: Optional[int] = None)

Rule constructor.

Parameters:
  • arg – a rule describing the elements of the collection.
  • leq – constrains the collection to have size at most leq.
  • geq – constrains the collection to have size at least geq.
  • eq – constrains the collection to have size exactly eq.
construction_name = 'Cycle'
markers() → Set[usainboltz.grammar.Marker]

Return the set of all markers in the rule.

class usainboltz.grammar.Epsilon

The epsilon symbol.

Epsilon is the class containing only one element of size 0.

Note

There is only one instance of the Epsilon class.

Examples

>>> eps1 = Epsilon()
>>> eps2 = Epsilon()
>>> eps1 is eps2
True
__init__()

Initialize self. See help(type(self)) for accurate signature.

markers() → Set[usainboltz.grammar.Marker]

Return the set of all markers contained in the expression.

class usainboltz.grammar.Grammar(rules: Optional[Dict[usainboltz.grammar.RuleName, usainboltz.grammar.Rule]] = None, labelled: bool = False)

Context free grammars.

__init__(rules: Optional[Dict[usainboltz.grammar.RuleName, usainboltz.grammar.Rule]] = None, labelled: bool = False)

Create a grammar.

The rules of the grammar can be either specified at grammar creation by feeding them to the constructor or specified later using the add_rule() method.

Parameters:rules – dictionary mapping RuleNames to Rules

Examples

>>> z = Atom()

The grammar of sequences

>>> S = RuleName("S")
>>> Grammar({S: Epsilon() + z * S})
{
  S : Union(epsilon, Product(z, S))
}

The grammar of binary trees

>>> B = RuleName("B")
>>> Grammar({B: Epsilon() + z * B * B})
{
  B : Union(epsilon, Product(z, B, B))
}

The grammar of unary-binary trees (or Motzkin trees)

>>> g = Grammar()
>>> T, U, B = RuleName("T"), RuleName("U"), RuleName("B")
>>> g.add_rule(T, z + U + B)
>>> g.add_rule(U, z * T)
>>> g.add_rule(B, z * T * T)
>>> g
{
  B : Product(z, T, T),
  T : Union(z, U, B),
  U : Product(z, T)
}
add_rule(rule_name: Union[usainboltz.grammar.RuleName, str], rule: Union[usainboltz.grammar.Rule, str]) → None

Add a rule to the grammar.

Parameters:
  • rule_name – a non-terminal symbol. If it was already defined in the grammar, it is replaced.
  • rule – the rule defining rule_name.

Examples

>>> A, B, C = RuleName("A"), RuleName("B"), RuleName("C")
>>> g = Grammar()
>>> g.add_rule(A, Union(B, C))
>>> g.rules[A]
Union(B, C)
markers() → Set[usainboltz.grammar.Marker]

Return all the markers appearing in the grammar.

Examples

>>> z, u = Atom(), Marker("u")
>>> B = RuleName("B")
>>> g = Grammar({B: Union(u, Product(z, B, B))})
>>> g.markers() == {u}
True
class usainboltz.grammar.IteratedRule(arg: Union[usainboltz.grammar.Rule, str], leq: Optional[int] = None, geq: Optional[int] = None, eq: Optional[int] = None)

The base class for iterated constructions Seq, Set, Cycle, MSet, etc.

Should not be instantiated directly.

__init__(arg: Union[usainboltz.grammar.Rule, str], leq: Optional[int] = None, geq: Optional[int] = None, eq: Optional[int] = None)

Rule constructor.

Parameters:
  • arg – a rule describing the elements of the collection.
  • leq – constrains the collection to have size at most leq.
  • geq – constrains the collection to have size at least geq.
  • eq – constrains the collection to have size exactly eq.
markers() → Set[usainboltz.grammar.Marker]

Return the set of all markers in the rule.

class usainboltz.grammar.MSet(arg: Union[usainboltz.grammar.Rule, str], leq: Optional[int] = None, geq: Optional[int] = None, eq: Optional[int] = None)

Unlabelled multi-sets.

A multi-set is like a set where elements can occur multiple times. In the following example, MSet is used to represent the class of non-plane general trees:

Examples

>>> z, T = Atom(), RuleName("T")
>>> Grammar({T: z * MSet(T)})
{
  T : Product(z, MSet(T))
}

The number of elements of the multi-set can be constrained to be greater or smaller that some integer constants:

Examples

>>> z = Atom()
>>> MSet(z, geq=5, leq=10)
MSet(z, geq = 5, leq = 10)
>>> MSet(z, geq=5)
MSet(z, geq = 5)
>>> MSet(z, leq=10)
MSet(z, leq = 10)
__init__(arg: Union[usainboltz.grammar.Rule, str], leq: Optional[int] = None, geq: Optional[int] = None, eq: Optional[int] = None)

Rule constructor.

Parameters:
  • arg – a rule describing the elements of the collection.
  • leq – constrains the collection to have size at most leq.
  • geq – constrains the collection to have size at least geq.
  • eq – constrains the collection to have size exactly eq.
construction_name = 'MSet'
markers() → Set[usainboltz.grammar.Marker]

Return the set of all markers in the rule.

class usainboltz.grammar.Marker(name: Optional[str] = None)

Marker symbols.

Examples

>>> Marker("u")
u
>>> Marker()
_symbol_0

Markers are similar to atoms but have size 0. They are usually used to mark some special places in the grammar. For instance in the following grammar for Motzkin tree, the marker u is used to mark unary nodes.

Examples

>>> z, u, M = Atom(), Marker(), RuleName()
>>> grammar = Grammar({M: z + u * z * M + z * M * M})
__init__(name: Optional[str] = None)

Initialize self. See help(type(self)) for accurate signature.

markers() → Set[usainboltz.grammar.Marker]

Return the set of all markers contained in the expression.

class usainboltz.grammar.Product(*factors)

Cartesian product of two or more rules.

Product(A, B, C) corresponds to the following grammar in BNF syntax: _ ::= A × B × C

Examples

>>> A, B, C = RuleName("A"), RuleName("B"), RuleName("C")
>>> Product(A, B, C)
Product(A, B, C)
>>> z = Atom()
>>> Product(z, A)
Product(z, A)
__init__(*factors)

Build the product of two or more rules.

Parameters:factors – list of factors of the product.
markers() → Set[usainboltz.grammar.Marker]

Return the set of all markers contained in the expression.

class usainboltz.grammar.Rule

The super class of all grammar rules.

Should not be instantiated directly.

__init__

Initialize self. See help(type(self)) for accurate signature.

markers() → Set[usainboltz.grammar.Marker]
class usainboltz.grammar.RuleName(name: Optional[str] = None)

Non terminal symbols of a grammar.

Instances of this class represent recursive references to non-terminal symbols inside grammar rules. For instance, sequences of atoms z could be defined using the following grammar where the RuleName S refers to itself in its definition:

Examples

>>> epsilon, z, S = Epsilon(), Atom(), RuleName("S")
>>> Grammar({S: epsilon + z * S})
{
    S : Union(epsilon, Product(z, S))
}
__init__(name: Optional[str] = None)

Initialize self. See help(type(self)) for accurate signature.

markers() → Set[usainboltz.grammar.Marker]

Return the set of all markers contained in the expression.

class usainboltz.grammar.Seq(arg: Union[usainboltz.grammar.Rule, str], leq: Optional[int] = None, geq: Optional[int] = None, eq: Optional[int] = None)

Sequences.

The Seq construction of the symbolic method models sequences of elements. In the following example, Seq is used to represent binary words as sequences of bits.

Example

>>> z, one, zero, S = Atom(), Marker("1"), Marker("0"), RuleName("S")
>>> Grammar({S: Seq(z * (one + zero))})
{
  S : Seq(Product(z, Union(1, 0)))
}

The number of terms of a sequence can be constrained to be greater or smaller that some integer constants:

Examples

>>> z, one, zero = Atom(), Marker("1"), Marker("0")
>>> Seq(z * (one + zero), leq=10, geq=5)
Seq(Product(z, Union(1, 0)), geq = 5, leq = 10)
>>> Seq(z * (one + zero), leq=10)
Seq(Product(z, Union(1, 0)), leq = 10)
>>> Seq(z * (one + zero), geq=5)
Seq(Product(z, Union(1, 0)), geq = 5)
__init__(arg: Union[usainboltz.grammar.Rule, str], leq: Optional[int] = None, geq: Optional[int] = None, eq: Optional[int] = None)

Rule constructor.

Parameters:
  • arg – a rule describing the elements of the collection.
  • leq – constrains the collection to have size at most leq.
  • geq – constrains the collection to have size at least geq.
  • eq – constrains the collection to have size exactly eq.
construction_name = 'Seq'
markers() → Set[usainboltz.grammar.Marker]

Return the set of all markers in the rule.

class usainboltz.grammar.Set(arg: Union[usainboltz.grammar.Rule, str], leq: Optional[int] = None, geq: Optional[int] = None, eq: Optional[int] = None)

Labelled sets.

The labelled Set construction of the symbolic method models sets of elements. In the following example, Set is used to model labelled general trees.

Example

>>> z, T = Atom(), RuleName("T")
>>> Grammar({T: z * Set(T)})
{
  T : Product(z, Set(T))
}

The number of elements of the set can be constrained to be greater or smaller that some integer constants:

Examples

>>> z = Atom()
>>> # sets of 5 to 10 elements
>>> Set(z, geq=5, leq=10)
Set(z,  geq = 5, leq = 10)
>>> # sets of at least 5 elements
>>> Set(z, geq = 5)
Set(z, geq = 5)
>>> # sets of at most 10 elements
>>> Set(z, leq=10)
Set(z, leq = 10)
__init__(arg: Union[usainboltz.grammar.Rule, str], leq: Optional[int] = None, geq: Optional[int] = None, eq: Optional[int] = None)

Rule constructor.

Parameters:
  • arg – a rule describing the elements of the collection.
  • leq – constrains the collection to have size at most leq.
  • geq – constrains the collection to have size at least geq.
  • eq – constrains the collection to have size exactly eq.
construction_name = 'Set'
markers() → Set[usainboltz.grammar.Marker]

Return the set of all markers in the rule.

class usainboltz.grammar.Singleton
__init__

Initialize self. See help(type(self)) for accurate signature.

class usainboltz.grammar.Symbol(name: Optional[str] = None)

Base class for all grammar symbols (Atom, RuleName, Epsilon).

Should not be instantiated directly.

__init__(name: Optional[str] = None)

Initialize self. See help(type(self)) for accurate signature.

markers() → Set[usainboltz.grammar.Marker]

Return the set of all markers contained in the expression.

class usainboltz.grammar.UCycle(arg: Union[usainboltz.grammar.Rule, str], leq: Optional[int] = None, geq: Optional[int] = None, eq: Optional[int] = None)

Unlabelled cycles.

A cycle is like a sequence whose components can be cyclically shifted. For instance: [a, b, c], [b, c, a] and [c, a, b] represent the same cycle.

The number of elements of the cycle can be constrained to be greater or smaller that some integer constants:

Examples

>>> z = Atom()
>>> UCycle(z, geq=5, leq=10)
UCycle(z, geq = 5, leq = 10)
>>> UCycle(z, geq=5)
UCycle(z, geq = 5)
>>> UCycle(z, leq=10)
UCycle(z, leq = 10)
__init__(arg: Union[usainboltz.grammar.Rule, str], leq: Optional[int] = None, geq: Optional[int] = None, eq: Optional[int] = None)

Rule constructor.

Parameters:
  • arg – a rule describing the elements of the collection.
  • leq – constrains the collection to have size at most leq.
  • geq – constrains the collection to have size at least geq.
  • eq – constrains the collection to have size exactly eq.
construction_name = 'UCycle'
markers() → Set[usainboltz.grammar.Marker]

Return the set of all markers in the rule.

class usainboltz.grammar.Union(*terms)

Disjoint union of two or more rules.

D = Union(A, B, C) corresponds to the following grammar in BNF syntax: D ::= A | B | C

Examples

>>> A, B, C = RuleName("A"), RuleName("B"), RuleName("C")
>>> Union(A, B, C)
Union(A, B, C)
>>> z = Atom()
>>> Union(z, A)
Union(z, A)
__init__(*terms)

Build a union of two or more rules.

Parameters:terms – list of terms of the union.
markers() → Set[usainboltz.grammar.Marker]

Return the set of markers contained in the expression.