usainboltz.grammar¶
Context-free grammars for Boltzmann generation.¶
Grammars use the basic operators of the symbolic method of analytic
combinatorics to specify labelled or unlabelled combinatorial classes. For
instance, binary trees can be specified by B = leaf + Z * B * B
which,
using the syntax implemented in this module, looks like:
Examples
>>> z, leaf = Atom(), Marker("leaf")
>>> B = RuleName("B")
>>> # Using the explicit syntax
>>> Grammar({B: Union(leaf, Product(z, B, B))})
{
B : Union(leaf, Product(z, B, B))
}
>>> # Using the syntactic sugar + and * for unions and products
>>> Grammar({B: leaf + z * B * B})
{
B : Union(leaf, Product(z, B, B))
}
Note that we make the difference between
- terminal symbols of the grammar:
Atom
(leaf
andz
in the above example). - non-terminal symbols:
RuleName
(B
in the above example).
The most basic constructions of the symbolic method are the Cartesian product
Product
and the disjoint union Union
. In the example of
binary trees, disjoint union is used to decompose the class between leaves and
internal nodes. The Cartesian product is used to define internal nodes as a
pair of children, the z
appearing at each internal node means that each
node counts as \(1\) in the size of a tree.
The exhaustive list of supported constructions is given below. Detailed explanations about each of these operators can be found on the Wikipedia page of the symbolic method and in [FS2009].
Here are two other examples of specifications, for general plane trees,
illustrating the use of the Seq
(sequence) operator and the use of
multi-rules grammars:
Examples
>>> z = Atom()
>>> T, S = RuleName("T"), RuleName("S")
>>> # With Seq
>>> Grammar({T: z * Seq(T)})
{
T : Product(z, Seq(T))
}
>>> # Equivalent definition without Seq
>>> nil = Epsilon()
>>> Grammar({
... T: z * S,
... S: nil + T * S,
... })
{
S : Union(epsilon, Product(T, S)),
T : Product(z, S)
}
Todo
Write about labeling
AUTHORS:
- Matthieu Dien (2019): initial version
- Martin Pépin (2019): initial version
References
[FS2009] | Philippe Flajolet, Robert Sedgewick, 2009, Analytic combinatorics, Cambridge University Press. |
Classes
Atom () |
Terminal symbol of a grammar, accounts for the size. |
Cycle (arg, str], leq, geq, eq) |
Labelled cycles. |
Epsilon () |
The epsilon symbol. |
Grammar (rules, …) |
Context free grammars. |
IteratedRule (arg, str], leq, geq, eq) |
The base class for iterated constructions Seq, Set, Cycle, MSet, etc. |
MSet (arg, str], leq, geq, eq) |
Unlabelled multi-sets. |
Marker (name) |
Marker symbols. |
Product (*factors) |
Cartesian product of two or more rules. |
Rule |
The super class of all grammar rules. |
RuleName (name) |
Non terminal symbols of a grammar. |
Seq (arg, str], leq, geq, eq) |
Sequences. |
Set (arg, str], leq, geq, eq) |
Labelled sets. |
Singleton |
|
Symbol (name) |
Base class for all grammar symbols (Atom, RuleName, Epsilon). |
UCycle (arg, str], leq, geq, eq) |
Unlabelled cycles. |
Union (*terms) |
Disjoint union of two or more rules. |
-
class
usainboltz.grammar.
Atom
¶ Terminal symbol of a grammar, accounts for the size.
Atom is the class containing only one element of size 1.
Note
There is only one instance of the Atom class.
>>> Atom() z
>>> Atom() is Atom() True
-
__init__
()¶ Initialize self. See help(type(self)) for accurate signature.
-
markers
() → Set[usainboltz.grammar.Marker]¶ Return the set of all markers contained in the expression.
-
-
class
usainboltz.grammar.
Cycle
(arg: Union[usainboltz.grammar.Rule, str], leq: Optional[int] = None, geq: Optional[int] = None, eq: Optional[int] = None)¶ Labelled cycles.
A cycle is like a sequence whose components can be cyclically shifted. For instance: [a, b, c], [b, c, a] and [c, a, b] represent the same cycle. In the following example, Cycle is used to represent the class of permutations as a set of cycles:
Example
>>> z, P = Atom(), RuleName("P") >>> Grammar({P: Set(Cycle(z))}) { P : Set(Cycle(z)) }
The number of elements of the cycle can be constrained to be greater or smaller that some integer constants:
Examples
>>> z = Atom()
>>> Cycle(z, geq=5, leq=10) Cycle(z, geq = 5, leq = 10)
>>> Cycle(z, geq=5) Cycle(z, geq = 5)
>>> Cycle(z, leq=10) Cycle(z, leq = 10)
-
__init__
(arg: Union[usainboltz.grammar.Rule, str], leq: Optional[int] = None, geq: Optional[int] = None, eq: Optional[int] = None)¶ Rule constructor.
Parameters: - arg – a rule describing the elements of the collection.
- leq – constrains the collection to have size at most
leq
. - geq – constrains the collection to have size at least
geq
. - eq – constrains the collection to have size exactly
eq
.
-
construction_name
= 'Cycle'¶
-
markers
() → Set[usainboltz.grammar.Marker]¶ Return the set of all markers in the rule.
-
-
class
usainboltz.grammar.
Epsilon
¶ The epsilon symbol.
Epsilon is the class containing only one element of size 0.
Note
There is only one instance of the Epsilon class.
Examples
>>> eps1 = Epsilon() >>> eps2 = Epsilon() >>> eps1 is eps2 True
-
__init__
()¶ Initialize self. See help(type(self)) for accurate signature.
-
markers
() → Set[usainboltz.grammar.Marker]¶ Return the set of all markers contained in the expression.
-
-
class
usainboltz.grammar.
Grammar
(rules: Optional[Dict[usainboltz.grammar.RuleName, usainboltz.grammar.Rule]] = None, labelled: bool = False)¶ Context free grammars.
-
__init__
(rules: Optional[Dict[usainboltz.grammar.RuleName, usainboltz.grammar.Rule]] = None, labelled: bool = False)¶ Create a grammar.
The rules of the grammar can be either specified at grammar creation by feeding them to the constructor or specified later using the
add_rule()
method.Parameters: rules – dictionary mapping RuleNames to Rules Examples
>>> z = Atom()
The grammar of sequences
>>> S = RuleName("S") >>> Grammar({S: Epsilon() + z * S}) { S : Union(epsilon, Product(z, S)) }
The grammar of binary trees
>>> B = RuleName("B") >>> Grammar({B: Epsilon() + z * B * B}) { B : Union(epsilon, Product(z, B, B)) }
The grammar of unary-binary trees (or Motzkin trees)
>>> g = Grammar() >>> T, U, B = RuleName("T"), RuleName("U"), RuleName("B") >>> g.add_rule(T, z + U + B) >>> g.add_rule(U, z * T) >>> g.add_rule(B, z * T * T) >>> g { B : Product(z, T, T), T : Union(z, U, B), U : Product(z, T) }
-
add_rule
(rule_name: Union[usainboltz.grammar.RuleName, str], rule: Union[usainboltz.grammar.Rule, str]) → None¶ Add a rule to the grammar.
Parameters: - rule_name – a non-terminal symbol. If it was already defined in the grammar, it is replaced.
- rule – the rule defining
rule_name
.
Examples
>>> A, B, C = RuleName("A"), RuleName("B"), RuleName("C") >>> g = Grammar() >>> g.add_rule(A, Union(B, C)) >>> g.rules[A] Union(B, C)
-
markers
() → Set[usainboltz.grammar.Marker]¶ Return all the markers appearing in the grammar.
Examples
>>> z, u = Atom(), Marker("u") >>> B = RuleName("B") >>> g = Grammar({B: Union(u, Product(z, B, B))}) >>> g.markers() == {u} True
-
-
class
usainboltz.grammar.
IteratedRule
(arg: Union[usainboltz.grammar.Rule, str], leq: Optional[int] = None, geq: Optional[int] = None, eq: Optional[int] = None)¶ The base class for iterated constructions Seq, Set, Cycle, MSet, etc.
Should not be instantiated directly.
-
__init__
(arg: Union[usainboltz.grammar.Rule, str], leq: Optional[int] = None, geq: Optional[int] = None, eq: Optional[int] = None)¶ Rule constructor.
Parameters: - arg – a rule describing the elements of the collection.
- leq – constrains the collection to have size at most
leq
. - geq – constrains the collection to have size at least
geq
. - eq – constrains the collection to have size exactly
eq
.
-
markers
() → Set[usainboltz.grammar.Marker]¶ Return the set of all markers in the rule.
-
-
class
usainboltz.grammar.
MSet
(arg: Union[usainboltz.grammar.Rule, str], leq: Optional[int] = None, geq: Optional[int] = None, eq: Optional[int] = None)¶ Unlabelled multi-sets.
A multi-set is like a set where elements can occur multiple times. In the following example, MSet is used to represent the class of non-plane general trees:
Examples
>>> z, T = Atom(), RuleName("T") >>> Grammar({T: z * MSet(T)}) { T : Product(z, MSet(T)) }
The number of elements of the multi-set can be constrained to be greater or smaller that some integer constants:
Examples
>>> z = Atom()
>>> MSet(z, geq=5, leq=10) MSet(z, geq = 5, leq = 10)
>>> MSet(z, geq=5) MSet(z, geq = 5)
>>> MSet(z, leq=10) MSet(z, leq = 10)
-
__init__
(arg: Union[usainboltz.grammar.Rule, str], leq: Optional[int] = None, geq: Optional[int] = None, eq: Optional[int] = None)¶ Rule constructor.
Parameters: - arg – a rule describing the elements of the collection.
- leq – constrains the collection to have size at most
leq
. - geq – constrains the collection to have size at least
geq
. - eq – constrains the collection to have size exactly
eq
.
-
construction_name
= 'MSet'¶
-
markers
() → Set[usainboltz.grammar.Marker]¶ Return the set of all markers in the rule.
-
-
class
usainboltz.grammar.
Marker
(name: Optional[str] = None)¶ Marker symbols.
Examples
>>> Marker("u") u
>>> Marker() _symbol_0
Markers are similar to atoms but have size 0. They are usually used to mark some special places in the grammar. For instance in the following grammar for Motzkin tree, the marker u is used to mark unary nodes.
Examples
>>> z, u, M = Atom(), Marker(), RuleName() >>> grammar = Grammar({M: z + u * z * M + z * M * M})
-
__init__
(name: Optional[str] = None)¶ Initialize self. See help(type(self)) for accurate signature.
-
markers
() → Set[usainboltz.grammar.Marker]¶ Return the set of all markers contained in the expression.
-
-
class
usainboltz.grammar.
Product
(*factors)¶ Cartesian product of two or more rules.
Product(A, B, C)
corresponds to the following grammar in BNF syntax:_ ::= A × B × C
Examples
>>> A, B, C = RuleName("A"), RuleName("B"), RuleName("C")
>>> Product(A, B, C) Product(A, B, C)
>>> z = Atom() >>> Product(z, A) Product(z, A)
-
__init__
(*factors)¶ Build the product of two or more rules.
Parameters: factors – list of factors of the product.
-
markers
() → Set[usainboltz.grammar.Marker]¶ Return the set of all markers contained in the expression.
-
-
class
usainboltz.grammar.
Rule
¶ The super class of all grammar rules.
Should not be instantiated directly.
-
__init__
¶ Initialize self. See help(type(self)) for accurate signature.
-
markers
() → Set[usainboltz.grammar.Marker]¶
-
-
class
usainboltz.grammar.
RuleName
(name: Optional[str] = None)¶ Non terminal symbols of a grammar.
Instances of this class represent recursive references to non-terminal symbols inside grammar rules. For instance, sequences of atoms
z
could be defined using the following grammar where theRuleName
S
refers to itself in its definition:Examples
>>> epsilon, z, S = Epsilon(), Atom(), RuleName("S") >>> Grammar({S: epsilon + z * S}) { S : Union(epsilon, Product(z, S)) }
-
__init__
(name: Optional[str] = None)¶ Initialize self. See help(type(self)) for accurate signature.
-
markers
() → Set[usainboltz.grammar.Marker]¶ Return the set of all markers contained in the expression.
-
-
class
usainboltz.grammar.
Seq
(arg: Union[usainboltz.grammar.Rule, str], leq: Optional[int] = None, geq: Optional[int] = None, eq: Optional[int] = None)¶ Sequences.
The Seq construction of the symbolic method models sequences of elements. In the following example, Seq is used to represent binary words as sequences of bits.
Example
>>> z, one, zero, S = Atom(), Marker("1"), Marker("0"), RuleName("S") >>> Grammar({S: Seq(z * (one + zero))}) { S : Seq(Product(z, Union(1, 0))) }
The number of terms of a sequence can be constrained to be greater or smaller that some integer constants:
Examples
>>> z, one, zero = Atom(), Marker("1"), Marker("0")
>>> Seq(z * (one + zero), leq=10, geq=5) Seq(Product(z, Union(1, 0)), geq = 5, leq = 10)
>>> Seq(z * (one + zero), leq=10) Seq(Product(z, Union(1, 0)), leq = 10)
>>> Seq(z * (one + zero), geq=5) Seq(Product(z, Union(1, 0)), geq = 5)
-
__init__
(arg: Union[usainboltz.grammar.Rule, str], leq: Optional[int] = None, geq: Optional[int] = None, eq: Optional[int] = None)¶ Rule constructor.
Parameters: - arg – a rule describing the elements of the collection.
- leq – constrains the collection to have size at most
leq
. - geq – constrains the collection to have size at least
geq
. - eq – constrains the collection to have size exactly
eq
.
-
construction_name
= 'Seq'¶
-
markers
() → Set[usainboltz.grammar.Marker]¶ Return the set of all markers in the rule.
-
-
class
usainboltz.grammar.
Set
(arg: Union[usainboltz.grammar.Rule, str], leq: Optional[int] = None, geq: Optional[int] = None, eq: Optional[int] = None)¶ Labelled sets.
The labelled Set construction of the symbolic method models sets of elements. In the following example, Set is used to model labelled general trees.
Example
>>> z, T = Atom(), RuleName("T") >>> Grammar({T: z * Set(T)}) { T : Product(z, Set(T)) }
The number of elements of the set can be constrained to be greater or smaller that some integer constants:
Examples
>>> z = Atom()
>>> # sets of 5 to 10 elements >>> Set(z, geq=5, leq=10) Set(z, geq = 5, leq = 10)
>>> # sets of at least 5 elements >>> Set(z, geq = 5) Set(z, geq = 5)
>>> # sets of at most 10 elements >>> Set(z, leq=10) Set(z, leq = 10)
-
__init__
(arg: Union[usainboltz.grammar.Rule, str], leq: Optional[int] = None, geq: Optional[int] = None, eq: Optional[int] = None)¶ Rule constructor.
Parameters: - arg – a rule describing the elements of the collection.
- leq – constrains the collection to have size at most
leq
. - geq – constrains the collection to have size at least
geq
. - eq – constrains the collection to have size exactly
eq
.
-
construction_name
= 'Set'¶
-
markers
() → Set[usainboltz.grammar.Marker]¶ Return the set of all markers in the rule.
-
-
class
usainboltz.grammar.
Singleton
¶ -
__init__
¶ Initialize self. See help(type(self)) for accurate signature.
-
-
class
usainboltz.grammar.
Symbol
(name: Optional[str] = None)¶ Base class for all grammar symbols (Atom, RuleName, Epsilon).
Should not be instantiated directly.
-
__init__
(name: Optional[str] = None)¶ Initialize self. See help(type(self)) for accurate signature.
-
markers
() → Set[usainboltz.grammar.Marker]¶ Return the set of all markers contained in the expression.
-
-
class
usainboltz.grammar.
UCycle
(arg: Union[usainboltz.grammar.Rule, str], leq: Optional[int] = None, geq: Optional[int] = None, eq: Optional[int] = None)¶ Unlabelled cycles.
A cycle is like a sequence whose components can be cyclically shifted. For instance: [a, b, c], [b, c, a] and [c, a, b] represent the same cycle.
The number of elements of the cycle can be constrained to be greater or smaller that some integer constants:
Examples
>>> z = Atom()
>>> UCycle(z, geq=5, leq=10) UCycle(z, geq = 5, leq = 10)
>>> UCycle(z, geq=5) UCycle(z, geq = 5)
>>> UCycle(z, leq=10) UCycle(z, leq = 10)
-
__init__
(arg: Union[usainboltz.grammar.Rule, str], leq: Optional[int] = None, geq: Optional[int] = None, eq: Optional[int] = None)¶ Rule constructor.
Parameters: - arg – a rule describing the elements of the collection.
- leq – constrains the collection to have size at most
leq
. - geq – constrains the collection to have size at least
geq
. - eq – constrains the collection to have size exactly
eq
.
-
construction_name
= 'UCycle'¶
-
markers
() → Set[usainboltz.grammar.Marker]¶ Return the set of all markers in the rule.
-
-
class
usainboltz.grammar.
Union
(*terms)¶ Disjoint union of two or more rules.
D = Union(A, B, C)
corresponds to the following grammar in BNF syntax:D ::= A | B | C
Examples
>>> A, B, C = RuleName("A"), RuleName("B"), RuleName("C")
>>> Union(A, B, C) Union(A, B, C)
>>> z = Atom() >>> Union(z, A) Union(z, A)
-
__init__
(*terms)¶ Build a union of two or more rules.
Parameters: terms – list of terms of the union.
-
markers
() → Set[usainboltz.grammar.Marker]¶ Return the set of markers contained in the expression.
-