A
A
SYNTAX FOR MATHEMATICAL EXPRESSIONS
We represent mathematical expressions as trees with operators as internal nodes, and numbers,
constants or variables, as leaves. By enumerating nodes in prefix order, we transform trees into
sequences suitable for seq2seq architectures.
For this representation to be efficient, we want expressions, trees and sequences to be in a onetoone
correspondence. Different expressions will always result in different trees and sequences, but for the
reverse to hold, we need to take care of a few special cases.
First, expressions like sums and products may correspond to several trees. For instance, the expression
2 + 3 + 5
can be represented as any one of those trees:
+
2
3
5
+
+
2
3
5
+
2
+
3
5
We will assume that all operators have at most two operands, and that, in case of doubt, they are
associative to the right.
2 + 3 + 5
would then correspond to the rightmost tree.
Second, the distinction between internal nodes (operators) and leaves (mathematical primitive objects)
is somewhat arbitrary. For instance, the number

2
could be represented as a basic object, or as a
unary minus operator applied to the number
2
. Similarly, there are several ways to represent
√
5
,
42
x
5
, or the function
log
10
. For simplicity, we only consider numbers, constants and variables as
possible leaves, and avoid using a unary minus. In particular, expressions like

x
are represented as

1
×
x
. Here are the trees for

2
,
√
5
,
42
x
5
and

x
:

2
sqrt
5
×
42
pow
x
5
×

1
x
Integers are represented in positional notation, as a sign followed by a sequence of digits (from
0
to
9
in base
10
). For instance,
2354
and

34
are represented as
+2 3 5 4
and

3 4
. For zero, a unique
representation is chosen (
+0
or

0
).
B
M
ATHEMATICAL DERIVATIONS OF THE PROBLEM SPACE SIZE
In this section, we investigate the size of the problem space by computing the number of expressions
with
n
internal nodes. We first deal with the simpler case where we only have binary operators
(
p
1
= 0
), then consider trees and expressions composed of unary and binary operators. In each case,
we calculate a generating function (Flajolet & Sedgewick, 2009; Wilf, 2005) from which we derive a
closed formula or recurrence on the number of expressions, and an asymptotic expansion.
B.1
B
INARY TREES AND EXPRESSIONS
The main part of this derivation follows (Knuth, 1997) (pages 388389).
Generating function
Let
b
n
be the number of binary trees with
n
internal nodes. We have
b
0
= 1
and
b
1
= 1
. Any binary tree with
n
internal nodes can be generated by concatenating a left and a
right subtree with
k
and
n

1

k
internal nodes respectively. By summing over all possible values
of
k
, we have that:
b
n
=
b
0
b
n

1
+
b
1
b
n

2
+
· · ·
+
b
n

2
b
1
+
b
n

1
b
0
Let
B
(
z
)
be the generating function of
b
n
,
B
(
z
) =
b
0
+
b
1
z
+
b
2
z
2
+
b
3
z
3
+
. . .
15