leverage integration by parts: given two randomly generated functions
F
and
G
, we compute their
respective derivatives
f
and
g
. If
fG
already belongs to the training set, we know its integral, and we
can compute the integral of
Fg
as:
Z
Fg
=
FG

Z
fG
Similarly, if
Fg
is in the training set, we can infer the integral of
fG
. Whenever we discover the
integral of a new function, we add it to the training set. If none of
fG
or
Fg
are in the training set,
we simply generate new functions
F
and
G
. With this approach, we can generate the integrals of
functions like
x
10
sin(
x
)
without resorting to an external symbolic integration system.
Comparing different generation methods.
Table 1 in Section 4.1 summarizes the differences
between the three generation methods. The
FWD
method tends to generate short problems with long
solutions (that computer algebras can solve). The
BWD
approach, on the other hand, generates long
problems with short solutions.
IBP
generates datasets comparable to
FWD
(short problems and long
solutions), without an external computer algebra system. A mixture of
BWD
and
IBP
generated data
should therefore provide a better representation of problem space, without resorting to external tools.
Examples of functions / integrals for the three approaches are given in Table 9 of the Appendix.
3.2
F
IRST ORDER DIFFERENTIAL EQUATION
(ODE 1)
We now present a method to generate first order differential equations with their solutions. We
start from a bivariate function
F
(
x, y
)
such that the equation
F
(
x, y
) =
c
(where
c
is a constant)
can be analytically solved in
y
. In other words, there exists a bivariate function
f
that satisfies
∀
(
x, c
)
, F
(
x, f
(
x, c
)
)
=
c
. By differentiation with respect to
x
, we have that
∀
x, c
:
∂F
(
x, f
c
(
x
))
∂x
+
f
0
c
(
x
)
∂F
(
x, f
c
(
x
))
∂y
= 0
where
f
c
=
x
7→
f
(
x, c
)
. As a result, for any constant
c
,
f
c
is solution of the first order differential
equation:
∂F
(
x, y
)
∂x
+
y
0
∂F
(
x, y
)
∂y
= 0
(3)
With this approach, we can use the method described in Section C of the appendix to generate
arbitrary functions
F
(
x, y
)
analytically solvable in
y
, and create a dataset of differential equations
with their solutions.