From improper to proper learning
Our techniques offer a general scheme for converting
improper
learning algorithms to
proper
algorithms.
In particular, our approach applies to any parametric family
of distributions that are well approximated by a piecewise polynomial in which the parameters appear
polynomially and the breakpoints depend polynomially (or rationally) on the parameters.
As a result,
we can convert purely approximationtheoretic results into proper learning algorithms for other classes of
distributions, such as mixtures of Laplace or exponential distributions.
Conceptually, we show how to
5
approach proper learning as a purely deterministic optimization problem once a good density estimate is
available.
Hence our approach differs from essentially all previous proper learning algorithms, which use
probabilistic arguments in order to learn a mixture of Gaussians.
1.4
Techniques
At its core, our algorithm fits a mixture of Gaussians to a density estimate. In order to obtain an
accurate
and agnostic density estimate, we invoke recent work that has a time and sample complexity of
e
O
(
k
2
)
[ADLS15]. The density estimate produced by their algorithm has the form of a piecewise polynomial with
O
(
k
)
pieces, each of which has degree
O
(log
1
)
. It is important to note that our algorithm does not draw
any furthers samples after obtaining this density estimate — the process of fitting a mixture of Gaussians is
entirely deterministic.
Once we have obtained a good density estimate, the task of proper learning reduces to fitting a mixture of
k
Gaussians to the density estimate. We achieve this via a further reduction from fitting a GMM to solving
a carefully designed system of polynomial inequalities. We then solve the resulting system with Renegar’s
algorithm [Ren92a, Ren92b].
This reduction to a system of polynomial inequalities is our main technical
contribution and relies on the following techniques.
Shaperestricted polynomials
Ideally, one could directly fit a mixture of Gaussian pdfs to the density
estimate. However, this is a challenging task because the Gaussian pdf
1
σ
√
2
π
e

(
x

μ
)
2
2
is not convex in the
parameters
μ
and
σ
. Thus fitting a mixture of Gaussians is a nonconvex problem.
Instead of fitting mixtures of Gaussians directly, we instead use the notion of a
shape restricted
polynomial.
We say that a polynomial is shape restricted if its coefficients are in a given semialgebraic set, i.e., a set
defined by a finite number of polynomial equalities and inequalities. It is wellknown in approximation theory
that a single Gaussian can be approximated by a piecewise polynomial consisting of three pieces with degree
at most
O
(log
1
)
[Tim63]. So instead of fitting a mixture of
k
Gaussian directly, we instead fit a mixture of
k
shaperestricted piecewise polynomials. By encoding that the shaperestricted polynomials must have the
shape of Gaussian pdfs, we ensure that the mixture of shaperestricted piecewise polynomials found by the
system of polynomial inequalities is close to a true mixture of
k
Gaussians. After we have solved the system