From improper to proper learning our techniques offer

• 34

This preview shows page 5 - 7 out of 34 pages.

From improper to proper learning Our techniques offer a general scheme for converting improper learning algorithms to proper algorithms. In particular, our approach applies to any parametric family of distributions that are well approximated by a piecewise polynomial in which the parameters appear polynomially and the breakpoints depend polynomially (or rationally) on the parameters. As a result, we can convert purely approximation-theoretic results into proper learning algorithms for other classes of distributions, such as mixtures of Laplace or exponential distributions. Conceptually, we show how to 5
approach proper learning as a purely deterministic optimization problem once a good density estimate is available. Hence our approach differs from essentially all previous proper learning algorithms, which use probabilistic arguments in order to learn a mixture of Gaussians. 1.4 Techniques At its core, our algorithm fits a mixture of Gaussians to a density estimate. In order to obtain an -accurate and agnostic density estimate, we invoke recent work that has a time and sample complexity of e O ( k 2 ) [ADLS15]. The density estimate produced by their algorithm has the form of a piecewise polynomial with O ( k ) pieces, each of which has degree O (log 1 ) . It is important to note that our algorithm does not draw any furthers samples after obtaining this density estimate — the process of fitting a mixture of Gaussians is entirely deterministic. Once we have obtained a good density estimate, the task of proper learning reduces to fitting a mixture of k Gaussians to the density estimate. We achieve this via a further reduction from fitting a GMM to solving a carefully designed system of polynomial inequalities. We then solve the resulting system with Renegar’s algorithm [Ren92a, Ren92b]. This reduction to a system of polynomial inequalities is our main technical contribution and relies on the following techniques. Shape-restricted polynomials Ideally, one could directly fit a mixture of Gaussian pdfs to the density estimate. However, this is a challenging task because the Gaussian pdf 1 σ 2 π e - ( x - μ ) 2 2 is not convex in the parameters μ and σ . Thus fitting a mixture of Gaussians is a non-convex problem. Instead of fitting mixtures of Gaussians directly, we instead use the notion of a shape restricted polynomial. We say that a polynomial is shape restricted if its coefficients are in a given semialgebraic set, i.e., a set defined by a finite number of polynomial equalities and inequalities. It is well-known in approximation theory that a single Gaussian can be approximated by a piecewise polynomial consisting of three pieces with degree at most O (log 1 ) [Tim63]. So instead of fitting a mixture of k Gaussian directly, we instead fit a mixture of k shape-restricted piecewise polynomials. By encoding that the shape-restricted polynomials must have the shape of Gaussian pdfs, we ensure that the mixture of shape-restricted piecewise polynomials found by the system of polynomial inequalities is close to a true mixture of k -Gaussians. After we have solved the system