12111-55703-1-PB.pdf - Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence(AAAI-16 Sparse Perceptron Decision Tree for Millions of

12111-55703-1-PB.pdf - Proceedings of the Thirtieth AAAI...

This preview shows page 1 - 2 out of 7 pages.

Sparse Perceptron Decision Tree for Millions of Dimensions Weiwei Liu and Ivor W. Tsang Centre for Quantum Computation and Intelligent Systems University of Technology Sydney, Australia [email protected], [email protected] Abstract Due to the nonlinear but highly interpretable representations, decision tree (DT) models have significantly attracted a lot of attention of researchers. However, DT models usually suf- fer from the curse of dimensionality and achieve degenerated performance when there are many noisy features. To address these issues, this paper first presents a novel data-dependent generalization error bound for the perceptron decision tree (PDT), which provides the theoretical justification to learn a sparse linear hyperplane in each decision node and to prune the tree. Following our analysis, we introduce the notion of sparse perceptron decision node (SPDN) with a budget con- straint on the weight coefficients, and propose a sparse per- ceptron decision tree (SPDT) algorithm to achieve nonlinear prediction performance. To avoid generating an unstable and complicated decision tree and improve the generalization of the SPDT, we present a pruning strategy by learning classi- fiers to minimize cross-validation errors on each SPDN. Ex- tensive empirical studies verify that our SPDT is more re- silient to noisy features and effectively generates a small, yet accurate decision tree. Compared with state-of-the-art DT methods and SVM, our SPDT achieves better generalization performance on ultrahigh dimensional problems with more than 1 million features. Introduction Due to the nonlinear, but highly interpretable representa- tions (Murthy, Kasif, and Salzberg 1994), decision tree (DT) (Moret 1982; Safavian and Landgrebe 1991) has been a par- ticularly powerful tool in various machine learning and data mining applications, ranging from Agriculture (McQueen et al. 1995), Astronomy (Salzberg et al. 1995) and Molecu- lar Biology (Shimozono et al. 1994) to Financial analysis (Mezrich 1994). Many early works focus on the DT, in which each node uses the value of a single feature (attribute) (Breiman et al. 1984; Quinlan 1993) for decision. Since the decision of those methods at each node is equivalent to an axis-parallel hyperplane in the feature space, they are called axis-parallel DT, which suffer from the curse of dimensionality. To im- prove the performance of axis-parallel DT, researchers de- veloped perceptron decision tree (PDT) (Bennett et al. 2000) Corresponding author Copyright c 2016, Association for the Advancement of Artificial Intelligence (). All rights reserved. in which the test at a node uses the linear combinations of features. Clearly, PDT is the general form of axis-parallel DT. Many real-world problems, like documents and Microar- ray data (Guyon and Elisseeff 2003), are represented as high dimensional vectors. Many features in high dimensional space are usually non-informative or noisy. Those noisy fea- tures will decrease the generalization performance of DT and derogate from their promising results for dealing with
Image of page 1
Image of page 2

You've reached the end of your free preview.

Want to read all 7 pages?

  • Fall '19
  • suresh

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern

Stuck? We have tutors online 24/7 who can help you get unstuck.
A+ icon
Ask Expert Tutors You can ask You can ask You can ask (will expire )
Answers in as fast as 15 minutes
A+ icon
Ask Expert Tutors