BUG16.pdf - Stat Comput DOI 10.1007/s11222-015-9589-y...

This preview shows page 1 out of 19 pages.

Stat Comput DOI 10.1007/s11222-015-9589-y Extensions of stability selection using subsamples of observations and covariates Andre Beinrucker 1 · Ürün Dogan 2 · Gilles Blanchard 1 Received: 8 July 2014 / Accepted: 23 June 2015 © Springer Science+Business Media New York 2015 Abstract We introduce extensions of stability selection, a method to stabilise variable selection methods introduced by Meinshausen and Bühlmann (J R Stat Soc 72:417–473, 2010 ). We propose to apply a base selection method repeat- edly to random subsamples of observations and subsets of covariates under scrutiny, and to select covariates based on their selection frequency. We analyse the effects and benefits of these extensions. Our analysis generalizes the theoretical results of Meinshausen and Bühlmann (J R Stat Soc 72:417– 473, 2010 ) from the case of half-samples to subsamples of arbitrary size. We study, in a theoretical manner, the effect of taking random covariate subsets using a simplified score model. Finally we validate these extensions on numerical experiments on both synthetic and real datasets, and com- pare the obtained results in detail to the original stability selection method. Electronic supplementary material The online version of this article (doi: 10.1007/s11222-015-9589-y ) contains supplementary material, which is available to authorized users. A preliminary version of this work was presented at the conference DAGM 2012 ( Beinrucker et al. 2012b ). B Andre Beinrucker [email protected] Ürün Dogan [email protected] Gilles Blanchard [email protected] 1 University of Potsdam, Am Neuen Palais 10, 14469 Potsdam, Germany 2 Microsoft/Skype Labs, 2 Waterhouse Square, 140 Holborn, London EC1N2ST, UK Keywords Variable selection · Stability selection · Subsampling 1 Introduction 1.1 Motivation In many applications a very large number of covariates are observed, of which only a few carry information about an outcome of interest. Variable selection techniques aim at identifying such relevant covariates (for a review see Guyon 2006 ). Usually, variable selection aims at one of two goals: to identify informative covariates in order to get scientific insight into the data and the process that generated the out- come; or to use the covariates identified as relevant in order to predict the outcome. In this work we primarily focus on the identification of informative covariates but also consider pre- diction results using real data. We consider variable selection (also called feature selection in computer science-related communities) as a part of the broader field of dimensionality reduction. Many variable selection methods share the common draw- back of being unstable with respect to small changes of the data: if one estimates the set of relevant covariates on differ- ent sets of observations coming from the same source, the result can vary significantly. While this is not necessarily of concern if prediction is the goal, it makes the identification of relevant covariates very difficult. One approach to over- come this problem is stability selection ( Meinshausen and
Image of page 1
You've reached the end of this preview.
  • Fall '15
  • Huahs

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern