9
Additive Models, Trees, and Related Methods
In this chapter we begin our discussion of some specic methods for supervised learning. These techniques each assume a (dierent) structured form for the unknown regression function, and by doing so they nesse
2
Overview of Supervised Learning
2.1 Introduction
The rst three examples described in Chapter 1 have several components in common. For each there is a set of variables that might be denoted as inputs, which are measured or preset. These have some inuence
3
Linear Methods for Regression
3.1 Introduction
A linear regression model assumes that the regression function E(Y |X ) is linear in the inputs X1 , . . . , Xp . Linear models were largely developed in the precomputer age of statistics, but even in today
5
Basis Expansions and Regularization
5.1 Introduction
We have already made use of models linear in the input features, both for regression and classication. Linear regression, linear discriminant analysis, logistic regression and separating hyperplanes a
4
Linear Methods for Classication
4.1 Introduction
In this chapter we revisit the classication problem and focus on linear methods for classication. Since our predictor G(x) takes values in a discrete set G , we can always divide the input space into a co
HOMEWORK #2 DUE WEDNESDAY, JULY 14
STATISTICS 132, SUMMER 2010
Question 1: HTF 3.3 (b). You may use without proof the result of part (a) of HTF 3.3. Question 2: HTF 3.6 Question 3: HTF 3.7 Question 4: HTF 3.9 Question 5: HTF 3.12 Question 6: HTF 3.17. Set
18
High-Dimensional Problems: p N
18.1 When p is Much Bigger than N
In this chapter we discuss prediction problems in which the number of features p is much larger than the number of observations N , often written p N . Such problems have become of increa
17
Undirected Graphical Models
17.1 Introduction
A graph consists of a set of vertices (nodes), along with a set of edges joining some pairs of the vertices. In graphical models, each vertex represents a random variable, and the graph gives a visual way o
16
Ensemble Learning
16.1 Introduction
The idea of ensemble learning is to build a prediction model by combining the strengths of a collection of simpler base models. We have already seen a number of examples that fall into this category. Bagging in Secti
15
Random Forests
15.1 Introduction
Bagging or bootstrap aggregation Section 8.7 is a technique for reducing the variance of an estimated prediction function. Bagging seems to work especially well for high-variance, low-bias procedures, such as trees. For
13
Prototype Methods and Nearest-Neighbors
13.1 Introduction
In this chapter we discuss some simple and essentially model-free methods for classication and pattern recognition. Because they are highly unstructured, they typically are not useful for unders
12
Support Vector Machines and Flexible Discriminants
12.1 Introduction
In this chapter we describe generalizations of linear decision boundaries for classication. Optimal separating hyperplanes are introduced in Chapter 4 for the case when two classes ar
11
Neural Networks
11.1 Introduction
In this chapter we describe a class of learning methods that was developed separately in dierent eldsstatistics and articial intelligencebased on essentially identical models. The central idea is to extract linear comb
10
Boosting and Additive Trees
10.1 Boosting Methods
Boosting is one of the most powerful learning ideas introduced in the last twenty years. It was originally designed for classication problems, but as will be seen in this chapter, it can protably be ext
8
Model Inference and Averaging
8.1 Introduction
For most of this book, the tting (learning) of models has been achieved by minimizing a sum of squares for regression, or by minimizing cross-entropy for classication. In fact, both of these minimizations a
7
Model Assessment and Selection
7.1 Introduction
The generalization performance of a learning method relates to its prediction capability on independent test data. Assessment of this performance is extremely important in practice, since it guides the cho
6
Kernel Smoothing Methods
In this chapter we describe a class of regression techniques that achieve exibility in estimating the regression function f (X ) over the domain IRp by tting a dierent but simple model separately at each query point x0 . This is
References
Abu-Mostafa, Y. (1995). Hints, Neural Computation 7: 639671. Ackley, D. H., Hinton, G. and Sejnowski, T. (1985). A learning algorithm for Boltzmann machines, Trends in Cognitive Sciences 9: 147169. Adam, B.-L., Qu, Y., Davis, J. W., Ward, M. D.