{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

Chp2 - Copy

# Chp2 - Copy - 2 Overview of Supervised Learning 2.1...

This preview shows pages 1–3. Sign up to view the full content.

2 Overview of Supervised Learning 2.1 Introduction The first three examples described in Chapter 1 have several components in common. For each there is a set of variables that might be denoted as inputs , which are measured or preset. These have some inﬂuence on one or more outputs . For each example the goal is to use the inputs to predict the values of the outputs. This exercise is called supervised learning . We have used the more modern language of machine learning. In the statistical literature the inputs are often called the predictors , a term we will use interchangeably with inputs, and more classically the independent variables . In the pattern recognition literature the term features is preferred, which we use as well. The outputs are called the responses , or classically the dependent variables . 2.2 Variable Types and Terminology The outputs vary in nature among the examples. In the glucose prediction example, the output is a quantitative measurement, where some measure- ments are bigger than others, and measurements close in value are close in nature. In the famous Iris discrimination example due to R. A. Fisher, the output is qualitative (species of Iris) and assumes values in a finite set G = { Virginica , Setosa and Versicolor } . In the handwritten digit example the output is one of 10 different digit classes : G = { 0 , 1 , . . . , 9 } . In both of © Springer Science+Business Media, LLC 2009 T. Hastie et al., The Elements of Statistical Learning, Second Edition, 9 DOI: 10.1007/b94608_2,

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
10 2. Overview of Supervised Learning these there is no explicit ordering in the classes, and in fact often descrip- tive labels rather than numbers are used to denote the classes. Qualitative variables are also referred to as categorical or discrete variables as well as factors . For both types of outputs it makes sense to think of using the inputs to predict the output. Given some specific atmospheric measurements today and yesterday, we want to predict the ozone level tomorrow. Given the grayscale values for the pixels of the digitized image of the handwritten digit, we want to predict its class label. This distinction in output type has led to a naming convention for the prediction tasks: regression when we predict quantitative outputs, and clas- sification when we predict qualitative outputs. We will see that these two tasks have a lot in common, and in particular both can be viewed as a task in function approximation. Inputs also vary in measurement type; we can have some of each of qual- itative and quantitative input variables. These have also led to distinctions in the types of methods that are used for prediction: some methods are defined most naturally for quantitative inputs, some most naturally for qualitative and some for both. A third variable type is ordered categorical , such as small, medium and large , where there is an ordering between the values, but no metric notion is appropriate (the difference between medium and small need not be the same as that between large and medium). These are discussed further in Chapter 4.
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}