#### You've reached the end of your free preview.

Want to read all 225 pages?

**Unformatted text preview: **Introduction to Machine
Learning
with R
RIGOROUS MATHEMATICAL MODELING Scott V. Burger Introduction to Machine Learning
with R
Rigorous Mathematical Analysis Scott V. Burger Beijing Boston Farnham Sebastopol Tokyo Introduction to Machine Learning with R
by Scott V. Burger
Copyright © 2018 Scott Burger. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are
also available for most titles ( ). For more information, contact our corporate/insti‐
tutional sales department: 800-998-9938 or [email protected] Editors: Rachel Roumeliotis and Heather Scherer
Production Editor: Kristen Brown
Copyeditor: Bob Russell, Octal Publishing, Inc.
Proofreader: Jasmine Kwityn Indexer: WordCo Indexing Services, Inc.
Interior Designer: David Futato
Cover Designer: Karen Montgomery
Illustrator: Rebecca Demarest First Edition March 2018: Revision History for the First Edition
2018-03-08: First Release See for release details.
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Introduction to Machine Learning with
R, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc.
While the publisher and the author have used good faith efforts to ensure that the information and
instructions contained in this work are accurate, the publisher and the author disclaim all responsibility
for errors or omissions, including without limitation responsibility for damages resulting from the use of
or reliance on this work. Use of the information and instructions contained in this work is at your own
risk. If any code samples or other technology this work contains or describes is subject to open source
licenses or the intellectual property rights of others, it is your responsibility to ensure that your use
thereof complies with such licenses and/or rights. 978-1-491-97644-9
[LSI] Table of Contents Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
1. What Is a Model?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Algorithms Versus Models: What’s the Difference?
A Note on Terminology
Modeling Limitations
Statistics and Computation in Modeling
Data Training
Cross-Validation
Why Use R?
The Good
R and Machine Learning
The Bad
Summary 6
7
8
10
11
12
13
13
15
16
17 2. Supervised and Unsupervised Machine Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Supervised Models
Regression
Training and Testing of Data
Classification
Logistic Regression
Supervised Clustering Methods
Mixed Methods
Tree-Based Models
Random Forests
Neural Networks
Support Vector Machines
Unsupervised Learning 20
20
22
24
24
26
31
31
34
35
39
40
iii Unsupervised Clustering Methods
Summary 41
43 3. Sampling Statistics and Model Training in R. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Bias
Sampling in R
Training and Testing
Roles of Training and Test Sets
Why Make a Test Set?
Training and Test Sets: Regression Modeling
Training and Test Sets: Classification Modeling
Cross-Validation
k-Fold Cross-Validation
Summary 46
51
54
55
55
55
63
67
67
69 4. Regression in a Nutshell. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Linear Regression
Multivariate Regression
Regularization
Polynomial Regression
Goodness of Fit with Data—The Perils of Overfitting
Root-Mean-Square Error
Model Simplicity and Goodness of Fit
Logistic Regression
The Motivation for Classification
The Decision Boundary
The Sigmoid Function
Binary Classification
Multiclass Classification
Logistic Regression with Caret
Summary
Linear Regression
Logistic Regression 72
74
78
81
87
87
89
91
92
93
94
98
101
105
106
106
107 5. Neural Networks in a Nutshell. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Single-Layer Neural Networks
Building a Simple Neural Network by Using R
Multiple Compute Outputs
Hidden Compute Nodes
Multilayer Neural Networks
Neural Networks for Regression
Neural Networks for Classification iv | Table of Contents 109
111
113
114
120
125
130 Neural Networks with caret
Regression
Classification
Summary 131
131
132
133 6. Tree-Based Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
A Simple Tree Model
Deciding How to Split Trees
Tree Entropy and Information Gain
Pros and Cons of Decision Trees
Tree Overfitting
Pruning Trees
Decision Trees for Regression
Decision Trees for Classification
Conditional Inference Trees
Conditional Inference Tree Regression
Conditional Inference Tree Classification
Random Forests
Random Forest Regression
Random Forest Classification
Summary 135
138
139
140
141
145
151
151
152
154
155
155
156
157
158 7. Other Advanced Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
Naive Bayes Classification
Bayesian Statistics in a Nutshell
Application of Naive Bayes
Principal Component Analysis
Linear Discriminant Analysis
Support Vector Machines
k-Nearest Neighbors
Regression Using kNN
Classification Using kNN
Summary 159
159
161
163
169
173
179
181
182
184 8. Machine Learning with the caret Package. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
The Titanic Dataset
Data Wrangling
caret Unleashed
Imputation
Data Splitting
caret Under the Hood
Model Training 186
187
188
188
190
191
194 Table of Contents | v Comparing Multiple caret Models
Summary 197
199 A. Encyclopedia of Machine Learning Models in caret. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 vi | Table of Contents Preface In this short introduction, I tackle a few key points. Who Should Read This Book?
This book is ideally suited for people who have some working knowledge of the R
programming language. If you don’t have any knowledge of R, it’s an easy enough
language to pick up, and the code is readable enough that you can pretty much get
the gist of the code examples herein. Scope of the Book
This book is an introductory text, so we don’t dive deeply into the mathematical
underpinnings of every algorithm covered. Presented here are enough of the details
for you to discern the difference between a neural network and, say, a random forest
at a high level. Conventions Used in This Book
The following typographical conventions are used in this book:
Italic
Indicates new terms, URLs, email addresses, filenames, and file extensions.
Constant width Used for program listings, as well as within paragraphs to refer to program ele‐
ments such as variable or function names, databases, data types, environment
variables, statements, and keywords.
Constant width bold Shows commands or other text that should be typed literally by the user. vii Constant width italic Shows text that should be replaced with user-supplied values or by values deter‐
mined by context.
This element signifies a tip or suggestion. This element signifies a general note. This element indicates a warning or caution. O’Reilly Safari
Safari (formerly Safari Books Online) is a membership-based
training and reference platform for enterprise, government,
educators, and individuals.
Members have access to thousands of books, training videos, Learning Paths, interac‐
tive tutorials, and curated playlists from over 250 publishers, including O’Reilly
Media, Harvard Business Review, Prentice Hall Professional, Addison-Wesley Profes‐
sional, Microsoft Press, Sams, Que, Peachpit Press, Adobe, Focal Press, Cisco Press,
John Wiley & Sons, Syngress, Morgan Kaufmann, IBM Redbooks, Packt, Adobe
Press, FT Press, Apress, Manning, New Riders, McGraw-Hill, Jones & Bartlett, and
Course Technology, among others.
For more information, please visit . viii | Preface How to Contact Us
Please address comments and questions concerning this book to the publisher:
O’Reilly Media, Inc.
1005 Gravenstein Highway North
Sebastopol, CA 95472
800-998-9938 (in the United States or Canada)
707-829-0515 (international or local)
707-829-0104 (fax)
We have a web page for this book, where we list errata, examples, and any additional
information. You can access this page at .
To comment or ask technical questions about this book, send email to bookques‐
[email protected]
For more information about our books, courses, conferences, and news, see our web‐
site at .
Find us on Facebook:
Follow us on Twitter:
Watch us on YouTube: Acknowledgments
It’s always been a dream of mine to write a book. When I was in third or fourth grade,
my ideal book to write would have been a talk show hosted by my stuffed-animal col‐
lection. I never thought at the time that I would develop the skills to one day be shed‐
ding light on the complex world of machine learning. Between then and now, so
many things have happened that I need to take a moment to thank some people who
have made this book possible in more ways than one: Allison Randal, Amanda Har‐
ris, Cristiano Sabiu, Dorothy Duffy, Elayne Britain, Filipe Abdalla, Heather Scherer,
Ian Furniss, Kristen Brown, Kristen Larson, Marie Beaugureau, Max Winderbaum,
Myrna Fant, Richard Fant, Robert Lippens, Will Wright, and Woody Ciskowski. Preface | ix CHAPTER 1 What Is a Model? There was a time in my undergraduate physics studies that I was excited to learn what
a model was. I remember the scene pretty well. We were in a Stars and Galaxies class,
getting ready to learn about atmospheric models that could be applied not only to the
Earth, but to other planets in the solar system as well. I knew enough about climate
models to know they were complicated, so I braced myself for an onslaught of math
that would take me weeks to parse. When we finally got to the meat of the subject, I
was kind of let down: I had already dealt with data models in the past and hadn’t even
realized!
Because models are a fundamental aspect of machine learning, perhaps it’s not sur‐
prising that this story mirrors how I learned to understand the field of machine
learning. During my graduate studies, I was on the fence about going into the finan‐
cial industry. I had heard that machine learning was being used extensively in that
world, and, as a lowly physics major, I felt like I would need to be more of a computa‐
tional engineer to compete. I came to a similar realization that not only was machine
learning not as scary of a subject as I originally thought, but I had indeed been using
it before. Since before high school, even!
Models are helpful because unlike dashboards, which offer a static picture of what the
data shows currently (or at a particular slice in time), models can go further and help
you understand the future. For example, someone who is working on a sales team
might only be familiar with reports that show a static picture. Maybe their screen is
always up to date with what the daily sales are. There have been countless dashboards
that I’ve seen and built that simply say “this is how many assets are in right now.” Or,
“this is what our key performance indicator is for today.” A report is a static entity
that doesn’t offer an intuition as to how it evolves over time.
Figure 1-1 shows what a report might look like: 1 op <- par(mar = c(10, 4, 4, 2) + 0.1) #margin formatting barplot(mtcars$mpg, names.arg = row.names(mtcars), las = 2, ylab = "Fuel
Efficiency in Miles per Gallon") Figure 1-1. A distribution of vehicle fuel efficiency based on the built-in mtcars dataset
found in R
Figure 1-1 depicts a plot of the mtcars dataset that comes prebuilt with R. The figure
shows a number of cars plotted by their fuel efficiency in miles per gallon. This report
isn’t very interesting. It doesn’t give us any predictive power. Seeing how the efficiency
of the cars is distributed is nice, but how can we relate that to other things in the data
and, moreover, make predictions from it?
A model is any sort of function that has predictive power.
So how do we turn this boring report into something more useful? How do we bridge
the gap between reporting and machine learning? Oftentimes the correct answer to
this is “more data!” That can come in the form of more observations of the same data
or by collecting new types of data that we can then use for comparison.
Let’s take a look at the built-in mtcars dataset that comes with R in more detail: 2 | Chapter 1: What Is a Model? head(mtcars)
##
##
##
##
##
##
## Mazda RX4
Mazda RX4 Wag
Datsun 710
Hornet 4 Drive
Hornet Sportabout
Valiant mpg cyl disp hp drat
wt qsec vs am gear carb
21.0
6 160 110 3.90 2.620 16.46 0 1
4
4
21.0
6 160 110 3.90 2.875 17.02 0 1
4
4
22.8
4 108 93 3.85 2.320 18.61 1 1
4
1
21.4
6 258 110 3.08 3.215 19.44 1 0
3
1
18.7
8 360 175 3.15 3.440 17.02 0 0
3
2
18.1
6 225 105 2.76 3.460 20.22 1 0
3
1 By just calling the built-in object of mtcars within R, we can see all sorts of columns
in the data from which to choose to build a machine learning model. In the machine
learning world, columns of data are sometimes also called features. Now that we
know what we have to work with, we could try seeing if there’s a relationship between
the car’s fuel efficiency and any one of these features, as depicted in Figure 1-2:
pairs(mtcars[1:7], lower.panel = NULL) Figure 1-2. A pairs plot of the mtcars dataset, focusing on the first seven rows
Each box is its own separate plot, for which the dependent variable is the text box at
the bottom of the column, and the independent variable is the text box at the begin‐
ning of the row. Some of these plots are more interesting for trending purposes than
others. None of the plots in the cyl row, for example, look like they lend themselves
easily to simple regression modeling. What Is a Model? | 3 In this example, we are plotting some of those features against others. The columns,
or features, of this data are defined as follows:
mpg
cyl Miles per US gallon
Number of cylinders in the car’s engine disp The engine’s displacement (or volume) in cubic inches hp The engine’s horsepower drat The vehicle’s rear axle ratio wt The vehicle’s weight in thousands of pounds qsec The vehicle’s quarter-mile race time vs The vehicle’s engine cylinder configuration, where “V” is for a v-shaped engine
and “S” is for a straight, inline design am The transmission of the vehicle, where 0 is an automatic transmission and 1 is a
manual transmission gear
carb The number of gears in the vehicle’s transmission
The number of carburetors used by the vehicle’s engine You can read the upper-right plot as “mpg as a function of quarter-mile-time,” for
example. Here we are mostly interested in something that looks like it might have
some kind of quantifiable relationship. This is up to the investigator to pick out what
patterns look interesting. Note that “mpg as a function of cyl” looks very different
from “mpg as a function of wt.” In this case, we focus on the latter, as shown in
Figure 1-3:
plot(y = mtcars$mpg, x = mtcars$wt, xlab = "Vehicle Weight",
ylab = "Vehicle Fuel Efficiency in Miles per Gallon") 4 | Chapter 1: What Is a Model? Figure 1-3. This plot is the basis for drawing a regression line through the data
Now we have a more interesting kind of dataset. We still have our fuel efficiency, but
now it is plotted against the weight of the respective cars in tons. From this kind of
format of the data, we can extract a best fit to all the data points and turn this plot
into an equation. We’ll cover this in more detail in later chapters, but we use a func‐
tion in R to model the value we’re interested in, called a response, against other fea‐
tures in our dataset:
mt.model <- lm(formula = mpg ~ wt, data = mtcars)
coef(mt.model)[2]
##
wt
## -5.344472
coef(mt.model)[1]
## (Intercept)
##
37.28513 In this code chunk, we modeled the vehicle’s fuel efficiency (mpg) as a function of the
vehicle’s weight (wt) and extracted values from that model object to use in an equa‐
tion that we can write as follows:
Fuel Efficiency = 5.344 × Vehicle Weight + 37.285 What Is a Model? | 5 Now if we wanted to know what the fuel efficiency was for any car, not just those in
the dataset, all we would need to input is the weight of it, and we get a result. This the
benefit of a model. We have predictive power, given some kind of input (e.g., weight),
that can give us a value for any number we put in.
The model might have its limitations, but this is one way in which we can help to
expand the data beyond a static report into something more flexible and more
insightful. A given vehicle’s weight might not actually be predictive of the fuel effi‐
ciency as given by the preceding equation. There might be some error in the data or
the observation.
You might have come across this kind of modeling procedure before in dealing with
the world of data. If you have, congratulations—you have been doing machine learn‐
ing without even knowing it! This particular type of machine learning model is called
linear regression. It’s much simpler than some other machine learning models like
neural networks, but the algorithms that make it work are certainly using machine
learning principles. Algorithms Versus Models: What’s the Difference?
Machine learning and algorithms can hardly be separated. Algorithms are another
subject that can seem impenetrably daunting at first, but they are actually quite sim‐
ple at their core, and you have probably been using them for a long time without real‐
izing it.
An algorithm is a set of steps performed in order.
That’s all an algorithm is. The algorithm for putting on your shoes might be some‐
thing like putting your toes in the open part of the shoe, and then pressing your foot
forward and your heel downward. The set of steps necessary to produce a machine
learning algorithm are more complicated than designing an algorithm for putting on
your shoes, of course, but one of the goals of this book is to explain the inner work‐
ings of the most widely used machine learning models in R by helping to simplify
their algorithmic processes.
The simplest algorithm for linear regression involves putting two points on a plot and
then drawing a line between them. You get the important parts of the equation (slope
and intercept) by taking the difference in the coordinates of those points with respect
to some origin. The algorithm becomes more complicated when you try to do the
same procedure for more than two points, however. That process involves more
equations that can be tedious to compute by hand for a human but very easy for a
processor in a computer to handle in microseconds.
A machine learning model like regression or clustering or neural networks relies on
the workings of algorithms to help them run in the first place. Algorithms are the 6 | Chapter 1: What Is a Model? engine that underlie the simple R code that we run. They do all the heavy lifting of
multiplying matrices, optimizing results, and outputting a number for us to use.
There are many types of models in R, which span an entire ecosystem of machine
learning more generally. There are three major types of models: regression models,
classification models, and mixed models that are a combination of both. We’ve
already encountered a regression model. A classification model is different in that we
would be trying to take input data and arrange it according to a type, class, group, or
other discrete output. Mixed models might start with a regression model and then
use the output from that to help it classify other types of data. The reverse could be
true for other mixed models.
The function call for a simple linear regression in R can be writ...

View
Full Document