MLBOOK - INTRODUCTION TO MACHINE LEARNING AN EARLY DRAFT OF...

Info iconThis preview shows pages 1–6. Sign up to view the full content.

View Full Document Right Arrow Icon
INTRODUCTION TO MACHINE LEARNING AN EARLY DRAFT OF A PROPOSED TEXTBOOK Nils J. Nilsson Robotics Laboratory Department of Computer Science Stanford University Stanford, CA 94305 e-mail: nilsson@cs.stanford.edu November 3, 1998 Copyright c ± 2005 Nils J. Nilsson This material may not be copied, reproduced, or distributed without the written permission of the copyright holder.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
ii
Background image of page 2
Contents 1 Preliminaries 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 What is Machine Learning? . . . . . . . . . . . . . . . . . 1 1.1.2 Wellsprings of Machine Learning . . . . . . . . . . . . . . 3 1.1.3 Varieties of Machine Learning . . . . . . . . . . . . . . . . 4 1.2 Learning Input-Output Functions . . . . . . . . . . . . . . . . . . 5 1.2.1 Types of Learning . . . . . . . . . . . . . . . . . . . . . . 5 1.2.2 Input Vectors . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2.3 Outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.2.4 Training Regimes . . . . . . . . . . . . . . . . . . . . . . . 8 1.2.5 Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.2.6 Performance Evaluation . . . . . . . . . . . . . . . . . . . 9 1.3 Learning Requires Bias . . . . . . . . . . . . . . . . . . . . . . . . 9 1.4 Sample Applications . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.5 Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.6 Bibliographical and Historical Remarks . . . . . . . . . . . . . . 13 2 Boolean Functions 15 2.1 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.1.1 Boolean Algebra . . . . . . . . . . . . . . . . . . . . . . . 15 2.1.2 Diagrammatic Representations . . . . . . . . . . . . . . . 16 2.2 Classes of Boolean Functions . . . . . . . . . . . . . . . . . . . . 17 2.2.1 Terms and Clauses . . . . . . . . . . . . . . . . . . . . . . 17 2.2.2 DNF Functions . . . . . . . . . . . . . . . . . . . . . . . . 18 2.2.3 CNF Functions . . . . . . . . . . . . . . . . . . . . . . . . 21 2.2.4 Decision Lists . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.2.5 Symmetric and Voting Functions . . . . . . . . . . . . . . 23 2.2.6 Linearly Separable Functions . . . . . . . . . . . . . . . . 23 2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.4 Bibliographical and Historical Remarks . . . . . . . . . . . . . . 25 iii
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
3 Using Version Spaces for Learning 27 3.1 Version Spaces and Mistake Bounds . . . . . . . . . . . . . . . . 27 3.2 Version Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.3 Learning as Search of a Version Space . . . . . . . . . . . . . . . 32 3.4 The Candidate Elimination Method . . . . . . . . . . . . . . . . 32 3.5 Bibliographical and Historical Remarks . . . . . . . . . . . . . . 34 4 Neural Networks 35 4.1 Threshold Logic Units . . . . . . . . . . . . . . . . . . . . . . . . 35 4.1.1 Definitions and Geometry . . . . . . . . . . . . . . . . . . 35 4.1.2 Special Cases of Linearly Separable Functions . . . . . . . 37 4.1.3 Error-Correction Training of a TLU . . . . . . . . . . . . 38 4.1.4 Weight Space . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.1.5 The Widrow-Hoff Procedure . . . . . . . . . . . . . . . . . 42 4.1.6 Training a TLU on Non-Linearly-Separable Training Sets 44 4.2 Linear Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.3 Networks of TLUs . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.3.1 Motivation and Examples . . . . . . . . . . . . . . . . . . 46 4.3.2 Madalines . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.3.3 Piecewise Linear Machines . . . . . . . . . . . . . . . . . . 50 4.3.4 Cascade Networks . . . . . . . . . . . . . . . . . . . . . . 51 4.4 Training Feedforward Networks by Backpropagation . . . . . . . 52 4.4.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.4.2 The Backpropagation Method . . . . . . . . . . . . . . . . 53 4.4.3 Computing Weight Changes in the Final Layer . . . . . . 56 4.4.4 Computing Changes to the Weights in Intermediate Layers 58 4.4.5 Variations on Backprop . . . . . . . . . . . . . . . . . . . 59 4.4.6 An Application: Steering a Van . . . . . . . . . . . . . . . 60 4.5 Synergies Between Neural Network and Knowledge-Based Methods 61 4.6 Bibliographical and Historical Remarks . . . . . . . . . . . . . . 61 5 Statistical Learning 63 5.1 Using Statistical Decision Theory . . . . . . . . . . . . . . . . . . 63 5.1.1 Background and General Method . . . . . . . . . . . . . . 63 5.1.2 Gaussian (or Normal) Distributions . . . . . . . . . . . . 65 5.1.3 Conditionally Independent Binary Components . . . . . . 68 5.2 Learning Belief Networks . . . . . . . . . . . . . . . . . . . . . . 70 5.3 Nearest-Neighbor Methods . . . . . . . . . . . . . . . . . . . . . . 70 5.4 Bibliographical and Historical Remarks . . . . . . . . . . . . . . 72 iv
Background image of page 4
6 Decision Trees 73 6.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 6.2 Supervised Learning of Univariate Decision Trees . . . . . . . . . 74 6.2.1 Selecting the Type of Test . . . . . . . . . . . . . . . . . . 75 6.2.2 Using Uncertainty Reduction to Select Tests . . . . . . . 75 6.2.3 Non-Binary Attributes . . . . . . . . . . . . . . . . . . . . 79 6.3 Networks Equivalent to Decision Trees . . . . . . . . . . . . . . . 79 6.4 Overfitting and Evaluation . . . . . . . . . . . . . . . . . . . . . 80 6.4.1 Overfitting . . . . . . . . . . . . . . . . . . . . . . . . . . 80 6.4.2 Validation Methods . . . . . . . . . . . . . . . . . . . . . 81 6.4.3 Avoiding Overfitting in Decision Trees . . . . . . . . . . . 82 6.4.4 Minimum-Description Length Methods . . . . . . . . . . . 83 6.4.5 Noise in Data . . . . . . . . . . . . . . . . . . . . . . . . . 84 6.5 The Problem of Replicated Subtrees . . . . . . . . . . . . . . . . 84 6.6 The Problem of Missing Attributes . . . . . . . . . . . . . . . . . 86 6.7 Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 6.8 Bibliographical and Historical Remarks .
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 6
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 188

MLBOOK - INTRODUCTION TO MACHINE LEARNING AN EARLY DRAFT OF...

This preview shows document pages 1 - 6. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online