MachineLearning-6

MachineLearning-6 - What is Machine Learning? Many...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
1 What is Machine Learning? Many different forms of “Machine Learning” We focus on the problem of prediction Want to make a prediction based on observations Vector X of m observed variables: <X 1 , X 2 , …, X m > o X 1 , X 2 , …, X m are called “input features/variables” o Also called “independent variables,” but this can be misleading! X 1 , X 2 , …, X m need not be (and usually are not) independent Based on observed X , want to predict unseen variable Y o Y called “output feature/variable” (or the “dependent variable”) Seek to “learn” a function g ( X ) to predict Y: o When Y is discrete, prediction of Y is called “classification” o When Y is continuous, prediction of Y is called “regression” ) ( ˆ X g Y A (Very Short) List of Applications Machine learning widely used in many contexts Stock price prediction o Using economic indicators, predict if stock with go up/down Computational biology and medical diagnosis o Predicting gene expression based on DNA o Determine likelihood for cancer using clinical/demographic data Predict people likely to purchase product or click on ad o “Based on past purchases, you might want to buy…” Credit card fraud and telephone fraud detection o Based on past purchases/phone calls is a new one fraudulent? Saves companies billions(!) of dollars annually Spam E-mail detection (gmail, hotmail, many others) What is Bayes Doing in My Mail Server? This is spam: Who was crazy enough to think of that? Let’s get Bayesian on your spam: Content analysis details: (49.5 hits, 7.0 required) 0.9 RCVD_IN_PBL RBL: Received via a relay in Spamhaus PBL [93.40.189.29 listed in zen.spamhaus.org] 1.5 URIBL_WS_SURBL Contains an URL listed in the WS SURBL blocklist [URIs: recragas.cn] 5.0 URIBL_JP_SURBL Contains an URL listed in the JP SURBL blocklist [URIs: recragas.cn] 5.0 URIBL_OB_SURBL Contains an URL listed in the OB SURBL blocklist [URIs: recragas.cn] 5.0 URIBL_SC_SURBL Contains an URL listed in the SC SURBL blocklist [URIs: recragas.cn] 2.0 URIBL_BLACK Contains an URL listed in the URIBL blacklist [URIs: recragas.cn] 8.0 BAYES_99 BODY: Bayesian spam probability is 99 to 100% [score: 1.0000] Spam, Spam… Go Away! The constant battle with spam Source: http://www.google.com/mail/help/fightspam/spamexplained.html “And machine-learning algorithms developed to merge and rank large sets of Google search results allow us to combine hundreds of factors to classify spam.” Training a Learning Machine We consider statistical learning paradigm here We are given set of N “training” instances o Each training instance is pair: (<x
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 3

MachineLearning-6 - What is Machine Learning? Many...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online