Machine learning – Clustering – K-means Clustering – K-means Feature #2 (Source: Naftali Harris) Feature #1 1 Consider data in R 2 spread on three different clusters, 2 Pick randomly K = 3 data points as cluster centroids, 3 Assign each data point to the class with closest centroid, 4 Update the centroids by taking the means within the clusters , 5 Go back to 3 until no more changes. 55
Machine learning – Clustering – K-means Clustering – K-means Optimal in terms of inter- and extra-class variability (loss), In practice, it requires much more iterations, Solutions strongly depend on the initialization , Good initializations can be obtained by K-means++ strategy. The number of class K is often unkown : usually found by trial and error, or by cross-validation, AIC, BIC, . . . K too small/large under/overfitting. (we will come back to this) The data dimension d is often much larger than 2 , subject to the curse of dimensionalty. (we will also come back to this) Vector quantization (VQ): the centroid substitutes all vectors of its class. 56
Machine learning – Classification Classification Classification: predict class d from observation x . (Source: Philip Martin) Classify a document into a predefined category. Documents can be text, images, videos. . . Popular ones are Support Vector Machines and Artificial Neural Networks. 57

Machine learning – Regression Regression Regression (prediction): predict value(s) from observation. Statistical process for estimating the relationships among variables. Regression means to predict the output value using training data. related to interpolation and extrapolation. Popular ones are linear least square and Artificial Neural Networks. 58
Machine learning – Classification vs Regression Classification vs Regression Classification Assign to a class Ex: a type of tumor is harmful or not Output is discrete/categorical v.s Regression Predict one or several output values Ex: what will be the house price? Output is a real number/continuous (Source: Ali Reza Kohani) Quiz, which one is which? denoising, identification, verification, approximation. 59

Machine learning – Polynomial curve fitting Polynomial curve fitting Consider N individuals answering a survey asking for their wealth: x i level of happiness: d i We want to learn how to predict d i (the desired output) from x i as d i y i = f ( x i ; θ ) where f is the predictor and y i denotes the predicted output.
