Data Analytics: Introduction, Classification Introduction to Computational Thinking and Data Science Lecture 6
Today’s Topics 1. Data analysis tasks 2. Classification tasks 3. Building a classifier 4. Evaluating a classifier 2
1. Data Analysis Tasks 3
Different Data Analysis Tasks u Classification u Assign a category (ie, a class) for a new instance u Clustering u Form clusters (ie, groups) with a set of instances u Pattern detection u Identify regularities (ie, patterns) in temporal or spatial data u Simulation u Define mathematical formulas that can generate data similar to observations collected 4
Different Data Analysis Tasks u Classification u Clustering u Pattern detection u Causal discovery u Simulation u … u Each type of task is characterized by the kinds of data they require and the kinds of output they generate u Each type of task uses different algorithms . 5
2. Classification Tasks 6
Classifying Mushrooms u What mushrooms are edible, i.e., not poisonous? u Book lists many kinds of mushrooms identified as either edible, poisonous, or unknown edibility u Given a new kind mushroom not listed in the book, is it edible ? 7
Classifying Iris Plants u Iris flowers have different sepal and petal shapes: u Iris Setosa u Iris Versicolour u Iris Virginica 8 • Suppose you are shown lots of examples of each type. • Given a new iris flower, what type is it ?
Classification Tasks u Given: u A set of classes u Instances (examples) of each class u Generate: A method (aka model ) that when given a new instance it will determine its class. 9
Classification Tasks u Given: u A set of classes u Instances of each class u Generate: A method that when given a new instance it will determine its class u Instances are described as a set of features or attributes and their values u The class that the instance belongs to is also called its “ label ” u Input is a set of “ labeled instances ”. 10
Possible Features 1. cap-shape: bell=b,conical=c,convex=x,flat=f, knobbed=k,sunken=s 2. cap-surface: fibrous=f,grooves=g,scaly=y,smooth=s 3. cap-color: brown=n,buff=b,cinnamon=c,gray=g,green=r, pink=p,purple=u,red=e,white=w,yellow=y 4. bruises?: bruises=t,no=f 5. odor: almond=a,anise=l,creosote=c,fishy=y,foul=f, musty=m,none=n,pungent=p,spicy=s 6. gill-attachment: attached=a,descending=d,free=f,notched=n 7. gill-spacing: close=c,crowded=w,distant=d 8. gill-size: broad=b,narrow=n 9. gill-color: black=k,brown=n,buff=b,chocolate=h,gray=g, green=r,orange=o,pink=p,purple=u,red=e, white=w,yellow=y 10. stalk-shape: enlarging=e,tapering=t 11. stalk-root: bulbous=b,club=c,cup=u,equal=e, rhizomorphs=z,rooted=r,missing=?
- Fall '17
- Machine Learning, Type I and type II errors, Statistical classification, u Book