Pattern Discovery, Clustering, Simulation Introduction to Computational Thinking and Data Science Lecture 7 Anna Farzindar [email protected]
Today’s Topics 1. Pattern detection 2. Pattern learning 3. Pattern discovery 4. Clustering 5. Simulation 6. Practical examples of data analysis 2
Pattern Detection 3
Learning Approaches Supervised Learning u The training data is annotated with information to help the learning system u Eg. the class for each instance Unsupervised Learning u The training data is not annotated with any extra information to help the learning system u Eg. clustering of data 4 Semi-Supervised Learning
Clustering vs. Classification u Classification is supervised u class labels are provided; u learn a classifier to predict class labels of novel/unseen data u Clustering is unsupervised or semi-supervised ; u No class label is give u Understand the structure underlying your data Semi- Supervised Learning Unsupervised Learning Supervised Learning Predictive Modeling Tasks
Different Data Analysis Tasks u Classification u Assign a category (ie, a class) for a new instance u Clustering u Form clusters (ie, groups) with a set of instances u Pattern discovery u Identify regularities (ie, patterns) in temporal or spatial data u Simulation u Define mathematical formulas that can generate data similar to observations collected 6
Network Patterns 7 Central entities Strength of ties Subgroups Patterns of activity over time Graphs are visual representations of networks, displaying actors (entities) as nodes and the relational ties connecting actors as lines.
Spatial Patterns 8 Patterns Patterns may be recognized because of their arrangement.
Temporal Patterns 9 Pattern Detector Patterns P1 P2 * * * * * * * * * * * * ** * * * *
Detecting Patterns in a Text String u ababababab u abcabcabcabc u abcccccccabcccabccccccccccabcabccc 10
A Pattern Language u ababababab u (ab)* u abcabcabcabc u (abc)* u abcccccccabcccabccccccccccabcabccc u ((ab)(c)*)* 11
Detecting Patterns in Streaming Data u (ab)*x* u Ababab thsrthw abab yertueyrtyerthe ab sgd u abcabcabcabc u abcabc rgkskhgsnrhn abcabcabcabc rjgjsrn 12
Concept Drift u Over time, the data source changes and the concepts that were learned in the past have now changed u The model built on old data inconsistent with the new data, u The predictions become less accurate as time passes u Regular updating of the model is necessary u Eg. weather prediction 13
Pattern Learning and Pattern Discovery 14
Pattern Detection vs Pattern Learning Pattern Detection u Inputs: u Data u A set of patterns u Output: u Matches of the patterns to the data Pattern Learning u Inputs: u Data annotated with a set of patterns u Output: u A set of patterns that appear in the data with some frequency 15
Pattern Detection vs Pattern Learning Pattern Learning u Inputs: u Data annotated with a set of patterns u Output: u A set of patterns that appear in the data with some frequency Pattern Discovery u Inputs: u Data u Output: u A set of patterns that appear in the data with some frequency 16
Event Pattern Detection vs.
- Fall '17
- Machine Learning, u Data, u class labels