Jeff Howbert Introduction to Machine Learning Winter 2014 1 Anomaly Detection Some slides taken or adapted from: “Anomaly Detection: A Tutorial” Arindam Banerjee, Varun Chandola, Vipin Kumar, Jaideep Srivastava, University of Minnesota Aleksandar Lazarevic, United Technology Research Center

Jeff Howbert Introduction to Machine Learning Winter 2014 2 Anomalies and outliers are essentially the same thing: objects that are different from most other objects The techniques used for detection are the same. Anomaly detection
Jeff Howbert Introduction to Machine Learning Winter 2014 3 circle6 Historically, the field of statistics tried to find and remove outliers as a way to improve analyses. circle6 There are now many fields where the outliers / anomalies are the objects of greatest interest. The rare events may be the ones with the greatest impact, and often in a negative way. Anomaly detection

Jeff Howbert Introduction to Machine Learning Winter 2014 4 circle6 Data from different class of object or underlying mechanism disease vs. non-disease fraud vs. not fraud circle6 Natural variation tails on a Gaussian distribution circle6 Data measurement and collection errors Causes of anomalies
Jeff Howbert Introduction to Machine Learning Winter 2014 5 Structure of anomalies circle6 Point anomalies circle6 Contextual anomalies circle6 Collective anomalies

Jeff Howbert Introduction to Machine Learning Winter 2014 6 circle6 An individual data instance is anomalous with respect to the data Point anomalies X Y N 1 N 2 o 1 o 2 O 3
Jeff Howbert Introduction to Machine Learning Winter 2014 7 Contextual anomalies circle6 An individual data instance is anomalous within a context circle6 Requires a notion of context circle6 Also referred to as conditional anomalies * * Song, et al, “Conditional Anomaly Detection”, IEEE Transactions on Data and Knowledge Engineering, 2006. Normal Anomaly

Jeff Howbert Introduction to Machine Learning Winter 2014 8 Collective anomalies circle6 A collection of related data instances is anomalous circle6 Requires a relationship among data instances Sequential data Spatial data Graph data circle6 The individual instances within a collective anomaly are not anomalous by themselves anomalous subsequence
Jeff Howbert Introduction to Machine Learning Winter 2014 9 Applications of anomaly detection circle6 Network intrusion circle6 Insurance / credit card fraud circle6 Healthcare informatics / medical diagnostics circle6 Industrial damage detection circle6 Image processing / video surveillance circle6 Novel topic detection in text mining circle6

Jeff Howbert Introduction to Machine Learning Winter 2014 10 Intrusion detection circle6 Intrusion detection Monitor events occurring in a computer system or network and analyze them for intrusions Intrusions defined as attempts to bypass the security mechanisms of a computer or network circle6 Challenges Traditional intrusion detection systems are based on signatures of known attacks and
