This preview shows pages 1–4. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Name: ————————————— CS573 Midterm: Fall 2010 This is a closedbook, closednotes exam. Nonprogrammable calculators are allowed for probability calculations. There are 11 pages including the cover page. The total number of points for the exam is 50 and you have 2 hours to complete the exam. Note the point value of each question and allocate your time accordingly. Read each question carefully and show your work. Question Score 1 2 3 4 5 6 7 8 Total 1 1 Data mining components (8 pts) Read the excerpts from Kolter, J. Z. and Maloof, M. A. Learning to detect malicious executables in the wild . In Proceedings of the Tenth ACM SIGKDD international Conference on Knowledge Discovery and Data Mining, 2004 on pages 911. 1. Describe the data mining task. Classification Detecting malicious executables. 2. Describe the data representation. IID, tabular data, where each instance is a tuple of features Features are ngrams 3. Describe the knowledge representation. Nearest neighbor formulation for lazy prediction 500 ngrams with highest information gain 4. Describe the learning algorithm (both search and scoring function). Similarity coefficient is used for scoring internally Search is implicit in selection of nearest neighbors 2 Models and patterns (6pts) 1. Describe the difference between a model and a pattern. Give an example of each. Pattern describes local property of data. Model describes global property of data. pattern example (e.g., association rule) model example (e.g., decision trees) 2. Discuss the difference between parametric and nonparametric models. Give an example model of each type and describe a situation in which it may outperform the other type of model. Parametric models assume a particular parametric from (e.g., Gaussian) and learning consists of optimizing those parameters. Nonparametric models make fewer assumptions about the function form of the model and are datadriven (e.g., size of the model can grow without bound as the size of the training data increases). Parametric example (e.g., Naive Bayes) Nonparametric example (e.g., decision trees) Siutation: parametric may do better with less data because parametric assumptions reduce variance of model. 2 3 Sufficient Statistics (4pts) Suppose s is a statistic for which p ( θ  x ,D ) = p ( θ  s ). Assume p ( θ  s ) 6 = 0 and prove that p ( D  s,θ ) is independent of θ . Create an example to show that the inequality p ( θ  s ) 6 = 0 is required for your proof. By Bayes theorem we have: P ( D  s,θ ) = P ( θ  s,D ) P ( s,D ) P ( s,θ ) Since P ( θ  s,D ) = P ( θ  s ): P ( D  s,θ ) = P ( θ  s ) P ( s,D ) P ( s,θ ) = P ( θ  s ) P ( s ) P ( D  s ) P ( s,θ ) = P ( θ,s ) P ( D  s ) P ( s,θ ) = P ( D  s ) Thus P ( D  s,θ ) is independent of θ ....
View
Full
Document
This note was uploaded on 03/13/2012 for the course CS 573 taught by Professor Staff during the Fall '08 term at Purdue University.
 Fall '08
 Staff

Click to edit the document details