midterm-solution

midterm-solution - Name:...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Name: ————————————— CS573 Midterm: Fall 2010 This is a closed-book, closed-notes exam. Non-programmable calculators are allowed for probability calculations. There are 11 pages including the cover page. The total number of points for the exam is 50 and you have 2 hours to complete the exam. Note the point value of each question and allocate your time accordingly. Read each question carefully and show your work. Question Score 1 2 3 4 5 6 7 8 Total 1 1 Data mining components (8 pts) Read the excerpts from Kolter, J. Z. and Maloof, M. A. Learning to detect malicious executables in the wild . In Proceedings of the Tenth ACM SIGKDD international Conference on Knowledge Discovery and Data Mining, 2004 on pages 9-11. 1. Describe the data mining task. Classification Detecting malicious executables. 2. Describe the data representation. IID, tabular data, where each instance is a tuple of features Features are n-grams 3. Describe the knowledge representation. Nearest neighbor formulation for lazy prediction 500 n-grams with highest information gain 4. Describe the learning algorithm (both search and scoring function). Similarity coefficient is used for scoring internally Search is implicit in selection of nearest neighbors 2 Models and patterns (6pts) 1. Describe the difference between a model and a pattern. Give an example of each. Pattern describes local property of data. Model describes global property of data. pattern example (e.g., association rule) model example (e.g., decision trees) 2. Discuss the difference between parametric and non-parametric models. Give an example model of each type and describe a situation in which it may outperform the other type of model. Parametric models assume a particular parametric from (e.g., Gaussian) and learning consists of optimizing those parameters. Non-parametric models make fewer assumptions about the function form of the model and are data-driven (e.g., size of the model can grow without bound as the size of the training data increases). Parametric example (e.g., Naive Bayes) Non-parametric example (e.g., decision trees) Siutation: parametric may do better with less data because parametric assumptions reduce variance of model. 2 3 Sufficient Statistics (4pts) Suppose s is a statistic for which p ( θ | x ,D ) = p ( θ | s ). Assume p ( θ | s ) 6 = 0 and prove that p ( D | s,θ ) is independent of θ . Create an example to show that the inequality p ( θ | s ) 6 = 0 is required for your proof. By Bayes theorem we have: P ( D | s,θ ) = P ( θ | s,D ) P ( s,D ) P ( s,θ ) Since P ( θ | s,D ) = P ( θ | s ): P ( D | s,θ ) = P ( θ | s ) P ( s,D ) P ( s,θ ) = P ( θ | s ) P ( s ) P ( D | s ) P ( s,θ ) = P ( θ,s ) P ( D | s ) P ( s,θ ) = P ( D | s ) Thus P ( D | s,θ ) is independent of θ ....
View Full Document

This note was uploaded on 03/13/2012 for the course CS 573 taught by Professor Staff during the Fall '08 term at Purdue University.

Page1 / 11

midterm-solution - Name:...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online