{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

midterm-solution

midterm-solution - Name CS573 Midterm Fall 2010 This is a...

Info icon This preview shows pages 1–5. Sign up to view the full content.

View Full Document Right Arrow Icon
Name: ————————————— CS573 Midterm: Fall 2010 This is a closed-book, closed-notes exam. Non-programmable calculators are allowed for probability calculations. There are 11 pages including the cover page. The total number of points for the exam is 50 and you have 2 hours to complete the exam. Note the point value of each question and allocate your time accordingly. Read each question carefully and show your work. Question Score 1 2 3 4 5 6 7 8 Total 1
Image of page 1

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
1 Data mining components (8 pts) Read the excerpts from Kolter, J. Z. and Maloof, M. A. Learning to detect malicious executables in the wild . In Proceedings of the Tenth ACM SIGKDD international Conference on Knowledge Discovery and Data Mining, 2004 on pages 9-11. 1. Describe the data mining task. Classification Detecting malicious executables. 2. Describe the data representation. IID, tabular data, where each instance is a tuple of features Features are n-grams 3. Describe the knowledge representation. Nearest neighbor formulation for lazy prediction 500 n-grams with highest information gain 4. Describe the learning algorithm (both search and scoring function). Similarity coefficient is used for scoring internally Search is implicit in selection of nearest neighbors 2 Models and patterns (6pts) 1. Describe the difference between a model and a pattern. Give an example of each. Pattern describes local property of data. Model describes global property of data. pattern example (e.g., association rule) model example (e.g., decision trees) 2. Discuss the difference between parametric and non-parametric models. Give an example model of each type and describe a situation in which it may outperform the other type of model. Parametric models assume a particular parametric from (e.g., Gaussian) and learning consists of optimizing those parameters. Non-parametric models make fewer assumptions about the function form of the model and are data-driven (e.g., size of the model can grow without bound as the size of the training data increases). Parametric example (e.g., Naive Bayes) Non-parametric example (e.g., decision trees) Siutation: parametric may do better with less data because parametric assumptions reduce variance of model. 2
Image of page 2
3 Sufficient Statistics (4pts) Suppose s is a statistic for which p ( θ | x , D ) = p ( θ | s ). Assume p ( θ | s ) 6 = 0 and prove that p ( D | s, θ ) is independent of θ . Create an example to show that the inequality p ( θ | s ) 6 = 0 is required for your proof. By Bayes theorem we have: P ( D | s, θ ) = P ( θ | s, D ) P ( s, D ) P ( s, θ ) Since P ( θ | s, D ) = P ( θ | s ): P ( D | s, θ ) = P ( θ | s ) P ( s, D ) P ( s, θ ) = P ( θ | s ) P ( s ) P ( D | s ) P ( s, θ ) = P ( θ, s ) P ( D | s ) P ( s, θ ) = P ( D | s ) Thus P ( D | s, θ ) is independent of θ . The inequality p ( θ | s ) 6 = 0 is required. Otherwise, P ( θ, s ) = 0 and the division operation is unde- fined. 3
Image of page 3

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
4 Central Limit Theorem (6pts) In a given city it is assumed that the average number of automobile accidents in each year is 15 with variance 3.
Image of page 4
Image of page 5
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}