This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: CS 478 Machine Learning: Homework 5 Suggested Solutions 1 The Viterbi Algorithm (a) The most likely sequence of weather is (rain, rain, snow). running programming skiing sunny 0.12 0.0108 0.00144 rain 0.18 0.0108 0.000432 cloudy 0.02 0.0018 0.000288 snow 0.04 0.0144 0.00216 (b) The most likely sequence of weather is (snow, cloudy, rain, rain, snow). skiing studying running running programming sunny 0.06 0.005 0.0006 0.000144 0.0000216 rain 0.03 0.0024 0.0018 0.000324 0.00001944 cloudy 0.04 0.006 0.00012 0.000018 0.00000324 snow 0.1 0.004 0.0002 0.000144 0.00002592 (c) You need to fill in a table of dimension k l when doing the Viterbi algorithm, and filling in each entry requires O ( k ) time since it involves finding out the maximum of k numbers. Therefore the overall complexity of the Viterbi algorithm is O ( k 2 l ). (d) The most likely state of weather on the last day is sunny. skiing studying running running programming sunny 0.06 0.0082 0.003824 0.00148408 0.000449278 rain 0.03 0.0073 0.008718 0.00281646 0.00033946 cloudy 0.04 0.0162 0.000527 0.00015024 0.0000551908 snow 0.1 0.0048 0.001428 0.00091806 0.000324407 To compute P ( X 1 = x 1 ,X 2 = x 2 ,. .. ,X l = x l ), we just need to sum up all states Y at the last position: P ( X 1 = x 1 ,X 2 = x 2 ,. .. ,X l = x l ) = 0 . 000449278+0 . 00033946+0 . 0000551908+0 . 000324407 = 0 . 001168337 (1) 2 Statistical Learning Theory (a) When the target class is linearly separable, there is always a hypothesis h that achieves zero training error from H 1 . By counting we can obtain that the number of distinct hypothesis in H 1 is 28. Therefore by putting = 0 . 05, we can obtain the generalization 1 bound for when training error is zero: 1 n (log  H 1   log ) 1 100 (log 28 log 0 . 05) . 063279 (2) Therefore the generalization bound holds with 95% confidence when is at least 0.063279, therefore we can guarantee that the generalization error of0....
View
Full
Document
This note was uploaded on 10/02/2008 for the course CS 478 taught by Professor Joachims during the Spring '08 term at Cornell University (Engineering School).
 Spring '08
 JOACHIMS
 Machine Learning

Click to edit the document details