This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: mx “xiv/i 1 Q1 Probability and MLE [20 pts] 1, (a) Suppose we wish to calculate P(HE1, E2) and we have no conditional independence information
Which of the following sets of numbers are sufﬁcient for the calculation 7 P(E1, E2), P(H), P(E1lH), P(E2(H)
airtime, E2), P(H), MEL E2111)
iii, P(H), P(E1IH), P(E2H) (b) Suppose we know that P(E1lH,E2) : P(E1!H) for all values of H,E1,E2‘. Now which of the
above three sets are sufﬁcient 7 8&765‘ Em: email) Piatziiwm’) p €13$<Qéc93ﬁ PCEg t; l : PM?) l H) PCEZlH)
(it ﬁr ‘55; x I ms Am 6 man 305M“>é.rocmnw, relabmg d
J g) a i 2” Which of the following statemen s are true 7 If none of them are true, write NONE,
(a) If X and Y are independent then E[2XY] 2 2E[X]E[Y] and VaT‘lX + 2Y] : VaﬂXl —l— VaTIYl
Va (b) IfX and Y are independent and X > 1 then Va7‘[X+2Y2] = Var [X]+4Va7‘[Y2] and E[X2—X] Z
Va7'[X]. (c) If X are Y are not independent then Var[X + Y] : Va7'[Xl + Var
(d) If X and Y are independent then E[XY2] 2 E[X]E[Y]2 and Var[X + Y] : Va7‘[X] + Vm (e) If X and Y are not independent and f(X) : X2 then E[f(X)Y] : E[f(X)]E[Y] and VaﬂX +
2Y] : leX] + 4Va7'[Y] OVER FUR seams
3” You are playing a game with two coins Coin 1 has a 9 probability of heads” Coin 2 has a 29 probability
of heads” You ﬂip these coins several times and record your results: (a) What is the log—likelihood of the data given 6 7 ‘7 kit?) ‘7. 8% t P 2 Head) Pfcom 2:T€u\)l J PLCGm Z .2; 2: @Chleﬁizea— r. 239— C9 «29);, 5M8) : 363:; L69) ; icga + Liege}— + 3:03 (1 «1a) (b) What is the maximum likelihood estimate fOr 6‘ 7 7‘ TM Ossxuea ‘_ 2. o —‘ We 2. a 69 m; :ﬁ 2— ;’ 
(p w »_T +£1.12) 2%”; 9‘> {’S O >%£ [5
{99— 8 (14.6”) A , , A , v’ ‘ I ' ‘1
gwe arzjmag Us) 2 {imam>4 Jug») big tangy) ,5, mcnctanai
_ L9 x, l .. l ‘7‘. ‘!
[mtiﬁmlglm ti Q, Ra\e,uom\ \bmyarHes EU“ 3 0&9 3R “1 {:00 \8 Tmeggm
VN‘UXX 3 0‘7‘\/C«FL)Q ag\R ‘H‘m ESL£an \C X and cma \néupenéun\—
EDW] :ECXQEW]
Voxc’H‘rﬂ ; Vqr‘fﬂ Nady]
WW1: EUMEW]
19X cmd \/ am no? \nchpenéUN’
Hm; Emm]
\Mx ‘\/a<"[)<]+\/mr[\{] (Ci. X” WWW") ) m m; "3 : Em +Em km’ngﬁnggg ‘Pm Swfhcs 0&3, 9&6?th \TD ((3, CL), (A) a Farm max]: EWHH'X VQ(EX1‘§Ei<X~EY><W’3
2 {[th‘EEX‘IX mfxﬂ
2 mi] N. Eileix‘sx] + ELgtxﬂ Em} ammﬂ MEN? ‘
: HM— E;._[x13*7~
Ed? smw Xﬂ E[,X19‘7E[X] and 5‘0
EW} ~ EEX] ‘» EH1 T1 ~ EH11
H H EH1,“ 7/ Var [)4] ma Hm. Q2 Decision Trees [20 pts] 1. The ﬁgure below shows a dataset with two inputs X1 and X2 and one output Y, which can take on
the values positive (+) or negative There are 16 datapoints: 12 are positive and 4 are negative, L2a£ nook Answer t0 (“> Assume we are testing two extreme decision tree learning algorithms Algorithm OVERFIT builds a
decision tree in the standard fashion, but never prunes Algorithm UNDERFIT refuses to risk splitting at all, and so the entire decision tree is just one leaf node. (a) Exactly how many leaf—nodes will be in the decision tree learned by OVERFIT on this data? 01 ($22 piciwe Aux/2.) (b) What is the leave—one—out classiﬁcation error of using OVERFIT on our dataset? Report the total number of misclassiﬁcations, EveJ3 row: ,M w; b‘ Mc‘sckamikt“ hawk Ur “ML he in A SiAtl/bm MAG mqu cm b3 HM. opposing cian . EVPJ‘A 90%? M dank “MA , AA SwJU‘ 7 8
(c) What is the leaveoneout classiﬁcation error of using UNDERFIT on our dataset? Report the
total number of misclassiﬁcations. its A“ N: 6H5, MG) WUA be. crimﬁJS‘o omit)
WwLULeMC‘B.MQ LoéDnookns. Ans :Lr (d) Now, suppose we are learning a decision tree from a dataset with M binary—valued inputs and R
training points, What is the maximum possible number of leaves in the decision tree, Circle one of the following answers: \F R <7.M the» M St H has A R,log2(R),R2,2R,M,log2(M),M2,2M, 52;: Pom. at “(a min(R,M),min(R,log2(M)),rnin(R,M2 Leaé Le. R (Eaves. min(log2(R),M),min(log2(R),log2(M)),min(log2(R),M2),min(log2(R),2M),
’ min(R2, M), min(R2, log2(M)), min(R2, M2), min(R2, 2M), If R 22M We» min(2R,M),min(2R,log2(M)),min(2R,M2),rnin(2R,2M), W quip? misi siuf max(R,M),maX(R,10g2(M)),max(R,M2),max(R, 2M), 0+“; a“ M k%ka$ max(log2(R),M),max(log2(R),log2(M)),max(1og2(R),M2),max(log2(R),2M), max(R2,M),maX(R2,log2(M)),max(R2,M2),max(R2,2M),
WM” W WRA‘ max(2R,M),max(2R,log2(M)),maX(2R,M2),max(2R,2M) M Max—WS 2M ‘ M
Lad/es . Ms Awgwef = MM<R,Z 3 Linear Regression Consider ﬁtting the linear regression model for these data
X —1 0 2
y 1 —1 1
(b) Fit Y2 : ﬁg + e, (degenerated linear regression), ﬁnd ,80,
50 =al'gmin 20/; — 50)2
ﬁg 2 1/3
(b) Fit Y, = ﬂlXi + 6, (linear regression without the constant term), ﬁnd [30
and ,61 .
[7’1 =argmin r ,81Xi)2
ﬁi = ZXiYi/ZX? = 1/5 Q4 Conditional Independence [5 pts] 1. Consider the following joint distribution over the random variables A, B, and C. A
0
0
0
0
1
1
1
1 HOHOI—‘OD—low
HHOOt—II—IOOQ (a) True or False: A is conditionally independent of B given C. .
ﬂed becqut Viék PCA" 13:3) 6:10 : PCAzi [C3 (b) If you answered part (a) with TRUE, make a change to the top two rows of this table to create
a joint distribution in which the answer to (a) is FALSE. If you answered part (a) with FALSE, make a change to the top two rows of this table to create
a joint distribution in which the answer to (a) is TRUE. one, Possible change. {9 A B C, PCAsc.)
0 0 o 0 010%! moi”: any damage winced? 444559,. 4100 rows mvs‘i'
54,” «rest/1+ {v1 H4¢ +4.ng Tep'reSevt'l;"; «jwwl' W) a we}*5t23tzr1anwe; ,m’tgn 1:4. 95
$0M +0 Q5 Generative vs Discriminative Classiﬁers [15 pts] 1. You wish to train a classiﬁer to predict the gender (a boolean variable, G) of a person based on
that person’s weight (a continuous variable, W) and whether or not they are a graduate student (a
boolean variable, 5'). Assume that W and S are conditionally independent given G. Also, assume
that the variance of the probability distribution P(Weighthender = female) equals the variance for P(Wez'ghthender = male). (a) Is it reasonable to train a Naive Bayes classiﬁer for this task? Y€$t Wow} 5‘”: Cancer‘l'nna”), [Haeepcnﬂen‘l’ glue» (b) If not, explain why not, and describe how you might reformulate this problem to allow training
a naive Bayes classiﬁer. If so, list every probability distribution your classiﬁer must learn, what
form of distribution you would use for each, and give the total number of parameters your classi
ﬁer must estimate from the training data. We must cctmm‘l‘c 6 Fara«dds:
P(é) Bgvnwll; —3P PCG=OETrCno+e I’Ce'm) Meal viaIr be eslrvnqteISefumtcb. H [S 1" P(G :1) )
Bﬂ‘wovlll ‘9 'P($=ll G =0 3" 9,
PCs‘G) Bewile "'> P(S:l )G :0956, lap L". r W
m ‘ GT”  Vuvmwcc av +11: Norm; :sl'n v was gove—‘VV’3
P(WlG) MW at "3’ Mulls“ _ Mean P,( Y’Cwléu) law‘s“,  mam 9w PCWlG=°> (c) Note one difference between the above P(GendeTIWeight, Student) problem and the problems
we discussed in class is that the above problem involves training a classiﬁer over a combination
of boolean and continuous inputs. Now suppose you would like to train a discriminative classiﬁer
for this problem, to directly ﬁt the parameters of P(GH/V, S), under the conditional independence
assumption. Assuming that W and S are conditionally independent given G, is it correct to
assume that P(G = Ill/V, S) can be expressed as a conventional logistic function: 1 P(G = IIW’S) = 1 + exp(wo + w1W + 1023) If not, explain why not. If so, prove this. YYS. ﬁfe. can In: shown by "inguinal ‘Hac. altexmqlwn "41.:“1'5 Uqchuygs Ch“l’+="" ‘e‘ﬂlﬂ‘l' (Wh'ela (was 'l'lm: cue. O‘DNM'MQI quMLlcs D wnHq 'Hae
SolvlIm 49 d ﬁves{Jon Rom homewwlc ?wahwla covets Boolew Yawaalf53. “gram 8% m’l—zmi's Llﬂﬂﬂﬁxfl'" usmg ovr Vcmees C,W,+$, W¢ have:
P(Gﬂlws) = 1+ exam at» +1., ‘p'IOWl'TOMIS
he "ﬂew? 41148 +lvle'Ve'Cvre‘. 1—9
we: lVl LW'ﬁ'l’lVI I'e:
10,: Mo'z‘h
OAR 6 wag; In 99 (I'Q,)
9' ll'ao) Q6 Neural Networks [20 pts] 1” For this question, suppose we have a Neural Network (shown below) with linear activation units, In
other words, the output of each unit is a constant C multiplied by the weighted sum of inputs (a) Can any function that is represented by the above network also be represented by a single unit
ANN (or perceptron)" If so, draw the equivalent perceptron, detailing the weights and the acti— vation function Otherwise, explain wh not“ , ,
y “7/11.: MW c4555 c 2 {LS gLC‘iil‘g/af/W 9Mka 7313a} W,Wl/éﬂf "ﬂu/10524;? 1/13; :1
wi’réf/f if amt/wt», £4545 Magma? (b) Can the space of functions that is represented by the above ANN also be represented by linear
regression? (Yes/No) %S x451? ﬁ/mﬁim ’37 %£’b¢ éﬂJ 9457;»; ___. 7%,»); #5 IL, [II/Maw
yr} éljfu/iL/fv‘wii/[gx’l ’+C2[lv’W +M;%’)Xl Wjﬂjﬁiﬂw Wig/X2
,8]. I31 iiwa gaggwm ﬁll/5
2‘, Consider the XOR function: Y 2 (X1 /\ uX2) V (ﬁXl /\ X2) We can also express this as: “ 5
Y Z > % X1 7A X2
< % otherwise It is well known that XOR cannot be implemented by a single perceptr‘on, Draw a fully connected
three unit ANN that has binary inputs X1, X2, 1 and output Y, M/e < [MP gﬁqﬁdy’ Select wei hts that im lementY: X XOR X H _ . = , g t
g p < 1 2> 7% at”? W era/z, For this question, assume the sigmoid activation function:
1 X, )9 l/ M 1 + 6XP(‘('LU0 + 101371 + 102332)) y We are 7%; gﬁwmfmﬁan Mm: V= (x; Ann) Win At)
A z? ,4“; x‘m/aléramé ,4 5’”?
I 4 féfa Ami Kim/[amamk “big?
m page??? W‘m’ i in”: 14,21 5%”? its (’1' ﬂymlﬁ r} wry/hawk? <25” ﬂ/wwaa 5,7me :25”
.. [f rééi/IVL m4jgh'ﬂwé % ,«5‘ “Wiley/736], jg Jk'éveﬂ" ...
View
Full Document
 Spring '09
 Lanzi

Click to edit the document details