lecture17 - Data Mining CS57300 Purdue University November...

Info iconThis preview shows pages 1–14. Sign up to view the full content.

View Full Document Right Arrow Icon
Data Mining CS57300 Purdue University November 9, 2010
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Bayes net learning
Background image of page 2
Unknown structure • Set of nodes (variables) is speciFed • Inducer needs to estimate edges and parameters !" $" % &’"("() &’"’"’) &("("’) &("’"’) * * &("’"’) ! " # $% $& $’ $( $%% $)& $* $+ $ %&’)$*#+ , , , , , , , , Inducer E B P(A|E,B) e b 0.9 0.1 e ¬b 0.7 0.3 ¬e b 0.8 0.2 ¬e ¬b 1 0 ? ? P(A|?) ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
9 :%,$"#;"; *5" %+/<"$ .- 1#$#/"*"$; *. <" -&**"= 9 4$.%’ #;;+/1*&.%; #<.+* ,#+;#2&*6 #%= =./#&% ;*$+,*+$" 9 0#%%.* <" ,./1"%;#*"= <6 #,,+$#*" -&**&%’ .- 1#$#/"*"$; 9 72;. /&;;"; ,#+;#2&*6 #%= =./#&% ;*$+,*+$" !"#$%&’"() *+"#,.)$ ./’01 2’#3+"#4 !""#$% ’$ ’() *#++#$% ’$ ’() Learning structure
Background image of page 4
Approaches to learning structure • Constraint based • Performs tests of conditional independence • Searches for a network that is consistent with the observed dependencies and independencies • Strengths • Intuitive, separates structure learning from the form of the independence tests • Weaknesses • Sensitive to errors in individual tests
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Approaches to learning structure • Score based • Defne a score that evaluates how well the (in)dependencies oF a given structure match the observations • Search For a structure that maximizes the score • Strengths • Statistically motivated, can make compromises between ftting the data and complexity • Takes structure oF conditional probabilities into account • Weaknesses • Computationally diFfcult
Background image of page 6
!"#$%&"’ ’%)*’%+,,- .#%,# ,/*# 01#2312#*0 - # - # - * # - * # , " ! # !"#$% ’()*" +,-*. !"#$ #" /0* "(1* 23+ (-- "/+,4/,+*" 5 6($ 7* #%$3+*8 90*$ 431:(+#$% "/+,4/,+*" .#,4"4%’%15 ,6 7"1" Score function
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Optimization problem • Input • Training data • Scoring function (including priors, if needed) • Set of possible structures (including prior knowledge about structures) • Output • A network that maximizes the score • Key property • The score of a network is a sum of terms due to decomposable nature of model
Background image of page 8
Learning structure • Theorem: Finding a maximal scoring network structure with at most k parents for each variable is NP-hard for k>1 • This problem is addressed by using heuristic search: • Model space: all nodes, set of possible edges • Search operators: add edge, delete edge, reverse edge • Evaluation: penalized likelihood • Search techniques: greedy hill-climbing, best-±rst search, simulated annealing
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
7 =4/%(+2 0/"$+’%05&: ! " # $ % ( ) * + Search operations
Background image of page 10
7 !"#$%&’( 8% 9#:0’, ’6, /-%5, %; 0;’,5 0 $%-0$ -60(),< =, %($2 (,,: ’% 5,>/-%5, ’6, ;0.&$&,/ ’60’ =,5, -60(),: &( ’6, $0/’ .%?, ! " # $ Exploiting decomposability
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Greedy hill-climbing • Start with a given network • Empty network or Best tree or Random network • At each iteration • Evaluate all possible changes • Apply change that leads to best improvement in score • Iterate, stop when no modifcation improves score • Each step requires evaluating O(m 2 ) new changes
Background image of page 12
Possible pitfalls • Greedy hill-climbing can get stuck in: • Local maxima • All one edge changes reduce the score • Plateaus • Some one edge changes leave the score unchanged • Equivalent networks are neighbors in the search space and receive the same score • Standard heuristics can escape both • Random restarts • TABU search
Background image of page 13

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 14
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 03/13/2012 for the course CS 573 taught by Professor Staff during the Fall '08 term at Purdue University-West Lafayette.

Page1 / 56

lecture17 - Data Mining CS57300 Purdue University November...

This preview shows document pages 1 - 14. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online