This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: CS221 Lecture notes #11 Bayesian networks 1 Probability Review This material on Bayesian networks (Bayes nets) will rely heavily on several concepts from probability theory, and here we give a very brief review of these concepts. For more complete coverage, see Chapter 13 of the class textbook. Although the Bayes net framework can accommodate continuous random variables, we will limit ourselves to discrete random variables X which can take on a finite set of values x 1 ,...,x d . In general, we will use uppercase letters to denote random variables and lowercase letters to denote the values those variables may take on. The probability that X takes the value x will be denoted P( X = x ), or when there is no risk of ambiguity, P( x ). The joint distribution over n random variables X 1 ,...,X n encodes the probability of a particular assignment to all of the variables, i.e. P( X 1 = x 1 ,...,X n = x n ), or simply P( x 1 ,...,x n ). The conditional probability that a random variable X takes on the value x given some other random variable Y takes on the value y is written P( x  y ), and is defined as: P( x  y ) = P( x,y ) P( y ) . More generally, for a set of random variables X 1 ,...,X m and Y 1 ,...,Y n , we can write: P( x 1 ,...,x m  y 1 ,...,y n ) = P( x 1 ,...,x m ,y 1 ,...,y n ) P( y 1 ,...,y n ) . We will use bold letters to denote sets of random variables and the values they might take. If X = { X 1 ,...,X m } and Y = { Y 1 ,...,Y n } , we can rewrite 1 2 this definition as: P( x  y ) = P( x , y ) P( y ) . We refer to the quantity P( Y = y ) as the marginal probability of Y when we want to emphasize that we are ignoring X . If we are given the joint distribution over two sets of random variables X and Y , and x and y denote joint assignments to X and Y , we can retrieve the marginal probability of Y by marginalizing over all of the possible assignments to X , i.e., P( y ) = summationdisplay x P( x , y ) . By plugging this equation into our definition of conditional probability, we get Bayes’ Rule : P( x  y ) = P( x , y ) ∑ x ′ P( x ′ , y ) . Two sets of random variables X and Y are independent if P( x , y ) = P( x )P( y ) . By dividing both sides through by P( y ), we see that this definition is equiv alent to P( x  y ) = P( x ) . More generally, we say two sets of random variables X and Y are condi tionally independent given a third set of random variables Z if P( x , y  z ) = P( x  z )P( y  z ) , or equivalently, P( x  y , z ) = P( x  z ) . Finally, it is worth noting the chain rule for joint probabilities. If X and Y are two sets of random variables, we can simply multiply both sides by P( y ) in the definition of conditional probability to find that P( x , y ) = P( x  y )P( y ) ....
View
Full
Document
This note was uploaded on 12/15/2009 for the course CS 221 taught by Professor Koller,ng during the Fall '09 term at Stanford.
 Fall '09
 KOLLER,NG

Click to edit the document details