Unformatted Document Excerpt
Coursehero >>
Maryland >>
Maryland >>
CMSC 828
Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
Mode Graphical An I ntroduction ls:
LiseGe toor C pute S nceDe om r cie pt Unive rsity of Maryland http://www.cs.um du/~ge d.e toor
Reading List for Next Lecture
Learning Probabilistic Relational Models, L. Getoor, N. Friedman, D. Koller, A. Pfeffer. Invited contribution to the book Relational Data Mining, S. Dzeroski and N. Lavrac, Eds., SpringerVerlag, 2001. http://www.cs.umd.edu/~getoor/Publications/lprmch.ps http://www.cs.umd.edu/class/spring2005/cmsc828g/Readings/lpr mch.pdf
Probabilistic Models for Relational Data, David Heckerman, Christopher Meek and Daphne Koller http://www.cs.umd.edu/projects/srl2004/Papers/heckerman.pdf ftp://ftp.research.microsoft.com/pub/tr/TR200430.pdf
Graphical Models
e.g. Bayesian networks, Bayes nets, Belief nets, Markov networks, HMMs, Dynamic Bayes nets, etc. Themes:
representation reasoning learning
Materials based on upcoming book by Nir Friedman and Daphne Koller. Slides based on material from Nir Friedman.
Probability Distributions
Let X1,...,Xp be discrete random variables Let P be a joint distribution over X1,...,Xp If the variables are binary, then we need O(2p) parameters to describe P Can we do better? Key idea: use properties of independence
Independent Random Variables
Two variables X and Y are independent if
P(X = x|Y = y) = P(X = x) for all values x, y That is, learning the values of Y does not change prediction of X
If X and Y are independent then
P(X,Y) = P(X|Y)P(Y) = P(X)P(Y)
In general, if X1,...,Xp are independent, then
P(X1,...,Xp)= P(X1)...P(Xp) Requires O(n) parameters
Conditional Independence
Unfortunately, most of random variables of interest are not independent of each other A more suitable notion is that of conditional independence Two variables X and Y are conditionally independent given Z if
P(X = x|Y = y,Z=z) = P(X = x|Z=z) for all values x,y,z That is, learning the values of Y does not change prediction of X once we know the value of Z
notation: I ( X , Y | Z )
Example: Nave Bayesian Model
A common model in early diagnosis: Thus, if
Symptoms are conditionally independent given the disease (or fault) X1,...,Xp denote whether the symptoms exhibited by the patient (headache, highfever, etc.) and H denotes the hypothesis about the patients health
then, P(X1,...,Xp,H) = P(H)P(X1|H)...P(Xp|H),
This nave Bayesian model allows compact representation
It does embody strong independence assumptions
Graphical Models
Graph is language for representing independencies
Directed Acyclic Graph > Bayesian Network Undirected Graph > Markov Network
DAGS: Markov Assumption
We now make this independence assumption more precise for directed acyclic graphs (DAGs) Each random variable X, is independent of its non descendents, given its parents Pa(X) Formally, I (X, NonDesc(X) | Pa(X))
Ance stor
Pare nt
Y1
Y2
X
Non-desce nt scende Non-de ndent
De nde sce nt
Markov Assumption Example
Earthquake Burglary
In this example:
Radio
Alarm
I ( E, B ) I ( B, {E, R} ) I ( R, {A, B, C} | E ) I ( A, R | B,E ) I ( C, {B, E, R} | A)
Call
IMaps
A DAG G is an IMap of a distribution P if the all Markov assumptions implied by G are satisfied by P Examples:
(Assuming G and P both use the same set of random variables)
X
Y
X
Y
x y P(x,y) 0 0 0.25 0 1 0.25 1 0 0.25 1 1 0.25
x y P(x,y) 0 0 0.2 0 1 0.3 1 0 0.4 1 1 0.1
Factorization
Given that G is an IMap of P, can we simplify the representation of P? Example:
X
Y
Since I(X,Y), we have that P(X|Y) = P(X) Applying the chain rule P(X,Y) = P(X|Y) P(Y) = P(X) P(Y) Thus, we have a simpler representation of P(X,Y)
Factorization Theorem
Thm: if G is an IMap of P, then
P( X1 ,..., Xp ) = P( Xi | Pa( Xi ))
i
Proof: P( X1 ,..., Xp ) = P( Xi | X1 ,..., Xi-1 ) By chain rule: i wlog. X1,...,Xp is an ordering consistent with G
( From assumption: Pa Xi ) { X1, , Xi-1 } { X1, , Xi-1 } - Pa( Xi ) NonDesc( Xi )
Since G is an IMap, I (Xi, NonDesc(Xi)| Pa(Xi)) Hence,
I ( Xi , { X1, , Xi-1 } - Pa( Xi ) | Pa( Xi ))
We conclude, P(Xi | X1,...,Xi1) = P(Xi | Pa(Xi) )
Factorization Example
Earthquake Burglary
Radio
Alarm
Call
P(C,A,R,E,B) = P(B)P(E|B)P(R|E,B)P(A|R,B,E)P(C| A,R,B,E) versus P(C,A,R,E,B) = P(B) P(E) P(R|E) P(A|B,E) P(C|A)
Consequences
We can write P in terms of "local" conditional probabilities
If G is sparse,
that is, |Pa(Xi)| < k ,
each conditional probability can be specified
compactly
e.g. for binary variables, these require O(2k) params. linear in number of variables
representation of P is compact
DAGS: Summary
The Markov Independences of a DAG G
I (Xi , NonDesc(Xi) | Pai )
G is an IMap of a distribution P
If P satisfies the Markov independencies implied by G
if G is an IMap of P, then
P( X1 ,..., Xn ) = P( Xi | Pai )
i
Conditional Independencies
Let Markov(G) be the set of Markov Independencies implied by G The factorization theorem shows
P( X 1,..., X n ) = P( X i | Pai ) G is an IMap of P i
We can also show the opposite: Thm: P( X 1,..., X n ) = P( X i | Pai ) G is an IMap of P i
Implied Independencies
Does a graph G imply additional independencies as a consequence of Markov(G)? We can define a logic of independence statements Some axioms:
I( X ; Y | Z ) I( Y; X | Z ) I( X ; Y1, Y2 | Z ) I( X; Y1 | Z )
dseperation
A procedure dsep(X; Y | Z, G) that given a DAG G, and sets X, Y, and Z returns either yes or no Goal: dsep(X; Y | Z, G) = yes iff I(X;Y|Z) follows from Markov(G)
Paths
Intuition: dependency must "flow" along paths in the graph A path is a sequence of neighboring variables Examples: R E A B C A E R
Earthquake
Burglary
Radio
Alarm
Call
Paths
We want to know when a path is
active creates dependency between end nodes blocked cannot create dependency end nodes
We want to classify situations in which paths are active.
Path Blockage
Three cases:
Common cause
Blocked
E R A
Unblocked Active
E R A
Path Blockage
Three cases:
Common cause Intermediate cause
Blocked
E A C
Unblocked Active
E A C
Path Blockage
Three cases:
Common cause Intermediate cause Common Effect
Blocked
Unblocked Active
E B A C
E A C
B E
B
A C
Path Blockage General Case
A path is active, given evidence Z, if Whenever we have the configuration
A B C
B or one of its descendents are in Z No other nodes in the path are in Z A path is blocked, given evidence Z, if it is not active.
Example
dsep(R,B)?
E R A C
B
Example
dsep(R,B) = yes dsep(R,B|A)?
E R A C
B
Example
dsep(R,B) = yes dsep(R,B|A) = no dsep(R,B|E,A)?
E R A C
B
dSeparation
X is dseparated from Y, given Z, if all paths from a node in X to a node in Y are blocked, given Z. Checking dseparation can be done efficiently (linear time in number of edges)
Bottomup phase: Mark all nodes whose descendents are in Z X to Y phase: Traverse (BFS) all edges on paths from X to Y and check if they are blocked
Soundness
Thm: If
then
G is an IMap of P dsep( X; Y | Z, G ) = yes P satisfies I( X; Y | Z )
Informally, Any independence reported by dseparation is satisfied by underlying distribution
Completeness
Thm: If dsep( X; Y | Z, G ) = no then there is a distribution P such that
G is an IMap of P P does not satisfy I( X; Y | Z )
Informally, Any independence not reported by dseparation might be violated by the underlying distribution We cannot determine this by examining the graph structure alone
IMaps revisited
The fact that G is IMap of P might not be that useful For example, complete DAGs
A DAG is G is complete is we cannot add an arc without creating a cycle X1 X2 X1 X2
X3
X4
X3
X4
These DAGs do not imply any independencies Thus, they are IMaps of any distribution
Minimal IMaps
A DAG G is a minimal IMap of P if G is an IMap of P If G' G, then G' is not an IMap of P Removing any arc from G introduces (conditional) independencies that do not hold in P
Minimal IMap Example
X1 X3 X4 X2
If is a minimal IMap Then, these are not IMaps:
X1 X3 X1 X3 X4 X4 X2 X2 X1 X3 X1 X3 X4 X4 X2 X2
Constructing minimal IMaps
The factorization theorem suggests an algorithm Fix an ordering X1,...,Xn For each i,
select Pai to be a minimal subset of {X1,...,Xi1 }, such that I(Xi ; {X1,...,Xi1 } Pai | Pai )
Clearly, the resulting graph is a minimal IMap.
Nonuniqueness of minimal IMap
Unfortunately, there may be several minimal IMaps for the same distribution
Applying IMap construction procedure with different orders can lead to different structures
Order: C, R, A, E, B
E R A C B R
Original I-Map
E A C B
Choosing Ordering & Causality
The choice of order can have drastic impact on the complexity of minimal IMap Heuristic argument: construct IMap using causal ordering among variables Justification?
It is often reasonable to assume that graphs of causal influence should satisfy the Markov properties.
PMaps
A DAG G is PMap (perfect map) of a distribution P if
I(X; Y | Z) if and only if dsep(X; Y |Z, G) = yes
Notes: A PMap captures all the independencies in the distribution PMaps are unique, up to DAG equivalence
PMaps
Unfortunately, some distributions do not have a P Map
Bayesian Networks
A Bayesian network specifies a probability distribution via two components:
A DAG G A collection of conditional probability distributions P(Xi|Pai)
The joint distribution P is defined by the factorization
P( X 1,..., X n ) = P( X i | Pai )
i
Additional requirement: G is a minimal IMap of P
Bayesian Networks
A Bayesian network specifies a probability distribution via two components:
A DAG G A collection of conditional probability distributions P(Xi|Pai)
The joint distribution P is defined by the factorization
P( X 1,..., X n ) = P( X i | Pai )
i
Additional requirement: G is a minimal IMap of P
DAGs and BNs
DAGs as a representation of conditional independencies:
This theory is the basis for defining Bayesian networks
Markov independencies of a DAG Tight correspondence between Markov(G) and the factorization defined by G dseparation, a sound & complete procedure for computing the consequences of the independencies Notion of minimal IMap PMaps
Undirected Graphs: Markov Networks
Alternative representation of conditional independencies Let U be an undirected graph Let Ni be the set of neighbors of Xi
Define Markov(U) to be the set of independencies I( Xi ; {X1,...,Xn} Ni {Xi } | Ni ) U is an IMap of P if P satisfies Markov(U)
Example
This graph implies that I(A; C | B, D ) I(B; D | A, C )
D A B
C
Note: this example does not have a directed PMap
Markov Network Factorization
Thm: if P is strictly positive, that is P(x1, ..., xn ) > 0 for all assignments then U is an IMap of P if and only if there is a factorization
1 k
1 P( X1 , , Xn ) = f ( Ci ) Z i where C , ..., C are the maximal cliques in U
Alternative form:
1 g (Ci ) P( X 1 , , X n ) = e i Z
Relationship between Directed & Undirected Models
Chain Graphs
Directed Graphs
Undirected Graphs
CPDs
So far, we focused on how to represent independencies using DAGs The "other" component of a Bayesian networks is the specification of the conditional probability distributions (CPDs) Here, we'll just discuss the simplest representation of CPDs
Tabular CPDs
When the variable of interest are all discrete, the common representation is as a table: | For example P(C A,B) can be represented by
A 0 0 1 1
B 0 1 0 1
P(C= 0 | A, B) 0.25 0.50 0.12 0.33
P(C= 1 | A, B) 0.75 0.50 0.88 0.67
Tabular CPDs
Pros: Very flexible, can capture any CPD of discrete variables Can be easily stored and manipulated Cons: Representation size grows exponentially with the number of parents! Unwieldy to assess probabilities for more than few parents
Continuous CPDs
When X is a continuous variables, we need to represent the density of X, given any value of its parents
Gaussian Conditional Gaussian
CPDs: Summary
Many choices for representing CPDs Any "statistical" model of conditional distribution can be used
e.g., any regression model
Representing structure in CPDs can have implications on independencies among variables
Inference in Bayesian Networks
Inference
We now have compact representations of probability distributions:
Bayesian Networks Markov Networks
Network describes a unique probability distribution P How do we answer queries about P? inference is name for the process of computing answers to such queries
Queries: Likelihood
There are many types of queries we might ask. Most of these involve evidence
An evidence e is an assignment of values to a set E variables in the domain Without loss of generality E = { Xk+1, ..., Xn }
Simplest query: compute probability of evidence
P(e ) = P( x 1, , x k , e )
x1 xk
This is often referred to as computing the likelihood of the evidence
Queries: A posteriori belief
Often we are interested in the conditional probability of a variable given the evidence
P( X , e ) P( X | e ) = P(e )
This is the a posteriori belief in X, given evidence e A related task is computing the term P(X, e)
i.e., the likelihood of e and X = x for values of X we can recover the a posteriori belief by
P( X = x , e ) P( X = x | e ) = P(X = x , e )
x
A posteriori belief
This query is useful in many cases: Prediction: what is the probability of an outcome given the starting condition Diagnosis: what is the probability of disease/fault given symptoms
Target is an ancestor of the evidence Target is a descendent of the evidence
Note: the direction between variables does not restrict the directions of the queries
Probabilistic inference can combine evidence form all parts of the network
Queries: MAP
In this query we want to find the maximum a posteriori assignment for some variable of interest (say X1,...,Xl ) That is, x1,...,xl maximize the probability P(x1,...,xl | e) Note that this is equivalent to maximizing P(x1,...,xl, e)
Queries: MAP
We can use MAP for: Classification Explanation
find most likely label, given the evidence What is the most likely scenario, given the evidence
Queries: MAP
Cautionary note: The MAP depends on the set of variables Example:
MAP of X MAP of (X, Y)
x y P , ) ( y x 0 0 0 1 10 1 1 0 5 . 3 0 5 . 0 0 . 3 0 . 3
Complexity of Inference
Thm: Computing P(X = x) in a Bayesian network is NP hard Not surprising, since we can simulate Boolean gates.
Hardness
Hardness does not mean we cannot solve inference
It implies that we cannot find a general procedure that works efficiently for all networks For particular families of networks, we can have provably efficient procedures
Approaches to inference
Exact inference
Inference in Simple Chains Variable elimination Clustering / join tree algorithms Stochastic simulation / sampling methods Markov chain Monte Carlo methods Mean field theory
Approximate inference
Inference in Simple Chains
X1 X2
How do we compute P(X2)?
P( x 2 ) = P( x 1, x 2 ) = P( x 1)P( x 2 | x 1)
x1 x1
Inference in Simple Chains (cont.)
X1 X2 X3
How do we compute P(X3)?
P ( x 3 ) = P ( x 2 , x 3 ) = P ( x 2 )P ( x 3 | x 2 )
x2 x2
we already know how to compute P(X2)...
P( x 2 ) = P( x 1, x 2 ) = P( x 1)P( x 2 | x 1)
x1 x1
Inference in Simple Chains (cont.)
X1 X2 X3
...
Xn
How do we compute P(Xn)? Compute P(X1), P(X2), P(X3), ... We compute each term by using the previous one
P( x i +1) = P( x i )P( x i +1 | x i )
xi
Complexity: Each step costs O(| Val(Xi)| *| Val(Xi+1)| ) operations
Compare to nave evaluation, that requires summing over joint values of n-1 variables
Inference in Simple Chains (cont.)
X1 X2
Suppose that we observe the value of X2 =x2 How do we compute P(X1| x2)?
Recall that we it suffices to compute P(X1,x2)
P( x 1, x 2 ) = P( x 2 | x 1)P( x 1)
Inference in Simple Chains (cont.)
X1 X2 X3
Suppose that we observe the value of X3 =x3 How do we compute P(X1,x3)?
P( x 1, x 3 ) = P( x 1)P( x 3 | x 1)
How do we compute P(x3| x1)?
P( x 3 | x 1) = P( x 2 , x 3 | x 1) = P( x 2 | x 1)P( x 3 | x 1, x 2 )
x2 x2
= P ( x 2 | x 1 )P ( x 3 | x 2 )
x2
Inference in Simple Chains (cont.)
X1 X2 X3
...
Xn
Suppose that we observe the value of Xn =xn How do we compute P(X1,xn)?
P( x 1,x n) = P( x 1)P( x n | x 1)
We compute P(xn| xn-1), P(xn| xn-2), ... iteratively
P( x n | x i ) =
= P( x i +1 | x i )P( x n | x i +1)
xi
x i +1
P( x
i +1
,xn | xi )
Inference in Simple Chains (cont.)
X1 X2
...
Xk
...
Xn
Suppose that we observe the value of Xn =xn We want to find P(Xk| xn ) How do we compute P(Xk,xn )?
P( x k ,x n) = P( x k )P( x n | x k )
We compute P(Xk ) by forward iterations We compute P(xn | Xk ) by backward iterations
Elimination in Chains
We now try to understand the simple chain example using firstorder principles
A B C D E
Using definition of probability, we have
P(e ) = P(a, b , c , d , e )
d c b a
Elimination in Chains
A B C D E
By chain decomposition, we get
P(e ) = P(a, b , c , d , e ) = P(a )P(b | a )P(c | b )P(d | c )P(e | d )
d c b a d c b a
Elimination in Chains
A B C D E
Rearranging terms ...
P(e ) = P(a )P(b | a )P(c | b )P(d | c )P(e | d ) = P(c | b )P(d | c )P(e | d ) P(a )P(b | a )
d c b a d c b a
Elimination in Chains
X
A
d c
B
C
D
E
Now we can perform innermost summation
P(e ) = P(c | b )P(d | c )P(e | d ) P(a )P(b | a ) = P(c | b )P(d | c )P(e | d ) p (b )
d c b b a
This summation, is exactly the first step in the forward iteration we describe before
Elimination in Chains
X
A
X
B
C
D
E
Rearranging and then summing again, we get
P(e ) = P(c | b )P(d | c )P(e | d ) p (b ) = P(d | c )P(e | d ) P(c | b ) p (b ) = P(d | c )P(e | d ) p (c )
d c d c b d c b
Elimination in Chains with Evidence
Similarly, we understand the backward pass
A B C D E
We write the query in explicit form
P(a, e ) = P(a, b , c , d , e ) = P(a )P(b | a )P(c | b )P(d | c )P(e | d )
b c d b c d
Elimination in Chains with Evidence
A B C
X
D
E
Eliminating d, we get
P(a, e ) = P(a )P(b | a )P(c | b )P(d | c )P(e | d ) = P(a )P(b | a )P(c | b ) P(d | c )P(e | d ) = P(a )P(b | a )P(c | b )P(e | c )
b c b c d b c d
Elimination in Chains with Evidence
A B
X
C
X
D
E
Eliminating c, we get
P(a, e ) = P(a )P(b | a )P(c | b )P(e | c ) = P(a )P(b | a ) P(c | b )P(e | c ) = P(a )P(b | a ) p (e | b )
b b c b c
Elimination in Chains with Evidence
A
X
B
X
C
X
D
E
Finally, we eliminate b
P(a, e ) = P(a )P(b | a ) p (e | b ) = P(a ) P(b | a ) p (e | b )
b b
= P(a )P(e | a )
Variable Elimination
General idea: Write query in the form
P( X n , e ) = P( x i | pai )
xk x3 x2 i
Iteratively
Move all irrelevant terms outside of innermost sum Perform innermost sum, getting a new term Insert the new term into the product
A More Complex Example
"Asia" network:
Visit to Asia
Smoking
Tuberculosis
Lung Cancer
Abnormality in Chest
Bronchitis
X-Ray
Dyspnea
We want to compute P(d) Need to eliminate: v,s,x,t,l,a,b
V T A X D L
S
Initial factors
B
P(v )P( s )P(t | v )P(l | s )P(b | s )P(a | t , l )P( x | a )P(d | a, b )
We want to compute P(d) Need to eliminate: v,s,x,t,l,a,b
V T A X D L
S
Initial factors
B
P(v )P( s )P(t | v )P(l | s )P(b | s )P(a | t , l )P( x | a )P(d | a, b )
Eliminate: v Compute: f v (t ) = P(v )P(t | v )
v
f v (t )P( s )P(l | s )P(b | s )P(a | t , l )P( x | a )P(d | a, b )
Note: f v(t) = P(t) In general, result of elimination is not necessarily a probability term
We want to compute P(d) Need to eliminate: s,x,t,l,a,b Initial factors
V T A X D L
S
B
P(v )P( s )P(t | v )P(l | s )P(b | s )P(a | t , l )P( x | a )P(d | a, b )
f v (t )P( s )P(l | s )P(b | s )P(a | t , l )P( x | a )P(d | a, b )
Eliminate: s Compute:
f s (b , l ) = P( s )P(b | s )P(l | s )
s
f v (t )f s (b , l )P(a | t , l )P( x | a )P(d | a, b )
Summing on s results in a factor with two arguments f s(b,l) In general, result of elimination may be a function of several variables
We want to compute P(d) Need to eliminate: x,t,l,a,b Initial factors
V T A X D L
S
B
P(v )P( s )P(t | v )P(l | s )P(b | s )P(a | t , l )P( x | a )P(d | a, b )
f v (t )P( s )P(l | s )P(b | s )P(a | t , l )P( x | a )P(d | a, b ) f v (t )f s (b , l )P(a | t , l )P( x | a )P(d | a, b )
Eliminate: x Compute: f x ( a ) = P( x | a )
x
f v (t )f s (b , l )f x (a )P(a | t , l )P(d | a, b )
Note: f x(a) = 1 for all values of a !!
We want to compute P(d) Need to eliminate: t,l,a,b Initial factors
V T A X D L
S
B
P(v )P( s )P(t | v )P(l | s )P(b | s )P(a | t , l )P( x | a )P(d | a, b )
f v (t )P( s )P(l | s )P(b | s )P(a | t , l )P( x | a )P(d | a, b ) f v (t )f s (b , l )P(a | t , l )P( x | a )P(d | a, b ) f v (t )f s (b , l )f x (a )P(a | t , l )P(d | a, b )
Eliminate: t Compute: f t (a, l ) = f v (t )P(a | t , l )
t
f s (b , l )f x (a )f t (a, l )P(d | a, b )
We want to compute P(d) Need to eliminate: l,a,b Initial factors
V T A X D L
S
B
P(v )P( s )P(t | v )P(l | s )P(b | s )P(a | t , l )P( x | a )P(d | a, b )
f v (t )P( s )P(l | s )P(b | s )P(a | t , l )P( x | a )P(d | a, b ) f v (t )f s (b , l )P(a | t , l )P( x | a )P(d | a, b ) f v (t )f s (b , l )f x (a )P(a | t , l )P(d | a, b ) f s (b , l )f x (a )f t (a, l )P(d | a, b )
Eliminate: l Compute: l f (a, b ) = f s (b , l )f t (a, l )
l
f l (a, b )f x (a )P(d | a, b )
We want to compute P(d) Need to eliminate: b Initial factors
V T A X D L
S
B
P(v )P( s )P(t | v )P(l | s )P(b | s )P(a | t , l )P( x | a )P(d | a, b )
f v (t )P( s )P(l | s )P(b | s )P(a | t , l )P( x | a )P(d | a, b ) f v (t )f s (b , l )P(a | t , l )P( x | a )P(d | a, b ) f v (t )f s (b , l )f x (a )P(a | t , l )P(d | a, b ) f s (b , l )f x (a )f t (a, l )P(d | a, b ) f l (a, b )f x (a )P(d | a, b ) f a (b , d ) f b (d )
Eliminate: a,b Compute: f a (b , d ) = f l (a, b )f x (a ) p (d | a, b )
a
f b (d ) = f a (b , d )
b
Variable Elimination
We now understand variable elimination as a sequence of rewriting operations Actual computation is done in elimination step Exactly the same computation procedure applies to Markov networks Computation depends on order of elimination
Dealing with evidence
How do we deal with evidence? Suppose get evidence V = t, S= f, D = t We want to compute P(L, V = t, S= f, D = t)
T
V L A X D
S
B
Dealing with Evidence
T
V L A X D
S
We start by writing the factors:
B
P(v )P( s )P(t | v )P(l | s )P(b | s )P(a | t , l )P( x | a )P(d | a, b )
Since we know that V = t, we don't need to eliminate V Instead, we can replace the factors P(V) and P(T| V) with
f P (V ) = P(V = t )
f p (T |V ) ( ) = P( | V = t ) T T
These "select" the appropriate parts of the original factors given the evidence Note that f p(V) is a constant, and thus does not appear in elimination of other variables
Dealing with Evidence
Given evidence V = t, S= f, D = t Compute P(L, V = t, S= f, D = t ) Initial factors, after setting evidence:
T
V L A X D
S
B
f P (v )f P ( s )f P (t |v ) (t )f P (l | s ) (l )f P ( b| s ) (b )P(a | t , l )P( x | a )f P (d |a ,b ) (a, b )
Dealing with Evidence
Given evidence V = t, S= f, D = t Compute P(L, V = t, S= f, D = t ) Initial factors, after setting evidence:
T
V L A X D
S
B
f P (v )f P ( s )f P (t |v ) (t )f P (l | s ) (l )f P ( b| s ) (b )P(a | t , l )P( x | a )f P (d |a ,b ) (a, b )
Eliminating x, we get
f P (v )f P ( s )f P (t |v ) (t )f P (l | s ) (l )f P ( b| s ) (b )P(a | t , l )f x (a )f P (d |a ,b ) (a, b )
Dealing with Evidence
Given evidence V = t, S= f, D = t Compute P(L, V = t, S= f, D = t ) Initial factors, after setting evidence:
T
V L A X D
S
B
f P (v )f P ( s )f P (t |v ) (t )f P (l | s ) (l )f P ( b| s ) (b )P(a | t , l )P( x | a )f P (d |a ,b ) (a, b )
Eliminating x, we get
f P (v )f P ( s )f P (t |v ) (t )f P (l | s ) (l )f P ( b| s ) (b )P(a | t , l )f x (a )f P (d |a ,b ) (a, b )
Eliminating t, we get
f P (v )f P ( s )f P (l | s ) (l )f P ( b| s ) (b )f t (a, l )f x (a )f P (d |a ,b ) (a, b )
Dealing with Evidence
Given evidence V = t, S= f, D = t Compute P(L, V = t, S= f, D = t ) Initial factors, after setting evidence:
T
V L A X D
S
B
f P (v )f P ( s )f P (t |v ) (t )f P (l | s ) (l )f P ( b| s ) (b )P(a | t , l )P( x | a )f P (d |a ,b ) (a, b )
Eliminating x, we get
f P (v )f P ( s )f P (t |v ) (t )f P (l | s ) (l )f P ( b| s ) (b )P(a | t , l )f x (a )f P (d |a ,b ) (a, b )
Eliminating t, we get
f P (v )f P ( s )f P (l | s ) (l )f P ( b| s ) (b )f t (a, l )f x (a )f P (d |a ,b ) (a, b )
Eliminating a, we get
f P (v )f P ( s )f P (l | s ) (l )f P ( b| s ) (b )f a (b , l )
Dealing with Evidence
Given evidence V = t, S= f, D = t Compute P(L, V = t, S= f, D = t ) Initial factors, after setting evidence: Eliminating x, we get
T
V L A X D
S
B
f P (v )f P ( s )f P (t |v ) (t )f P (l | s ) (l )f P ( b| s ) (b )P(a | t , l )P( x | a )f P (d |a ,b ) (a, b )
Eliminating t, we get f P (v )f P ( s )f P (t |v ) (t )f P (l | s ) (l )f P ( b| s ) (b )P( a | t , l )f x (a )f P (d | a ,b ) ( a, b ) Eliminating a, we get
f P (v )f P ( s )f P (l | s ) (l )f P ( b| s ) (b )f t (a, l )f x (a )f P (d |a ,b ) (a, b )
Eliminating b, we get
f P (v )f P ( s )f P (l | s ) (l )f P ( b| s ) (b )f a (b , l ) f P (v )f P ( s )f P (l | s ) (l )f b (l )
Complexity of variable elimination
Suppose in one elimination step we compute
x m
f x ( y1, , yk ) = f 'x (x , y1, , yk ) f 'x (x , y1, , yk ) = f i (x , y1,1, , y1,li )
i =1 This requires multiplications m Val( X ) Val( Yi )
i
multiplications For each value for x, y1, ..., yk, we do m
Val( X ) Val( Yi ) additions
i
For each value of y1, ..., yk , we do | Val(X)| additions
Complexity is exponential in number of variables in the intermediate factor!
Understanding Variable Elimination
We want to select "good" elimination orderings that reduce complexity We start by attempting to understand variable elimination via the graph we are working with This will reduce the problem of finding good ordering to graphtheoretic operation that is wellunderstood
Undirected graph representation
At each stage of the procedure, we have an algebraic term that we need to evaluate In general this term is of the form: P( x 1, , x k ) = f i ( Z i ) where Zi are sets of variables We now plot a graph where there is undirected edge XY if X,Y are arguments of some factor
that is, if X,Y are in some Zi
y1 yn i
Note: this is the Markov network that describes the probability on the variables we did not eliminate yet
Chordal Graphs
elimination ordering undirected chordal graph
V T A X D L B X S T A D V L B S
Graph: Maximal cliques are factors in elimination Factors in elimination are cliques in the graph Complexity is exponential in size of the largest clique in graph
Induced Width
The size of the largest clique in the induced graph is thus an indicator for the complexity of variable elimination This quantity is called the induced width of a graph according to the specified ordering Finding a good ordering for a graph is equivalent to finding the minimal induced width of the graph
General Networks
From graph theory: Thm: Finding an ordering that minimizes the induced width is NPHard However, There are reasonable heuristic for finding "relatively" good ordering There are provable approximations to the best induced width If the graph has a small induced width, there are algorithms that find it in polynomial time
Elimination on Trees
Formally, for any tree, there is an elimination ordering with induced width = 1 Thm Inference on trees is linear in number of variables
PolyTrees
A polytree is a network where there is at most one path from one variable to another
A H C D F G E
Thm: Inference in a polytree is linear in the representation size of the network
This assumes tabular CPT representation
B
Approaches to inference
Exact inference
Inference in Simple Chains Variable elimination Clustering / join tree algorithms Stochastic simulation / sampling methods Markov chain Monte Carlo methods Mean field theory
Approximate inference
Learning Bayesian Networks
Learning Bayesian networks
E
B A C
Data + Prior information
Inducer
R
E B P(A | E,B) e b e b e b .9 .7 .8 .1 .3 .2
e b .99 .01
Known Structure Complete Data
E, B, A <Y,N,N> <Y,Y,Y> <N,N,Y> <N,Y,Y> . . <N,Y,Y>
E A B
E
B A
Inducer
E B P(A | E,B) e b e b e b .9 .7 .8 .1 .3 .2
E B P(A | E,B) e b e b e b e b ? ? ? ? ? ? ? ?
e b .99 .01
Network structure is specified
Data does not contain missing values
Inducer needs to estimate parameters
Unknown Structure Complete Data
E, B, A <Y,N,N> <Y,Y,Y> <N,N,Y> <N,Y,Y> . . <N,Y,Y>
E A B
E
B A
Inducer
E B P(A | E,B) e b e b e b .9 .7 .8 .1 .3 .2
E B P(A | E,B) e b e b e b e b ? ? ? ? ? ? ? ?
e b .99 .01
Network structure is not specified
Data does not contain missing values
Inducer needs to select arcs & estimate parameters
Known Structure Incomplete Data
E, B, A <Y,N,N> <Y,?,Y> <N,N,Y> <N,Y,?> . . <?,Y,Y> E B P(A | E,B) e b e b e b e b ? ? ? ? ? ? ? ?
E A B
E
B A
Inducer
E B P(A | E,B) e b e b e b .9 .7 .8 .1 .3 .2
e b .99 .01
Network structure is specified Data contains missing values
We consider assignments to missing values
Known Structure / Complete Data
Given a network structure G
And choice of parametric family for P(Xi| Pai)
Learn parameters for network Goal Construct a network that is "closest" to probability that generated the data
Learning Parameters for a Bayesian Network
Training data has the form:
E B
A
B [ 1] A [ 1] C [ 1] E [ 1] D= E [ M ] B[ M ] A[ M ] C [ M ]
C
Learning Parameters for a Bayesian Network
Since we assume i.i.d. samples, likelihood function is
E B
A
L ( : D ) = P (E [ m ], B [ m ], A [ m ], C [ m ] : )
m
C
Learning Parameters for a Bayesian Network
By definition of network, we get
E B
A
L ( : D ) = P (E [ m ], B [ m ], A [ m ], C [ m ] : )
m
C
P (E [ m ] : ) =
m
P (B [ m] : ) P ( A [ m ] | B [ m ], E [ m ] : ) P (C [ m ] | A [ m ] : )
B [ 1] A [ 1] C [ 1] E [ 1] E [ M ] B[ M ] A[ M ] C [ M ]
Learning Parameters for a Bayesian Network
Rewriting terms, we get
E B
A
L ( : D ) = P (E [ m ], B [ m ], A [ m ], C [ m ] : )
m
C
= P (E [ m ] : )
m
P (B [ m] : )
m
P (A [ m ] | B [ m ], E [ m ] : )
m
P (C [ m ] | A [ m ] : )
m
B [ 1] A [ 1] C [ 1] E [ 1] E [ M ] B[ M ] A[ M ] C [ M ]
General Bayesian Networks
Generalizing for any Bayesian network:
L( : D ) = P( x 1[ m], , x n [ m] : )
m
i.i.d. samples Network factorization
= P( x i [ m] | Pai [ m] : i )
m i
= P( x i [ m] | Pai [ m] : i )
i m i The likelihood decomposes according to the structure of the network.
= Li ( i : D )
General Bayesian Networks (Cont.)
Decomposition Independent Estimation Problems If the parameters for each family are not related, then they can be estimated independently of each other.
From Binomial to Multinomial
For example, suppose X can have the values 1,2,...,K We want to learn the parameters 1, 2. ..., K Sufficient statistics: N1, N2, ..., NK the number of times each outcome is observed Likelihood function: MLE:
^ k = Nk N
L ( : D ) = k N k
k =1
K
Likelihood for Multinomial Networks
When we assume that P(Xi | Pai ) is multinomial, we get further decomposition:
Li ( i : D ) = P( x i [ m] | Pai [ m] : i ) =
pai m pai m,Pai [ m ] = pai
P(x [ m] | pa
i N ( x i , pai )
i
: i )
= P( x i | pai : i ) N ( x i , pai )
xi
= x i | pai
pai xi
Likelihood for Multinomial Networks
When we assume that P(Xi | Pai ) is multinomial, we get further decomposition:
Li ( i : D ) = x i | pai
pai xi
N ( x i , pai )
For each value pai of the parents of Xi we get an independent multinomial problem The MLE is
^ x
i | pa i
N ( x i , pa i ) = N ( pa i )
Bayesian Approach: Dirichlet Priors
Recall that the likelihood function is
L( : D ) = k N k
k =1
K
A Dirichlet prior with hyperparameters 1,...,K is defined as for legal 1,..., K P( ) k -1
k =1 k
K
Then the posterior has the same form, with hyperparameters 1+N 1, ...,K +N K
P( | D ) P( )P(D | ) k k -1 k N k = k k +N k -1
k =1 k =1 k =1
K
K
K
Dirichlet Priors (cont.)
We can compute the prediction on a new event in closed form: If P() is Dirichlet with hyperparameters 1,..., K then
P( X[1] = k ) = k P( ) d = k
P( X[ M + 1] = k | D) = k P( | D) d = k + Nk ( + N) Since the posterior is also Dirichlet, we get
Prior Knowledge
The hyperparameters 1,..., K can be thought of as "imaginary" counts from our prior experience Equivalent sample size = 1+...+K The larger the equivalent sample size the more confident we are in our prior
Conjugate Families
The property that the posterior distribution follows the same parametric form as the prior distribution is called conjugacy Conjugate families are useful since:
Dirichlet prior is a conjugate family for the multinomial likelihood For many distributions we can represent them with hyperparameters They allow for sequential update within the same representation In many cases we have closedform solution for prediction
Bayesian Prediction(cont.)
Given these observations, we can compute the posterior for each multinomial Xi | pai independently
The posterior is Dirichlet with parameters
(Xi=1| pai)+N (Xi=1| pai),..., (Xi=k| pai)+N (Xi=k| pai)
The predictive distribution is then represented by the parameters
( x i , pai ) + N( x i , pai ) ~ x i|pai = (pai ) + N(pai )
Learning Parameters: Summary
Estimation relies on sufficient statistics
For multinomial these are of the form N (xi,pai) Parameter estimation
^ x
i | pa i
=
N ( x i , pa i ) N ( pa i )
MLE
~ x
i | pa i
=
( x i , pa i ) + N ( x i , pa i ) ( pa i ) + N ( pa i )
Bayesian (Dirichlet)
Bayesian methods also require choice of priors Both MLE and Bayesian are asymptotically equivalent and consistent Both can be implemented in an online manner by accumulating sufficient statistics
Learning Structure from Complete Data
Benefits of Learning Structure
Efficient learning more accurate models with less data Discover structural properties of the domain Identifying independencies faster inference Predict effect of actions
Ordering of events Relevance Compare: P(A) and P(B) vs. joint P(A,B)
Involves learning causal relationship among variables
Why Struggle for Accurate Structure?
Earthquake Alarm Set Burglary
Sound
Adding an arc
Earthquake Alarm Set Burglary
Missing an arc
Earthquake Alarm Set Burglary
Sound Sound
Increases the number of parameters to be fitted Wrong assumptions about causality and domain structure
Cannot be compensated by accurate fitting of parameters Also misses causality and domain structure
Approaches to Learning Structure
Constraint based
Perform tests of conditional independence Search for a network that is consistent with the observed dependencies and independencies
Pros & Cons
+ Intuitive, follows closely the construction of BNs + Separates structure learning from the form of the independence tests - Sensitive to errors in individual tests
Approaches to Learning Structure
Score based
Define a score that evaluates how well the (in)dependencies in a structure match the observations Search for a structure that maximizes the score + + + - Statistically motivated Can make compromises Takes the structure of conditional probabilities into account Computationally hard
Pros & Cons
Likelihood Score for Structures
First cut approach:
Use likelihood function
Recall, the likelihood score for a network structure and parameters is L( G, G : D) = P( x1 [ m], , x n [ m] : G, G )
= P( x i [ m] | PaiG[ m] : G, G,i )
Since we know how to maximize parameters from now we assume
L(G : D ) = m ax
G
m
m
i
L(G, G : D )
Likelihood Score for Structure (cont.)
Rearranging terms:
l (G : D) = lo L(G : D) g
where i H(X) is the entropy of X I(X;Y) is the mutual information between X and Y
= M I (Xi ;P iG) -H (Xi ) a
(
)
I(X;Y) measures how much "information" each variables provides about the other I(X;Y) 0 I(X;Y) = 0 iff X and Y are independent I(X;Y) = H(X) iff X is totally predictable given Y
Likelihood Score for Structure (cont.)
l (G : D) = M I (Xi ;P iG ) -H (Xi ) a
Good news: Intuitive explanation of likelihood score:
i
(
)
Likelihood as a compromise among dependencies, based on their strength
The larger the dependency of each variable on its parents, the higher the score
Likelihood Score for Structure (cont.)
l (G : D) = M I (Xi ;P iG ) -H (Xi ) a
Bad news: Adding arcs always helps
I(X;Y) I (X;Y,Z) Maximal score attained by fully connected networks Such networks can overfit the data parameters capture the noise in the data
i
(
)
Avoiding Overfitting
"Classic" issue in learning. Approaches: Restricting the hypotheses space Minimum description length Bayesian methods
Limits the overfitting capability of the learner Example: restrict # of parents or # of parameters Description length measures complexity Prefer models that compactly describes the training data Average over all possible parameter values Use prior knowledge
Bayesian Inference
Bayesian Reasoningcompute expectation over unknown G P( x[ M + 1] | D) = P( x[ M + 1] | D, G)P( G | D)
G
Assumption: Gs are mutually exclusive and exhaustive We know how to compute P(x[M+1]| G,D) How do we compute P(G| D)?
Same as prediction with fixed structure
Posterior Score
Using Bayes rule:
Marginal likelihood Prior over structures
P(D | G )P(G ) P(G | D ) = P(D )
Probability of Data
P(D) is the same for all structures G Can be ignored when comparing structures
Marginal Likelihood
By introduction of variables, we have that
P(D | G ) = P(D | G, )P( | G ) d
Likelihood Prior over parameters
This integral measures sensitivity to choice of parameters
Marginal Likelihood for General Network
The marginal likelihood has the form:
P(D | G ) =
i paiG
( pa iG ) + N ( pa iG )
(
( pa iG )
(
)
)
xi
( ( x i , pa iG ) + N ( x i , pa iG )) ( ( x i , pa iG ))
Dirichlet Marginal Likelihood For the sequence of values of Xi when
where N(..) are the counts from the data (..) are the hyperparameters for each family given G
Xi's parents have a particular value
Priors
We need: prior counts (..) for each network structure G This can be a formidable task
There are exponentially many structures...
BDe Score
Possible solution: The BDe prior Represent prior using two elements M0, B0
M0 equivalent sample size B0 network representing the prior probability of events
BDe Score
Intuition: M0 prior examples distributed by B0 Set (xi,paiG) = M0 P(xi,paiG| B0)
Note that paiG are not the same as the parents of Xi in B0. Compute P(xi,paiG| B0) using standard inference procedures Equivalent networks are assigned the same score
Such priors have desirable theoretical properties
Bayesian Score: Asymptotic Behavior
Theorem: If the prior P( | G) is "wellbehaved", then
lo P(D | G) = l (G : D) - g lo M g d ( G) +O(1 im ) 2
Asymptotic Behavior: Consequences
lo M g lo P(D | G) = l (G : D) - g d ( G) +O(1 im ) 2 Bayesian score is consistent
Observed data eventually overrides prior information
As M the "true" structure G* maximizes the score (almost surely) For sufficiently large M, the maximal scoring structures are equivalent to G* Assuming that the prior assigns positive probability to all cases
Asymptotic Behavior
So ( c re lo M g G : D) = l (G : D) - d ( G) im 2
This score can also be justified by the Minimal Description Length (MDL) principle This equation explicitly shows the tradeoff between
Fitness to data likelihood term Penalty for complexity regularization term
Scores Summary
Likelihood, MDL, (log) BDe have the form
S core (G : D ) = S core ( X i | PaiG : N ( X i Pai ))
i
BDe requires assessing prior network. It can naturally incorporate prior knowledge and previous experience BDe is consistent and asymptotically equivalent (up to a constant) to MDL All are scoreequivalent
G equivalent to G' core = S (G') S (G) core
Optimization Problem
Input:
Training data Scoring function (including priors, if needed) Set of possible structures
Including prior knowledge about structure
Output:
A network (or networks) that maximize the score
Key Property:
Decomposability: the score of a network is a sum of terms.
Heuristic Search
We address the problem by using heuristic search Define a search space: Traverse this space looking for highscoring structures Search techniques:
Greedy hillclimbing Best first search Simulated Annealing ... nodes are possible structures edges denote adjacency of structures
Heuristic Search (cont.)
Typical operations:
S E S E D C
A
C
D C dd
D
S E D
C
D
C te ele
E
Re ver se
C E
S E D
C
Exploiting Decomposability in Local Search
S E D S E D C C S E D S E D C C
Caching: To update the score of after a local change, we only need to rescore the families that were changed in the last move
Greedy HillClimbing
Simplest heuristic local search
Start with a given network
empty network best tree a random network
At each iteration
Each step requires evaluating approximately n new changes
Stop when no modification improves score
Evaluate all possible changes Apply change that leads to best improvement in score Reiterate
Greedy HillClimbing: Possible Pitfalls
Greedy HillClimbing can get struck in:
Local Maxima: Plateaus:
All oneedge changes reduce the score Some oneedge changes leave the score unchanged Happens because equivalent networks received the same score and are neighbors in the search space
Both occur during structure search Standard heuristics can escape both
Random restarts TABU search
Search: Summary
Discrete optimization problem In general, NPHard
Decomposability Sufficient statistics
Need to resort to heuristic search In practice, search is relatively fast (~100 vars in ~10 min):
In some cases, we can reduce the search problem to an easy optimization problem
Example: learning trees
Graphical Models Intro Summary
Representations
Graphs are cool way to put constraints on distributions, so that you can say lots of stuff without even looking at the numbers!
Inference
GM let you compute all kinds of different probabilities efficiently
Learning
You can even learn them automagically!
Textbooks related to the document above:
Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more.
Course Hero has millions of course specific materials providing students with the best way to expand
their education.
Below is a small sample set of documents:
Maryland - RAINFALL - 2
EFFECT OF THE INTRASEASONAL WIND FLUCTUATIONS IN THE WEST AFRICAN MONSOON ON AIR-SEA FLUXES. Semyon A. Grodsky, and James A. Carton Department of Meteorology, University of Maryland, College Park, MD 20742, USA E-mail: senya@ocean2.umd.edu Abstract.
Maryland - AOSC - 200
METO 200 Lesson 12Fig. 7-1, p. 189Fig. 7-2, p. 190Hadley CellComposite (clouds, surface temperature (colors) image. Note the line of clouds along the ITCZFig. 7.10GLOBAL CIRCULATION George Hadley first suggested in 1735 the general conce
Maryland - AOSC - 200
AOSC 200 Lesson 19Tornado over College Park, 10/23/01Box 11-1, p. 329Geographic distribution of the month of maximum tornado threa.tFig. 11-30, p. 337TORNADO DERIVED FROM SPANISH WORD `TORNADA' THUNDERSTORM TORNADOS ARISE FROM SEVERE THU
Maryland - ECON - 460
Econ 460 Industrial Organization Fall 2008 Section 0101Instructor: Stephen Hutton Contact information: Office: Tyd 3115N Email: hutton@econ.umd.edu Class information: Class time: Tue, Thur 11-12:15 Class location: Tyd 2108. Office hours: Thur 1-3pm
Maryland - JUNE - 19
UPPER AIR CALCULATIONS AND PLOTTING (Ver 5.014-LINUX-X11) Current filename: /noaaport/nwstg/convert/01061912_upa.wxp Date: 1200Z 19 JUN 01 Searching for KWAL. Searching the city database file for: KWAL . Date:1200Z 19 JUN 01 Station: KWAL WMO ident:
Maryland - JUNE - 19
811 FXUS61 KCTP 190803 CCA AFDCTP AREA FORECAST DISCUSSION.CORRECTED NATIONAL WEATHER SERVICE STATE COLLEGE PA 335 AM EDT TUE JUN 19 2001 MAINLY CLEAR AND TRANQUIL NIGHT ACROSS ALL OF CENTRAL PENN THANKS TO HIGH PRESSURE CENTERED ACROSS ERN VIRGINIA.
Maryland - NEOPS - 2001
037 FXUS61 KPHI 130654 AFDPHI AREA FORECAST DISCUSSION NATIONAL WEATHER SERVICE MOUNT HOLLY NJ 250 AM EDT FRI JUL 13 2001 IT SEEMS AS THOUGH.LIKE YESTERDAY.WE/LL START OFF WITH SUNNY SKIES IN THE MORNING FOLLOWED BY A FAIR AMOUNT OF CUMULUS DEVELOPME
Maryland - EDMS - 769
NELS88-90 Student Data FileCodebook for Student01.savFebruary 13, 2001For information about this file contact: Bob Croninger University of Maryland College of Education Education Policy & Leadership 2110 Benjamin Building College Park, MD 20742
Maryland - PUAF - 640
SCHOOL OF PUBLIC POLICY UNIVERSITY OF MARYLAND PUAF 640: Microeconomics and Policy Analysis Fall 2008 PROBLEM SET 5 SOLUTIONSQuestion 1: Bicycle manufacturers have roughly the following production function: X = 5*L*K2 where X is the number of bicyc
Maryland - CMSC - 412
CMSC412Project 0CMSC412 Spr'06The usual Info: http:/www.cs.umd.edu/class/spring2006/cmsc412 Webforum/Project page linked off of the webpage Recitation: CSI 2118; Mon Wed(?): 11-11:50 (?), 12-12:50 (?) TA: Saurabh Srivastava, saurabhs@cs
Maryland - CLOUD - 2008
Cloud Computing Lecture #4Graph Algorithms with MapReduceJimmy Lin The iSchool University of Maryland Wednesday, February 6, 2008Material adapted from slides by Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed C
Maryland - CLOUD - 9
94650971256671311158207111459161188319912857908115814191128758294202098709207111607361261012885535358709207125182241104407796509601188625477782838709207860178374942849298727897333911044077125182249568045855353597843639650
Maryland - BCHM - 676
Statistical Significance for Peptide Identification by Tandem Mass SpectrometryNathan EdwardsCenter for Bioinformatics and Computational Biology University of Maryland, College ParkHigh Quality Peptide Identification: E-value < 10-82Moderate
Maryland - CS - 420
Fibonacci Heap notes: a f-heap is a lazy binomial heap. so, we willhave to have a few definitions for you to have more than mere words togo by, after you DRAW the definitions.just a thought. (these are noton the quiz. merely the final exam.)1)
Maryland - CS - 420
Memory Management-Memory allocation -Garbage collectionMemory Allocation Memory pool: large block of contiguous memory Memory manager allocates memory by returning a handle to the user Use the term heap to refer to free memory accessed by a dyn
Maryland - PUAF - 741
PS2.1: Counting Dentists U.S. population ~ 300 million Each person visits the dentist twice a year A dentist can do 2 visits/hour ~ 16 visits/d Dentists work 5 d/wk, 50 wk/y ~ 250 d/y6300 U 10 2 16 250 : 150,000 PUAF 741 PS
Maryland - ECON - 12
Economics 435 Summer 2008Midterm Exam SolutionThis is a closed-book exam. You must remain in the room when you write your examination. You may not refer to any books, notes, or other material, but you may use a calculator (but not a "programmable"
Maryland - ECON - 435
Economics 435 Summer 2008Midterm Exam SolutionThis is a closed-book exam. You must remain in the room when you write your examination. You may not refer to any books, notes, or other material, but you may use a calculator (but not a "programmable"
Maryland - AMSC - 662
15-213"The course that gives CMU its Zip!"The Memory Hierarchy Oct. 3, 2002Topics Storage technologies and trends Locality of reference Caching in the memory hierarchyclass12.pptRandom-Access Memory (RAM)Key features RAM is packaged
Maryland - AMSC - 662
Homework Problems, page 842, Bryant and O'Hallaron's book.12.6A. Modify Tiny so that it echoes every request line and requestheader.B. Use your favorite browser to make a request to Tiny for staticcontent. Capture the output from a Tiny in a
Maryland - CMSC - 498
Scale Free NetworksIntro Very large real networks (millions or billions of nodes and edges) Occurring in nature, society, economy and technology Evolving (growing) in time rather than designed. Examples: Internet and WWW Many networks in natu
Maryland - INFM - 220
Intelligence Community and Classified InformationIntelligence and Information Information is data Gathered through a variety of means Intelligence is the result of analyzing information Intelligence can be public (open source) or classified S
Maryland - CMSC - 818
Information DynamicsA Fresh Look at Information Its Properties and ImplicationsAshok K Agrawala University of Maryland College park, MD 20742 Agrawala@cs.umd.edu (301) 4052525Jan 2009CollaboratorsChristian AlmazanRon Larsen Udaya Shankar
Maryland - CMSC - 818
Project Title Multilingual Search Engine Project members Ms Dhanya K Ms Dhivya Ms Geetha Lekshmy V Project Objective Develop a web based system that searches information based on the meaning of the native language text given and informati
Maryland - CMSC - 828
Shock Graphs and Shape MatchingKaleem Siddiqi, Ali Shokoufandeh, Sven Dickinson and Steven ZuckerThe Skeleton: Blum's Medial Axis A connected collection of curves. The set of all points within a closed, Jordan curve such that the largest circ
Maryland - ENEE - 114
8 5 4 8 5 7 2 -5 -1 * + / + % * +3 2 -5 -2 + /4 5 4 8 -1 % * +5 -5 7 3 5 -8 * / - %4 -12 4 -5 -5 / * -4 12 4 -5 -5 / * +5 9 1 5 3 -4 + * % *
Maryland - ENEE - 2
8 5 4 8 5 7 2 -5 -1 * + / + % * +3 2 -5 -2 + /4 5 4 8 -1 % * +5 -5 7 3 5 -8 * / - %4 -12 4 -5 -5 / * -4 12 4 -5 -5 / * +5 9 1 5 3 -4 + * % *
Maryland - PROJECT - 114
8 5 4 8 5 7 2 -5 -1 * + / + % * +3 2 -5 -2 + /4 5 4 8 -1 % * +5 -5 7 3 5 -8 * / - %4 -12 4 -5 -5 / * -4 12 4 -5 -5 / * +5 9 1 5 3 -4 + * % *
Maryland - ECON - 200
10ExternalitiesPRINCIPLES OFMICROECONOMICSFOURTH EDITIONN. G R E G O R Y M A N K I WPowerPoint Slides 2007 Thomson South-Western, all rights reservedIn this chapter, look for the answers to these questions: What is an externality? Why
Maryland - CLOUD - 2008
Cloud Computing Lecture #1Parallel and Distributed ComputingJimmy Lin The iSchool University of Maryland Monday, January 28, 2008Material adapted from slides by Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed C
Maryland - CHEM - 231
Chapter 4 Reactions of Alkenes.Things to Know (1) A + B C product (C) of the reaction; stereo and regiochemistries if appropriate (2) Here's a target (C); how would you make it (3) Mechanisms The alkene reactions. alkanes alkyl halides alcohols halo
Maryland - G - 342
GEOL 342 Sedimentation and Stratigraphy Lecture 1: Introduction 26 January 2006 Assoc. Prof. A. Jay KaufmanIntroduction to Sedimentation and StratigraphyGoals of this course 1. 2. 3. 4. To explore the origin and evolution of sediments and sediment
Maryland - ENEE - 719
Visual Surveillance ChipVisual Surveillance Chip1. Visual Change Detection 2. Photoreceptors and change detection 3. superior colliculus / optic tectumOur eyes generally move to gather information that we seek, but very often our eyes move to l
Maryland - ENEE - 10
ENEE 204 HW #10 Due Wednesday April 19th 2000 Read Text : Chapter 7.4 - 7.5The following circuits are excited by step function sources. Find the initial conditions at t=0+, the steady state solution and the roots of characteristic equation for the c
Maryland - ENEE - 204
ENEE 204 HW #10 Due Wednesday April 19th 2000 Read Text : Chapter 7.4 - 7.5The following circuits are excited by step function sources. Find the initial conditions at t=0+, the steady state solution and the roots of characteristic equation for the c
Maryland - HW - 204
ENEE 204 HW #10 Due Wednesday April 19th 2000 Read Text : Chapter 7.4 - 7.5The following circuits are excited by step function sources. Find the initial conditions at t=0+, the steady state solution and the roots of characteristic equation for the c
Maryland - MYSQL - 51
n= 5840 node), split, n, loss, yval, (yprob) * denotes terminal node1) root 5840 1798 pass (0.30787671 0.69212329) 2) sql_mode=ANSI 1741 0 fail (1.00000000 0.00000000) * 3) sql_mode=NULL,STRICT_ALL_TABLES,TRADITIONAL 4099 57 pass
Maryland - MYSQL - 51
n= 5859 node), split, n, loss, yval, (yprob) * denotes terminal node1) root 5859 1839 pass (0.31387609 0.68612391) 2) sql_mode=ANSI 1741 0 fail (1.00000000 0.00000000) * 3) sql_mode=NULL,STRICT_ALL_TABLES,TRADITIONAL 4118 98 pass
Maryland - MYSQL - 51
n= 5862 node), split, n, loss, yval, (yprob) * denotes terminal node1) root 5862 2645 fail (0.54878881 0.45121119) 2) sql_mode=ANSI,TRADITIONAL 3183 0 fail (1.00000000 0.00000000) * 3) sql_mode=NULL,STRICT_ALL_TABLES 2679 34 pass
Maryland - CMSC - 722
Lecture slides for Automated Planning: Theory and PracticeChapter 2 Representations for Classical PlanningDana S. Nau CMSC 722, AI Planning University of Maryland, Spring 2008Dana Nau: Lecture slides for Automated Planning Licensed under the Cre
Maryland - ENCE - 627
Product Design Selection Under UncertaintyBabak BesharatiENCE627 Decision Analysis for Engineering Dec. 3rd 2002OUTLINE OVERVIEW OF THE PROJECT DESIGN VARIABLES AND ATTRIBUTES DESIGN ALTERNATIVE GENERATION THE UNCERTAINTY MODELING DESIGNER'S
Maryland - ENCE - 725
! High Yield Case, Birge and Louveaux, Chapter 1min 150x1 + 230x2+260x3 ! planting costs+ 238y1 + 210y2 ! purchase costs-170w1 - 150w2-36w3-10w4 ! revenuess.t.x1+x2+x3 <=500 ! total land available3.0x1+y1-w1 >=200
Maryland - CMSC - 132
Grader Use Only:CMSC132 Spring 2008 Midterm #2#1 #2 #3 #4 #5 HonorsSoftware Engineering GUI/Inner Classes Threads Graphs Trees(30 pts) (10 pts) (15 pts) (25 pts) (20 pts) (10 pts)First Name: _ Last Name: _ Student ID: _ Discussion TA: _ Dis
Maryland - CMSC - 2
Grader Use Only:CMSC132 Spring 2008 Midterm #2#1 #2 #3 #4 #5 HonorsSoftware Engineering GUI/Inner Classes Threads Graphs Trees(30 pts) (10 pts) (15 pts) (25 pts) (20 pts) (10 pts)First Name: _ Last Name: _ Student ID: _ Discussion TA: _ Dis
Maryland - CMSC - 421
First Order LogicRussell and Norvig: Chapters 8 and 9 CMSC421 Fall 2005Propositional logic is a weak languageHard to identify "individuals." E.g., Mary, 3 Can't directly talk about properties of individuals or relations between individuals. E
Maryland - CS - 412
Project 4 Roadmaphttp:/www.intel.com/design/pentium4/manuals/24547212.pdf pages 3-2, 3-21Mapping kernel memory (theory)Premise: for the kernel, linear to physical mapping is one-to-one Say we have 8MB of physical memory. Answer (on paper) to the
Maryland - SEPT - 2008
Amended ESS Rules of OperationAugust 2008 ESCOP COMMITTEES ESCOP's functions are accomplished principally through the work of its committees and subcommittees. To provide continuity to the committees, a committee chair and a committee chairdesignate
Maryland - MAY - 2001
Island of Dreams, Island of Fears: Conflicting Representations of Ellis IslandRebecca EwingEwing 2"Ellis Island is a reminder of the hope for freedom and prosperity that the United States offered to the poor, tired, hungry, and downtrodden of t
Maryland - CHEMCONF - 98
Date: Fri, 24 Apr 1998 07:59:23 EDTFrom: Donald Rosenthal <ROSEN1@CLVM.CLARKSON.EDU>Subject: ABOUT TOM O'HAVER AND FUTURE ON-LINE CONFERENCESTo: CHEMCONF RegistrantsFrom: Donald Rosenthal and Brian Tissue Chemistry Department
Maryland - PHIL - 140
Benedict Chan 1128 Skinner Building (301) 405-2406 Office Hour: M 11-12, F 11-12 bschan@umd.edu Handout for October 10, 2003 Philosophy 140, Fall 2003 Discussion Sections 201: 9-9:50am JRN 1105 202: 10-10:50am TYD 2110 203: 1-1:50pm TYD 1114 I will
Maryland - COMM - 700
Citation ExerciseYour name: Form manual you are using:The following contain errors. Using either APA or MLA whichever you will be mastering, correct the entries. Some of these will require only your following your handbook; others may require your
Maryland - PHYS - 106
Homework Set 2From "Seeing the Light" Chapter 2: (starting page 68)P2, P5, P12, P14, P18, P21, P23, PM2 Due: Monday, Feb 16 Geometric opticsGeometric Optics: Study of lightrays based on the assumption that light travels on a straight line
Maryland - DPS - 06
Extending the Results of the Spitzer-Deep Impact Experiment to Other Comets and Exo-Systems C.M. Lisse (JHU-Applied Physics Laboratory)(Typical JFC) T1 Ejecta(Oort Cloud)16 um Ejecta ImagingGMCISMPoorly Understood Process of Chemical and Gr
Maryland - BAYESIANIN - 2006
Introduction to Bayesian Inference in Item Response TheoryRobert J. MislevyUniversity of MarylandMarch 31, 2003March 31, 2003University of MarylandSlide 1Topicsq Whatis item response theory (IRT)? q Examples with the Rasch model q A fu
Maryland - BAYESIANIN - 2006
EXAM0001 -1.5841 100011010000001000 00010000010001001001011101EXAM0002 -1.6890 011001000100000010000101101100100100011000001EXAM0003 0.4940 111111111110110111110011111110100111001111100EXAM0004 -0.9810 100000011000 100000 0010000000101111
Maryland - COGPSYCHSP - 2008
1. Drawing on Carroll's paper, comment on the following quotation from Dick Show & David Lohman: Summary test scores, and factors based on them, have often been though of as "signs" indicating the presence of underlying, latent traits. . An alternati
Maryland - HONR - 269
1 HONS 269K Spring 2006Declarations: Contemporary African Arts ASY 3217 Tu, Thurs 12:30 to 1:45 Course DescriptionThis course deals with the arts and artists in African societies since Independence to the present time (+/-1960 to 2000). We shall
Maryland - HONR - 32008
1 HONS 269K Spring 2006Declarations: Contemporary African Arts ASY 3217 Tu, Thurs 12:30 to 1:45 Course DescriptionThis course deals with the arts and artists in African societies since Independence to the present time (+/-1960 to 2000). We shall
Maryland - ARTH - 250
ARTH 250 Art and Archaeology of Ancient America Study guide # 12 Andes IncaSites --Cuzco Sacsahuaman Qoricancha (Golden House) Hunuco Pampa Ollantaytambo Tambo Colorado Pisac Machu Picchu Intihuatana Stone (`Hitching Post of the Sun')Obje
Maryland - ARTH - 32008
ARTH 250 Art and Archaeology of Ancient America Study guide # 12 Andes IncaSites --Cuzco Sacsahuaman Qoricancha (Golden House) Hunuco Pampa Ollantaytambo Tambo Colorado Pisac Machu Picchu Intihuatana Stone (`Hitching Post of the Sun')Obje
Maryland - ARTH - 250
ARTH 250 Study Guide # 4 (Mesoamerica) The Maya Sites Ro Azul Uaxactun Tikal North Acropolis Stela 29 (AD 292) Stela 4 `Curl Nose' (AD 380) Stela 31 `Stormy Sky' (AD 445) Twin Pyramid Complex Temple I (including Burial 116), II, IV (including lintel