This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Uncertainty CPS 170 Ron Parr Why do we need uncertainty? • Reason: Sh*t happens • Ac?ons don’t have determinis?c outcomes • Can logic be the “language” of AI??? • Problem: General logical statements are almost always false • Truthful and accurate statements about the world would seem to require an endless list of qualiﬁca(ons • How do you start a car? • Call this “The Qualiﬁca?on Problem” 1 The Qualiﬁca?on Problem • Is this a real concern? • YES! • Systems that try to avoid dealing with uncertainty tend to be briSle. • Plans fail • Finding shortest path to goal isn’t that great if the path doesn’t really get you to the goal Probabili?es • Natural way to represent uncertainty • People have intui?ve no?ons about probabili?es • Many of these are wrong or inconsistent • Most people don’t get what probabili?es mean • Finer details of this ques?on s?ll debated 2 Rela?ve Frequencies • Probabili?es deﬁned over events • Space of all possible events is “event space” Event space: A Not A • Think: Playing blindfolded darts with the Venn diagram.. • P(A)~percentage of dart throws that hit A Understanding Probabili?es •
•
•
• Ini?ally, probabili?es are “rela?ve frequencies” This works well for dice and coin ﬂips For more complicated events, this is problema?c What is the probability that the democrats will control Congress in 2012? – This event only happens once – We can’t count frequencies – S?ll seems like a meaningful ques?on • In general, all events are unique • “Reference Class” problem 3 Probabilities and Beliefs
• Suppose I have ﬂipped a coin and hidden the outcome • What is P(Heads)? • Note that this is a statement about a belief, not a statement about the world • The world is in exactly one state and it is in that state with probability 1. • Assigning truth values to probability statements is very tricky business • Must reference speakers state of knowledge Frequen?sm and Subjec?vism • Frequen?sts hold that probabili?es must come from rela?ve frequencies • This is a purist viewpoint • This is corrupted by the fact that rela?ve frequencies are ocen unobtainable • Ocen requires complicated and convoluted assump?ons to come up with probabili?es • Subjec?vists: probabili?es are degrees of belief – Taints purity of probabili?es – Ocen more prac?cal 4 The Middle Ground • No two events are ever iden?cal, but • No two events are ever totally unique either • Probability that Obama will be elected in 2012? – He won once before – Condi?ons in next elec?on will be similar, but not iden?cal – Opponent will most likely be diﬀerent • In reality, we use probabili?es as beliefs, but we allow data (rela?ve frequencies) to inﬂuence these beliefs • More precisely: We can use Bayes rule to combine our prior beliefs with new data Why probabili?es are good • Subjec?vists: probabili?es are degrees of belief • Are all degrees of belief probability? – AI has used many no?ons of belief: • Certainty Factors • Fuzzy Logic • Can prove that a person who holds a system of beliefs inconsistent with probability theory can be tricked into accepted a sequence of bets that is guaranteed to lose (Dutch book) 5 So, what are probabili?es really? • Probabili?es are deﬁned over random variables • Random variables are usually represented with capitals: X,Y,Z • Random variables take on values from a ﬁnite domain d
(X), d(Y), d(Z) • We use lower case leSers for values from domains • X=x asserts: RV X has taken on value x • P(x) is shorthand for P(X=x) Event spaces for binary, discrete RVs • 2 variable case • Important: Event space grows exponen?ally in number of random variables • Components of event space = atomic events 6 Domains
• In the simplest case, domains are Boolean • In general may include many diﬀerent values • Most general case: domains may be con?nuous • This introduces some special complica?ons Kolmogorov’s axioms of probability • 0<=P(a)<=1 • P(true) = 1; P(false)=0 • P(a or b) = P(a) + P(b) – P(a and b) • Subtract to correct for double coun?ng • This is suﬃcient to completely specify probability theory for discrete variables • Con?nuous variables need density func(ons 7 Atomic Events • When several variables are involved, it is useful to think about atomic events • An atomic event is a complete assignment to
variables in the domain (compare with states in search) • Atomic events are mutually exclusive • Exhaust space of all possible events • For n binary variables, how many unique atomic events are there? Joint Distribu?ons • A joint distribu?on is an assignment of probabili?es to every possible atomic event • We can deﬁne all other probabili?es in terms of the joint probabili?es by marginaliza(on: P(a) = P(a ∧ b) + P(a ∧ ¬b)
P (a ) = ∑ P (e i )
e i ∈e (a )
€
€ 8 Example • P(cold ⋀ headache) = 0.4 • P(¬cold ⋀ headache) = 0.2 • P(cold ⋀ ¬ headache) = 0.3 • P(¬ cold ⋀ ¬ headache) = 0.1 • What are P(cold) and P(headache)? Independence • If A and B are independent: P(A ⋀ B) = P(A)P(B) •
•
•
• P(cold ⋀ headache) = 0.4 P(¬cold ⋀ headache) = 0.2 P(cold ⋀ ¬ headache) = 0.3 P(¬ cold ⋀ ¬ headache) = 0.1 • Are cold and headache independent? 9 Independence • If A and B are mutually exclusive: P(A ∨ B) = P(A)+P(B) (Why?) • Examples of independent events: –
–
–
– Duke winning NCAA, Dem. winning white house Two successive, fair coin ﬂips My car star?ng and my iPod working etc. • Can independent events be mutually exclusive? Independence
• Convenient when it occurs, but don’t count on it • When you have it: – P(A and B) = P(A)*P(B) – P(A or B) = P(A) + P(B) – P(A)P(B) • Special cases: Disjoint events, perfectly correlated events 10 Why Probabili?es Are Messy • Probabili?es are not truth
func?onal • To compute P(a and b) we need to consult the joint distribu?on –
–
–
– sum out all of the other variables from the distribu?on It is not a func?on of P(a) and P(b) It is not a func?on of P(a) and P(b) It is not a func?on of P(a) and P(b) • This fact led to many approxima?ons methods such as certainty factors and fuzzy logic (Why?) • Neat vs. Scruﬀy… The Scruﬀy Trap • Reasoning about probabili?es correctly requires knowledge of the joint distribu?on • This is exponen?ally large • Very convenient to assume independence • Assuming independence when there is not independence leads to incorrect answers • Examples: – ANDing symptoms – ORing symptoms 11 Condi?onal Probabili?es • Ordinary probabili?es for random variables:
uncondi(onal or prior probabili?es • P(ab) = P(a AND b)/P(b) • This tells us the probability of a given that we know only b • If we know c and d, we can’t use P(ab) directly (without addi?onal assump?ons) • Annoying, but solves the qualiﬁca?on problem… Probability Solves the Qualiﬁca?on Problem • P(diseasesymptom1) • This deﬁnes the probability of a disease given that we have observed only symptom1 • The condi?oning bar indicates that the probability is deﬁned with respect to a par?cular state of knowledge, not as an absolute thing 12 Condi?on with Bayes’s Rule P( A ∧ B) = P(B ∧ A)
P( A  B)P(B) = P(B  A)P( A)
P(B  A)P( A)
P( A  B) =
P(B)
Note that we will usually call Bayes’s rules “Bayes Rule” €
Condi?oning and Belief Update • Suppose we know P(ABCDE) • Observe B=b, update our beliefs: P( ACDE  b) =
Joint P( ABCDE )
P( ABCDE )
=
P (b )
∑ P( AbCDE )
ACDE Nota?on comment: This is a very condensed nota?on. P(ACDEb) is not a number; it’s a distribu/on €
13 Example Revisited • P(cold ⋀ headache) = 0.4 • P(¬cold ⋀ headache) = 0.2 • P(cold ⋀ ¬ headache) = 0.3 • P(¬ cold ⋀ ¬ headache) = 0.1 • What is P(coldheadache)? Let’s Play Doctor • Suppose P(cold) = 0.7, P(headache) = 0.6 • P(headachecold) = 0.57 • What is P(coldheadache)? P(h  c )P(c )
P(h)
0.57 * 0.7
=
= 0.665
0.6 P(c  h) =
• IMPORTANT: Not always symmetric € 14 Expecta?on • Most of us use expecta?on in some form when we compute averages • What is the average value of a die roll? • (1+2+3+4+5+6)/6 = 3.5 Bias • What if not all events are equally likely? • Suppose weighted die makes 6 2X more likely that anything else. What is average value of outcome? • (1 + 2 + 3 + 4 + 5 + 6 + 6)/7 = 3.86 • Probs: 1/7 for 1…5, and 2/7 for 6 • (1 + 2 + 3 + 4 + 5)*1/7 + 6 * 2/7 = 3.86 15 Expecta?on in General • Suppose we have some RV X • Suppose we have some func?on f(X) • What is the expected value of f(X)? E f ( x ) = ∑ P(X ) f (X )
x
x
€ Sums of Expecta?ons • Suppose we have f(X) and g(Y). • What is the expected value of f(X)+g(Y)? E f ( X ) + g(Y ) = ∑ P( X ∧ Y )( f ( X ) + g(Y )) XY XY = ∑ P( X ∧ Y ) f ( X )a?oP( X ∧ Y )g(Y )
+∑ n ct
xp e f E
i yo
r=t∑ ∑ P(X ∧ Y ) f (X ) +∑ ∑ P(X ∧ Y )g(Y )
a
Line
= ∑ f ( x )∑ P( X ∧ Y ) +∑ g(Y )∑ P( X ∧ Y )
XY X X Y XY Y Y X Y X = ∑ f ( x )P( X ) + ∑ g(Y )∑ P( X ∧ Y )
X Y X = E f ( X ) + E f (Y )
X Y € 16 Con?nuous Random Variables • Domain is some interval, region, or union of regions • Uniform case: Simplest to visualize (event probability is propor?onal to area) • Non
uniform case visualized with extra dimension 0.16 (1/(2*pi)) * exp(
(x**2)/2) 0.14 Gaussian (normal/bell) distribu?on: 0.12 0.1 0.08 0.06 0.04 0.02 0
10
5 0 5 10 Upda?ng Kolmogrov’s Axioms • Use lower case for probability density • Use end of the alphabet for con?nuous vars • For discrete events: 0 ≤ P(a) ≤ 1 • For densi?es: 0 ≤ p(x) • Is p(x)>1 possible??? 17 Requirements on Con?nuous Distribu?ons • p(x)>1 is possible so long as: • Don’t confuse p(x) and P(X=x) • P(X=x) for any x is 0! Cumula?ve Distribu?ons • When distribu?on is over numbers, we can ask: – P(X>=c) for some c – P(X<c) for some c – P(a<=X<=b) for some, a and b • Solve by – Summa?on – Integra?on • Cumula?ve some?mes called – CDF – Distribu?on func?on 18 Sloppy Comment about Con?nuous Distribu?ons • In many, many cases, you can generalize what you know about discrete distribu?ons to con?nuous distribu?ons, replacing “p” with “P” and “Σ” with “∫” • Proper treatment of this topic requires measure theory and is beyond the scope of the text and class Probability Conclusions
• Probabilis?c reasoning has many advantages: – Solves qualiﬁca?on problem – Is beSer than any other system of beliefs (Dutch book argument) • Probabilis?c reasoning is tricky – Some things decompose nicely: linearity of expecta?on, conjunc?ons of independent events, disjunc?ons of disjoint events – Some things can be counterintui?ve at ﬁrst: conjunc?ons of arbitrary events, condi?onal probability • Reasoning eﬃciently with probabili?es poses signiﬁcant data structure and algorithmic challenges for AI (Roughly speaking, the AI community realized some ?me around 1990 that probabili?es were the right thing and has spent the last 20 years grappling with this realiza?on.) 19 ...
View
Full
Document
 Spring '11
 Parr
 Artificial Intelligence

Click to edit the document details