McClure - Decision making Samuel McClure May 5 2011 Decision neuroscience • Since the brain is the basis for thought • Hope • With a

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Decision making Samuel McClure May 5, 2011 Decision neuroscience • Since the brain is the basis for thought • Hope: • With a mechanistic understanding of how the brain values and selects among alternatives then we would have an accurate description of decision-making Decision-making • We need to understand how value is handled by the brain • How value is triggered by sensory events • How this leads to choice of actions • A lot is known about this! Outline other players available strategies pay-offs information estimated values of possible actions choice procedure (complex) repeated play Outline • How does the brain support these processes? • What are the consequences of relying on these particular mechanisms? Neural valuation system (Olds and Milner, 1954) • Aimed for reticular formation • Missed by 4mm • Discovered neural substrate that could substitute for rewards Brain stimulation reward (BSR) • Value of brain stimulation scales with amount of current and rate of stimulation Mediated by dopamine amphetamine: DA agonist pimozide: DA antagonist Midbrain dopamine system Common neural currency (Shizgal and Conover, 1996) • BSR substitutes for natural rewards • BSR interacts with natural rewards Computational function Unpredictable Reward Predictable Reward Unpredictable Reward Absence ! theory: responses indicate errors in reward prediction Reinforcement learning • How valuable is it to take some action a in some state of the world s? • Reinforcement learning methods try to learn this through a value function Q(s,a) Q(st , at ) = r(st , at ) + γ r(st+1 , at+1 ) + γ 2 r(st+2 , at+2 ) + γ 3 r(st+3 , at+3 ) + ... ∞ ￿ = γ τ r(st+τ , at+τ ) τ =0 γ ∈ [0, 1] Temporal difference (TD) learning • Q(s,a) is nicely recursive Q(st , at ) = r(st , at ) + γ r(st+1 , at+1 ) + γ 2 r(st+2 , at+2 ) + γ 3 r(st+3 , at+3 ) Q(st+1 , at+1 ) = r(st+1 , at+1 ) + γ r(st+2 , at+2 ) + γ 2 r(st+3 , at+3 ) + γ 3 r(st+4 , at+4 ) γ Q(st+1 , at+1 ) = γ r(st+1 , at+1 ) + γ 2 r(st+2 , at+2 ) + γ 3 r(st+3 , at+3 ) + γ 4 r(st+4 , at+4 ) so, Q(st , at ) = r(st , at ) + γ Q(st+1 , at+1 ) TD learning • Can update estimates on a moment-bymoment basis δ (t) = r(st , at ) + γ Q(st+1 , at+1 ) − Q(st , at ) ∆Q(st , at ) = ηδ (t) • Accounts for response of dopamine neurons Q Neuron responses look like reward prediction error Unpredictable Reward Predictable Reward Unpredictable Reward Absence ! Reward learning (oversimplified) action r V DA Action selection • Thought to occur via competition in striatum Winner-take-all Ivry & Spencer, 2004 • What is the evidence that dopamine has these effects in people? Addiction • All drugs of abuse activate the dopamine system Chocolate Value Value • Simulate prediction errors and thereby increase the value of associated stimuli and actions on each use. Time Cocaine Time Parkinson’s disease • Symptoms apparent when 90-95% of dopamine neurons die • Flat affect • Unable to voluntarily initiate actions • Loss of value signal? Measurable in people with fMRI Linking things together • Game theory / behavioral analyses do not (generally) discuss brain processes like these • Neuroeconomics paper did not discuss game theoretic analysis of choice problems Multiple valuation systems • Pavlovian • see food, approach food • Habitual • dopamine system • Goal-directed • prefrontal cortex Pavlovian values Animal Learning & Behavior 1986, 14 (4), 443-451 An approach through the looking-glass WAYNE A. HERSHBERGER Northern Illinois University, DeKalb, Illinois Forty food-deprived cockerel chicks were tested individually in a straight runway containing a familiar food cup that moved when the chicks moved. The food cup always moved in the same direction as the chick: For 20 experimental chicks it moved twice as far as the chick; for 20 control chicks it moved half as far. In Lewis Carroll’s (1898/1926) picturesque terminology, the experimental chicks were tested in Alice’s "room through the looking-glass," in which, in order to approach the food cup, they had to "walk the other way." Although the control chicks performed well, the experimental chicks evinced the runway behavior that characterizes positive feedback: They persistently chased the food cup away. This means that the spatial polarity of visual feedback is critical and implies that an ordinary approach response is but an automatic (closed-loop) realization of an intended visual perception. Powers (1973a, 1973b, 1978; Powers, Clark, & McFarland, 1960a, 1960b) has asserted that organisms are closed-loop mechanisms that realize their intentions by means of negative feedback. An implication of this thesis is that it should be possible to disable an organism simply by reversing the spatial polarity of its feedback. The experiment reported below tested this hypothesis. "room through the looking-glass," wherein they could approach the food cup only by walking "the other way." The question is, would they "walk the other way," or would they merely run away after the receding food cup, as the closed-loop thesis predicts? Specifically, 40 hungry 4-day-old cockerel chicks were tested individually in a straight runway containing a food Reasoning Depends on prefrontal cortex • Receives input from higher-order sensory areas of the brain • Projects back to these same areas • Enables goal-directed activation of brain areas that is necessary to accomplish abstract goals, simulate future consequences Prefrontal cortex • Required for flexible responding • Lesions can produce environmental dependency syndrome (Lhermitte, 1986) Multiple systems • Different ways to value things • Direct learning based on past experience • Estimation based on diversity of cues and predictions about desires • Clear example: temporal discounting Temporal discounting • The longer you have to wait for something the less it is worth to you now • Would you rather have $10 today or $11 tomorrow? • Would you rather have $10 in a year or $11 in a year and a day? Impulsivity • People tend to be impulsive when things are available immediately • Time-dependent preference reversals: • Patient when options are far out in time • Much less so as sooner option approaches Why? • • • Dual system answer: When everything is off in the future then we rely on our prefrontal cortex alone • How valuable is $10 in a year? Immediate rewards can activate the dopamine system • We have a lot of experience that lets us know how valuable having $10 is now Prediction • Choices involving immediate rewards should activate brain areas associated with dopamine system (e.g. striatum) • All choices should recruit the prefrontal cortex Free Response! 2 s! 12 s! systematically change time to earliest reward • Dopamine-related areas are preferentially activated when there is immediate reward A! MPFC! PCC! VStr! MOFC! 10! T13 ! ! ! x = 4mm % Signal Change! B! y = 8mm VStr! 0.4! MOFC! z = -4mm MPFC! ! 0! PCC! 0.2! 0.0! -0.2! -4! 0! 4! Time (s)! 8! d = Today! d = 2 weeks! d = 1 month! !: Regions that respond equally to rewards at all delays A! B! 1.2! PMA! RPar! DLPFC! DLPFC! RPar! VCtx! VLPFC! LOFC! d = Today! d = 2 weeks! Equally active for all choices LOFC! DLPFC! RPar! ! x = 44mm % Signal Change! • Prefrontal cortex 0.8! 0.4! 0.0! -4! 0! 4! Time (s)! 8! PMA! VCtx! LOFC! SMA! ! x = 0mm 0! T13 ! 10! d = 1 month! Predicting decisions • Choices depend on activity in each of these Normalized Signal Change! two systems 0.05! PFC! 0.0! DA! -0.05! Choose! Early! Choose! Late! Example from game theory • How do people bid in auctions? • Common value auctions Common value auctions • Item for bid has fixed common value: x • Auction participants have unbiased 0 estimates xi drawn from uniform distribution with max error ! • Max and min possible values for x known • n total auction participants 0 are Famous case study (Capen, Clapp, Campbell, 1971) • Oil drilling rights in Gulf of Mexico • How much is the oil worth under a patch of ocean? 0 [xL , xH ] x0 x0 x [x0 − ε, x0 + ε] Bayesian solution x0 x y x • Bid so as to maximize expected earnings • Probability of winning: ρ(y ) y ρ(y ) y ￿ p y = max yj j ￿ = n−1 ￿ i=1 = n−1 ￿ i=1 = ρ(y ) p(yi < y ) y − (x0 − ε) 2ε (2ε)−(n−1) (y − x0 + ε)n−1 x0 − ε x0 − ρ(y ) y xL ￿ ￿x+ε 1 n−1 −(x0 − ρ(x))(x − x0 + ε) ￿ − (x − x0 + ε)n = n xL dE (x, y ) ￿ ￿ ￿ ￿ = 0. ￿ dy 1 y =x −1 n (xL − ρ(x))(x + ε − xL ) + (x + ε − xL )n Expected earnings:+ ε, x − ε] n b(x) − a(x) = 2= x ∈ [x ε • L H ￿ −+ε x1 (2εE (x, y ) (= −(2ε)−xL ) ) x ε− n −(n−1) [x0 − ρ(y )](y − x0 + ε)n−1 dx0 . y x x0 −ε ￿ x￿ 1 y− − (2ε)−(n−1) (x + εx0 xL )n−1 −n(xL − ρ(x))(x + ε= xxL )−1 − 1 − n ρ E (x, y ) x ε xL xH dE (x, y )/dy = 0 ￿ dE (x, y ) ￿ ￿ = 0. dy ￿y=x • Maximize: x ∈ [xL + ε, xH − ε] b(x) − a(x) = 2ε ￿ +ε ρ￿ (x) − n(xL −xρ(x))(x + ε − xL )−1 −n1 1 = 0 E (x, y ) = (2ε)−n [x0 2ε ρ(y )](y − x0 + ε) − dx0 . − n ￿ xε ρ (x)(x + (x− x= ) + −−ε+) − nxL 2ε [x−(xLε ε)] .xL = 0 ρε ) L x nρ(x e− − x − ++ n+1 ￿ • Find differential equation: ρ (x)(x + ε − xL ) + nρ(x) − [x + ε + (n − 1)xL ] = 0. ρ ε xL xH E (x, y ) x x0 −ρ(y )+r r ￿ ∞ • Solve: ρ(x) 2ε − n [x−(xL +ε)] x−ε+ e 2ε . n+1 = • People (and companies) bid above this r E (x, y ) always ￿ x0 −ρ(y )+r ∞ • (2ε) [x0 − ρ(y ) + r](y − x0 + ε)n−1 h(x0 ) dx0 Winner’s curse = (2ε) = −(n−1) 0 −(n−1) (b(x) − a(x)) −1 E (x, y ) = (2ε) −n ￿ ￿ b(x) a(x) [x0 − ρ(y ) + r](y − x0 + ε)n−1 dx0 x ∈ [xL + ε, xH − ε] x+ε x−ε [x0 − ρ(y ) + r](y − x0 + ε)n−1 dx0 Barry Zito 2007 2008 2009 2010 $18M / year 11 wins, 13 losses 10 wins, 17 losses 10 wins, 13 losses 9 wins, 13 losses 2011 0 W, 1 L, 6.23 ERA Jamarcus Russell Neural explanation • Rational strategy is so complicated that people rely on their dopamine system • Behavioral evidence: ! Neural evidence • Use reinforcement learning to fit behavior • Use estimated prediction error signal to analyze brain imaging data • Finding: dopamine target regions show evidence of error signal as people learn to bid The decision neuroscience view on rationality in human behavior • • • Dopamine system is adaptive but rigid • In many circumstances it will lead to optimal behavior, but it is fallible Reasoning (prefrontal cortex) is much more flexible • • It came up with game theory after all But prefrontal cortex is slow and taxing Behavior depends on how the structure and context of the task recruits and favors different value systems ...
View Full Document

This document was uploaded on 05/28/2011.

Ask a homework question - tutors are online