103_5_full

103_5_full - PSYC 103 Winter 2011 Lecture 5 What is the...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: PSYC 103 Winter 2011 Lecture 5 What is the name of the technique in which a dislike is acquired to an attractive flavor by making an animal ill after consuming it? a. Conditioned inhibition b. Extinction c. Punishment d. Appetitive conditioning e. Taste aversion conditioning What, according to Lloyd Morgan's (1894) canon, is the best explanation for animal behavior? a. One that makes no appeal to mental states b. One that refers to the simplest psychological mechanisms c. One that refers to the most complex psychological mechanisms d. One that does not appeal to human psychological mechanisms e. One that is based upon rigorous experimentation The biologically significant stimulus that is presented after a neutral stimulus in Pavlovian conditioning is referred to as what? a. The unconditioned stimulus b. The conditioned stimulus c. The discriminative stimulus d. The cue e. The significant stimulus Training in which a conditioned stimulus signals the occurrence of an unconditioned stimulus is known as what? a. Instrumental conditioning b. Operant conditioning c. Excitatory conditioning d. Stimulus substitution e. Inhibitory conditioning Autoshaping is an example of: a. Pavlovian conditioning b. Instrumental conditioning c. Negative reinforcement d. Aversive conditioning e. Conditioned suppression In Pavlovian conditioning, a conditioned response that opposes the response elicited by the unconditioned stimulus is called a/an ________ conditioned response. a. Consummatory b. Reflexive c. Alpha d. Preparatory e. Compensatory Rescorla–Wagner model Conditioning with a single CS Associative strength: Strength of the connection between internal representations of the CS and US. The change in associative strength ( ΔV ) on each conditioning is determined by: ΔV = α(λ-V) α = Learning rate parameter determined by the properties of the CS & US V= Current strength of the CS→US association λ = Magnitude of the US, reflects the maximum CS →US association (learning) possible Kamin’s “surprise” is expressed as (λ-V) : “What you got” - “What you expected” Learning on each trial ‘ΔV’ is proportional to (λ - V) 4 Rescorla–Wagner model Conditioning with a single CS ΔV = α(λ-V) An application to conditioning with a single CS Associative Strength (V) Before conditioning (Trial 0) assume λ = 100, α = 0.2, and, as no learning has yet taken place, V = 0 60 50 40 30 20 10 0 Trial 0 Trial 1 Trial 2 Trial 3 Trial 4 Trial 5 Rescorla–Wagner model Conditioning with a single CS ΔV = α(λ-V) An application to conditioning with a single CS On Trial 1 therefore the change in associative strength will be: ΔV = 0.2(100-0) = 20 Associative Strength (V) 60 50 40 30 20 10 0 Trial 0 Trial 1 Trial 2 Trial 3 Trial 4 Trial 6 Rescorla–Wagner model Conditioning with a single CS ΔV = α(λ-V) An application to conditioning with a single CS On Trial 2, the change in associative strength will be less, for now V= 20 Associative Strength (V) ΔV = 0.2(100-20) = 16 60 50 40 30 20 10 0 Trial 0 Trial 1 Trial 2 Trial 3 Trial 4 Trial 7 Rescorla–Wagner model Conditioning with a single CS ΔV = α(λ-V) An application to conditioning with a single CS On Trial 3, the change in associative strength will be even less, for now V = 36 ΔV = 0.2(100-36) = 12.8 Associative Strength (V) 60 50 40 30 20 10 0 Trial 0 Trial 1 Trial 2 Trial 3 Trial 4 Trial 8 Rescorla–Wagner model Conditioning with a single CS ΔV = α(λ-V) An application to conditioning with a single CS On Trial 4, the change in associative strength will be even less, for now V= 49 ΔV = 0.2(100- 48.8) = 10.2 Associative Strength (V) 60 50 40 30 20 10 0 Trial 0 Trial 1 Trial 2 Trial 3 Trial 4 Trial And so on until V = λ (100), and therefore ΔV = 0 9 Surprise and conditioning: Rescorla–Wagner model Conditioning with a compound CS Blocking Group Element conditioning Compound conditioning Test Group E Noise → Shock Noise & Light → Shock Light Group C - Noise & Light → Shock Light - V Noise = λ after the element conditioning for Group E Therefore (λ-VALL) = 0 during compound conditioning -  No increments in associative strength will take place to the light Overshadowing -  After compound conditioning in Group C, V Noise + V Light = λ -  Therefore V Noise or V Light alone < λ - Conditioning to the light alone would result in V Light = λ 10 Surprise and conditioning: Rescorla–Wagner model Conditioning with a compound CS Inhibitory conditioning -  Example Training: Noise → Shock Light + Noise → no Shock -  As a consequence of this training: -  V Noise = λ -  (λ-VALL) takes on a negative value during compound conditioning because λ = 0 -  V Light starts at 0, compound conditioning drives V Light negative ( to - λ) 11 Surprise and conditioning: Rescorla–Wagner model Evaluation of the model Pros Cons Can explain many of the known phenomena in animal learning, even counter-intuitive ones Does not poses a mechanism to explain “attentional” phenomena, e.g. latent inhibition Expressed in formal, mathematical terms, can make specific predictions Fails to explain how a surprising reduction in the number of USs in stage 2 of a blocking experiment generates excitatory learning (Dickinson, Hall & Mackintosh, 1976) Important senior relative of many more contemporary theories of learning. Can be applied beyond the study of animal learning, e.g. Siegel & Allen (1996) Provides an incomplete account of discrimination and configural learning (Pearce, 1987) 12 INSTRUMENTAL CONDITIONING Operant (Instrumental) Conditioning In pavlovian conditioning: US –> UR In Instrumental Conditioning, presentation of US depends on the animal’s behavior The response is ‘Instrumental’ in producing the outcome. Stimulus → Response → Outcome Instrumental or Operant Behavior reinforcer Thorndike (1874-1949) Puzzle box Law of effect: Of several responses made to the same situation, those which are accompanied or closely followed by satisfaction to the animal will, other things being equal, be more firmly connected with the situation Contrast with Pavlovian Conditioning: US –> UR (mostly involuntary); E.L. Thorndike (1874 - 1949) •  Automatically happens. No “thought” required •  Very simple principles could be used to explain all of behavior •  Discrete-trial method: subject performs the instrumental response only at certain times determined by the experimenter Discrete-trial apparatuses S: Start box G: Goal Straight-alley runway T-maze What are the problems with a discrete-trail approach? B.F. Skinner (1904 - 1990) The free-operant method: subject, rather than the experimenter decides when to start the next trial Dependent measure: “rate of responding”, i.e. how often the operant response is initiated Measuring operant behavior Cumulative record: graphs of the (total) number of responses as a function of time Establishing an Operant Response: Shaping Shaping: the gradual change from some initial behavior to some desired target response; reinforcement of successive approximations Shaping new responses - variability is key The Conditions of Learning R-O contingencies Contingency Positive: Response receives reinforcer Reinforcing consequence Pleasant Aversive Positive reinforcement Punishment Response increases Response decreases Negative: Response prevents reinforcer Omission training: response cost Response decreases Negative reinforcement Escape Response increases The Conditions of Learning R-O contingencies Stimulus → Response → Outcome Positive Reinforcement: Delivery of a stimulus shortly following a response to increase the future probability of that response. A hungry rat presses a bar to receives food. The rat's behavior of pressing the bar is strengthened by the consequence of receiving food. Negative Reinforcement: Removal of an aversive stimulus shortly following a response to increase the future probability of that response. A rat presses a bar to stop a foot shock. Punishment: Delivery of an aversive stimulus shortly following a response to decrease the future probability of that response. A rat presses a bar and receives a mild electrical shock on its feet. The Conditions of Learning Contiguity -  Learning occurs when response is immediately followed by reinforcement -  Logan (1960) Trained rats to run down an alley for food: Speed of running was fastest if food available immediately rat enters goal box. Speed of running directly related to delay: Gradient of delay. -  Lattal & Gleeson (1990) Instrumental responding still possible with a considerable delay: Delay of 30 sec between response and reinforcement Rate of responding was slow, but was still maintained. 25 Conditions of learning Response-reinforcer contingency Hammond (1980) -  Experimental session divided into 1-sec intervals -  If rat presses lever, US delivered at the end of that 1-sec interval with p=0.12 -  At end of training, response rate ≈ 50 responses/min -  Another group given identical training, but US also delivered during intervals that did not contain a response, with p=0.12 -  At end of training, response rate ≈ 15 responses/min Molar theory of reinforcement Baum (1973) Instrumental conditioning effective IF responding increases the rate of reinforcement BUT, Thomas (1981) showed instrumental conditioning effective in rats when: -  lever pressing had no effect on rate of reinforcement -  lever pressing decreased the rate of reinforcement 26 (ii) Conditions of learning What makes a good reinforcer? (1) Thorndike (1911): A satisfying state of affairs: “One which the animal does nothing to avoid, often doing things which maintain or renew it “ Circular definition! (2) Premack Principle Premack (1959) Reinforcers = opportunities to engage in behavior, e.g. eating If two activities (A and B) are freely available, and A is chosen over B, then A is said to be a reinforcer for B (3) Can only be determined empirically (skinner) 27 The consequences of behavior Reinforcer Stimulus → Response → Outcome any event that strengthens the behavior it follows Primary Reinforcer (typically) Pleasing stimulus Motivation: satisfies a biological need (food, water, sex, drugs) Size and Temporal proximity matter: Immediate reinforcement is most effective Can lead to superstition Secondary (conditioned) Reinforcer Learned through association with primary reinforcer Overcomes ineffectiveness of delayed reinforcement Marking stimulus: can also overcome effects of delayed reinforcement (i)  The nature of instrumental learning Historical background Thorndike (1898) -  Law of effect: Whenever a response is followed by a reinforcer it strengthens an S-R connection - Formed basis of theorizing by Guthrie (1935) and Hull (1943) -  Unintuitive? Does not allow animals to anticipate outcome Tolman (1932) -  Animals form R-US associations Elliot (1928) - Rats trained to run down an alley for food - Changing the value of the reinforcer influenced the level of responding 29 What does the stimulus do? Stimulus → Response → Outcome S-R association: Law of Effect Acts as catalyst S-O association: Clark Hull 1930’s Similar to CS-US association in Pavlovian cond. S (R - O) Three-Term Contingency: Skinner SD signals the existence of the R-O contingency SD is a discriminative stimulus that “sets the occasion” for R-O There is experimental evidence for all four types of associations: R-O, S-R, S-O, S(R-O) Constraints on learning •  Strong Behaviorist Prediction: if reinforcement is given, any response can be conditioned •  Thorndike: Law of Effect •  J.B. Watson •  Experimental Results: Some behaviors are more easily conditioned than others –  Very hard for cats to learn to yawn or scratch to get out the puzzle box. –  coin (or token) release is hard to train (Breland and Breland) •  S - O association dominates for in this case Skinner movie Schedules of reinforcement When P(O|R) = 1, every occurrence of the response is reinforced What if P(O|R) ≠1? Possible scenarios •  Response must be repeated before reinforcement is obtained •  Reinforcement is given only after a certain amount of time has passed Schedules of reinforcement: rules that defines which occurrences of the instrumental response are reinforced; primary determinants of behavior • Ratio Schedule: The number of responses determines reinforcement • Interval Schedule: The timing of the response (since the last reinforcer) determines reinforcement Cumulative record No responding High response rate No responding Low response rate Slope of the cumulative record is directly proportional to the rate of responding Ratio Schedules The number of responses determines reinforcement, the time between response does not matter Fixed ratio (FR): required number of responses is constant Variable ratio (VR): required number of responses varies between reinforcer deliveries Interval Schedules The passage of time determines the availability of reinforcement; the number of responses made during the interval does not matter. Fixed Interval (FI): required interval is constant Variable Interval (VI): required interval varies between reinforcer deliveries FI scallop Typically, once the interval expires the animal must make one response Interval schedules produce lower rates of responding than ratio schedules (ii) Conditions of learning The nature of the reinforcer What makes a good reinforcer? Something of biological significance? NO, Schwarz (1989): rats will press a lever to turn on a light. Thorndike (1911): A satisfying state of affairs: “One which the animal does nothing to avoid, often doing things which maintain or renew it “ Circular definition! 38 (ii) Conditions of learning The nature of the reinforcer Premack Principle Premack (1959) Reinforcers = opportunities to engage in behavior, e.g. eating If two activities (A and B) are freely available, and A is chosen over B, then A is said to be a reinforcer for B Allison & Timberlake (1974) Showed that rats will drink a strong saccharine solution to gain a less preferred weak saccharine solution. They showed animals seek a state of equilibrium. Responses performed until they reach a bliss point, then stop. 39 (ii) Conditions of learning Conditioned reinforcement Hyde (1976) Group Stage 1 Stage 2 Tone → Food Lever press → Tone Tone & Food randomly Lever press → Tone Experimental Control -  Tone became an appetitive Pavlovian CS. Served as a substitute for food. Schwartz (1958) More lever pressing in the Experimental Group than the Control Group Conditioned reinforcer may also: (1)  Provide feedback that correct response has been made (2)  Cue the next response (3)  Counteract the effect of imposing a delay between R and primary reinforcer 40 (iii) Performance of instrumental behavior Deprivation Hull (1943) - Drive: A central state activated by needs —energizes behavior - Will invigorate lever-pressing if animal is hungry - BUT, drive is non-specific, pain produced by shock should invigorate responding for food —it does not (Boe & Church, 1967) Dual-system theories of motivation -  e.g. Rescorla & Solomon (1967) – two motivational systems: 1)  positive: activated by deprivation states (e.g. hunger, thirst) 2)  negative: activated by aversive states ( e.g. pain) -  Predicts that training an animal to respond for food will be affected by its deprivation level -  Some support for this, but major challenge by Balleine (1992) 41 (iii) Performance of instrumental behavior Pavlovian-instrumental interactions Can Pavlovian CSs influence instrumental behavior? If so, how? Motivational influences Konorski (1967): CS can excite an affective representation of the US and arouse positive motivational system -  Should be able to alter strength of instrumental responding with an appetitive CS Lovibond (1983): Pavlovian-instrumental transfer: - Rabbits, trained to push a lever with nose for food - Then clicker→ food training, - Clicker enhanced lever pressing Dickinson & Pearce (1977): Modified dual systems theories: - Inhibitory connections between appetitive and aversive systems - Explains conditioned suppression 42 (iii) Performance of instrumental behavior Pavlovian-instrumental interactions Can Pavlovian CSs influence instrumental behavior, if so, how? Response-cueing properties of Pavlovian CRs Colwill & Rescorla (1988) Pavlovian Conditioning Instrumental conditioning (responses trained in different sessions) Test R1 → US1 CS →US1 CS: R1 vs. R2 R2 → US2 More responding to R1 than R2 -  Colwill & Rescorla (1988): R1 trained during memory of US1, R2 trained during memory of US2 -  Memory of US acts as an S, setting up an S-R association (e.g. SUS1 → R1) -  CS evokes same memory at test and elicits responding to R1 via S-R link 43 (iv) Law of effect and problem solving Thorndike (1911) All problems solved the same way: -  Reward will strengthen an accidentally occurring response -  Making it more likely to occur in the future BUT Are animals more sophisticated than this? -  Can they use insight? -  Do they have an understanding of folk physics? 44 (iv) Law of effect and problem solving Insight Kohler (1925) -  Hung a banana from the ceiling of a cage, out of reach of 6 apes. Wooden box in cage. - Period of inactivity, then one chimpanzee suddenly moved a box towards the banana and then climbed on the box to reach the banana - Was this problem solved by trial and error? -  Difficult to say, apes had prior experience with boxes, sticks, so… may have learned through trial and error, may not. -  Epstein et al. (1984): Highlights the importance of prior experience on seemingly insightful behavior Drawing based on Kohler 1957. 45 (iv) Law of effect and problem solving Causal inference and folk physics Do animals have an appreciation of the physical and causal properties of the objects they are using? Primates Premack (1976) -  Chimpanzee had to replace the shape in the middle of the upper row with the knife to gain reward -  Reveals the animal understood that the knife cuts apples… -  BUT could be explained by the animal’s past experience of seeing knives and apples together -  FURTHERMORE Povinelli (2000) cites 27 experiments, all showing chimps have no understanding of physical properties of the problems they were solving 46 (iv) Law of effect and problem solving Causal inference and folk physics Do animals have an appreciation of the physical and causal properties of the objects they are using? Birds Heinrich & Bungyar (2005) -  Trained ravens to pull up on a string to acquire food -  Subsequent tests revealed they would also pull down to acquire food -  Implies: “An apprehension of a cause-effect relation between string, food and certain body parts” -  BUT, test performance could be due to the trained response generalizing to the test response Weir, Chappell & Kacelnik (2002) Crows required to retrieve a bucket of food from a tube with wires 47 (iv) Law of effect and problem solving Causal inference and folk physics Do animals have an appreciation of the physical and causal properties of the objects they are using? Concluding comments -  Possible to explain many of the findings with an appeal to trial and error learning and generalization of responding. -  Tebbich et al. (2001) describe how even complex tool use by woodpecker finches can be explained by trial and error and maturation of species-typical behavior -  Many appeals to the presence of an understanding of folk physics in animals demand an appreciation of abstract thought -  This is a contentious issue… 48 The mean number of errors made by two groups of rats in a multiple-unit maze. For the first nine trials the reward for the control group was more attractive than for the experimental group, but for the remaining trials both groups received the same reward (adapted from Elliott, 1928). The mean rates at which a single group of rats performed two responses, R1 and R2, that had previously been associated with two different rewards. Before the test sessions, the reward for R1, but not R2, had been devalued. No rewards were presented in the test session (adapted from Rescorla, 1991). The mean rate of pressing a lever by a single rat when food was presented 30 seconds after a response (adapted from Lattal & Gleeson, 1990). The mean rates of lever pressing for water by three groups of thirsty rats in their final session of training. The groups differed in the probability with which free water was delivered during the intervals between responses. Group 0 received no water during these intervals, Group 0.08 and Group 0.12 received water with a probability of 0.08 and 0.12, respectively, at the end of each period of 1 second in which a response did not occur (adapted from Hammond, 1980). The total number of lever presses recorded in each session for a rat in the experiment by Thomas (1981). The mean rates of lever pressing by three groups of rats that received a burst of noise after each rewarded response (Corr), after some nonrewarded responses (Uncorr), or no noise at all (Food alone) (adapted from Pearce & Hall, 1978). A sketch of the apparatus used by Premack (1971a) to determine if being given the opportunity to run could serve as a reinforcer for drinking in rats that were not thirsty (adapted from Premack, 1971a). The mean rates of lever pressing for a brief tone by two groups of rats. For the experimental group the tone had previously been paired with food, whereas for the control group the tone and food had been presented randomly in respect to each other (adapted from Hyde, 1976). The mean number of responses made by six groups of rats in an extinction test session. The left-hand letter of each pair indicates the level of deprivation when subjects were trained to lever press for reward—either satiated (S) or hungry (H)—the right-hand letter indicates the deprivation level during test trials. Two of the groups were allowed to consume the reward either satiated, Pre(S), or hungry, Pre(H), prior to instrumental conditioning (adapted from Balleine, 1992). The mean rates of performing two responses, R1 and R2, in the presence of an established Pavlovian conditioned stimulus (CS). Prior to testing, instrumental conditioning had been given in which the reinforcer for R1 was the same as the Pavlovian unconditioned stimulus (US), and the reinforcer for R2 was different to the Pavlovian US. Testing was conducted in the absence of any reinforcers in a single session (adapted from Colwill & Rescorla, 1988). Sultan stacking boxes in an attempt to reach a banana (drawing based on Kohler, 1956). Sketch of an array of objects used by Premack (1976) to test for causal inference in chimpanzees (adapted from Premack, 1976). Diagram of the apparatus used by Povinelli (2000) and by Visalberghi and Limongelli (1994) to test whether an animal will push a peanut in the direction that ensures it does not fall into a trap. From Visalberghi and Limongelli, 1994. Copyright © 1994 American Psychological Association. Reproduced with permission. Diagram of the apparatus used by Heinrich and Bugnyar (2005). A raven stood on the perch and was expected to retrieve food by pulling the string upwards (left-hand side) or downwards (right-hand side). Schedules of reinforcement When P(O|R) = 1, every occurrence of the response is reinforced What if P(O|R) ≠1? Possible scenarios •  Response must be repeated before reinforcement is obtained •  Reinforcement is given only after a certain amount of time has passed Schedules of reinforcement: rules that defines which occurrences of the instrumental response are reinforced; primary determinants of behavior • Ratio Schedule: The number of responses determines reinforcement • Interval Schedule: The timing of the response (since the last reinforcer) determines reinforcement Cumulative record No responding High response rate No responding Low response rate Slope of the cumulative record is directly proportional to the rate of responding Ratio Schedules The number of responses determines reinforcement, the time between response does not matter Fixed ratio (FR): required number of responses is constant Variable ratio (VR): required number of responses varies between reinforcer deliveries Interval Schedules The passage of time determines the availability of reinforcement; the number of responses made during the interval does not matter. Fixed Interval (FI): required interval is constant Variable Interval (VI): required interval varies between reinforcer deliveries FI scallop Typically, once the interval expires the animal must make one response Interval schedules produce lower rates of responding than ratio schedules ...
View Full Document

Ask a homework question - tutors are online