The Psychology of Learning

Operant Conditioning

Thorndike and Skinner

In operant conditioning, voluntary behavior is shaped by consequences.

E.L. Thorndike was an American psychologist who studied animal learning. In a classic set of experiments, he placed cats in a specially designed "puzzle box" that could be opened by operating a latch. The cats learned to open the box through a process of trial and error. Over time, behaviors that were followed by favorable consequences become more likely, while those that were followed by unfavorable consequences become less likely, a process Thorndike called the law of effect.

American psychologist B.F. Skinner continued research on Thorndike's law of effect. His research focused on operant conditioning, a learning process in which voluntary behavior is shaped by its consequences. He designed the Skinner box (also called an operant conditioning chamber) for his research. This box is an enclosed apparatus that contains a bar an animal can press to obtain food or water. The number of presses in response to different schedules of food or water delivery is automatically recorded by the apparatus. The box can also display different color lights or tones of different loudness and pitches as cues to shape behavior.

An Operant Conditioning Chamber (Skinner Box)

An operant conditioning chamber (Skinner box) allows animals to press a bar to obtain food or water. The box records the rate and number of presses. Researchers can use them to study how different intervals of rewards or punishment influence learning.

Principles of Reinforcement and Punishment

Desired outcomes increase the frequency of behavior (reinforcement), while undesired outcomes decrease the frequency of behavior (punishment).

Thorndike and Skinner observed that the frequency with which specific behaviors occur depends on their outcome. Outcomes that increase the frequency of behaviors are called reinforcers. In positive reinforcement, a desired stimulus follows behavior and increases the frequency of the behavior. For example, if pressing a bar leads to a food pellet, bar pressing will increase. Likewise, if a child receives a fun toy for being quiet during a ceremony, the likelihood of her being quiet during future ceremonies will increase. In negative reinforcement, removing an undesired stimulus following a behavior leads to an increase in the behavior. For example, if pressing a bar shuts off an ear-splitting alarm, bar pressing will increase. Likewise, if getting home by curfew consistently results in a teenager's parents letting him borrow the car more often, the likelihood of the teenager obeying curfew in the future will increase. Both positive and negative reinforcement increase the frequency of a behavior. Reinforcers are often used in behavior therapy to help people build new skills.

In contrast, punishment is an undesired outcome that decreases the frequency of the response. For example, an animal that gets an electric shock for pressing a bar will stop pressing that bar. There are two types of punishment. In positive punishment, an aversive stimulus follows behavior and leads to a decrease in that behavior. For example, a teenager who breaks curfew may have to get up early the next day to do chores. Negative punishment decreases behavior by removing a desired stimulus following the behavior. The curfew-breaking teenager may lose access to something they value, such as driving privileges or use of their phone. Both positive and negative punishment make curfew-breaking less likely to occur in the future.

A discriminative stimulus is a cue to the learner that means "respond now and a reinforcer will be delivered." For example, pressing a bar produces a payoff when a high-pitched tone is played but not when a low-pitched tone is played. Bar pressing when the high-pitched tone is played will increase in frequency. Bar pressing when the low-pitched tone is played will decrease.

There are many different types of reinforcers. Primary reinforcers satisfy biological needs, such as food, water, and sex. Social reinforcers, such as hugs, smiles, and friendly greetings, satisfy needs for social connectedness. Secondary reinforcers are neutral stimuli that are associated with primary and social reinforcers. For example, a reward system may provide people with cash or points that can be exchanged for other reinforcers. A child may earn points for chores that they can exchange for a pizza party with friends.

Schedules of Reinforcement

A schedule of reinforcement is a rule that describes how frequently a behavior is reinforced. Different schedules yield different behavioral outcomes.

In the real world, not every behavior leads to a clear positive or negative consequence. The link between behaviors and outcomes determines how quickly learning occurs and how long it lasts. In continuous reinforcement, every correct response is followed by a reinforcer. In a Skinner box, this means that every time the rat presses a bar, it receives a food pellet. In partial reinforcement, only some correct responses are reinforced.

There are four major schedules of reinforcement. In fixed-ratio reinforcement, a reinforcer is delivered after a fixed number of correct responses occurs. An example of this is getting paid after completing a fixed number of tasks, such as stuffing 100 envelopes. This schedule produces rapid and steady learning, but the behavior usually stops quickly after the reinforcers stop being delivered.

In variable-ratio reinforcement, a reinforcer is delivered after a varying number of required responses occurs. Slot machines operate on a variable-ratio reinforcement schedule. The number of times the lever must be pulled in order to get a payout is randomly determined by the machine. This type of schedule produces slower learning because lever pulls are not strongly linked to rewards. However, once learned, the behavior is highly resistant to extinction. This is why slot machines are so addictive. The players never know when the next payoff is going to arrive, so they keep pulling the lever, assuming it will eventually happen.

In fixed-interval reinforcement, a reinforcer is delivered following the first required response after a fixed period of time has elapsed, such as getting a paycheck after two weeks of working. For animals in a Skinner box, this schedule produces a characteristic "scalloped" performance curve. Behavior increases rapidly just before the anticipated reinforcer occurs and then declines rapidly and remains low until the reinforcer is due. In variable-interval reinforcement, a reinforcer is delivered after a varying period of time has elapsed. Surfing is an example of variable-interval reinforcement. The length of time a surfer must wait before another suitable wave arrives can vary anywhere from a few seconds to several minutes. Like variable-ratio reinforcement, variable-interval reinforcement produces slower learning that is nonetheless highly resistant to extinction because learners don't know how long they have to wait for a payoff and will ultimately wait longer.
The frequency of rewards and the relationship of rewards to behavior influences both speed and retention of learning.
Shaping involves reinforcing small steps that lead to a complex desired behavior. For example, a trainer may teach a nervous horse to load onto a trailer through a series of steps. The trainer may reward the horse first for simply approaching the trailer. Next, the trainer may reward both approaching and standing next to the open trailer door; then approaching the trailer, standing next to the door, and entering halfway; and so on. Operant conditioning techniques have been used successfully in training animals as service animals to support people with disabilities. Animals that appear in movies are trained to perform their scripted actions through the use of operant conditioning.

Applications of Operant Conditioning

Operant conditioning is a powerful means of modifying and deterring behavior across a wide range of applications. However, overusing rewards can reduce motivation.

Operant conditioning is a powerful technique for shaping the behavior of growing children. The frequency of prosocial behaviors can be increased through reinforcement, while antisocial behaviors can be decreased through punishment. For parenting to be effective, consequences must be consistent, immediate, and effective. Consistency is necessary to ensure that the child understands which behaviors are acceptable and which are not. Consequences must be immediate so that the connection between behavior and its outcome is clear. Ultimately, rewards are more useful in shaping behavior than punishments. This is because rewards help children learn the desired behavior, whereas punishment only teaches children what not to do. Punishments such as spanking may accidentally teach aggressive behavior. Numerous studies have found that children are more likely to comply when parents used time-outs rather than spanking. Spanking has also been associated with greater aggression in children.

Disruptive or maladaptive behavior can be successfully addressed through behavior modification therapy, which employs operant conditioning techniques. Token economies are a type of behavior modification intervention in which good behavior is reinforced with tokens (poker chips, stickers, coins, or points) that can be exchanged for desired objects or activities.

Intrinsic motivation is a drive to engage in a behavior because one finds the behavior itself rewarding. Extrinsic motivation is a drive to engage in a behavior in order to obtain an external reward or to avoid punishment. Excessive use of rewards can decrease a person's interest in a behavior that they previously enjoyed. For example, most children enjoy reading storybooks and will do so on their own to entertain themselves. If they are instead rewarded for reading, reading becomes less intrinsically rewarding for them, and the time they spend reading will decline. The decrease of intrinsic motivation as a result of building extrinsic motivation through excessive external rewards is called the overjustification effect.

Observational Learning and Modeling Behavior

Simply observing behavior and its consequences, without actually participating in the behavior, can still lead to learning.

Observational learning is learning that occurs by watching the behavior of others. Modeling is the process of demonstrating a behavior so that another can learn it. Most human behavior is learned through these processes. By observing others, one forms an idea of how new behaviors are performed. On later occasions, this remembered information serves as a guide for action. Vicarious reinforcement occurs when an individual observes another person receive reinforcement for a behavior. Watching someone else get a reward makes it more likely that people will imitate that behavior.

American psychologist Albert Bandura proposed a four-step model of observational learning. In the first step, a behavior captures the individual's attention. In the second, what was noticed is remembered. In the third, the individual performs the behavior that was observed. In the fourth and final step, the consequences that follow the behavior determine whether the individual will perform the behavior again in the future. Bandura's first test of this model involved children (aged three to six years) watching an adult behave aggressively. The adult directed physical and verbal aggression at a plastic blow-up clown doll named Bobo. The adult modeled novel behaviors such as exclaiming "Sockaroo!" as she struck the doll with a hammer. In one condition, she was rewarded for her behavior. In a second condition, she was put in time-out for behaving aggressively. In a third, no consequences occurred. Each child was then put in a room alone with a variety of toys, including the Bobo doll. When the children saw rewards or no consequences for aggressive behavior they were likely to imitate it. When children saw consequences for aggressive behavior, they did not imitate it. Bandura showed that children also imitated the behavior of cartoon characters and that they would imitate both prosocial and aggressive behavior.

Modeling and the Role of Media

There is a correlation between violent media consumption and aggression. Violent media may increase risk for aggression, but the relationship is complex and involves many other factors.

Many studies have explored the link between aggressive behavior and watching violent shows or playing violent video games. It has been hypothesized that observing violence without consequences may make people more likely to act aggressively. It may also desensitize people to the impact of violence on victims. In 2003, researchers published the results of a 15-year study on the impact of watching violent TV shows. Children who watched more violent TV were more likely to push, grab, and shove their spouses as adults and more likely to respond violently to insults. They were also three times more likely to have been convicted of a violent crime.

The American Psychological Association and the American Academy of Pediatrics have summarized large bodies of research. Both organizations raised concerns about violent video games. They concluded that playing violent games increases anger and aggressive acts and decreases prosocial behavior and empathy. These organizations advised against games that award points for killing living targets because this teaches children to associate pleasure and success with their ability to harm others.

However, correlations between violent media and aggression do not necessarily imply causation, that one variable causes the other. The relationship between aggression and violent media could mean that violent media increases aggression. However, the relationship could run the other direction. People predisposed to being aggressive may enjoy violent media. Some studies have found that the link between violent games and violent behavior disappears after accounting for aggressive tendencies. It is also possible that other variables may explain both aggressive behavior and violent media consumption. Many factors influence who spends a significant amount of time watching violent shows or playing violent games. Those factors may also predict aggressive behavior. For example, neglected children may be both free to use age-inappropriate media and more likely to develop problem behaviors.