5.3 Operant Conditioning

Laura Westmaas, BA, MSc

5.3 Operant Conditioning

Operant conditioning theory is the simplest of the motivation theories. It basically states that people will do those things for which they are rewarded and will avoid doing things for which they are punished. This premise is based on Thorndyke’s “law of effect, ”which states that behavior that is positively reinforced tends to be repeated, whereas behavior that is not reinforced will tend not to be repeated. However, if this were the sum total of conditioning theory, we would not be discussing it here. Operant conditioning theory does offer greater insights than “reward what you want and punish what you don’t,” and knowledge of its principles can lead to effective management practices.

Operant conditioning focuses on the learning of voluntary behaviors (Skinner, 1953; Skinner, 1959; Skinner, 1971). The term operant conditioning indicates that learning results from our “operating on” the environment. After we “operate on the environment” (that is, behave in a certain fashion), consequences result. These consequences determine the likelihood of similar behavior in the future. Learning occurs because we do something to the environment. The environment then reacts to our action, and our subsequent behavior is influenced by this reaction.

The Basic Operant Model

According to operant conditioning theory, we learn to behave in a particular fashion because of consequences that resulted from our past behaviors (Skinner, 1953). The learning process involves three distinct steps (see Table 5.1). The first step involves a stimulus (S). The stimulus is any situation or event we perceive that we then respond to. A homework assignment is a stimulus. The second step involves a response (R), that is, any behavior or action we take in reaction to the stimulus. Staying up late to get your homework assignment in on time is a response. (We use the words response and behavior interchangeably here.) Finally, a consequence (C) is any event that follows our response and that makes the response more or less likely to occur in the future. If Colleen Sullivan receives praise from her professor for working hard, and if getting that praise is a pleasurable event, then it is likely that Colleen will work hard again in the future. If, on the other hand, the professor ignores or criticizes Colleen’s response (working hard), this consequence is likely to make Colleen avoid working hard in the future. It is the experienced consequence (positive or negative) that influences whether a response will be repeated the next time the stimulus is presented.

General Operant Model: Stimulus –> Response –> Consequence

Reinforcement Interventions

Reinforcement theory can be applied to modify employee behaviour.

*Figure 5.2 Strategies for Behavioural Change. Image: Rice University.Organizational Behavior, CC BY-NC-SA 4.0. Color altered from original. [click to enlarge]*

Positive reinforcement is a method of increasing the desired behaviour (Beatty & Schneier, 1975). Positive reinforcement involves making sure that behaviour is met with positive consequences. For example, praising an employee for treating a customer respectfully is an example of positive reinforcement. If the praise immediately follows the positive behaviour, the employee will see a link between the behaviour and positive consequences and will be motivated to repeat similar behaviours.

Negative reinforcement is also used to increase the desired behaviour. Negative reinforcement involves removal of unpleasant outcomes once desired behaviour is demonstrated. Nagging an employee to complete a report is an example of negative reinforcement. The negative stimulus in the environment will remain present until positive behaviour is demonstrated. The problem with negative reinforcement is that the negative stimulus may lead to unexpected behaviours and may fail to stimulate the desired behaviour. For example, the person may start avoiding the manager to avoid being nagged.

Extinction is used to decrease the frequency of negative behaviours. Extinction is the removal of rewards following negative behaviour. Sometimes, negative behaviours are demonstrated because they are being inadvertently rewarded. For example, it has been shown that when people are rewarded for their unethical behaviours, they tend to demonstrate higher levels of unethical behaviours (Harvey & Sims, 1978). Thus, when the rewards following unwanted behaviours are removed, the frequency of future negative behaviours may be reduced. For example, if a coworker is forwarding unsolicited e-mail messages containing jokes, commenting and laughing at these jokes may be encouraging the person to keep forwarding these messages. Completely ignoring such messages may reduce their frequency.

Punishment is another method of reducing the frequency of undesirable behaviours. Punishment involves presenting negative consequences following unwanted behaviours. Giving an employee a warning for consistently being late to work is an example of punishment.

The most frequently used punishments (along with the most frequently used rewards) are shown in Table 5.1.

Table 5.1 Frequently Used Rewards and Punishments

Rewards	Punishments
Pay raise	Oral reprimands
Bonus	Written reprimands
Promotion	Ostracism
Praise and recognition	Criticism from superiors
Awards	Suspension
Self-recognition	Demotion
Sense of accomplishment	Reduced authority
Increased responsibility	Undesired transfer
Time off	Termination
Source

The use of punishment is indeed one of the most controversial issues of behavior change strategies. Although punishment can have positive work outcomes—especially if it is administered in an impersonal way and as soon as possible after the transgression—negative repercussions can also result when employees either resent the action or feel they are being treated unfairly. These negative outcomes from punishment are shown in Figure 5.3. Thus, although punishment represents a potent force in corrective learning, its use must be carefully considered and implemented. In general, for punishment to be effective the punishment should “fit the crime” in severity, should be given in private, and should be explained to the employee.

*Figure 5.3 Potential Negative Consequences of Punishment. Image: Rice University. Organizational Behavior, CC BY-NC-SA 4.0. Color altered from original. [click to enlarge]*

Let’s Focus

Be Effective in Your Use of Discipline

As a manager, sometimes you may have to discipline an employee to eliminate unwanted behaviour. Here are some tips to make this process more effective.

Consider whether punishment is the most effective way to modify behaviour. Sometimes catching people in the act of doing good things and praising or rewarding them is preferable to punishing negative behaviour. Instead of criticizing them for being late, consider praising them when they are on time. Carrots may be more effective than sticks. You can also make the behaviour extinct by removing any rewards that follow undesirable behaviour.
Be sure that the punishment fits the crime. If a punishment is too harsh, both the employee in question and coworkers who will learn about the punishment will feel it is unfair. Unfair punishment may not change unwanted behaviour.
Be consistent in your treatment of employees. Have disciplinary procedures and apply them in the same way to everyone. It is unfair to enforce a rule for one particular employee but then give others a free pass.
Document the behaviour in question. If an employee is going to be disciplined, the evidence must go beyond hearsay.
Be timely with discipline. When a long period of time passes between behaviour and punishment, it is less effective in reducing undesired behaviour because the connection between the behaviour and punishment is weaker.

Adapted from ideas in Ambrose, M. L., & Kulik, C. T. (1999). Old friends, new faces: Motivation research in the 1990s. Journal of Management, 25(3), 231–292; Guffey, C. J., & Helms, M. M. (2001). Effective employee discipline: A case of the Internal Revenue Service. Public Personnel Management, 30(1), 111-127.

In summary, positive reinforcement and avoidance learning focus on bringing about the desired response from the employee. With positive reinforcement the employee behaves in a certain way in order to gain desired rewards, whereas with avoidance learning the employee behaves in order to avoid certain unpleasant outcomes. In both cases, however, the behavior desired by the supervisor is enhanced. In contrast, extinction and punishment focus on supervisory attempts to reduce the incidence of undesired behavior. That is, extinction and punishment are typically used to get someone to stop doing something the supervisor doesn’t like. It does not necessarily follow that the individual will begin acting in the most desired, or correct, manner.

Often students have difficulty seeing the distinction between avoidance and extinction or in understanding how either could have a significant impact on behavior. Two factors are important to keep in mind. The first we will simply call the “history effect.” Not being harassed could reinforce an employee’s prompt arrival at work if in the past the employee had been harassed for being late. Arriving on time and thereby avoiding the past harassment would reinforce arriving on time. This same dynamic would hold true for extinction. If the employee had been praised in the past for arriving on time, then arrived late and was not praised, this would serve to weaken the tendency to arrive late. The second factor we will call the “social effect.” For example, if you see others harassed when they arrive late and then you are not harassed when you arrive on time, this could reinforce your arriving at work on time. Again, this same dynamic would hold true for extinction. If you had observed others being praised for arriving on time, then not receiving praise when you arrived late would serve to weaken the tendency to arrive late.

From a managerial perspective, questions arise about which strategy of behavioral change is most effective. Advocates of behavioral change strategies, such as Skinner, answer that positive reinforcement combined with extinction is the most suitable way to bring about desired behavior. There are several reasons for this focus on the positive approach to reinforcement. First, although punishment can inhibit or eliminate undesired behavior, it often does not provide information to the individual about how or in which direction to change. Also, the application of punishment may cause the individual to become alienated from the work situation, thereby reducing the chances that useful change can be effected. Similarly, avoidance learning tends to emphasize the negative; that is, people are taught to stay clear of certain behaviors, such as tardiness, for fear of repercussions. In contrast, it is felt that combining positive reinforcement with the use of extinction has the fewest undesirable side effects and allows individuals to receive the rewards they desire. A positive approach to reinforcement is believed by some to be the most effective tool management has to bring about favorable changes in organizations.

Schedules of Reinforcement

Having examined four distinct strategies for behavioral change, we now turn to an examination of the various ways, or schedules, of administering these techniques. As noted by Costello and Zalkind (1963), “The speed with which learning takes place and also how lasting its effects will be is determined by the timing of reinforcement” (p. 193). Thus, a knowledge of the types of schedules of reinforcement is essential to managers if they are to know how to choose rewards that will have maximum impact on employee performance. Although there are a variety of ways in which rewards can be administered, most approaches can be categorized into two groups: continuous and partial (or intermittent) reinforcement schedules. A continuous reinforcement schedule rewards desired behavior every time it occurs. For example, a manager could praise (or pay) employees every time they perform properly. With the time and resource constraints most managers work under, this is often difficult, if not impossible. So, most managerial reward strategies operate on a partial schedule. A partial reinforcement schedule rewards desired behavior at specific intervals, not every time desired behavior is exhibited. Compared to continuous schedules, partial reinforcement schedules lead to slower learning but stronger retention. Thus, learning is generally more permanent. Four kinds of partial reinforcement schedules can be identified: (1) fixed interval, (2) fixed ratio, (3) variable interval, and (4) variable ratio (see Table 5.2).

Table 5.2: Schedules of Partial Reinforcement

Schedule of Reinforcement	Nature of Reinforcement	Effects on Behavior When Applied	Effects on Behavior When Terminated	Example
Fixed interval	Reward on fixed time basis	Leads to average and irregular performance	Quick extinction of behavior	Weekly paycheck
Fixed ratio	Reward consistently tied to output	Leads quickly to very high and stable performance	Quick extinction of behavior	Piece-rate pay system
Variable interval	Reward given at variable intervals around some average time	Leads to moderately high and stable performance	Slow extinction of behavior	Monthly performance appraisal and reward at random times each month
Variable ratio	Reward given at variable output levels around some average output	Leads to very high performance	Slow extinction of behavior	Sales bonus tied to selling X accounts, but X constantly changes around some mean
Source

Fixed-Interval Schedule. A fixed-interval reinforcement schedule rewards individuals at specified intervals for their performance, as with a biweekly paycheck. If employees perform even minimally, they are paid. This technique generally does not result in high or sustained levels of performance because employees know that marginal performance usually leads to the same level of reward as high performance. Thus, there is little incentive for high effort and performance. Also, when rewards are withheld or suspended, extinction of desired behavior occurs quickly. Many of the recent job redesign efforts in organizations were prompted by recognition of the need for alternate strategies of motivation rather than paying people on fixed-interval schedules.

Fixed-Ratio Schedule. The second fixed schedule is the fixed-ratio schedule. Here the reward is administered only upon the completion of a given number of desired responses. In other words, rewards are tied to performance in a ratio of rewards to results. A common example of the fixed-ratio schedule is a piece-rate pay system, whereby employees are paid for each unit of output they produce. Under this system, performance rapidly reaches high levels. In fact, according to Hamner (1977), “the response level here is significantly higher than that obtained under any of the interval (time-based) schedules” (p. 105). On the negative side, however, performance declines sharply when the rewards are withheld, as with fixed-interval schedules.

Variable-Interval Schedule. Using variable reinforcement schedules, both variable-interval and variable-ratio reinforcements are administered at random times that cannot be predicted by the employee. The employee is generally not aware of when the next evaluation and reward period will be. Under a variable-interval schedule, rewards are administered at intervals of time that are based on an average. For example, an employee may know that on the average her performance is evaluated and rewarded about once a month, but she does not know when this event will occur. She does know, however, that it will occur sometime during the interval of a month. Under this schedule, effort and performance will generally be high and fairly stable over time because employees never know when the evaluation will take place.

Variable-Ratio Schedule. Finally, a variable-ratio schedule is one in which rewards are administered only after an employee has performed the desired behavior a number of times, with the number changing from the administration of one reward to the next but averaging over time to a certain ratio of number of performances to rewards. For example, a manager may determine that a salesperson will receive a bonus for every 15th new account sold. However, instead of administering the bonus every 15th sale (as in a fixed-interval schedule), the manager may vary the number of sales that is necessary for the bonus, from perhaps 10 sales for the first bonus to 20 for the second. On the average, however, the 15:1 ratio prevails. If the employee understands the parameters, then the “safe” level of sales, or the level of sales most likely to result in a bonus, is in excess of 15. Consequently, the variable-ratio schedule typically leads to high and stable performance. Moreover, extinction of desired behavior is slow.

Which of these four schedules of reinforcement is superior? In a review of several studies comparing the various techniques, Hamner concludes:

The necessity for arranging appropriate reinforcement contingencies is dramatically illustrated by several studies in which rewards were shifted from a response-contingent (ratio) to a time-contingent (interval) basis. During the period in which rewards were made conditional upon occurrence of the desired behavior, the appropriate response patterns were exhibited at a consistently high level. When the same rewards were given based on time and independent of the worker’s behavior, there was a marked drop in the desired behavior. The reinstatements of the performance-contingent reward schedule promptly restored the high level of responsiveness.

In other words, the performance-contingent (or ratio) reward schedules generally lead to better performance than the time-contingent (or interval) schedules, regardless of whether such schedules are fixed or variable.

Two additional approaches to learning are found in the work of David Kolb (2015) and Mel Silberman and colleagues (2016). Kolb’s experiential learning style theory is typically represented by a four-stage learning cycle in which the learner ‘touches all the bases’. The Four stages are achieved when a person progresses through a cycle of four stages: of (1) having a concrete experience followed by (2) observation of and reflection on that experience which leads to (3) the formation of abstract concepts (analysis) and generalizations (conclusions) which are then (4) used to test hypothesis in future situations, resulting in new experiences.

Silberman in his book Active Training, identified eight qualities of an effective and active learning experience. The eight qualities are as follows:

a moderate level of content;
a balance between affective, behavioral, and cognitive learning,
a variety of learning approaches,
opportunities for group participation,
encouraging participants to share their expertise,
recycling concepts and skills learned earlier,
advocating real-life problem solving, and
allowing time for re-entry.

Here’s an Example

Shaping a Salesperson’s Behavior

Sharon Johnson worked for a publishing company based in Nashville, Tennessee, that sold a line of children’s books directly to the public through a door-to-door sales force. Sharon had been a very successful salesperson and was promoted first to district and then to regional sales manager after just four years with the company. Sales bonuses were fixed, and a fixed-dollar bonus was tied to every $1,000 in sales over a specific minimum quota. However, there was a wide variety of rewards, from praise to gift certificates, that were left to Sharon’s discretion.

Sharon knew from her organizational behavior class that giving out praise to those who liked it and gifts to those who preferred them was an important means of reinforcing desired behavior, and she had been quite successful in implementing this principle. She also knew that if you reinforced a behavior that was “on the right track” to the ideal behavior you wanted out of a salesperson, eventually you could shape their behavior, almost without their realizing it.

Sharon had one particular salesperson, Lyle, that she thought had great potential, yet his weekly sales were somewhat inconsistent and often lower than she thought possible. When Lyle was questioned about his performance, he indicated that sometimes he felt that the families he approached could not afford the books he was selling and so he did not think it was right to push the sale too hard. Although Sharon argued that it was not Lyle’s place to decide for others what they could or could not afford, Lyle still felt uncomfortable about utilizing his normal sales approach with these families.

Sharon believed that through subtle reinforcement of certain behaviors she could shape Lyle’s behavior and that over time he would increasingly use his typical sales approach with the families he thought could not afford the books. For example, she knew that in the cases of families Lyle thought could not afford the books, he spent only 3.5 minutes in the house compared to 12.7 minutes in homes of families he judged able to afford the books. Sharon believed that if she praised Lyle when the average time he spent in each family’s home was quite similar that Lyle would increase the time he spent in the homes of families he judged unable to afford the books. She believed that the longer he spent in these homes, the more likely Lyle was to utilize his typical sales approach. This was just one of several ways Sharon thought she could shape Lyle’s behavior without trying to change his mind about pushing books onto people he thought could not afford them.

Sharon saw no ethical issues in this case until she told a friend about it and the friend questioned whether it was ethical to utilize learning and reinforcement techniques to change people’s behavior “against their will” even if they did not realize that this was happening.

Source: This ethical challenge is based on a true but disguised case observed by author J. Stewart Black.

Let’s Review

Reinforcement theory argues that behaviour is a function of its consequences.
By properly tying rewards to positive behaviours, eliminating rewards following negative behaviours, and punishing negative behaviours, leaders can increase the frequency of desired behaviours.
These three theories are particularly useful in designing reward systems within a company.

References

This section is adapted from:

4.2 Reinforcement and Behavioral Change and 4.3 Behaviour Modification in Organization in Organizational Behaviour, by Rice University, OpenStax and is licensed under a Creative Commons Attribution 4.0 International License, except where otherwise noted.

Process-Based Theories in NSCC Organizational Behaviour by NSCC is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

Beatty, R. W., & Schneier, C. E. (1975). A case for positive reinforcement. Business Horizons, 18, 57–66.

Costello, T. W. & Zalkind, S. S. (1963). Psychology in administration: A research orientation. Prentice-Hall.

Hamner, W. C. (1977). Reinforcement theory. In H. L. Tosi and W. C. Hamner (Eds.), Organizational behavior and management: A contingency approach (p. 98-105). St Clair.

Harvey, H. W., & Sims, H. P. (1978). Some determinants of unethical decision behaviour: An experiment. Journal of Applied Psychology, 63, 451–457.

Kolb, D. (2015). Experiential learning (2nd Edition). Pearson FT Press.

Silberman, M., Beich, E., & Auerbach, C. (2016). Active training. Wiley.

Skinner, B. F. (1953). Science and human behaviour. Free Press.

Skinner, B. F. (1969). Contingencies of reinforcement. Appleton Century-Crofts.

Skinner, B. F. (1971). Beyond freedom and dignity. Bantam Books.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Psychology, Communication, and the Canadian Workplace Copyright © 2022 by Laura Westmaas, BA, MSc is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.