53 Describing Statistical Relationships
Learning Objectives
- Describe differences between groups in terms of their means and standard deviations, and in terms of Cohen’s d.
- Describe correlations between quantitative variables in terms of Pearson’s r.
As we have seen throughout this book, most interesting research questions in psychology are about statistical relationships between variables. In this section, we revisit the two basic forms of statistical relationship introduced earlier in the book—differences between groups or conditions and relationships between quantitative variables—and we consider how to describe them in more detail.
Differences Between Groups or Conditions
Differences between groups or conditions are usually described in terms of the mean and standard deviation of each group or condition. For example, Thomas Ollendick and his colleagues conducted a study in which they evaluated two one-session treatments for simple phobias in children (Ollendick et al., 2009)[1]. They randomly assigned children with an intense fear (e.g., to dogs) to one of three conditions. In the exposure condition, the children actually confronted the object of their fear under the guidance of a trained therapist. In the education condition, they learned about phobias and some strategies for coping with them. In the wait-list control condition, they were waiting to receive a treatment after the study was over. The severity of each child’s phobia was then rated on a 1-to-8 scale by a clinician who did not know which treatment the child had received. (This was one of several dependent variables.) The mean fear rating in the education condition was 4.83 with a standard deviation of 1.52, while the mean fear rating in the exposure condition was 3.47 with a standard deviation of 1.77. The mean fear rating in the control condition was 5.56 with a standard deviation of 1.21. In other words, both treatments worked, but the exposure treatment worked better than the education treatment. As we have seen, differences between group or condition means can be presented in a bar graph like that in Figure 12.5, where the heights of the bars represent the group or condition means. We will look more closely at creating American Psychological Association (APA)-style bar graphs shortly.
It is also important to be able to describe the strength of a statistical relationship, which is often referred to as the effect size. The most widely used measure of effect size for differences between group or condition means is called Cohen’s d, which is the difference between the two means divided by the standard deviation:
d = (M1 −M2)/SD
In this formula, it does not really matter which mean is M1 and which is M2. If there is a treatment group and a control group, the treatment group mean is usually M1 and the control group mean is M2. Otherwise, the larger mean is usually M1 and the smaller mean M2 so that Cohen’s d turns out to be positive. Indeed Cohen’s d values should always be positive so it is the absolute difference between the means that is considered in the numerator. The standard deviation in this formula is usually a kind of average of the two group standard deviations called the pooled-within groups standard deviation. To compute the pooled within-groups standard deviation, add the sum of the squared differences for Group 1 to the sum of squared differences for Group 2, divide this by the sum of the two sample sizes, and then take the square root of that. Informally, however, the standard deviation of either group can be used instead.
Conceptually, Cohen’s d is the difference between the two means expressed in standard deviation units. (Notice its similarity to a z score, which expresses the difference between an individual score and a mean in standard deviation units.) A Cohen’s d of 0.50 means that the two group means differ by 0.50 standard deviations (half a standard deviation). A Cohen’s d of 1.20 means that they differ by 1.20 standard deviations. But how should we interpret these values in terms of the strength of the relationship or the size of the difference between the means? Table 12.4 presents some guidelines for interpreting Cohen’s d values in psychological research (Cohen, 1992)[2]. Values near 0.20 are considered small, values near 0.50 are considered medium, and values near 0.80 are considered large. Thus a Cohen’s d value of 0.50 represents a medium-sized difference between two means, and a Cohen’s d value of 1.20 represents a very large difference in the context of psychological research. In the research by Ollendick and his colleagues, there was a large difference (d = 0.82) between the exposure and education conditions.
Relationship strength | Cohen’s d | Pearson’s r |
Strong/large | 0.80 | ± 0.50 |
Medium | 0.50 | ± 0.30 |
Weak/small | 0.20 | ± 0.10 |
Cohen’s d is useful because it has the same meaning regardless of the variable being compared or the scale it was measured on. A Cohen’s d of 0.20 means that the two group means differ by 0.20 standard deviations whether we are talking about scores on the Rosenberg Self-Esteem scale, reaction time measured in milliseconds, number of siblings, or diastolic blood pressure measured in millimeters of mercury. Not only does this make it easier for researchers to communicate with each other about their results, it also makes it possible to combine and compare results across different studies using different measures.
Be aware that the term effect size can be misleading because it suggests a causal relationship—that the difference between the two means is an “effect” of being in one group or condition as opposed to another. Imagine, for example, a study showing that a group of exercisers is happier on average than a group of nonexercisers, with an “effect size” of d = 0.35. If the study was an experiment—with participants randomly assigned to exercise and no-exercise conditions—then one could conclude that exercising caused a small to medium-sized increase in happiness. If the study was cross-sectional, however, then one could conclude only that the exercisers were happier than the nonexercisers by a small to medium-sized amount. In other words, simply calling the difference an “effect size” does not make the relationship a causal one.
Sex Differences Expressed as Cohen’s d
Researcher Janet Shibley Hyde has looked at the results of numerous studies on psychological sex differences and expressed the results in terms of Cohen’s d (Hyde, 2007)[3]. Following are a few of the values she has found, averaging across several studies in each case. (Note that because she always treats the mean for men as M1 and the mean for women as M2, positive values indicate that men score higher and negative values indicate that women score higher.)
Mathematical problem solving | +0.08 |
Reading comprehension | −0.09 |
Smiling | −0.40 |
Aggression | +0.50 |
Attitudes toward casual sex | +0.81 |
Leadership effectiveness | −0.02 |
Hyde points out that although men and women differ by a large amount on some variables (e.g., attitudes toward casual sex), they differ by only a small amount on the vast majority. In many cases, Cohen’s d is less than 0.10, which she terms a “trivial” difference. (The difference in talkativeness discussed in Chapter 1 was also trivial: d = 0.06.) Although researchers and non-researchers alike often emphasize sex differences, Hyde has argued that it makes at least as much sense to think of men and women as fundamentally similar. She refers to this as the “gender similarities hypothesis.”
Correlations Between Quantitative Variables
As we have seen throughout the book, many interesting statistical relationships take the form of correlations between quantitative variables. For example, researchers Kurt Carlson and Jacqueline Conard conducted a study on the relationship between the alphabetical position of the first letter of people’s last names (from A = 1 to Z = 26) and how quickly those people responded to consumer appeals (Carlson & Conard, 2011)[4]. In one study, they sent emails to a large group of MBA students, offering free basketball tickets from a limited supply. The result was that the further toward the end of the alphabet students’ last names were, the faster they tended to respond. These results are summarized in Figure 12.6.
Such relationships are often presented using line graphs or scatterplots, which show how the level of one variable differs across the range of the other. In the line graph in Figure 12.6, for example, each point represents the mean response time for participants with last names in the first, second, third, and fourth quartiles (or quarters) of the name distribution. It clearly shows how response time tends to decline as people’s last names get closer to the end of the alphabet. The scatterplot in Figure 12.7, shows the relationship between 25 research methods students’ scores on the Rosenberg Self-Esteem Scale given on two occasions a week apart. Here the points represent individuals, and we can see that the higher students scored on the first occasion, the higher they tended to score on the second occasion. In general, line graphs are used when the variable on the x-axis has (or is organized into) a small number of distinct values, such as the four quartiles of the name distribution. Scatterplots are used when the variable on the x-axis has a large number of values, such as the different possible self-esteem scores.
The data presented in Figure 12.7 provide a good example of a positive relationship, in which higher scores on one variable tend to be associated with higher scores on the other (so that the points go from the lower left to the upper right of the graph). The data presented in Figure 12.6 provide a good example of a negative relationship, in which higher scores on one variable tend to be associated with lower scores on the other (so that the points go from the upper left to the lower right).
Both of these examples are also linear relationships, in which the points are reasonably well fit by a single straight line. Nonlinear relationships are those in which the points are better fit by a curved line. Figure 12.8, for example, shows a hypothetical relationship between the amount of sleep people get per night and their level of depression. In this example, the line that best fits the points is a curve—a kind of upside down “U”—because people who get about eight hours of sleep tend to be the least depressed, while those who get too little sleep and those who get too much sleep tend to be more depressed. Nonlinear relationships are not uncommon in psychology, but a detailed discussion of them is beyond the scope of this book.
As we saw earlier in the book, the strength of a correlation between quantitative variables is typically measured using a statistic called Pearson’s r. As Figure 12.9 shows, its possible values range from −1.00, through zero, to +1.00. A value of 0 means there is no relationship between the two variables. In addition to his guidelines for interpreting Cohen’s d, Cohen offered guidelines for interpreting Pearson’s r in psychological research (see Table 12.4). Values near ±.10 are considered small, values near ± .30 are considered medium, and values near ±.50 are considered large. Notice that the sign of Pearson’s r is unrelated to its strength. Pearson’s r values of +.30 and −.30, for example, are equally strong; it is just that one represents a moderate positive relationship and the other a moderate negative relationship. Like Cohen’s d, Pearson’s r is also referred to as a measure of “effect size” even though the relationship may not be a causal one.
The computations for Pearson’s r are more complicated than those for Cohen’s d. Although you may never have to do them by hand, it is still instructive to see how. Computationally, Pearson’s r is the “mean cross-product of z scores.” To compute it, one starts by transforming all the scores to z scores. For the X variable, subtract the mean of X from each score and divide each difference by the standard deviation of X. For the Y variable, subtract the mean of Y from each score and divide each difference by the standard deviation of Y. Then, for each individual, multiply the two z scores together to form a cross-product. Finally, take the mean of the cross-products. The formula looks like this:
Table 12.5 illustrates these computations for a small set of data. The first column lists the scores for the X variable, which has a mean of 4.00 and a standard deviation of 1.90. The second column is the z-score for each of these raw scores. The third and fourth columns list the raw scores for the Y variable, which has a mean of 40 and a standard deviation of 11.78, and the corresponding z scores. The fifth column lists the cross-products. For example, the first one is 0.00 multiplied by −0.85, which is equal to 0.00. The second is 1.58 multiplied by 1.19, which is equal to 1.88. The mean of these cross-products, shown at the bottom of that column, is Pearson’s r, which in this case is +.53. There are other formulas for computing Pearson’s r by hand that may be quicker. This approach, however, is much clearer in terms of communicating conceptually what Pearson’s r is.
X | zx | Y | zy | zxzy |
4 | 0.00 | 30 | −0.85 | 0.00 |
7 | 1.58 | 54 | 1.19 | 1.88 |
2 | −1.05 | 23 | −1.44 | 1.52 |
5 | 0.53 | 43 | 0.26 | 0.13 |
2 | −1.05 | 50 | 0.85 | −0.89 |
Mx = 4.00 | My = 40.00 | r = 0.53 | ||
SDx = 1.90 | SDy = 11.78 |
As we saw earlier, there are two common situations in which the value of Pearson’s r can be misleading. One is when the relationship under study is nonlinear. Even though Figure 12.8 shows a fairly strong relationship between depression and sleep, Pearson’s r would be close to zero because the points in the scatterplot are not well fit by a single straight line. This means that it is important to make a scatterplot and confirm that a relationship is approximately linear before using Pearson’s r. The other is when one or both of the variables have a limited range in the sample relative to the population. This problem is referred to as restriction of range. Assume, for example, that there is a strong negative correlation between people’s age and their enjoyment of hip hop music as shown by the scatterplot in Figure 12.10. Pearson’s r here is −.77. However, if we were to collect data only from 18- to 24-year-olds—represented by the shaded area of Figure 12.11—then the relationship would seem to be quite weak. In fact, Pearson’s r for this restricted range of ages is 0. It is a good idea, therefore, to design studies to avoid restriction of range. For example, if age is one of your primary variables, then you can plan to collect data from people of a wide range of ages. Because restriction of range is not always anticipated or easily avoidable, however, it is good practice to examine your data for possible restriction of range and to interpret Pearson’s r in light of it. (There are also statistical methods to correct Pearson’s r for restriction of range, but they are beyond the scope of this book).
- Ollendick, T. H., Öst, L.-G., Reuterskiöld, L., Costa, N., Cederlund, R., Sirbu, C.,…Jarrett, M. A. (2009). One-session treatments of specific phobias in youth: A randomized clinical trial in the United States and Sweden. Journal of Consulting and Clinical Psychology, 77, 504–516. ↵
- Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155–159. ↵
- Hyde, J. S. (2007). New directions in the study of gender similarities and differences. Current Directions in Psychological Science, 16, 259–263. ↵
- Carlson, K. A., & Conard, J. M. (2011). The last name effect: How last name influences acquisition timing. Journal of Consumer Research, 38(2), 300-307. doi: 10.1086/658470 ↵
Studies in which researchers follow people in different age groups in a smaller period of time.
Studies in which one group of people are followed over time as they age.
An in-depth examination of an individual.
Learning Objectives
- Identify why researchers must provide a detailed description of methodology
- Describe what it means to use science in an ethical way
Research ethics involves examining the way that research is conducted and how the findings will be used. In this section, we’ll consider research ethics from both angles.
Doing science the ethical way
As you now know, researchers must consider their own ethical principles and follow those of their institution, discipline, and community. We’ve already considered many of the ways that social workers strive to ensure the ethical practice of research, such as informing and protecting subjects, but the practice of ethical research doesn’t end once subjects have been identified and data have been collected. Social workers must also fully disclose their research procedures and findings. This means being honest about subject identification and recruitment, data collection and analyzation, as well as being transparent with the study’s ultimate findings.
If researchers fully disclose how they conducted their research, then those who use their work to build research projects, create social policies, or make decisions can have confidence in the work. By sharing how research was conducted, the researcher assures their readers that they have conducted a legitimate study and that they didn’t simply come to whatever conclusions they wanted to find. A description or presentation of research findings that is not accompanied by information about research methodology is missing some relevant information. Sometimes methodological details are left out because there isn’t time or space to share them. This is often the case with news reports of research findings. Other times, there may be a more insidious reason that that important information isn’t there. This may be the case if sharing methodological details would raise questions about the study's legitimacy. As researchers, it is our ethical responsibility to fully disclose our research procedures. As consumers of research, it is our ethical responsibility to pay attention to such details. We’ll discuss this more in the next section.
There’s a New Yorker cartoon (https://www.art.com/products/p15063407512-sa-i6847806/dana-fradon-filing-cabinets-labeled-our-facts-their-facts-neutral-facts-disput-new-yorker-cartoon.htm?upi=PGQTTQ0) that depicts a set of filing cabinets that aptly demonstrates what we don’t want to see happen with regard to research. Each filing cabinet drawer in the cartoon is labeled differently. The labels include such headings as, “Our Facts,” “Their Facts,” “Neutral Facts,” “Disputable Facts,” “Absolute Facts,” “Bare Facts,” “Unsubstantiated Facts,” and “Indisputable Facts.” The cartoon insinuates that someone could open the file drawer of their choice and pick out the facts that they happen to like the most. While this may occur when using the unscientific ways of knowing described in Chapter 1, it is fortunately not how facts are discovered in social work, or any other science for that matter. There is a method to this madness that we call research.
Honesty in research is facilitated by the scientific principle of replication. Ideally, this means that one scientist could repeat another’s study with relative ease. By replicating a study, we may become more (or less) confident in the original study’s findings. Replication may prove extremely difficult, if not nearly impossible, to achieve with long-term ethnographic studies. Nevertheless, replication sets the standard that all social science researchers should provide as much detail as possible about the way conclusions are reached.
Full disclosure also includes being honest with oneself and others about the strengths and weaknesses of a study. Being aware of the strengths and weaknesses of your own work can help a researcher make reasonable recommendations about the next steps other researchers might consider taking in their inquiries. Awareness and disclosure of a study’s strengths and weaknesses can also help highlight the theoretical or policy implications of one’s work. In addition, openness about strengths and weaknesses helps readers evaluate the work and decide for themselves how or whether to rely on its findings. Finally, openness about a study’s sponsors is crucial. How can we effectively evaluate research without knowing who paid the bills?
The standard of replicability along with openness about a study’s strengths, weaknesses, and funders enable those who read the research to evaluate it fairly and completely. Knowledge of funding sources is often raised as an issue in medical research, but medical researchers aren’t the only ones who need to be honest about their funding. For example, if we know that a political think tank with ties to a particular party has funded some research, we can take that knowledge into consideration when reviewing the study’s findings and stated policy implications. Lastly, and related to this point, we must consider how, by whom, and for what purpose research may be used.
Using science the ethical way
Science has many uses. There are many ways that science can be understood and applied. Some use science to create laws and social policies, while others use it to understand themselves and those around them. Some people rely on science to improve the life conditions of themselves and others, while others may use it to improve their business or other undertakings. In any case, there are ethical ways to use science. We can use it to learn about the design and purpose of studies we want to utilize and apply. We can recognize the limitations of our scientific and methodological knowledge and analyze how this impacts our understanding of research. Further, we can learn to apply the findings of scientific investigation to the proper, relevant cases and populations.
Social scientists who conduct research on behalf of organizations and agencies may face additional ethical questions about the use of their research, particularly when the organization controls the final report and the publicity it receives. There is a potential conflict of interest for evaluation researchers who are employees of the agency being evaluated. A similar conflict of interest might exist between independent researchers whose work is being funded by some government agency or private foundation.
So who decides what constitutes ethical conduct or use of research? Perhaps we all do.
Key Takeaways
- Conducting research ethically requires that researchers be ethical not only in their data collection procedures but also in reporting their methods and findings.
- The ethical use of research requires an effort to understand research, an awareness of your own limitations in terms of knowledge and understanding, and the honest application of research findings.