8.0 Introduction

Chapter Learning Objectives

By the end of the chapter, you should be able to:

  • Calculate a line of best fit for correlated variables using linear regression.
  • Calculate the Coefficient of determination.

In many studies, we measure more than one variable for each individual. For example, we measure precipitation and plant growth, or number of young with nesting habitat, or soil erosion and volume of water. We collect pairs of data and instead of examining each variable separately (univariate data), we want to find ways to describe bivariate data, in which two variables are measured on each subject in our sample. Given such data, we begin by determining if there is a relationship between these two variables. As the values of one variable change, do we see corresponding changes in the other variable?

We can describe the relationship between these two variables graphically and numerically. We begin by considering the concept of correlation.

Correlation is defined as the statistical association between two variables.

A correlation exists between two variables when one of them is related to the other in some way. A scatterplot is the best place to start. A scatterplot (or scatter diagram) is a graph of the paired ([latex]x[/latex], [latex]y[/latex]) sample data with a horizontal [latex]x[/latex]-axis and a vertical [latex]y[/latex]-axis. Each individual ([latex]x[/latex], [latex]y[/latex]) pair is plotted as a single point.

A scatterplot detailing the correlation between chest girth and length.
Figure 8.0.1 Scatterplot of chest girth versus length.
Image Description
The image is a scatterplot titled “Scatterplot of Chest.G vs Length.” It shows a positive correlation between two variables: Chest.G (on the y-axis) and Length (on the x-axis). The x-axis is labeled “Length” and ranges from [latex]30[/latex] to [latex]80[/latex], while the y-axis is labeled “Chest.G” and ranges from [latex]20[/latex] to [latex]60[/latex]. The plot contains numerous red dots representing individual data points, which are dispersed in such a way that they generally trend upwards from the lower-left corner to the upper-right corner of the plot, indicating that as Length increases, Chest.G also tends to increase.

In this example, we plot bear chest girth ([latex]y[/latex]) against bear length ([latex]x[/latex]). When examining a scatterplot, we should study the overall pattern of the plotted points. In this example, we see that the value for chest girth does tend to increase as the value of length increases. We can see an upward slope and a straight-line pattern in the plotted data points.

Key Takeaways

A scatterplot can identify several different types of relationships between two variables.

  • A relationship has no correlation when the points on a scatterplot do not show any pattern.
  • A relationship is non-linear when the points on a scatterplot follow a pattern but not a straight line.
  • A relationship is linear when the points on a scatterplot follow a somewhat straight line pattern. This is the relationship that we will examine.

Linear relationships can be either positive or negative. Positive relationships have points that incline upwards to the right. As [latex]x[/latex] values increase, [latex]y[/latex] values increase. As [latex]x[/latex] values decrease, [latex]y[/latex] values decrease. For example, when studying plants, height typically increases as diameter increases.

A scatterplot visualizing the linear correlation between the height and diameter of plant stems.
Figure 8.0.2 Scatterplot of height versus diameter.
Image Description
A scatter plot showing the relationship between height and diameter. The x-axis is labeled “Diameter” and ranges from [latex]1[/latex] to [latex]5[/latex]. The y-axis is labeled “Height” and ranges from [latex]2[/latex] to [latex]6[/latex]. There are fourteen red points plotted, indicating a positive correlation between height and diameter.

Negative relationships have points that decline downward to the right. As [latex]x[/latex] values increase, [latex]y[/latex] values decrease. As [latex]x[/latex] values decrease, [latex]y[/latex] values increase. For example, as wind speed increases, wind chill temperature decreases.

A scatterplot visualizing the negative correlation between wind speed and wind chill.
Figure 8.0.3 Scatterplot of temperature versus wind speed.
Image Description

This image is a scatterplot graph titled “Scatterplot of Wind Chill at [latex]20^\circ{}F[/latex] vs Speed.” The graph depicts data points representing wind chill temperatures at various wind speeds, measured at [latex]20[/latex] degrees Fahrenheit.

The x-axis is labeled “Speed” and ranges from [latex]5.0[/latex] to [latex]20.0[/latex]. The y-axis is labeled “[latex]20^\circ{}F[/latex]” and ranges from [latex]-10[/latex] to [latex]10[/latex].

Red dots are plotted on the graph to represent the data points. As the speed increases from left to right, the wind chill temperature decreases. At a speed of approximately [latex]5.0[/latex], the wind chill is close to [latex]10.0[/latex]. At a speed of approximately [latex]20.0[/latex], the wind chill is close to [latex]-10.0[/latex].

Non-linear relationships have an apparent pattern, just not linear. For example, as age increases height increases up to a point then levels off after reaching a maximum height.

A scatterplot visualizing the correlation between age and height, a correlation that is non-linear.
Figure 8.0.4 Scatterplot of height versus age.
Image Description

This image is a scatterplot titled “Scatterplot of Height vs Age”. The plot displays individual data points (marked by red dots) representing height on the y-axis and age on the x-axis. The x-axis is labeled “Age” and ranges from [latex]0[/latex] to [latex]60[/latex], while the y-axis is labeled “Height” and ranges from [latex]1[/latex] to [latex]6[/latex].

Most data points cluster between ages [latex]0[/latex] and [latex]10[/latex] with heights ranging from just over [latex]1[/latex] to nearly [latex]4[/latex]. As age increases to between [latex]10[/latex] and [latex]20[/latex], the height of the data points also increases, with the majority reaching heights between [latex]4[/latex] and [latex]6[/latex]. Beyond the age of [latex]20[/latex], the height remains relatively constant around the [latex]6[/latex] mark, with a few data points extending up to age [latex]50[/latex].

When two variables have no relationship, there is no straight-line relationship or non-linear relationship. When one variable changes, it does not influence the other variable.

A scatterplot visualizing the uncorrelated variables of growth and area.
Figure 8.0.5 Scatterplot of growth versus area.
Image Description

The image is a scatterplot chart titled “Scatterplot of growth vs area.” The x-axis is labeled “area” with values ranging from [latex]0[/latex] to [latex]30[/latex], and the y-axis is labeled “growth” with values ranging from [latex]10[/latex] to [latex]35[/latex]. There are ten red dots plotted at various coordinates, representing different values of growth vs area:

  • Approximately ([latex]2, 25[/latex])
  • Approximately ([latex]4, 15[/latex])
  • Approximately ([latex]5, 23[/latex])
  • Approximately ([latex]10, 12[/latex])
  • Approximately ([latex]15, 30[/latex])
  • Approximately ([latex]20, 21[/latex])
  • Approximately ([latex]20, 15[/latex])
  • Approximately ([latex]22, 27[/latex])
  • Approximately ([latex]25, 32[/latex])
  • Approximately ([latex]30, 18[/latex])

Attribution

Chapter 7: Correlation and Simple Linear Regression” from Natural Resources Biometrics by Diane Kiernan is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, except where otherwise noted.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Mathematics of Finance Copyright © 2024 by Sharon Wang is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book