8.0 Introduction
Chapter Learning Objectives
By the end of the chapter, you should be able to:
- Calculate a line of best fit for correlated variables using linear regression.
- Calculate the Coefficient of determination.
In many studies, we measure more than one variable for each individual. For example, we measure precipitation and plant growth, or number of young with nesting habitat, or soil erosion and volume of water. We collect pairs of data and instead of examining each variable separately (univariate data), we want to find ways to describe bivariate data, in which two variables are measured on each subject in our sample. Given such data, we begin by determining if there is a relationship between these two variables. As the values of one variable change, do we see corresponding changes in the other variable?
We can describe the relationship between these two variables graphically and numerically. We begin by considering the concept of correlation.
Correlation is defined as the statistical association between two variables.
A correlation exists between two variables when one of them is related to the other in some way. A scatterplot is the best place to start. A scatterplot (or scatter diagram) is a graph of the paired ([latex]x[/latex], [latex]y[/latex]) sample data with a horizontal [latex]x[/latex]-axis and a vertical [latex]y[/latex]-axis. Each individual ([latex]x[/latex], [latex]y[/latex]) pair is plotted as a single point.
Image Description
In this example, we plot bear chest girth ([latex]y[/latex]) against bear length ([latex]x[/latex]). When examining a scatterplot, we should study the overall pattern of the plotted points. In this example, we see that the value for chest girth does tend to increase as the value of length increases. We can see an upward slope and a straight-line pattern in the plotted data points.
Key Takeaways
A scatterplot can identify several different types of relationships between two variables.
- A relationship has no correlation when the points on a scatterplot do not show any pattern.
- A relationship is non-linear when the points on a scatterplot follow a pattern but not a straight line.
- A relationship is linear when the points on a scatterplot follow a somewhat straight line pattern. This is the relationship that we will examine.
Linear relationships can be either positive or negative. Positive relationships have points that incline upwards to the right. As [latex]x[/latex] values increase, [latex]y[/latex] values increase. As [latex]x[/latex] values decrease, [latex]y[/latex] values decrease. For example, when studying plants, height typically increases as diameter increases.
Image Description
Negative relationships have points that decline downward to the right. As [latex]x[/latex] values increase, [latex]y[/latex] values decrease. As [latex]x[/latex] values decrease, [latex]y[/latex] values increase. For example, as wind speed increases, wind chill temperature decreases.
Image Description
This image is a scatterplot graph titled “Scatterplot of Wind Chill at [latex]20^\circ{}F[/latex] vs Speed.” The graph depicts data points representing wind chill temperatures at various wind speeds, measured at [latex]20[/latex] degrees Fahrenheit.
The x-axis is labeled “Speed” and ranges from [latex]5.0[/latex] to [latex]20.0[/latex]. The y-axis is labeled “[latex]20^\circ{}F[/latex]” and ranges from [latex]-10[/latex] to [latex]10[/latex].
Red dots are plotted on the graph to represent the data points. As the speed increases from left to right, the wind chill temperature decreases. At a speed of approximately [latex]5.0[/latex], the wind chill is close to [latex]10.0[/latex]. At a speed of approximately [latex]20.0[/latex], the wind chill is close to [latex]-10.0[/latex].
Non-linear relationships have an apparent pattern, just not linear. For example, as age increases height increases up to a point then levels off after reaching a maximum height.
Image Description
This image is a scatterplot titled “Scatterplot of Height vs Age”. The plot displays individual data points (marked by red dots) representing height on the y-axis and age on the x-axis. The x-axis is labeled “Age” and ranges from [latex]0[/latex] to [latex]60[/latex], while the y-axis is labeled “Height” and ranges from [latex]1[/latex] to [latex]6[/latex].
Most data points cluster between ages [latex]0[/latex] and [latex]10[/latex] with heights ranging from just over [latex]1[/latex] to nearly [latex]4[/latex]. As age increases to between [latex]10[/latex] and [latex]20[/latex], the height of the data points also increases, with the majority reaching heights between [latex]4[/latex] and [latex]6[/latex]. Beyond the age of [latex]20[/latex], the height remains relatively constant around the [latex]6[/latex] mark, with a few data points extending up to age [latex]50[/latex].
When two variables have no relationship, there is no straight-line relationship or non-linear relationship. When one variable changes, it does not influence the other variable.
Image Description
The image is a scatterplot chart titled “Scatterplot of growth vs area.” The x-axis is labeled “area” with values ranging from [latex]0[/latex] to [latex]30[/latex], and the y-axis is labeled “growth” with values ranging from [latex]10[/latex] to [latex]35[/latex]. There are ten red dots plotted at various coordinates, representing different values of growth vs area:
- Approximately ([latex]2, 25[/latex])
- Approximately ([latex]4, 15[/latex])
- Approximately ([latex]5, 23[/latex])
- Approximately ([latex]10, 12[/latex])
- Approximately ([latex]15, 30[/latex])
- Approximately ([latex]20, 21[/latex])
- Approximately ([latex]20, 15[/latex])
- Approximately ([latex]22, 27[/latex])
- Approximately ([latex]25, 32[/latex])
- Approximately ([latex]30, 18[/latex])
Attribution
“Chapter 7: Correlation and Simple Linear Regression” from Natural Resources Biometrics by Diane Kiernan is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, except where otherwise noted.