6.5 Probability and the Normal Distribution

Domenic Spilotro, MSc

6.5 Probability and the Normal Distribution

Learning Objectives

By the end of this section, you will be able to:

Find the probability of a discrete event.
Explain the concept of the standard normal distribution and its importance for inference.
Calculate event probabilities based on transforming raw scores to [latex]z[/latex]-scores and percentiles and understand how they are applied to decision-making situations.
Transform [latex]z[/latex]-scores into raw scores given an event probability.

Probability can seem like a daunting topic for many students. In a mathematical statistics course this might be true, as the meaning and purpose of probability gets obscured and overwhelmed by equations and theory. In this chapter we will focus only on the principles and ideas necessary to lay the groundwork for future inferential statistics. We accomplish this by quickly tying the concepts of probability to what we already know about normal distributions and [latex]z[/latex]-scores.

What Is Probability?

When we speak of the probability of something happening, we are talking how likely it is that “thing” will happen based on the conditions present. For instance, what is the probability that it will rain? That is, how likely do we think it is that it will rain today under the circumstances or conditions today? To define or understand the conditions that might affect how likely it is to rain, we might look out the window and say, “It’s sunny outside, so it’s not very likely that it will rain today.” Stated using probability language: given that it is sunny outside, the probability of rain is low. “Given” is the word we use to state what the conditions are. As the conditions change, so does the probability. Thus, if it were cloudy and windy outside, we might say, “Given the current weather conditions, there is a high probability that it is going to rain.”

In these examples, we spoke about whether or not it is going to rain. Raining is an example of an event, which is the catch-all term we use to talk about any specific thing happening; it is a generic term that we specified to mean “rain” in exactly the same way that “conditions” is a generic term that we specified to mean “sunny” or “cloudy and windy.”

It should also be noted that the terms “low” and “high” are relative and vague, and they will likely be interpreted different by different people (in other words: given how vague the terminology was, the probability of different interpretations is high). Most of the time we try to use more precise language or, even better, numbers to represent the probability of our event. Regardless, the basic structure and logic of our statements are consistent with how we speak about probability using numbers and formulas.

Let’s look at a slightly deeper example. Say we have a regular, six-sided die (note that die is singular and dice is plural) and want to know how likely it is that we will roll a [latex]1[/latex]. That is, what is the probability of rolling a [latex]1[/latex], given that the die is not weighted (which would introduce what we call a bias, though that is beyond the scope of this section). We could roll the die and see if it is a [latex]1[/latex] or not, but that won’t tell us about the probability, it will only tell us a single result. We could also roll the die hundreds or thousands of times, recording each outcome and seeing what the final list looks like, but this is time consuming, and rolling a die that many times may lead down a dark path to gambling or, worse, playing Dungeons & Dragons. What we need is a simple equation that represents what we are looking for and what is possible.

To calculate the probability of an event, which here is defined as rolling a [latex]1[/latex] on an unbiased die, we need to know two things: how many outcomes satisfy the criteria of our event (stated differently, how many outcomes would count as what we are looking for) and the total number of outcomes possible. In our example, only a single outcome, rolling a [latex]1[/latex], will satisfy our criteria, and there are a total of six possible outcomes (rolling a [latex]1[/latex], rolling a [latex]2[/latex], rolling a [latex]3[/latex], rolling a [latex]4[/latex], rolling a [latex]5[/latex], and rolling a [latex]6[/latex]). Thus, the probability of rolling a [latex]1[/latex] on an unbiased die is [latex]1[/latex] in [latex]6[/latex] or [latex]\frac{1}{6}[/latex]. Put into an equation using generic terms, we get:

[latex]\text{Probability of an event}=\frac{\text{Number of outcomes that satisfy our criteria}}{\text{Total number of possible outcomes}}[/latex]

We can also use [latex]P()[/latex] as shorthand for probability and [latex]A[/latex] as shorthand for an event:

[latex]P(A)=\frac{\text{Number of outcomes that count as A}}{\text{Total number of possible outcomes}}[/latex]

Using this equation, let’s now calculate the probability of rolling an even number on this die:

[latex]P(\text{even number})=\frac{2,4,\text{or}\;6}{1,2,3,4,5,\text{or}\;6}=\frac{3}{6}=\frac{1}{2}[/latex]

So we have a [latex]50\%[/latex] chance of rolling an even number of this die. The principles laid out here operate under a certain set of conditions and can be elaborated into ideas that are complex yet powerful and elegant. However, such extensions are not necessary for a basic understanding of statistics, so we will end our discussion on the math of probability here. Now, let’s turn back to more familiar topics.

Probability in Graphs and Distributions

We will see shortly that the normal distribution is the key to how probability works for our purposes. To understand exactly how, let’s first look at a simple, intuitive example using pie charts.

Probability in Pie Charts

Recall that a pie chart represents how frequently a category was observed and that all slices of the pie chart add up to [latex]100\%[/latex], or [latex]1[/latex]. This means that if we randomly select an observation from the data used to create the pie chart, the probability of it taking on a specific value is exactly equal to the size of that category’s slice in the pie chart.

Take, for example, the pie chart in Figure 6.5.1 representing the favourite sports of [latex]100[/latex] people. If you put this pie chart on a dart board and aimed blindly (assuming you are guaranteed to hit the board), the likelihood of hitting the slice for any given sport would be equal to the size of that slice. So, the probability of hitting the baseball slice is the highest at [latex]36\%[/latex]. The probability is equal to the proportion of the chart taken up by that section.

Recall that a pie chart represents how frequently a category was observed and that all slices of the pie chart add up to 100%, or 1. — Figure 6.5.1. Favourite sports. (“Favorite Sports Pie Chart” by Judy Schmitt is licensed under CC BY-NC-SA 4.0.)

We can also add slices together. For instance, maybe we want to know the probability to finding someone whose favourite sport is usually played on grass. The outcomes that satisfy this criterion are baseball, football, and soccer. To get the probability, we simply add their slices together to see what proportion of the area of the pie chart is in that region: [latex]36\%+25\%+20\%=81\%[/latex]. We can also add sections together even if they do not touch. If we want to know the likelihood that someone’s favourite sport is not called football somewhere in the world (i.e., baseball and hockey), we can add those slices even though they aren’t adjacent or contiguous in the chart itself: [latex]36\%+20\%=56\%[/latex]. We are able to do all of this because (1) the size of the slice corresponds to the area of the chart taken up by that slice, (2) the percentage for a specific category can be represented as a decimal (this step was skipped for ease of explanation above), and (3) the total area of the chart is equal to [latex]100\%[/latex] or [latex]1.0[/latex], which makes the size of the slices interpretable.

Try It

1) In your own words, what is probability?

Solution

Your answer should include information about an event happening under certain conditions given certain criteria. You could also discuss the relationship between probability and the area under the curve or the proportion of the area in a chart.

Try It

2) There is a bag with [latex]5[/latex] red blocks, [latex]2[/latex] yellow blocks, and [latex]4[/latex] blue blocks. If you reach in and grab one block without looking, what is the probability it is red?

Solution

[latex]\frac{5}{11}=0.4545[/latex]

Probability in Normal Distributions

If the language at the end of the last section sounded familiar, that’s because its exactly the language used in Section 6.4 to describe the normal distribution. Recall that the normal distribution has an area under its curve that is equal to [latex]1[/latex] and that it can be split into sections by drawing a line through it that corresponds to a given [latex]z[/latex]-score. Because of this, we can interpret areas under the normal curve as probabilities that correspond to [latex]z[/latex]-scores.

First, let’s look at the area between [latex]z=-1.00[/latex] and [latex]z=1.00[/latex] presented in Figure 6.5.2. We were told earlier that this region contains [latex]68\%[/latex] of the area under the curve. Thus, if we randomly chose a [latex]z[/latex]-score from all possible [latex]z[/latex]-scores, there is a [latex]68\%[/latex] chance that it will be between [latex]z==1.00[/latex] and [latex]z=1.00[/latex] because those are the [latex]z[/latex]-scores that satisfy our criteria.

See text description — Figure 6.5.2. There is a 68% chance of selecting a z-score from the blue-shaded region. (“68 Percent of the Area under the Curve” by Judy Schmitt is licensed under CC BY-NC-SA 4.0.)

Just like a pie chart is broken up into slices by drawing lines through it, we can also draw a line through the normal distribution to split it into sections. Take a look at the normal distribution in Figure 6.5.3, which has a line drawn through it at [latex]z=1.25[/latex]. This line creates two sections of the distribution: the smaller section called the tail and the larger section called the body. Differentiating between the body and the tail does not depend on which side of the distribution the line is drawn. All that matters is the relative size of the pieces: bigger is always body.

As you can see, we can break up the normal distribution into [latex]3[/latex] pieces (lower tail, body, and upper tail) as in Figure 6.5.2 or into [latex]2[/latex] pieces (body and tail) as in Figure 6.5.3. We can then find the proportion of the area in the body and tail based on where the line was drawn (i.e., at what [latex]z[/latex]-score). Mathematically, this is done using calculus. Fortunately, the exact values are given to you in the Standard Normal Distribution Table, also known at the [latex]z[/latex]-table. A portion of this table is shown in table 6.5.1 (the entire z-score table). Using the [latex]z[/latex] values in the table (A), we can find the area under the normal curve in any body and can calculate the area under the normal curve in any tail.

For example, suppose we want to find the area in the body for a [latex]z[/latex]-score of [latex]1.62[/latex]. As shown in Table 6.5.1, the row for [latex]1.62[/latex] corresponds with a value of [latex]0.9474[/latex] for the proportion in the body of the distribution. This cell has been highlighted in yellow to help locate it. Thus, the odds of randomly selecting someone with a [latex]z[/latex]-score less than (to the left of) [latex]z=1.62[/latex] is [latex]94.74\%[/latex] because that is the proportion of the area taken up by values that satisfy our criteria.

Mathematically, we present this solution in the following way:

[latex]P(z<1.62)=0.9474[/latex]

Table 6.5.1 Area Under The Standard Normal Curve ([latex]Z>0[/latex])
Z	0	0.01	0.02	0.03	0.04	0.05	0.06	0.07	0.08	0.09
0	0.5000	0.5040	0.5080	0.5120	0.5160	0.5199	0.5239	0.5279	0.5319	0.5359
0.1	0.5398	0.5438	0.5478	0.5517	0.5557	0.5596	0.5636	0.5675	0.5714	0.5753
0.2	0.5793	0.5832	0.5871	0.5910	0.5948	0.5987	0.6026	0.6064	0.6103	0.6141
0.3	0.6179	0.6217	0.6255	0.6293	0.6331	0.6368	0.6406	0.6443	0.6480	0.6517
0.4	0.6554	0.6591	0.6628	0.6664	0.6700	0.6736	0.6772	0.6808	0.6844	0.6879
0.5	0.6915	0.6950	0.6985	0.7019	0.7054	0.7088	0.7123	0.7157	0.7190	0.7224
0.6	0.7257	0.7291	0.7324	0.7357	0.7389	0.7422	0.7454	0.7486	0.7517	0.7549
0.7	0.7580	0.7611	0.7642	0.7673	0.7704	0.7734	0.7764	0.7794	0.7823	0.7852
0.8	0.7881	0.7910	0.7939	0.7967	0.7995	0.8023	0.8051	0.8078	0.8106	0.8133
0.9	0.8159	0.8186	0.8212	0.8238	0.8264	0.8289	0.8315	0.8340	0.8365	0.8389
1.0	0.8413	0.8438	0.8461	0.8485	0.8508	0.8531	0.8554	0.8577	0.8599	0.8621
1.1	0.8643	0.8665	0.8686	0.8708	0.8729	0.8749	0.8770	0.8790	0.8810	0.8830
1.2	0.8849	0.8869	0.8888	0.8907	0.8925	0.8944	0.8962	0.8980	0.8997	0.9015
1.3	0.9032	0.9049	0.9066	0.9082	0.9099	0.9115	0.9131	0.9147	0.9162	0.9177
1.4	0.9192	0.9207	0.9222	0.9236	0.9251	0.9265	0.9279	0.9292	0.9306	0.9319
1.5	0.9332	0.9345	0.9357	0.9370	0.9382	0.9394	0.9406	0.9418	0.9429	0.9441
1.6	0.9452	0.9463	0.9474	0.9484	0.9495	0.9505	0.9515	0.9525	0.9535	0.9545
1.7	0.9554	0.9564	0.9573	0.9582	0.9591	0.9599	0.9608	0.9616	0.9625	0.9633

Some versions of the [latex]z[/latex]-table only present the area in the body for positive [latex]z[/latex]-scores because the normal distribution is symmetrical. The area in the body of [latex]z=1.62[/latex] is equal to the area in the body for [latex]z=-1.62[/latex], though now — as illustrated in the middle distribution at the top of Table 6.5.1 — the body will be the shaded area to the right of [latex]z[/latex]. (When in doubt, drawing out your distribution and shading the area you need to find will always help.) Because the total area under the normal curve is always equal to [latex]1.00[/latex], the area in the tail is simply the area in the body subtracted from [latex]1.00[/latex] ([latex]1.00-0.9474=0.0526[/latex]). We will also provide the negative side of the [latex]z[/latex]-score table in our course, so we will be able to use it to find probabilities associated with negative [latex]z[/latex]-scores.

Let’s look at another example. This time, let’s find the area corresponding to [latex]z[/latex]-scores more extreme than [latex]z=-1.96[/latex] and [latex]z=1.96[/latex]. That is, let’s find the area in the tails of the distribution for values less than [latex]z=-1.96[/latex] (farther negative and therefore more extreme) and greater than [latex]z=1.96[/latex] (farther positive and therefore more extreme). This region is illustrated in Figure 6.5.4.

Mathematically, we are looking for

[latex]P(z<-1.96)[/latex] and [latex]P(z>1.96)[/latex]

Let’s start with the tail for [latex]z=1.96[/latex]. Our [latex]z[/latex]-table only provides the area under the curve to the left of a particular [latex]z[/latex]-score. As such, finding [latex]P(z<-1.96)[/latex] is straightforward, and we need only look to the negative side of the [latex]z[/latex]-score table to find it. Looking at the negative side of the [latex]z[/latex]-score table, we find [latex]P(z<-1.96)=0.0250[/latex]. Since the distribution is symmetric, this means we can deduct that [latex]P(z>1.96)=0.0250[/latex] as well. Finally, to get the total area in the shaded region, we simply add the areas together to get [latex]0.0500[/latex]. Thus, there is a [latex]5\%[/latex] chance of randomly getting a value more extreme than [latex]z=-1.96[/latex] or [latex]z=1.96[/latex].

Finally, we can find the area between two [latex]z[/latex]-scores by shading and subtracting. Figure 6.5.5 shows the area between [latex]z=0.50[/latex] and [latex]z=1.50[/latex]. Because this is a subsection of a body (rather than just a body or a tail), we must first find the larger of the two bodies, in this case the body for [latex]z=1.50[/latex], and subtract the smaller of the two bodies, or the body for [latex]z=0.50[/latex]. Aligning the distributions vertically, as in Figure 6.5.5, makes this clearer. From the complete [latex]z[/latex]-table, we see that the area in the body for [latex]z>=1.50[/latex] is [latex]0.9332[/latex], and the area in the body for [latex]z=0.50[/latex] is [latex]0.6915[/latex]. Subtracting these gives us [latex]0.9332-0.6915=0.2417[/latex].

Mathematically, we write:

[latex]\begin{align*}P(0.50\le z\le 1.50)&=P(z\le 1.50)-P(z\le 0.50)\\[2ex]P(0.50\le z\le 1.50)&=0.9332-0.6915\\[2ex]P(0.50\le z\le 1.50)&=0.2417\end{align*}[/latex]

Example 6.5.1

Use the [latex]z[/latex]-table to find the following probabilities:

a. [latex]P(z\le 2)[/latex]
b. [latex]P(z\geq 2)[/latex]
c. [latex]P(-2\le z\le 2)[/latex]

Solution

a.
Step 1: Access the full z-table.

Step 2: Determine the probability for [latex]P(z< 2)[/latex]

[latex]P(z< 2)=0.9772[/latex]

b.
Step 1: Write an equation to find the probability for [latex]P(z\geq 2)[/latex]

[latex]P(z\geq 2)=1-P(z< 2)[/latex]

Step 2: Access the full z-table.

Step 3: Determine the probability for [latex]P(z< 2)[/latex] and substitute into the equation.
You can also use your answer from part a.

[latex]P(z\geq 2)=1-0.9772[/latex]

Step 4: Solve.

[latex]P(z\geq 2)=0.0228[/latex]

c.
Step 1: We can find the area between [latex]2[/latex] [latex]z[/latex]-scored by subtracting.

[latex]P(-2\le z \le 2)=P(z\le 2)-P(z\le -2)[/latex]

Step 2: Access the full z-table.

Step 3: Substitute the values into the equation.

[latex]P(-2\le z \le 2)=0.9772-0.0228[/latex]

Step 4: Solve

[latex]P(-2\le z \le 2)=0.9544[/latex]

Try It

3) Use the z-table to find the following probabilities:

a. [latex]P(z\le -0.73)[/latex]
b. [latex]P(z\ge 2.65)[/latex]
c. [latex]P(-1.8\le z\le 0.82)[/latex]

Solution

a. [latex]P(z\le -0.73)=0.2327[/latex]
b. [latex]P(z\ge 2.65)=0.0040[/latex]
c. [latex]P(-1.8\le z\le 0.82)=0.7580[/latex]

Try It

4) Under a normal distribution, which of the following is more likely? (Note: this question can be answered without any calculations if you draw out the distributions and shade properly.)

Getting a [latex]z[/latex]-score greater than [latex]z=2.75[/latex]
Getting a [latex]z[/latex]-score less than [latex]z=-1.50[/latex]

Solution

Getting a [latex]z[/latex]-score less than [latex]z=-1.50[/latex] is more likely. [latex]z=2.75[/latex] is farther out into the right tail than [latex]z=-1.50[/latex] is into the left tail; therefore, there are fewer more extreme scores beyond [latex]2.75[/latex] than [latex]-1.50[/latex], regardless of the direction.

Standardizing Normal Distributions

In the examples above, we have been given the [latex]z[/latex]-scores directly. However, most of the time, we are studying a random variable [latex]X[/latex] where the mean and standard deviation are different from zero and one respectively. Thus, we need to use a process called standardization, sometimes referred to in mathematical notation as [latex]X\sim N(\mu, \sigma)[/latex] to [latex]Z\sim N(0,1)[/latex], in order to be able to find the probabilities using [latex]z[/latex]-scores. To do this, use the formula,

[latex]z=\frac{x-\mu}{\sigma}[/latex]

Using this formula standardizes the random variable and finds the [latex]z[/latex]-score associated with the values of interest so that we can find the probabilities in the [latex]z[/latex]-table.

Example 6.5.2

If the mean birth weight of a population of babies were [latex]3,370[/latex] grams and the standard deviation was [latex]150[/latex] grams.

a. Find the probability that a baby from that population has a birth weight of less than [latex]3000[/latex] grams.
b. Find the probability that a baby from that population has a birth weight of more than [latex]3500[/latex] grams.
c. Find the probability that a baby from that population has a birth weight between [latex]3300[/latex] grams and [latex]3600[/latex] grams.

Solution

a.
Step 1: Identify what you are looking for.

[latex]P(X<3000)[/latex]

Step 2: Use the formula [latex]z=\frac{x-\mu}{\sigma}[/latex] to find the [latex]z[/latex]-score.

[latex]\begin{align*}P(X<3000)&=P\left(z< \frac{3000-3370}{150}\right)\\P(X<3000)&=P\left(z< -2.47\right)\end{align*}[/latex]

Step 3: Find the probability in the z-table.

[latex]P(X<3000)=0.0068[/latex]

b.
Step 1: Identify what you are looking for.

[latex]P(X>3500)[/latex]

Step 2: Use the formula [latex]z=\frac{x-\mu}{\sigma}[/latex] to find the [latex]z[/latex]-score.
Remember to always round your [latex]z[/latex]-score to two decimal places.

[latex]\begin{align*}P(X>3500)&=P\left(z> \frac{3500-3370}{150}\right)\\[2ex]P(X>3500)&=P\left(z> 0.87\right)\end{align*}[/latex]

Step 3: Write the equation to find the [latex]z[/latex]-score more than [latex]z[/latex].

[latex]P(X>3500)=1-P\left(z< 0.87\right)[/latex]

Step 4: Replace with the probability in the z-table.

[latex]P(X>3500)=1-0.8078[/latex]

Step 5: Subtract.

[latex]P(X>3500)=0.1922[/latex]

c.
Step 1: Identify what you are looking for.

[latex]P\left(3300

Step 2: Use the formula [latex]z=\frac{x-\mu}{\sigma}[/latex] to find the [latex]z[/latex]-scores.

[latex]\begin{align*}P(3300

Step 3: Replace with the probabilities in the z-table and solve.

[latex]\begin{align*}P(3300< X< 3600)&=0.9370-0.3192\\[2ex]P(3300< X< 3600)&=0.6178\end{align*}[/latex]

Example 6.5.3

The scores for a test were normally distributed with a mean of [latex]66\%[/latex] and a standard deviation of [latex]4\%[/latex].

a. Find the probability that a randomly selected student scored less than [latex]55\%[/latex].
b. Find the probability that a student scored between [latex]70\%[/latex] and [latex]75\%[/latex].

Solution

a.
Step 1: Identify what you are looking for.

[latex]P(X<55)[/latex]

Step 2: Use the formula [latex]z=\frac{x-\mu}{\sigma}[/latex] to find the [latex]z[/latex]-score.

[latex]\begin{align*}P(X<55)&=P\left(z< \frac{55-66}{4}\right)\\P(X<3000)&=P\left(z< -2.75\right)\end{align*}[/latex]

Step 3: Find the probability in the z-table.

[latex]P(X<3000)=0.0030[/latex]

b.
Step 1: Identify what you are looking for.

[latex]P\left(70

Step 2: Use the formula [latex]z=\frac{x-\mu}{\sigma}[/latex] to find the [latex]z[/latex]-scores.

[latex]\begin{align*}P(70

Step 3: Replace with the probabilities in the z-table and solve.

[latex]\begin{align*}P(70

Try It

5) The heights of women in the United States are normally distributed with a mean of [latex]63.7[/latex] inches and a standard deviation of [latex]2.7[/latex] inches. If you randomly select a woman in the United States, what is the probability that she will be between [latex]65[/latex] and [latex]67[/latex] inches tall?

Solution

The probability that a randomly selected woman is between [latex]65[/latex] inches and [latex]67[/latex] inches tall is [latex]0.2044[/latex] or [latex]20.4\%[/latex].

Try It

6) The heights of men in the United States are normally distributed with a mean of [latex]69.1[/latex] inches and a standard deviation of [latex]2.9[/latex] inches. What proportion of men are taller than [latex]6[/latex] feet ([latex]72[/latex] inches)?

Solution

[latex]15.87\%[/latex] or [latex]0.1587[/latex]

Try It

7) You know you need to score at least [latex]82[/latex] points on the final exam to pass your class. After the final, you find out that the average score on the exam was [latex]78[/latex] with a standard deviation of [latex]7[/latex]. How likely is it that you pass the class?

Solution

It is not likely that you pass the class since the probability of scoring at least 82 points on the final exam is [latex]0.2843[/latex] or [latex]28.43\%[/latex].

Using Percentiles and Z-scores to Find Original Value

A useful tool when studying normal distributions is being able to track backwards to find the value on the original distribution that is associated with a particular [latex]z[/latex]-score or percentile. We can do this by a process sometimes called “un-standardizing”. Essentially, we are just rearranging the standardizing formula to solve for [latex]x[/latex].

[latex]\begin{align*}z&=\frac{x-\mu}{\sigma}\\[2ex]\sigma z&=x-\mu\\[2ex]\sigma z+\mu&=x\end{align*}[/latex]

So we can find the original data value by using the formula: [latex]x=\mu +z\sigma[/latex].

Example 6.5.4

If the mean birth weight of a population of babies were [latex]3,370[/latex] grams and the standard deviation was [latex]150[/latex] grams.

a. Find the birth weight of a baby that has a [latex]z[/latex]-score of [latex]2.34[/latex].
b. Find the birth weight of a baby in the [latex]90^{th}[/latex] percentile.

Solution

a.
Step 1: Use the equation [latex]x=\mu+z\sigma[/latex]

Step 2: Substitute the values.

[latex]x=3370+2.34(150)[/latex]

Step 3: Solve.

[latex]x=3721[/latex]

b.
Step 1: Look at the z-table to find the closest to [latex]0.9000[/latex] as we can find.

We can find [latex]1.28[/latex] as this [latex]z[/latex]-score.

Step 2: Substitute the [latex]z[/latex]-score into the equation.

[latex]\begin{align*}x&=\mu+z\sigma\\[2ex]x&=3370+1.28(150)\\[2ex]x&=3520\end{align*}[/latex]

Example 6.5.5

A citrus farmer who grows mandarin oranges finds that the diameters of mandarin oranges harvested on his farm follow a normal distribution with a mean diameter of [latex]5.85[/latex]cm and a standard deviation of [latex]0.24[/latex]cm.

a. Find the [latex]90^{th}[/latex] percentile for the diameters of mandarin oranges:
b. The middle [latex]20\%[/latex] of mandarin oranges from this farm have diameters between and .

Solution

a.
Step 1: Look at the z-table to find the closest to [latex]0.9000[/latex] as we can find.

We can find [latex]1.28[/latex] as this [latex]z[/latex]-score.

Step 2: Substitute the [latex]z[/latex]-score into the equation.

[latex]\begin{align*}x&=\mu+z\sigma\\[2ex]x&=5.85+1.28(0.24)\\[2ex]x&=6.16\end{align*}[/latex]

b.
Step 1: Find the area of the tails for the middle [latex]20\%[/latex].

[latex]\begin{align*} &\;&1-0.20&=0.80\\ &\text{Each tail will have half the value.}\;&0.80\div2&=0.40 \end{align*}[/latex]

Step 2: Find the value for the [latex]40^{th}[/latex] percentile.
Use the z-table to find the closest to [latex]0.4000[/latex].

[latex]\begin{align*}x&=\mu+z\sigma\\[2ex]x&=5.85+(-0.25)(0.24)\\[2ex]x&=5.79\end{align*}[/latex]

Step 3: Find the value for the [latex]60^{th}[/latex] percentile, [latex](0.40+0.20=0.60)[/latex].
Use the z-table to find the closest to [latex]0.6000[/latex].

[latex]\begin{align*}x&=\mu+z\sigma\\[2ex]x&=5.85+(0.26)(0.24)\\[2ex]x&=5.91\end{align*}[/latex]

Step 4: Answer the question in a complete sentence.

The middle [latex]20\%[/latex] of mandarin oranges from this farm have diameters between [latex]5.79[/latex] cm and [latex]5.91[/latex] cm.

Try It

8) What proportion of the area under the normal curve is greater than [latex]z=1.65[/latex]?

Solution

[latex]4.95\%[/latex] or [latex]0.0495[/latex]

Try It

9) Find the [latex]z[/latex]-score that bounds [latex]25\%[/latex] of the lower tail of the distribution.

Solution

[latex]z=-0.67[/latex]

Try It

10) Find the [latex]z[/latex]-score that bounds the top [latex]9\%[/latex] of the distribution.

Solution

[latex]z=1.34[/latex] (The top [latex]9\%[/latex] means [latex]9\%[/latex] of the area is in the upper tail and [latex]91\%[/latex] is in the body to the left; the value in the normal table closest to [latex]0.9100[/latex] is [latex]0.9099[/latex], which corresponds to [latex]z=1.34[/latex].)

Probability: The Bigger Picture

The concepts and ideas presented in this chapter are likely not intuitive at first. Probability is a tough topic for everyone, but the tools it gives us are incredibly powerful and enable us to do amazing things with data analysis. They are the heart of how inferential statistics work.

To summarize, the probability that an event happens is the number of outcomes that qualify as that event (i.e., the number of ways the event could happen) compared to the total number of outcomes (i.e., how many things are possible). This extends to graphs like a pie chart, where the biggest slices take up more of the area and are therefore more likely to be chosen at random. This idea then brings us back around to our normal distribution, which can also be broken up into regions or areas, each of which is bounded by one or two [latex]z[/latex]-scores and corresponds to all [latex]z[/latex]-scores in that region. The probability of randomly getting one of those [latex]z[/latex]-scores in the specified region can then be found on the Standard Normal Distribution Table. Thus, the larger the region, the more likely an event is, and vice versa. Because the tails of the distribution are, by definition, smaller and we go farther out into the tail, the likelihood or probability of finding a result out in the extremes becomes small.

Try It

11) In a distribution with a mean of [latex]70[/latex] and standard deviation of [latex]12[/latex], what proportion of scores are lower than [latex]55[/latex]?

Solution

In this distribution, the proportion of scores that are lower than [latex]55[/latex] is [latex]\frac{66}{625}[/latex] or the probability is [latex]0.1056[/latex].

Self Check

a) After completing the exercises, use this checklist to evaluate your mastery of the objectives of this section.

b) After looking at the checklist, do you think you are well-prepared for the next section? Why or why not?

Glossary

event: Any specific outcome that could happen.

probability: The likelihood of a statistical result or the number of outcomes that satisfy specific criteria divided by the total number of possible outcomes.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Fanshawe Pre-Health Sciences Mathematics 2 Copyright © 2022 by Domenic Spilotro, MSc is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

What Is Probability?

Probability in Graphs and Distributions

Probability in Pie Charts

Probability in Normal Distributions

Standardizing Normal Distributions

Using Percentiles and Z-scores to Find Original Value

Probability: The Bigger Picture

Self Check

License

Share This Book