1.1 Introduction to Statistics and Key Terms
Learning Objectives
By the end of this chapter, the student should be able to:
- Recognize and differentiate between key terms dealing with statistics
- Identify different types of data
- Identify data collection methods and study designs
- Apply various types of sampling methods to data collection
You are probably asking yourself the question, “When and where will I use statistics?” If you read any newspaper, watch television, or use the Internet, you will see statistical information. There are statistics about crime, sports, education, politics, and real estate. Typically, when you read a newspaper article or watch a television news program, you are given sample information. With this information, you may make a decision about the correctness of a statement, claim, or “fact.” Statistical methods can help you make the “best educated guess.”
Since you will undoubtedly be given statistical information at some point in your life, you need to know some techniques for analyzing the information thoughtfully. Think about buying a house or managing a budget. Think about your chosen profession. The fields of economics, business, psychology, education, biology, law, computer science, police science, and early childhood development require at least one course in statistics.
Included in this chapter are the basic ideas and words of probability and statistics. You will soon understand that statistics and probability work together. You will also learn how data are gathered and how “good” data can be distinguished from “bad.”
The Study of Statistics
We see and use data in our everyday lives. The science of statistics deals with the collection, analysis, interpretation, and presentation of data. This is reflected in the data analysis process which we will expand on in the next section.
You will first learn how to organize and summarize data. Organizing, summarizing, and presenting data is called descriptive statistics. Two ways to summarize data are by graphing and by using numbers (for example, finding an average). After you have studied probability and probability distributions, you will use formal methods for drawing useful conclusions from data while filtering out the noise. The formal methods are called inferential statistics.
Effective interpretation of data (inference) is based on good procedures for producing data and thoughtful examination. You will encounter what will seem to be a lot of mathematical formulas to make calculations. Keep in mind, the goal of statistics is not to perform numerous calculations using the formulas, but to interpret data to gain an understanding. The calculations can be done using a calculator or a computer. The understanding must come from you. If you can thoroughly grasp the basics of statistics, you can be more confident in the decisions you make in life. Statistical inference uses probability to determine how confident we can be that our conclusions are correct.
Probability
Probability is a mathematical tool used to study randomness. It deals with the chance (the likelihood) of an event occurring. For example, if you toss a fair coin four times, the outcomes may not be two heads and two tails. However, if you toss the same coin 4,000 times, the outcomes will be close to half heads and half tails. The expected theoretical probability of heads in any one toss is or 0.5. Even though the outcomes of a few repetitions are uncertain, there is a regular pattern of outcomes when there are many repetitions. After reading about the English statistician Karl Pearson who tossed a coin 24,000 times with a result of 12,012 heads, one of the authors tossed a coin 2,000 times. The results were 996 heads. The fraction is equal to 0.498 which is very close to 0.5, the expected probability.
The theory of probability began with the study of games of chance such as poker. Predictions take the form of probabilities. To predict the likelihood of an earthquake, of rain, or whether you will get an A in this course, we use probabilities. Doctors use probability to determine the chance of a medical test incorrectly diagnosing the presence of a disease. A stockbroker uses probability to determine the rate of return on a client’s investments. You might use probability to decide to buy a lottery ticket or not. In your study of statistics, you will utilize the power of mathematics and probability to analyze and interpret your data.
Key Terms
In statistics, we generally want to study a population. You can think of a population as a collection of persons, things, or objects under study. To study the population, we select a sample. The idea of sampling is to select a portion (or subset) of the larger population and study that portion (the sample) to gain information about the population. Parameters are numbers that describe a characteristic of the population.
Because it may take a lot of time and resources (time, money, manpower, etc.) to examine an entire population, we often study only a subset of that population. Taking a sample is a very practical technique to accomplish this. If you wished to compute the overall grade point average at your school, it would make sense to select a sample of students who attend the school. The data collected from the sample would be the students’ grade point averages. In presidential elections, opinion poll samples of 1,000–2,000 people are taken. The opinion poll is supposed to represent the views of the people in the entire country. Manufacturers of canned carbonated drinks take samples to determine if a 16 ounce can contains 16 ounces of carbonated drink.
From the information we collect in our sample, we can calculate a statistic. A statistic is a number that represents a property of the sample. For example, if we consider one math class to be a sample of the population of all math classes, then the average number of points earned by students in that one math class at the end of the term is an example of a statistic. The statistic is an estimate of a population parameter. A parameter is a numerical characteristic of the whole population that can be estimated by a statistic. Since we considered all math classes to be the population, then the average number of points earned per student over all the math classes is an example of a parameter.
One of the main concerns in the field of statistics is how accurately a statistic estimates a parameter. The accuracy really depends on how well the sample represents the population. The sample must contain the characteristics of the population in order to be a representative sample. We are interested in both the sample statistic and the population parameter in inferential statistics. In a later chapter, we will use the sample statistic to test the validity of the established population parameter.
Individuals are the units we are collecting information about. This could be a person, animal, item, thing, or place. A variable, usually notated by capital letters such as X or Y, is a specific characteristic or measurement that can be determined for each individual. The values of a variable are the possible observations of the variable. If there are multiple variables collected on an individual the entire row may be called a case or observational unit.
Data is the collection of the actual values of the variables of interest. They may be numbers or they may be words. We’ll dive into data in the next section.
Example
Determine what the key terms refer to in the following study. We want to know the average (mean) amount of money first year college students spend at ABC College on school supplies that do not include books. We randomly surveyed 100 first year students at the college. Three of those students spent $150, $200, and $225, respectively.
Your turn!
Determine what the key terms refer to in the following study. We want to know the average (mean) amount of money spent on school uniforms each year by families with children at Knoll Academy. We randomly survey 100 families with children in the school. Three of the families spent $65, $75, and $95, respectively.
Image References
Figure 1.1: Markus Winkler (2020). “Corona death and new cases stats.” Public domain. Retrieved from: https://unsplash.com/photos/tUEnyweZjEU
Process of collecting, organizing, and analyzing data
Methods of organizing, summarizing, and presenting data
The facet of statistics dealing with using a sample to generalize (or infer) about the population
The study of randomness; a number between zero and one, inclusive, that gives the likelihood that a specific event will occur
The whole group of individuals who can be studied to answer a research question
A number that is used to represent a population characteristic and can only be calculated as the result of a census
A subset of the population studied
A number calculated from a sample
The person, animal, item, thing, place, etc. that we collect information about
A characteristic of interest for each person or object in a population
Possible observations of the variable
Actual values (numbers or words) that are collected from the variables of interest