Glossary
- Alternative hypothesis
-
A working hypothesis that is contradictory to the null hypothesis
- Anecdotal evidence
-
Evidence that is based on personal testimony and collected informally
- Association
-
A relationship between variables
- Bernoulli trial
-
An experiment with the following characteristics:
- There are only two possible outcomes called “success” and “failure” for each trial
- The probability (p) of a success is the same for any trial (so the probability q = 1 − p of a failure is the same for any trial) - Bimodal distribution
-
A distribution that has 2 modes
- Binomial distribution
-
A random variable that counts the number of successes in a fixed number (n) of independent Bernoulli trials each with probability of a success (p)
- Bivariate data
-
Data consisting of two variables, often in search of an association
- Blinding
-
Not telling participants which treatment they are receiving
- Block design study
-
Grouping individuals based on a variable into "blocks" and then randomizing cases within each block to the treatment groups
- Case-control study
-
A study that compares a group that has a certain characteristic to a group that does not, often a retrospective study for rare conditions
- Center
-
The central tendency or most typical value of a dataset
- Central limit theorem (CLT)
-
States that if there is a population with mean μ and standard deviation σ and you take sufficiently large random samples from the population, then the distribution of the sample means will be approximately normally distributed
- Class midpoint
-
Found by adding the lower limit and upper limit, then dividing by 2
- Class width
-
The difference in consecutive lower class limits
- Cluster sampling
-
A method of sampling where the population has already sorted itself into groups (clusters), randomly selecting a cluster, and using every individual in the chosen cluster as the sample
- Coefficient of determination
-
A numerical measure of the percentage or proportion of variation in the dependent variable (y) that can be explained by the independent variable (x)
- Cohort study
-
Longitudinal study where a group of people (typically having a common factor) are studied and data is collected for a purpose
- Complement
-
The complement of an event consists of all outcomes in a sample space that are NOT in the event
- Completely randomized study
-
Dividing participants into treatment groups randomly
- Conditional probability
-
The likelihood that an event will occur given knowledge of another event
- Confidence interval
-
An interval built around a point estimate for an unknown population parameter
- Confounding (lurking, conditional) variable
-
A variable that has an effect on a study even though it is neither an explanatory variable nor a response variable
- Contingency (two-way) table
-
A table in a matrix format that displays the frequency distribution of different variables
- Continuity correction
-
When statisticians add or subtract .5 to values to improve approximation
- Continuous random variable
-
A random variable (RV) whose outcomes are measured as an uncountable, infinite, number of values
- Control group
-
A group in a randomized experiment that receives no (or an inactive) treatment but is otherwise managed exactly as the other groups
- Controlled (designed) experiment
-
Type of experiment where variables are manipulated; data is collected in a controlled setting
- Convenience sampling
-
Selecting individuals that are easily accessible and may result in biased data
- Correlation coefficient
-
A numerical measure that provides a measure of strength and direction of the linear association between the independent variable x and the dependent variable y
- Critical value
-
Point that lies on a distribution that acts as a cut-off value for accepting or rejecting the null hypothesis
- Cross-sectional study
-
Data collection on a population at one point in time (often prospective)
- Cumulative distribution function (CDF)
-
A function that gives the probability that a random variable takes a value less than or equal to x
- Cumulative relative frequency
-
The sum of the relative frequencies for all values that are less than or equal to the given value
- Data
-
Actual values (numbers or words) that are collected from the variables of interest
- Data analysis process
-
Process of collecting, organizing, and analyzing data
- Degrees of freedom
-
The number of objects in a sample that are free to vary
- Descriptive statistics
-
Methods of organizing, summarizing, and presenting data
- Designed (controlled) experiment
-
Data collection where variables are manipulated in a controlled setting
- Difference in means
-
The difference in the means of two independent populations
- Discrete random variable
-
A random variable that produces discrete data
- Distribution
-
The possible values a variable can take on, and how often it does so
- Double-blind study
-
The act of blinding both the subjects of an experiment and the researchers who work with the subjects
- Empirical rule
-
Roughly 68% of values are within 1 standard deviation of the mean, roughly 95% of values are within 2 standard deviations of the mean, and 99.7% of values are within 3 standard deviations of the mean
- Event
-
A single outcome, or subset of outcomes, of an experiment that you are interested in
- Expected value
-
Mean of a random variable
- Experimental unit
-
Any individual or object to be measured
- Explanatory variable
-
The independent variable in an experiment; the value controlled by researchers
- Extrapolation
-
The process of predicting outside of the observed x values
- Factors
-
Variables in an experiment
- Frequency
-
The number of times a value of the data occurs
- Graphical descriptive methods
-
Organizing, summarizing, or presenting data visually in graphs, figures, or charts
- Hypothesis testing
-
A decision making procedure for determining whether sample evidence supports a hypothesis
- Independent
-
The occurrence of one event has no effect on the probability of the occurrence of another event
- Individuals
-
The person, animal, item, thing, place, etc. that we collect information about
- Inferential statistics
-
The facet of statistics dealing with using a sample to generalize (or infer) about the population
- Influential points
-
Observed data points that do not follow the trend of the rest of the data and have a large influence on the calculation of the regression line
- Intersection (AND)
-
The shared or common outcomes of two events
- Interval scale level
-
Quantitative data where the difference or gap between values is meaningful
- Law of large numbers
-
As the number of trials in a probability experiment increases, the relative frequency of an event approaches the theoretical probability
- Levels
-
Certain values of variables in an experiment
- Linear regression
-
A mathematical model of a linear association
- Longitudinal study
-
Collecting data multiple times on the same individuals, usually at fixed increments, over a period of time
- Lower class limit
-
The lower end of a bin or class in a frequency table or histogram
- Margin of error (MoE)
-
How much a point estimate can be expected to differ from the true population value; made up of the standard error multiplied by the critical value
- Matched pairs design
-
Very similar individuals (or even the same individual) receive two different two treatments (or treatment vs. control) then the difference in results are compared
- Mean (average)
-
A number that measures the central tendency of the data
- Measures of location
-
A measure of an observation's standing relative to the rest of the dataset
- Median
-
The middle number in a sorted list
- Modality
-
How many peaks or clusters there appear to be in a quantitative distribution
- Mode
-
The most frequently occurring value
- Mutually exclusive (disjoint)
-
Two events that cannot happen at the same; they share no common outcomes
- Nominal scale level
-
Categorical data where the the categories have no natural, intuitive, or obvious order
- Normal (Gaussian) distribution
-
A commonly used symmetric, unimodal, bell-shaped, continuous probability distribution
- Null hypothesis
-
The claim that is assumed to be true and is tested in a hypothesis test
- Numerical descriptive methods
-
Numbers that summarize some aspect of a dataset, often calculated
- Observational study
-
Data collection where no variables are manipulated
- Ordinal scale level
-
Categorical data where the the categories have a natural or intuitive order
- Outcome
-
A particular result of an experiment
- Outlier
-
An observation that stands out from the rest of the data significantly
- P-value
-
The probability that an event will occur, assuming the null hypothesis is true
- Parameter
-
A number that is used to represent a population characteristic and can only be calculated as the result of a census
- Placebo
-
An inactive treatment that has no real effect on the explanatory variable
- Point estimate
-
The value that is calculated from a sample used to estimate an unknown population parameter
- Point estimation
-
Using sample data to calculate a single statistic as an estimate of an unknown population parameter
- Pooled proportion
-
Estimate of the common value of p1 and p2
- Population
-
The whole group of individuals who can be studied to answer a research question
- Population mean
-
The arithmetic mean, or average of a population
- Population mean difference
-
The mean of the differences in a matched pairs design
- Population proportion
-
The number of individuals that have a characteristic we are interested in divided by the total number in the population
- Power
-
The probability of failing to reject a true hypothesis
- Probability
-
The study of randomness; a number between zero and one, inclusive, that gives the likelihood that a specific event will occur
- Probability density function (PDF)
-
A function that defines a continuous random variable, and the likelihood of an outcome
- Probability experiment
-
A random experiment where the result is not predetermined
- Probability mass function (PMF)
-
A function that gives the probability that a discrete random variable is exactly equal to some value (x)
- Probability model
-
A mathematical representation of a random process that lists all possible outcomes and assigns probabilities to each of them
- Prospective study
-
Collecting information as events unfold
- Qualitative (categorical) data
-
Data that describes qualities, or puts individuals into categories
- Quantile
-
Points in a distribution that relate to the rank order of values in that distribution
- Quantitative (numerical) data
-
Numerical data with a mathematical context
- Quantitative continuous data
-
Data produced by a variable that takes on an uncountable, infinite, number of values
- Quantitative discrete data
-
Data produced by a variable that takes on a countable number of values
- Random variable
-
A representation of a probability model
- Ratio scale level
-
Quantitative data where the difference or gap between values is meaningful AND has a true 0 value
- Relative frequency
-
The percentage, proportion, or ratio of the frequency of a value of the data to the total number of outcomes
- Repeated measures
-
When an individual goes through a single treatment more than once
- Residual (error)
-
A residual measures the vertical distance between an observation and the predicted point on a regression line
- Response variable
-
The dependent variable in an experiment; the value that is measured for change at the end of an experiment
- Retrospective study
-
Collecting or using data after events have taken place
- Robust
-
Not affected by violations of assumptions such as outliers
- Sample
-
A subset of the population studied
- Sample mean
-
The arithmetic mean, or average of a dataset
- Sample proportion
-
The number of individuals that have a characteristic we are interested in divided by the total number in the sample, often found from categorical data
- Sample space
-
The set of all possible outcomes of an experiment
- Sampling bias
-
Bias resulting from all members of the population not being equally likely to be selected
- Sampling distribution
-
The probability distribution of a statistic at a given sample size
- Sampling variability
-
The idea that samples from the same population can yield different results
- Shape
-
What a dataset looks like visually
- Significance level
-
Probability that a true null hypothesis will be rejected, also known as Type I error and denoted by α
- Simple random sample (SRS)
-
Each member of the population is equally likely to be chosen for a sample of a given sample size and each sample is equally likely to be chosen
- Slope
-
Tells us how the dependent variable (y) changes for every one unit increase in the independent (x) variable, on average
- Spread (variation, variability)
-
The level of variability or dispersion of a dataset; also commonly known as variation/variability
- Standard deviation
-
The average distance (deviation) of each observation from the mean
- Standard error
-
The standard deviation of a sampling distribution
- Standard normal distribution (SND)
-
A normal random variable with a mean of 0 and standard deviation of 1 which z-scores follow; denoted N(0, 1)
- Statistic
-
A number calculated from a sample
- Statistical inference
-
Using information from a sample to answer a question, or generalize, about a population
- Statistically significant
-
Finding sufficient evidence that the effect we see is not just due to variability, often from rejecting the null hypothesis
- Stratified sampling
-
Dividing a population into groups (strata), and then using simple random sampling to identify a proportionate number of individuals from each
- Systematic (probability) sampling
-
Using some sort of pattern or probability based method for choosing your sample
- T-distribution
-
A family of t–distributions, dependent on degrees of freedom, similar to the normal distribution but with more variability built in
- Test statistic
-
A measure of how far what you observed is from the hypothesized (or claimed) value
- Treatment combinations (interactions)
-
Combinations of levels of variables in an experiment
- Treatments
-
Different values or components of the explanatory variable applied in an experiment
- Tree diagram
-
Diagram that helps calculate and organize the number of possible outcomes of an event or problem
- Type I error
-
The decision is to reject the null hypothesis when, in fact, the null hypothesis is true
- Type II error
-
Erroneously rejecting a true null hypothesis, or erroneously failing to reject a false null hypothesis
- Uniform distribution
-
A probability distribution in which all outcomes are equally likely
- Union (OR)
-
The set of all outcomes in two (or more) events
- Upper class limit
-
The upper end of a bin or class in a frequency table or histogram
- Values
-
Possible observations of the variable
- Variable
-
A characteristic of interest for each person or object in a population
- Variance
-
The square of the standard deviation; a computational step along the way to calculating the standard deviation
- Variation (variability, spread)
-
The level of variability or dispersion of a dataset; also commonly known as 'spread'
- Venn diagram
-
A diagram that shows all possible relations between a collection of different sets
- y-intercept
-
The value of y when x is 0 in your regression equation
- z-score
-
A measure of location that tells us how many standard deviations a value is above or below the mean