2.2 Empirically
So far, we learned how to measure risk when we know the probability distribution of a random variable. Usually, however, we only have historical data on a variable of interest to infer the probability distribution the data are drawn from or at least estimate some statistics of interest such as the sample mean and sample standard deviation. Notice, however, that our sample estimators of the true, population, parameters are based on probability theory where we assume that each data observation is equally probable.
Example 6: Sample Mean and Standard Deviation
Suppose the true distribution of the annual BioTech-X stock return is shown in Figure 3. We have drawn a random sample of 10 observations from this underlying distribution shown in the second column of Table 3. Suppose these are the realized annual rates of return on the stock over the past ten years. What are the sample mean and standard deviation and how do we interpret these two statistics?
Figure 5 plots the frequency distribution of the observed returns in the second column of Table 3 whereby the height shows the number of observations that fall in the interval on the horizontal axis.
Notice how different the histogram in Figure 5 looks compared with the probability density function on Figure 3. We cannot expect that the frequency distribution that we plot based on a limited sample size will look like the underlying theoretical distribution as there is uncertainty or sampling error in any individual sample. You will need thousands of observations to come up with a close approximation of the theoretical distribution of a continuous random variable. However, we rarely know the true, underlying distribution of a variable of interest and historical data contains important information that we can use in making decisions.
Two important numbers to summarize the historical data are the sample average and sample standard deviation. We can find the sample average (also called the mean return), [latex]\mu_{R}[/latex], by adding up all the returns in Table 3 and dividing by the number of years in our sample:
Our sample estimate of 6.7% is quite different from the true mean of 5%. That would be generally true if our sample size is small. However, if we draw a very large sample the Law of Large Numbers ensures that the sample mean will be very close to the population mean. The sample mean is an estimator of the expected value of a random variable. It is our best estimate of the return on an investment in BioTech-X stock in a given year based on the stock performance over the past ten years.
Generally, if [latex]n[/latex] denotes the number of observations and [latex]X_{i}[/latex] the [latex]i[/latex]-th observation of a random variable [latex]X[/latex], the sample mean, denoted with [latex]\mu[/latex] , is the arithmetic average of the observed values of [latex]X[/latex]:
(4) |
The sample mean is an estimator of the true underlying mean of the random variable. As every estimator, the sample mean is a rule that enables us to compute a desirable but unobserved property of a random variable, such as the expected value, using a sample.
The sample variance, [latex]\hat{\sigma}^{2}[/latex], can be computed by first subtracting the average from each value of a random variable and then squaring that difference to find the squared deviation from the mean for each value. Finally, we add up the squared deviations and divide by the number of observations minus one:
(5) |
The “hat” is commonly used to indicate that a sample is used to compute the statistic. One is subtracted from the number of observations to ensure that our measure of variance is unbiased. Unbiasedness is a desirable statistical property for an estimator as it ensures that our estimates of the unobserved parameter will be correct on average.
The sample standard deviation, [latex]\hat{\sigma}[/latex], is just the square root of the sample variance
(6) |
The sample variance for the returns in Table 3 can be calculated as follows:
and the sample standard deviation is
The use of historical data to make forecasts and inform our decisions about the future is based on the belief that history tends to repeat itself. For example, we investigate historical returns on stocks to form expectations about the return we can expect on our investment in the future. We evaluate historical performance of home fire insurance to assess the risk that an insured house will burn down in the coming year. However, reliable historical data are available for a limited number of years and our estimates will be subject to sampling errors. Further, in using historical data for forecasting you need to ask yourself whether the historical period covered in your sample adequately reflects economic conditions you expect to prevail during the forecast period. We need to be careful extrapolating the past into the future.
A graph or table that displays how often various outcomes occurred.
A theorem that states that the sample mean tends towards the true mean with a sufficiently large number of independent samples. For example, if we flip a fair coin only a hundred times, we may happen to land on “heads” in most trials. However, if we flip the coin several hundred times, we know with high confidence about half will be heads.