38 Uncertainty
Estimating uncertainty is at the core of good engineering measurement practice. It requires both mathematics and good judgement to provide good estimates and we will return to this section throughout the term.
Error and Uncertainty
If you know the true value, then the difference between measured and true values is an error, but you almost never measure when you already know the truth. If you have only a measurement, then your uncertainty is the range around the measured value in which you expect the truth will lie, with 95% probability. (video 11:25)
“Lies, Damned Lies, and Statistics”
That quotation is often attributed to Mark Twain and we need to be aware of this general distrust to avoid being accused of using “statistics as a drunken man uses lampposts – for support rather than illumination” Make sure your use of statistics doesn’t go beyond making the best estimates you can. It’s very easy to introduce your own bias while misusing statistics.
We will stick with some simple descriptive statistics (mean and standard deviation) and make the assumption that all of our random variables are normally distributed, the simplest, most easily defended assumption.
Mean and Standard Deviation
You have encountered Mean (Average) and Standard Deviation (Variability about the Mean) in a number of settings and they are easy to calculate for sets of data using built in functions in just about any spreadsheet (AVERAGE() and STDEV() functions in Excel) or high level computer language (numpy.mean() and numpy.std() in Python).
The best way to get a real sense of them is to explore the resulting values for different data sets as we will do in the active learning sessions and you will in analyzing your measurement results. Knowing the mechanics can help understand what happens when you call the functions and will be essential if you want to do the calculations on the fly with tiny memory in your Arduino.
is the mean, calculated just the way you are used to. is the square root of the variance, the mean of the square of the deviations of each element in the population from the mean.
The bigger the standard deviation, the wider the distribution of values in the population. When we have only a subset or sample of all the possible values using provides a better estimate of the overall variance or standard deviation. However, it is quite common to simply use and report the result as the Root Mean Square (RMS) of the sample set.
The video (6:15) shows a derivation to clear up how to get from the easily understood deviation from the mean to the easily calculated formula based on the sum of the squares.
Gaussian Normal Probability Distributions
Gaussian Normal Distributions are the ubiquitous “Bell” probability curve. You can understand the basics and apply them without digging too deeply into the math that leads to the shape. A normal distribution always results whenever many random events contribute to an outcome and is the most reasonable (least biased) assumption for probability if we only know the mean and standard deviation. Python provides functions that make it easy to explore the characteristics of Gaussian distributions and generate samples of random data resulting from the distribution. (video 18:01, Python Learning Sequence 3.2.0)
Uncertainty for a Single Uncalibrated Sensor
You need to refer to the manufacturers data sheet, or take many independent measurements from different samples of a sensor, in order to estimate the overall uncertainty. Looking at only one sensor will not let you identify the bias. (video 6:58)
We can obtain different estimates of our uncertainty at different stages in the measurement process. When still at the design stage for a measurement system our uncertainty will be highest. After we have purchased and calibrated our sensors, we will have a higher state of knowledge about the system and thus a lower level of uncertainty in our measurements. (video 16:25, Lecture Slides PDF “Defining Orders of Uncertainty”)
Always remember that our uncertainty is an estimate. Our information is often not good enough for a rigorous statistical measure and we need to use engineering judgement to make an approximate estimate from the information we have available. This example from our temperature measurements helps explain that estimating process. (video 3:14)
An error in the mean tells us about the bias of an individual device and we can learn more by testing multiple devices. The noise in our measurements translates to a standard deviation, and we know that the random uncertainty of a single measurement will be two standard deviations. (video 7:39)
Combined Uncertainties
You can get a better estimate of a measured value by taking the average of multiple independent measurements. For truly independent measurements, each one would have to be made with a different sensor, however, to be independent of electrical noise, they just have to be taken far enough apart in time for the noise to be different. For radio frequency (RF) noise, this is usually faster than our Arduino UNOs can sample.
The uncertainty in a single measurement of x can come from multiple independent sources and those multiple sources should be combined in quadrature, the square root of the sum of the squares:
where , etc., are the different sources of uncertainty in the measured value of . A similar approach can be used if you estimate the true value of based on the mean of multiple independent measurements of . The uncertainty in the mean is then inversely proportional to the square root of the number of measurements
As a result, taking 4 measurements will cut your uncertainty in half, 100 measurements by a factor of 10 and so on, as long as the measured value is steady. Unless things are changing very quickly, you should be able to get similar improvements in noise uncertainty by smoothing, without introducing too much lag.
Uncertainty in Derived Quantities
If the result of a calculation is a function of multiple measurements then we will still combine the uncertainties in quadrature. If
and the uncertainty in the calculated value due to the uncertainty in is
where the approximation arises from taking only the first term in a Taylor Series expansion around , reasonably accurate as long as is small. The overall uncertainty in the derived quantity will be the combined contributions from each of the measurements
or, using the same approximation as above,
This equation with the partial derivatives is widely presented in textbooks focussing on pencil and paper analysis, however I think there is more understanding to be had from the previous two equations, and they are easily calculated numerically for any combination of inputs. You can follow this fully analytical approach, go partially numerical by calculating derivatives numerically, or do a full numerical, statistical simulation with a Monte Carlo technique. (video 16:55)
Uncertainty in a Derived Density
This sequence sums up how to get density (hard to measure) from a combination of 3 other easier measurements and an equation of state. This is basic thermodynamics and we’ll use it as an example to better understand uncertainty in combined results. We could get all three of these measures from a single BME 280 sensor. (You could make a pretty good guess with a BMP 280 sensor by assuming a relative humidity of around 50%.) Density of air is a fairly complicated function of temperature, atmospheric pressure, and humidity, even if we only use the ideal gas law. We can’t measure density directly, but we can make a good estimate from those other measures. (video 6:11)
Temperature, Pressure and Relative Humidity come together in a complicated function to calculate density as shown above. The uncertainty in that calculated density depends on the uncertainty in the 3 contributing measured values in a way that would be difficult to calculate analytically, although you are welcome to try.
If we get density from temperature, pressure, and humidity, all with associated uncertainties, what is the uncertainty of the derived value for density? Is that close enough to meet our objectives? Follow along the concepts with some fairly involved Python functions to do the calculations. (video 21:38)
Use the Python Notebook from the video to follow along and adapt to your own work.