Page:Sm all cc.pdf/27

 $$\bar X = \sum_{i=1}^N x_i/N = (x_1+x_2+...x_{N-1}+x_N)/N$$

or, in shortened notation, $$\bar X = \sum X_i/N $$. The mean is simply the sum of all the individual measurement values, divided by the number of measurements.

The standard deviation (σ) is a measure of the dispersion or scatter of data. Defined as the square root of the variance (σ2), it is appropriate only for normal distributions. The variance is defined as:

$$\sigma^2 = \sum (X_i-\bar X)^2/N $$

Thus the variance is the average squared deviation from the mean, i.e., the sum of squared deviations from the mean divided by the number of data points. Computer programs usually avoid handling each measurement twice (first to calculate the mean and later to calculate the variance) by using an alternative equation: $$\sigma^2 = N^{-1} \sum (X_i^2)-\bar X^2  $$

The standard deviation and variance are always positive. The units of standard deviation are the same as those of the x data. Often one needs to compare the scatter to the average value; two handy measures of this relationship are the fractional standard deviation (σ/$\overline{X}$) and percentage standard deviation (100σ/$\overline{X}$).

Normal Distribution Function
The normal distribution function, or ‘normal error function’, is shown in Figure 3. This probability distribution function of likely X values is expressed in terms of the ‘true mean’ M and standard deviation σ as:

$$f(x) = (1/\sigma (2 \pi)^{0.5})e^{-(x-M)^2/2\sigma^2}$$

For data drawn from a normal distribution, we can expect about 68.3% of the measurements to lie within one standard deviation of the mean, with half of the 68.3% above the mean and half below. Similarly, 95.4% of the measurements will lie within two standard deviations of the mean (i.e., within the interval $\overline{X}$-2σ < xi < $\overline{X}$+2σ), and 99.7% of the measurements will lie within three standard deviations of the mean. These percentages are the areas under portions of the normal distribution function, as shown in Figure 3. All statistics books explain how to find the area under any desired portion of the curve, i.e., how to find the expected proportion of the data that will have values between specified limits. Of course, for the finite number of measurements of an individual dataset, we will only approximately observe these percentages. Nevertheless, it is well worth memorizing the following two