Skip to Main Content

Introduction to Data Analysis and R: Measures of Dispersion

Measures of Dispersion

Measures of dispersion, also called measures of variability, "describe the extent to which the values of a variable are different." (Wallace & Van Fleet, 2012, p. 293) The most common measures of dispersion are range, variance, standard deviation, and the coefficient of variation.

Range 

Range is a very simple statistic, as it is merely the difference between the largest and smallest values. It's not very useful from a statistical standpoint; because it relies only on the outermost values of a dataset, two datasets with the same range could still have drastic differences in their overall distribution. 

Variance

Variance can refer to two different things: population variance (sometimes called parametric variance) and sample variance. Population variance, typically represented by a lower-case sigma squared(), can only be truly calculated if you have observations for every member of your population, which is almost never the case. It's used more often by statisticians and using it should never be your first instinct. Sample variance, typically represented by , is what you should almost always use instead. Its formula can be expressed as:

where x is each value of the variable,  is the mean for the variable, and n is the total number of observations. Like the equation for the mean, it looks more complicated than it is. Because it's a squared measurement, it's rarely reported and is instead more useful for statisticians. Standard deviation is calculated from the variance, and is generally a more understandable and useful measurement.

Standard Deviation

Like variance, standard deviation can theoretically be calculated for both a population and a sample. The population standard deviation () is rarely used; sample standard deviation (s) should be your first choice. Easy to calculate, it is simply the square root of the variance. It tends to be more understandable than variance because it expresses the dispersion of a dataset in that dataset's original units. It can be written as:

where is the variance (see above).

Coefficient of Variation

If you need to compare the variation for different measurement variables, the coefficient of variation is another useful measure of dispersion. It expresses the amount of variation as a proportion of the total. It is calculated by dividing the standard deviation by the mean:

The video below helps demonstrate the relationships between variance, standard deviation, and the coefficient of variation.


Supplemental Readings

R - Dispersion Measures

(Don't hesitate to use the player controls to pause, rewind, slow down the video as needed! A thorough understanding of the concepts is vastly preferable to just speeding through.)

Dispersion Measures Practice

You will be using the "weatherData" dataset (linked below) to answer the questions in this quiz.