Geometric Mean

The geometric mean of n nonnegative numerical values is the nth root of the product of the n values. The denominator of the Pearson correlation coefficient is the geometric mean of the two variances. It is useful for averaging "product moment" values.

Suppose you have two positive data points x and y, then the geometric mean of these numbers is a number (g) such that x/g = y/b, and the arithmetic mean (a) is a number such that x - a = a - y.

The geometric means are used extensively by the U.S. Bureau of Labor Statistics ["Geomeans" as they call them] in the computation of the U.S. Consumer Price Index. The geomeans are also used in price indexes. The statistical use of geometric mean is for index numbers such as the Fisher's ideal index.

If some values are very large in magnitude and others are small, then the geometric mean is a better average. In a Geometric series, the most meaningful average is the geometric mean. The arithmetic mean is very biased toward the larger numbers in the series.

As an example, suppose sales of a certain item increase to 110% in the first year and to 150% of that in the second year. For simplicity, assume you sold 100 items initially. Then the number sold in the first year is 110 and the number sold in the second is 150% x 110 = 165. The arithmetic average of 110% and 150% is 130% so that we would incorrectly estimate that the number sold in the first year is 130 and the number in the second year is 169. The geometric mean of 110% and 150% is r = (1.65)1/2 so that we would correctly estimate that we would sell 100 (r)2 = 165 items in the second year.

As another similar example, if a mutual fund goes up by 50% one year and down by 50% the next year, and you hold a unit throughout both years, you have lost money at the end. For every dollar you started with, you have now got 75c. Thus, the performance is different from gaining (50%-50%)/2 (= 0%). It is the same as changing by a multiplicative factor of (1.5 x 0.5)½ = 0.866 each year. In a multiplicative process, the one value that can be substituted for each of a set of values to give the same "overall effect" is the geometric mean, not the arithmetic mean. As money tends to multiplicatively ("it takes money to make money"), financial data are often better combined in this way.

As a survey analysis example, give a sample of people a list of, say 10, crimes ranging in seriousness:

Theft... Assault ... Arson .. Rape ... Murder

Ask each respondent to give any numerical value they feel to any crime in the list (e.g. someone might decide to call arson 100). Then ask them to rate each crime in the list on a ratio scale. If a respondent thought rape was five times as bad as arson, then a value of 500 would be assigned, theft a quarter as bad, 25. Suppose we now wanted the "average" rating across respondents given to each crime. Since respondents are using their own base value, the arithmetic mean would be useless: people who used large numbers as their base value would "swamp" those who had chosen small numbers. However, the geometric mean -- the nth root of the product of ratings for each crime of the n respondents -- gives equal weighting to all responses. I've used this in a class exercise and it works nicely.

It is often good to log-transform such data before regression, ANOVA, etc. These statistical techniques give inferences about the arithmetic mean (which is intimately connected with the least-squares error measure); however, the arithmetic mean of log-transformed data is the log of the geometric mean of the data. So, for instance, a t test on log-transformed data is really a test for location of the geometric mean.

Back to Statistical Forecasting Home Page

Copyright © 2006 Statistical Forecasting. All Rights Reserved