Geometric Mean
The geometric mean of n nonnegative numerical values
is the nth root of the product of the n values. The denominator of the
Pearson correlation coefficient is the geometric mean of the two variances.
It is useful for averaging "product moment" values.
Suppose you have two positive data points x and y, then the geometric
mean of these numbers is a number (g) such that x/g = y/b, and the arithmetic
mean (a) is a number such that x - a = a - y.
The geometric means are used extensively by the U.S. Bureau of Labor
Statistics ["Geomeans" as they call them] in the computation of the U.S.
Consumer Price Index. The geomeans are also used in price indexes. The
statistical use of geometric mean is for index numbers such as the Fisher's
ideal index.
If some values are very large in magnitude and others are small, then
the geometric mean is a better average. In a Geometric series, the most
meaningful average is the geometric mean. The arithmetic mean is very
biased toward the larger numbers in the series.
As an example, suppose sales of a certain item increase to 110% in the
first year and to 150% of that in the second year. For simplicity, assume
you sold 100 items initially. Then the number sold in the first year
is 110 and the number sold in the second is 150% x 110 = 165. The arithmetic
average of 110% and 150% is 130% so that we would incorrectly estimate
that the number sold in the first year is 130 and the number in the second
year is 169. The geometric mean of 110% and 150% is r = (1.65)1/2 so
that we would correctly estimate that we would sell 100 (r)2 =
165 items in the second year.
As another similar example, if a mutual fund goes up by 50% one year
and down by 50% the next year, and you hold a unit throughout both years,
you have lost money at the end. For every dollar you started with, you
have now got 75c. Thus, the performance is different from gaining (50%-50%)/2
(= 0%). It is the same as changing by a multiplicative factor of (1.5
x 0.5)½ = 0.866 each year. In a multiplicative
process, the one value that can be substituted for each of a set of values
to give the same "overall effect" is the geometric mean, not the arithmetic
mean. As money tends to multiplicatively ("it takes money to make money"),
financial data are often better combined in this way.
As a survey analysis example, give a sample of people a list of, say
10, crimes ranging in seriousness:
Theft... Assault ... Arson .. Rape ... Murder
Ask each respondent to give any numerical value they feel to any crime
in the list (e.g. someone might decide to call arson 100). Then ask them
to rate each crime in the list on a ratio scale. If a respondent thought
rape was five times as bad as arson, then a value of 500 would be assigned,
theft a quarter as bad, 25. Suppose we now wanted the "average" rating
across respondents given to each crime. Since respondents are using their
own base value, the arithmetic mean would be useless: people who used
large numbers as their base value would "swamp" those who had chosen
small numbers. However, the geometric mean -- the nth root of the product
of ratings for each crime of the n respondents -- gives equal weighting
to all responses. I've used this in a class exercise and it works nicely.
It is often good to log-transform such data before regression, ANOVA,
etc. These statistical techniques give inferences about the arithmetic
mean (which is intimately connected with the least-squares error measure);
however, the arithmetic mean of log-transformed data is the log of the
geometric mean of the data. So, for instance, a t test on log-transformed
data is really a test for location of the geometric mean.
Back to Statistical
Forecasting Home Page
|