Central Limit Theorem
For practical purposes, the main idea of the central
limit theorem (CLT) is that the average of a sample of observations drawn
from some population with any shape-distribution is approximately distributed
as a normal distribution if certain conditions are met. In theoretical
statistics there are several versions of the central limit theorem depending
on how these conditions are specified. These are concerned with the types
of assumptions made about the distribution of the parent population (population
from which the sample is drawn) and the actual sampling procedure.
One of the simplest versions of the theorem says that if is a random
sample of size n (say, n> 30) from an infinite
population finite standard deviation , then the standardized sample mean
converges to a standard normal distribution or, equivalently, the sample
mean approaches a normal distribution with mean equal to the population
mean and standard deviation equal to standard deviation of the population
divided by square root of sample size n. In applications of the central
limit theorem to practical problems in statistical inference, however,
statisticians are more interested in how closely the approximate distribution
of the sample mean follows a normal distribution for finite sample sizes,
than the limiting distribution itself. Sufficiently close agreement with
a normal distribution allows statisticians to use normal theory for making
inferences about population parameters (such as the mean ) using the
sample mean, irrespective of the actual form of the parent population.
It is well known that whatever the parent population is, the standardized
variable will have a distribution with a mean 0 and standard deviation
1 under random sampling. Moreover, if the parent population is normal,
then is distributed exactly as a standard normal variable for any positive
integer n. The central limit theorem states the remarkable result that,
even when the parent population is non-normal, the standardized variable
is approximately normal if the sample size is large enough (say, > 30).
It is generally not possible to state conditions under which the approximation
given by the central limit theorem works and what sample sizes are needed
before the approximation becomes good enough. As a general guideline,
statisticians have used the prescription that if the parent distribution
is symmetric and relatively short-tailed, then the sample mean reaches
approximate normality for smaller samples than if the parent population
is skewed or long-tailed.
On e must study the behavior of the mean of samples of different sizes
drawn from a variety of parent populations. Examining sampling distributions
of sample means computed from samples of different sizes drawn from a
variety of distributions, allow us to gain some insight into the behavior
of the sample mean under those specific conditions as well as examine
the validity of the guidelines mentioned above for using the central
limit theorem in practice.
Under certain conditions, in large samples, the sampling distribution
of the sample mean can be approximated by a normal distribution. The
sample size needed for the approximation to be adequate depends strongly
on the shape of the parent distribution. Symmetry (or lack thereof) is
particularly important. For a symmetric parent distribution, even if
very different from the shape of a normal distribution, an adequate approximation
can be obtained with small samples (e.g., 10 or 12 for the uniform distribution).
For symmetric short-tailed parent distributions, the sample mean reaches
approximate normality for smaller samples than if the parent population
is skewed and long-tailed. In some extreme cases (e.g. binomial with
) samples sizes far exceeding the typical guidelines (say, 30) are needed
for an adequate approximation. For some distributions without first and
second moments (e.g., Cauchy), the central limit theorem does not hold.
Back to Statistical
Forecasting Home Page
|