BoxPlot
A BoxPlot is a graphical display that has many characteristics. It includes
the presence of possible outliers. It illustrates the range of data.
It shows a measure of dispersion such as the upper quartile, lower quartile
and interquartile range (IQR) of the data set as well as the median as
a measure of central location, which is useful for comparing sets of
data. It also gives an indication of the symmetry or skewness of the
distribution. The main reason for the popularity of boxplots is that
they offer much of information in a compact way.
A boxplot is a way of summarizing a set of data measured on an interval
scale. It is often used in exploratory data analysis. It is a type of
graph which is used to show the shape of the distribution, its central
value, and variability. The picture produced consists of the most extreme
values in the data set (maximum and minimum values), the lower
and upper quartiles, and the median.
Drawing a BoxPlot
There is a commonly accepted method of drawing the whiskers on a boxplot.
However, there is a plethora of methods for drawing the box. Some of
these methods were developed because they extend nicely to percentiles
other than 25% and 75%, others were chosen because of theoretical considerations
and some were developed for simplicity.
In 1977, John Tukey published an efficient method for displaying a five-number
data summary.
A BoxPlot summarizes the following statistical measures:
- median
- minimum and maximum data values
- upper and lower quartiles
The following is an example of a boxplot.

A BoxPlot may be drawn either vertically as in the above diagram, or
horizontally.
Understanding a Boxplot
The boxplot can be interpreted as follows:
-
The box itself contains the middle 50% of the data. The upper edge
(hinge) of the box indicates the 75th percentile of the data set,
and the lower hinge indicates the 25th percentile. The range of the
middle two quartiles is known as the inter-quartile range.
-
The line in the box indicates the median value of the data.
-
If the median line within the box is not equidistant from the hinges,
then the data is skewed.
-
The ends of the vertical lines or "whiskers" indicate the minimum
and maximum data values, unless outliers are present in which case
the whiskers extend to a maximum of 1.5 times the inter-quartile
range.
-
The points outside the ends of the whiskers are outliers or suspected
outliers.
What else can be interpreted from a BoxPlot
Beyond the basic information, boxplots sometimes are enhanced to convey
additional information:
-
The mean and its confidence interval can be shown using a diamond
shape in the box.
-
The expected range of the median can be shown using notches in the
box.
-
The width of the box can be varied in proportion to the log of the
sample size.
Advantages of Boxplots
Boxplots have the following strengths:
- Graphically display a variable's location and spread at a glance.
- Provide some indication of the data's symmetry and skewness.
- Unlike many other methods of data display, boxplots show outliers.
- By using a boxplot for each categorical variable side-by-side on
the same graph, one quickly can compare data sets.
One drawback of boxplots is that they tend to emphasize the tails of
a distribution, which are the least certain points in the data set. They
also hide many of the details of the distribution. Displaying a histogram in
conjunction with the boxplot helps in this regard, and both are important
tools for exploratory data analysis.
Back to Statistical
Forecasting Home Page |