SUMMARIZATION OF DATA SETS

Measures of Central Tendency

Several thousand radiological analyses have been completed for the NAEG program
over the past several years. These data contain a great deal of information

relative to NAEG objectives, but much of it may not be apparent without appropriUp to this point in time, we have used
ate statistical treatment of the data.

simple (arithmetic) means and standard errors to describe data sets such as

the "average" 239pu soil or vegetation concentrations in various soil activity

strata of safety-shot sites (Table 14, Gilbert et al., 1975).
Im this section,
we would like to briefly discuss some alternative ways of summarizing a data
set that give additional information to the reader. A common characteristic
of radionuclide data collected from environmental studies is the occurrence of
one or more observations that are much larger in magnitude than the bulk of
the data.
This is illustrated in Figure 5 by comparing the symmetric normal
distribution with the asymmetric lognormal distribution (Aitchison and Brown,

1969, page 9) for parameters wy = O and o* = 0.50.

Our primary concern here is with how one should summarize the information
contained in a data set from an underlying distribution of the asymmetric
type.
We will be particularly interested in choosing the most appropriate

estimator of the "center" of the distribution.

If the underlying distribution

(from which the actual data values are drawn) is symmetric, then the mode,
median, and arithmetic mean (AM) fall at the same place on the distribution.
The mode is the value that occurs most frequently; the median is that value
above which and below which half the values occur.
For an asymmetric distribution, the mode, median, and AM do not coincide.
For the lognormal distribution
(Figure 5), the mean is always larger than the median which in turn is larger
than the mode.
The extent of the difference between the mean, median, and
mode for the lognormal can be determined for any u and o2 by using the formulae

in Figure 5.

Another measure of the "center" of any distribution is the geometric mean (GM)
computed as

GM =
or equivalently

n

I

i=l

GM = exp

i
X,

n

+

12

Co a
log, X,)
i=1

This mean gives less weight to the larger data values than does the arithmetic
mean.
In fact, the GM will always be less than the AM unless all data points

are equal, in which case the AM and GM will be numerically equal. The GM is
sometimes referred to as the "lognormal mean." However, the GM is an estimate
of the median, not the mean, of the lognormal distribution.
Actually, it is a
biased estimate of the median for this distribution (an unbiased estimator is
given by Zellner, 1971), but this may be of little practical importance since
the true distribution is usually unknown and may not be lognormal.

252

Select target paragraph3