SUMMARIZATION OF DATA SETS Measures of Central Tendency Several thousand radiological analyses have been completed for the NAEG program over the past several years. These data contain a great deal of information relative to NAEG objectives, but much of it may not be apparent without appropriUp to this point in time, we have used ate statistical treatment of the data. simple (arithmetic) means and standard errors to describe data sets such as the "average" 239pu soil or vegetation concentrations in various soil activity strata of safety-shot sites (Table 14, Gilbert et al., 1975). Im this section, we would like to briefly discuss some alternative ways of summarizing a data set that give additional information to the reader. A common characteristic of radionuclide data collected from environmental studies is the occurrence of one or more observations that are much larger in magnitude than the bulk of the data. This is illustrated in Figure 5 by comparing the symmetric normal distribution with the asymmetric lognormal distribution (Aitchison and Brown, 1969, page 9) for parameters wy = O and o* = 0.50. Our primary concern here is with how one should summarize the information contained in a data set from an underlying distribution of the asymmetric type. We will be particularly interested in choosing the most appropriate estimator of the "center" of the distribution. If the underlying distribution (from which the actual data values are drawn) is symmetric, then the mode, median, and arithmetic mean (AM) fall at the same place on the distribution. The mode is the value that occurs most frequently; the median is that value above which and below which half the values occur. For an asymmetric distribution, the mode, median, and AM do not coincide. For the lognormal distribution (Figure 5), the mean is always larger than the median which in turn is larger than the mode. The extent of the difference between the mean, median, and mode for the lognormal can be determined for any u and o2 by using the formulae in Figure 5. Another measure of the "center" of any distribution is the geometric mean (GM) computed as GM = or equivalently n I i=l GM = exp i X, n + 12 Co a log, X,) i=1 This mean gives less weight to the larger data values than does the arithmetic mean. In fact, the GM will always be less than the AM unless all data points are equal, in which case the AM and GM will be numerically equal. The GM is sometimes referred to as the "lognormal mean." However, the GM is an estimate of the median, not the mean, of the lognormal distribution. Actually, it is a biased estimate of the median for this distribution (an unbiased estimator is given by Zellner, 1971), but this may be of little practical importance since the true distribution is usually unknown and may not be lognormal. 252

Select target paragraph3