RMI Nuclear Justice Documents

- Public documents
- Sign in
We are faced then with deciding which estimator to use for asymmetric distributions.

Our choice should depend on the objective of the study and the use to

be made of the estimator. Does one want to estimate where most of the data in
a data set lie, or is it important to give extra weight to the extreme values
for the purpose at hand?

In Figure 6, the AM clearly overestimates where the

bulk of the data lies. One could argue, however, that when working with a
potentially harmful substance such as 233py in the environment, it may be

preferable to be conservative in the sense that we tend to overestimate rather
than underestimate average soil concentrations.
Stem-and-Leaf Displays

Probably the best approach when working with asymmetric distributions is to
Even this,
compute more than one estimate of the "average" of the data set.
variability
or
scatter
the
about
information
much
convey
not
will
however,

One method for obtaining such information is to plot the
present in the data.
A preferred method, however,
as was done in Figure 6.
form
data in histogram

is called a "stem-and-leaf" display (Tukey, 1972).

This gives all the informa-

tion of a histogram in addition to retaining the actual numerical values which
The construction of a stem-and-leaf
makes it a simple matter to find the median.
display is illustrated using the following 239py concentrations in soil that
are displayed in histogram form in Figure 6:

8.2
9.4
12.8
1.7
7.9
8.9
6.7
10.3
21.3
11.3

0.8
9.0
3.5
0.5
10.7
2.0
7.6
4.4
3.1
11.3

2.4
3.6
16.4
4.8
5.6
4.4
5.8
305.0
6.8
8.7

18.2
3.0
3.4
5.6
11.2
11.0
2.5
21.0
3.6
20.0

1.9
14.3
7.9
9.2
5.9
2.6
10.2

The first step is to select a "stem" which corresponds to the intervals of a
histogram.
For the above data set, units of 10s appear to be a reasonable
choice.
The stem appears as in column (a) in Table 2.
The "leaf" of the

display is the next digit of the number, illustrated in colum(b) of Table 2
for the first 5 numbers in column 1 of the above data set. Doing this for all

47 numbers gives column (c) in the table.

Note that two data with the same

stem value appear in the same row, @.g., 21.3 and 21.0.

By reordering the

leaf values from smallest to largest for each stem, and by adding a depth
column, we obtain the final stem-and-leaf display given in column (d).
Note

that the "leaf" part is just a histogram, but each "bar" of the histogram now
contains the actual numerical values of the data.

The "depth" column is

constructed by counting the number of observations starting at both ends.
Thus, the entry at position 7 on the stem contains the median (central observa-

tion).

acd

256
Select target paragraph3

● ● ●