DISCUSSION
We examined our own radioisotope quality control data to test
the
assumption of normality and we found that most of our data does not
follow a normal distribution,
even when the common transformations of
data, such as the logarithm and square root, are used. Our data seem to
be best described statistically as a heterogeneous normal distribution,
also known as a mixture of normal distributions.
The following example of cesium-134 is typical of our data.
sent the example in the sequence of the data analysis steps.
1346,
12-20-74
I will preThe first
These questions
% PROBABILITY
curve could be composed of one- or two-line segments.
29 30 40 50 60 70 980
step is to plot the cumulative distribution of the data on normal probability paper.
The plot of the example data is shown in Figure 1.
It is
obvious that the data do not fall on one straight line, the requisite
for a single normal distribution to describe the data. Note that the
slopes of the top and bottom sagments are the same, thus the possibility
that they are two pieces of one distribution. The center portion of the
of the components of the curve must be resolved by statistical test.
The several possible structures were programmed as statistical functions
and compared using the maximum likelihood ratio test.
In more detail,
these steps are as
follows.
We start with a statistical model of the data distribution.
The model
for a mixture of three normal distributions is shown in Figure 2. The
Parameters denoted as "p” determine the proportion of the data "explained" by each of the component normal distributions. Since the total
of all proportions must equal 100 percent of the data, two of these
proportions are sufficient to define all three components.
The extenSions of this formula to two components, or more than three, should be
obvious.
A three-component model has eight parameters to be determined
from the data.
A not so obvious extension is to restrict mean values or
Standard deviations to be equal.
For example, in Figure 2 we might
replace
the us with with a u2
so that uz appears twice
in the formula
5
giving a seven-parameter model.
In about one-fifth of our data sets,
though not in the example presented here, this restriction to equal mean
values was statistically the best model.
The next step is to combine the model with the observed data using the
likelihood function.
This step is explained in detail in most elementary mathematical statistics textbooks.
300
After we have the likelihood function, we wish to choose those parameter
values (in our case the “p's,” “u's,” and “o's") that maximize the
likelihood function; this is the statistical principle called maximum
likelihood estimation.
The discipline of computer sciences has recently
provided some generalized functional maximization programs which can act
594
Figure 1.
400
500
DATA VALUES
ability Plot
Cumulative Norval Proh
595
ay
G00