likelihood ated parameter values. this data was the function to find "Simplex" algorithm; wo the associ- there are currently available a The computations outlined above are performed for a sequence of successively more complex statistical models of mixtures of normal distri-~butions, starting with a single normal distribution, then a mixture of two normal distributions with the same mean, then two normal distributions with different means, and so on. The choice of the most compli- N(u,o) where Mathematical Expression of a Mixture of Normal Distributions Figure 2. Co 1 27 ¥3 e 1 (x 3uy’ cated model to try is made from the cumulative distribution plot, such as shown in Figure 1. In our data a maximum of four distributions was used. These models were compared using the likelihoad ratio test; the details of this test are also found in elementary mathematical statistics texts, Note that, with this test, we cannot decide if any one model is a good fit to the data cr not, we can only decide the relative. merits of the models used. The successively more complex models were pairwise compared using the likelihood ratio principle until no statistically significant improvement in the fit to the data was found. A model consisting of a mixture of three normal distributions was found to best represent our example data; these distributions are plotted in the black curves on Figure 3. The broken line curve was calculated from the sample mean and standard deviation of all the data assuming a Single normal distribution. The relative size of the black curves, the area under these curves, is drawn here to represent the proportions of the data in each of the component distributions. The broken line curve is simply drawn a convenient size. Specifically this data set is best modeled by 21 percent of the data having a mean of 390 and a standard deviation of 195 (the wide curve at the bottom), 22 percent of the data having a mean of 394 and a standard deviation of 11 (the lower middle curve), and 57 percent of the data having a mean of 454 and a standard deviation of 13 (the tallest curve). Some additional information is of essentially the same as the 37. interest. The total sample size was The “known value" of the material sent to the laboratories was 452, 454 mean of the tallest curve, thus we suspect that this component curve represents the "good" laboratories. The average of all the data, represented by the broken line curve, was 440, which, if used to characterize the data set, suggests a bias from the known value--a bias that disappears if we use the component curves. Consider the range of concentrations defining the 95-percent area of the "good” or tallest curve. Eleven percent of the wide distribution is within this good range. Since the wide group represents 21 percent of the data, about 2 percent (21%*11%) of the data values would be misclassified as good when in fact the values are from poorly performing laboratories that just happened to hit it right this time. Less than One percent of the lower middle curve is actually within this 95 pDercent interval of the upper middle curve. One could, of course, this exercise with any wcceptance criteria one wished. lower middle group represents a group of 597 5 its maximum and The particular maximization technique used on number of other good algorithms. —e — f(x)=p,*N(u.o;) + p,*N(u.,0;)+ (1-p,—p,) N(u,,9;,) @iractly upon the eight repeat We suspect the laboratories with good