Cover Story

F-Test: The Pitfalls and Alternatives

Dear Sir or Madam,

In today's newsletter, we will focus on the F-Test for lack of fit (sometimes called Non-linearity). The strengths and weaknesses of this test have been discussed for quite some time in the bioassay community, so we'll begin with a recap of the general principle of the test.

F-Test: How does it work

The test tries to answer a simple question: Is the chosen model suitable for evaluating the data at hand? As a bare minimum, the variability of the fitted model should not be significantly worse than the natural variability of the observations. To verify this, the test calculates the deviation from the mean observation of each dose to both the value predicted for that dose by the fitted model and to the individual replicate observations for that dose. These deviations are corrected based on the assay setup, mostly by correcting for the number of dose steps and the number of replicate measurements, and their ratio is computed. This ratio is compared against a critical value to determine whether the test fails or passes.

F-Test: The pitfalls

The F-Test fails if the deviations between the predicted observations and the mean observations become too large. Since any model is just an approximation of reality, it would be unreasonable to expect all the mean observations to lie exactly on the model curve. If the variability of replicate measurements is moderate, the denominator of the critical ratio is large enough to absorb this effect, and the test performs as desired.

However, due to advances in assay design and execution, modern assays will often exhibit very low replicate variability. These advances in measurements precision aim to improve the assay precision and to reduce uncertainty of the result, but also reduce the denominator of the F-Test. For these assays, where the denominator of the test statistic is very small, the test may become overly sensitive and start failing assays that show no apparent flaws.

The following example illustrates this point. First, let's have a look at a sample with high variability in the replicate measurements. Judging from the plot, the sample looks acceptable. Since the predictive error is low in comparison with the high replicate variability, the sample passes the F-Test.

Now, let's take a look at what happens as we increase measurement precision while keeping the mean response for each dose constant. The measurements contract toward the mean and differences between model curve and mean response become more apparent. Although the curves still seem to be acceptable from a visual inspection, the F-Test starts failing samples.

Alternatives to the F-Test

As we've seen, the principal problem of the test stems from the small denominator of the test statistic. There have been some attempts to remedy this weakness by omitting or modifying the denominator, for example, the test on Sum-of-Squares Non-linearity, or the test on relative lack-of-fit. While these values can be used to characterize an assay system, there aren't any known probability distributions for these test statistics yet. This makes the use of these tests more laborious since margins for the test have to be determined on an assay-to-assay basis. All-in-all it seems like no silver bullet for this problem has been found to date.

What is your favorite way to test for lack-of-fit? You can always contact us with suggestions for new features, or tests that you'd like to use in PLA at We're looking forward to hearing from you.

Best regards
Mathias von Gellhorn
Marketing Manager

Stegmann Systems GmbH, Raiffeisenstr. 2, 63110 Rodgau, Germany
Phone: +49 (6106) 77010-0, Fax: +49 (6106) 77010-190

Read the whole newsletter