Cramér–von Mises criterion
Statistical test
From Wikipedia, the free encyclopedia
In statistics the Cramér–von Mises criterion is a criterion used for judging the goodness of fit of a cumulative distribution function (CDF) compared to a given empirical distribution function , or for comparing two empirical distributions. It is also used as a part of other algorithms, such as minimum distance estimation. It is defined as , where

In one-sample applications is the theoretical distribution and is the empirically observed distribution. Alternatively the two distributions can both be empirically estimated ones; this is called the two-sample case.
The criterion is named after Harald Cramér and Richard Edler von Mises who first proposed it in 1928–1930. [1][2] The generalization to two samples is due to Anderson. [3]
The Cramér–von Mises test is an alternative to the Kolmogorov–Smirnov test (1933).[4]
Cramér–von Mises test (one sample)
Let be the observed values, in increasing order. Then the test statistic is[3]: 1153 [5]
If this value is larger than the tabulated value, then the hypothesis that the data came from the distribution can be rejected.
Watson test
A modified version of the Cramér–von Mises test is the Watson test[6] which uses the statistic U2, where[5]
where
Cramér–von Mises test (two samples)
Let and be the observed values in the first and second sample respectively, in increasing order. Within the combined sample of size , let be the ranks of the xs in the combined sample, and let be the ranks of the ys in the combined sample. Anderson[3]: 1149 shows that
where U is defined as
If the value of T is larger than the tabulated values,[3]: 1154–1159 the hypothesis that the two samples come from the same distribution can be rejected. (Some books[specify] give critical values for U, which is more convenient, as it avoids the need to compute T via the expression above. The conclusion will be the same.)
The above assumes there are no duplicates in the , , and sequences. So is unique, and its rank is in the sorted list . If there are duplicates, and through are a run of identical values in the sorted list, then one common approach is the midrank[7] method: assign each duplicate a "rank" of . In the above equations, in the expressions and , duplicates can modify all four variables , , , and .
Cramér distance
For two distributions on the real line with cumulative distribution functions and and finite first moment, the Cramér distance is
a metric on the space of such distributions.[8] Note that some sources define the Cramér distance as , but this fails the triangle inequality and so cannot be properly defined as a distance. The Cramér distance is the one-dimensional case of the energy distance via the relationship ,[9] and when represents a single observation with cumulative distribution , is equivalent to the continuous ranked probability score, a strictly proper scoring rule.[10]

Under the probability integral transform (PIT), the plot of the empirical distribution of the transformed values and the uniform distribution on creates a PIT reliability diagram. The Cramér distance between these two distributions equals , the square root of the criterion, and serves as a numerical score of the calibration error of . This may also be referred to as the Root Mean Square Calibration Error (RMSCE).
For a deterministic (point) forecast at , the PIT degenerates to a Bernoulli random variable on with success probability , so in the population limit the Cramér distance between the PIT CDF and the uniform distribution evaluates in closed form to
This quantity is minimized at (the unbiased case) with value , establishing a calibration-error floor that no point forecast can fall below regardless of how accurate its central value is. In contrast, a well-calibrated probabilistic forecast can approach 0. Similarly, this quantity is maximized at the bias extremes with value .