# Recommended means of analysis of residual errors

## A. Testing the significance of a random error rate on one control zone

For an individual error rate (i.e. the error rate of a single control zone or the error rate of the whole sample), the statistical test is rather simple. It is actually similar to the test that is performed for the LPIS Quality Element 1b within the LPIS QA exercise.

First, the standard deviation $S_R$ of the error rate R must be estimated. This estimation depends on the type of sample, i.e., random or from the risk analysis and also on the group of beneficiaries that this sample represents (i.e. one control zone or the whole population of the MS).

If the error rate R is computed from a random sample, one must use the following procedure:

- For each individual dossier in the sample, compute the difference $q_i$ = $E_i$ - R* $C_i$ where $C_i$ is the claimed area and $E_i$ is the area not found;
- Compute $S_q$ as the sample standard deviation of the vector of $q_i$;
- $S_R$ is then computed as:

where ๐ is the number of dossiers in the sample, ๐๐๐ก๐ถ is the total claimed area over the considered zone (i.e. either the control zone or the full MS territory) and ๐ is the total number of dossiers over the considered zone (again, either the control zone or the full MS territory).

If the error rate R is computed from a risk-based sample, the procedure for estimating the standard deviation of the error rate is more complex.

Once, the standard deviation $S_R$ is estimated, one must evaluate the upper limit of error (ULE) with:

Where $t_{n-1,0.95}$ is the quantile at 95% of the Student distribution with n-1 degree of freedom.

Finally, the conclusion of the test โIs the error rate R smaller than the 2% limit ?โ depends on which of the following cases is verified:

In case no chapter supplied.

## B. Comparing random error rates from different control zones

For the comparison of two error rates from independent samples (e.g. error rates from two different control zones, risk versus random samples), the preparation is the same. Only the test differs has it tries to address questions such as โIs the random error rate A larger than the random error rate B?โ or โAre both random error rates equal?โ.

The random error rates $R_A$ and $R_B$ and their respective standard deviations $S_A$ and $S_B$ must be evaluated individually as described previously and accordingly to the type of samples.

The statistic of the test will be computed as:

Then, according to the test that is considered, the conclusion of the test depends on which of the following cases is verified:

where the value m is computed as:

$t_{m,0.95}$ and $t_{m,0.975}$ are respectively the quantile at 95% and 97.5% of the Student distribution with m degree of freedom and $n_A$ and $n_B$ are the sizes of the samples A and B respectively.

## C. Comparing two error rates: paired case

For the comparison of two error rates from dependent sample (e.g. in case of reprocessed dossiers during a QC or for testing the equivalence of CwRS and classical checks), the procedure is slightly different. Instead of working on the error rates separately, one must keep the original structure of the sample, i.e. the sample is in fact constituted as pairs of observed area-not-found (e.g. a CAPI measurement and a field measurement) on the same dossier.

First, the standard deviation $S_{RDif}$ of the difference of the error rates $R_{Dif}$=$R_1$ - $R_2$ must be estimated. This estimation depends on the type of sample, i.e., random or from the risk analysis.

If the error rate $R_Dif$ is computed from a random sample, one must use the following procedure:

- For each individual dossier, compute the difference
$q_i$= ($E_{1,i}$-$E_{2,i}$) - ($R_1$ - $R_2$) * $C_i$
where $C_i$ is the claimed area and $E_{1,i}$ is the area not found using the first method and $E_{2,i}$ is the area not found using the second method;
- Compute $S_q$ as the sample standard deviation of the vector of $q_i$;
- $S_{RDif}$ is then computed as:

where ๐ is the number of dossiers in the sample, ๐๐๐ก๐ถ is the total claimed area over the considered zone (i.e. either the control zone or the full MS territory) and ๐ is the total number of dossiers over the considered zone (again, either the control zone or the full MS territory).

Once, the standard deviation $S_{RDif}$ is estimated, one must evaluate the lower and upper limit of error (LLE and ULE) with:

where $t_{n-2,0.975}$ is the quantile at 97.5% of the Student distribution with n-2 degree of freedom.

Note that there are n-2 degrees of freedom and not n-1 as in the test on one isolated error rate in previous section. Also, it is a two-sided test so there are two limits: LLE and ULE.

Finally, the conclusion of the test โAre the error rates $R_1$ and $R_2$ equal ?โ depends on which of the following cases is verified:

## End of pages on analysis of residual errors

Go back to the Navigation page OTSC