Residual Error

From Wikicap - European Commission

Recommended means of analysis of residual errors

A. Testing the significance of a random error rate on one control zone

For an individual error rate (i.e. the error rate of a single control zone or the error rate of the whole sample), the statistical test is rather simple. It is actually similar to the test that is performed for the LPIS Quality Element 1b within the LPIS QA exercise.

First, the standard deviation S_R of the error rate R must be estimated. This estimation depends on the type of sample, i.e., random or from the risk analysis and also on the group of beneficiaries that this sample represents (i.e. one control zone or the whole population of the MS).

If the error rate R is computed from a random sample, one must use the following procedure:

- For each individual dossier in the sample, compute the difference q_i = E_i - R* C_i where C_i is the claimed area and E_i is the area not found;
- Compute S_q as the sample standard deviation of the vector of q_i;
- S_R is then computed as:
Sr.png

where ๐‘› is the number of dossiers in the sample, ๐‘‡๐‘œ๐‘ก๐ถ is the total claimed area over the considered zone (i.e. either the control zone or the full MS territory) and ๐‘ is the total number of dossiers over the considered zone (again, either the control zone or the full MS territory).

If the error rate R is computed from a risk-based sample, the procedure for estimating the standard deviation of the error rate is more complex.

Once, the standard deviation S_R is estimated, one must evaluate the upper limit of error (ULE) with:

ULE.png

Where t_{n-1,0.95} is the quantile at 95% of the Student distribution with n-1 degree of freedom.

Finally, the conclusion of the test โ€œIs the error rate R smaller than the 2% limit ?โ€ depends on which of the following cases is verified:

In case no chapter supplied.

B. Comparing random error rates from different control zones

For the comparison of two error rates from independent samples (e.g. error rates from two different control zones, risk versus random samples), the preparation is the same. Only the test differs has it tries to address questions such as โ€œIs the random error rate A larger than the random error rate B?โ€ or โ€œAre both random error rates equal?โ€.

The random error rates R_A and R_B and their respective standard deviations S_A and S_B must be evaluated individually as described previously and accordingly to the type of samples.

The statistic of the test will be computed as:

T.png

Then, according to the test that is considered, the conclusion of the test depends on which of the following cases is verified:

TableRaRb.png

where the value m is computed as:

M.png

t_{m,0.95} and t_{m,0.975} are respectively the quantile at 95% and 97.5% of the Student distribution with m degree of freedom and n_A and n_B are the sizes of the samples A and B respectively.

C. Comparing two error rates: paired case

For the comparison of two error rates from dependent sample (e.g. in case of reprocessed dossiers during a QC or for testing the equivalence of CwRS and classical checks), the procedure is slightly different. Instead of working on the error rates separately, one must keep the original structure of the sample, i.e. the sample is in fact constituted as pairs of observed area-not-found (e.g. a CAPI measurement and a field measurement) on the same dossier.

First, the standard deviation S_{RDif} of the difference of the error rates R_{Dif}=R_1 - R_2 must be estimated. This estimation depends on the type of sample, i.e., random or from the risk analysis.

If the error rate R_Dif is computed from a random sample, one must use the following procedure:

- For each individual dossier, compute the difference
q_i= (E_{1,i}-E_{2,i}) - (R_1 - R_2) * C_i
where C_i is the claimed area and E_{1,i} is the area not found using the first method and E_{2,i} is the area not found using the second method;
- Compute S_q as the sample standard deviation of the vector of q_i;
- S_{RDif} is then computed as:
Srdif.png

where ๐‘› is the number of dossiers in the sample, ๐‘‡๐‘œ๐‘ก๐ถ is the total claimed area over the considered zone (i.e. either the control zone or the full MS territory) and ๐‘ is the total number of dossiers over the considered zone (again, either the control zone or the full MS territory).

Once, the standard deviation S_{RDif} is estimated, one must evaluate the lower and upper limit of error (LLE and ULE) with:

LLE.png

where t_{n-2,0.975} is the quantile at 97.5% of the Student distribution with n-2 degree of freedom.

Note that there are n-2 degrees of freedom and not n-1 as in the test on one isolated error rate in previous section. Also, it is a two-sided test so there are two limits: LLE and ULE.

Finally, the conclusion of the test โ€œAre the error rates R_1 and R_2 equal ?โ€ depends on which of the following cases is verified:

TableLLEULE.png

End of pages on analysis of residual errors

Go back to the Navigation page OTSC