Databases with Regional Unmixed Means (RUM-values)

From Agri4castWiki
Revision as of 17:00, 8 November 2013 by H Eerens (talk | contribs) (Equations for the spatial aggregation of the RUM-values)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

This procedure computes databases with the mean values of any IMG-variable for a set of administrative regions. Optionally, the means are also “unmixed”, i.e. specific for each crop or land cover class. In this way, the image data become compatible with the other agro-statistical information (official areas/yields, CGMS-outputs,…) in a spatial and thematic sense. The agro-statistical region (or district) replaces the pixel as spatial unit, and the crop/class becomes the thematic unit. The procedure also involves an important data reduction. The format of the files containing the RUM-values is described in the section Formats.

Ancillary maps on regions and land use

In addition to the RS-images, two ancillary inputs are required: a map with the administrative regions, and a land use map. Both must be converted to raster form, congruent with the LoRes images. For the regions map, this is a standard operation, but for the land use there are two possibilities:

  • The land use map is a LoRes hard classification, such as GLC2000. The "unmixing" is then realised by computing the simple mean of the subset of pixels which belongs to the concerned region and class (and repeating this for all region x class combinations).
  • More detailed HiRes land use maps such as CORINE, are first converted into a set of Area Fraction IMGs (AFIs), with the same resolution as the RS-images. The AFI's express for each pixel the area fraction covered by the different classes. The real C-indicators can then be computed (AF-weighted means, with optional AF-thresholds).

In practice for MCYFS, RUM/C-indicators are computed for two types of regions:

  • NUTS3 (5935 units, see figure)
  • 25km-meteogrid (22 823 cells)


For the land use, area fraction images of six agricultural classes are used, based on the CORINE and GLC2000 classifications:

  • Arab1
  • Arab2
  • Arab2
  • Past1
  • Past2
  • Rice

Because the remote sensing data dealt with has 3 different resolutions (1km, 0.25km for MODIS and 5km for MSG), these inputs were computed at these three spatial resolutions.


The developed methodology works in two steps. Every dekad/month when new RS-imagery arrives, a RUM-file is computed for each variable (NDVI, fAPAR, DMP, Ts and the MSG derived meteo variables) by means of the GLIMPSE-program IMG2RUM. RUM-files (Regional Unmixed Means) are ASCII-formatted and contain all the values for the concerned variable and period (dekad/month), i.e. the RUM-values for all considered regions x land use classes. The files are then ingested in an Oracle database.

Principles for ingestion of RUM-values into an Oracle database.

Principles for ingestion of RUM-values into an Oracle database

  • For each ROI a separate Oracle database, outlined in the figure below, has to be set up, though the data of different sensors can be combined even if they have different resolutions. Hence, one database for MCYFS is sufficient.
  • Five auxiliary and static tables have to be prepared in advance (REG0, CLASSES, METHODS, SENSORS, VARS). Their content is described in the table below. The primary key fields (REG0_ID, CLASSES_ID, ..) contain the same ID’s as used in the RUM-files. For instance, table REG0 provides for every region: its ID, its full name (string), absolute area in km², etc. Table VARS contains for every variable: the ID, the name, the pysical units (“kgDM/ha/day” for DMP,…), etc.
  • REG0 contains the most detailed regions, for which the RUM/C-indicators were computed – NUTS3 in the case of MCYFS. Optionally, more REG-tables can be created (REG1, REG2, …) for higher spatial levels (country, continent). This allows the computation of aggregated RUM-values on the ORACLE database.
  • Table RUM_HEADERS is of crucial importance. It stores all (so far observed) “cases” or combinations of the primary parameters (region, class, sensor, variable, method, threshold) and assigns a unique ID to them: RUM_ID.



  • The “real” data (Date, relative areas RA1/RA2, Mean and Standard Deviation are stored in 4 separate tables:
    • RUM10_VALUES/RUM30_VALUES: for the actual S10/S30
    • RUM10_LTA/RUM30_LTA: for the corresponding “historical year” data.

These tables contain all the available info, i.e. for all , regions, classes, sensors, variables, methods, thresholds and dates. All four are linked to RUM_HEADERS via the primary key RUM_ID. Note that the LTA-tables do not grow over time (there is only one “historical year”). This design offers a lot of advantages. The database is compact, because the longer items (names of regions, classes, etc.) are only stored once, in the static auxiliary tables. The dynamic part of the database only contains numbers (ID’s or data). Moreover, by separating the RUM_IDs from the “real” data, the primary parameters (region, class,…) only have to be stored once, in RUM_HEADERS. For the same reasons, the database queries run at optimal speed. Queries always search for a specific combination of the primary parameters, i.e. a single RUM_ID. They can thus be run on table RUM_HEADERS instead of on the “real” data.

In practice, new RUM-files are copied to an FTP-server from where they are automatically ingested into the database. The procedure performs quality checks, adds new RUM_IDs as to the needs and optimizes table RUM_HEADERS for querying. If the RUM-file contains C-indicators which are already included in the ORACLE-database, the old data are automatically overwritten. That is an important prerequisite for the frequent re-processings.

Go back to the pre-processing description of NOAA-AVHRR, METOP-AVHRR, SPOT-VEGETATION or TERRA-MODIS.

Equations for the spatial aggregation of the RUM-values

The data in the Oracle database can be spatially aggregated to higher level regions.

Procedure to spatially aggregate the data to higher level regions.

  • R regions: r=1...R (these must be aggregated into one "super-region")
  • K classes: k=1...K
  • \Sigma K() means: summation over K classes, \SigmaR() summation over R regions
  • S = Absolute areas (in km², ha, ...)
    • Sr,k = area of class k in region r
    • Sr = \SigmaK(Sr,k) = total area of region r (summed over K classes)
    • Sk = \SigmaR(Sr,k) = total area of class k (summed over R regions)
    • S = \SigmaR(Sr) = \SigmaK(Sk) = total area (over all region/classes)
  • F/G = Area fractions (dimensionless, 0-1)
    • Fr,k = Sr,k / Sr = area fraction of class k in region r
    • Gr,k = Sr,k / Sk = fraction of Sr,k in the total area of class k (over R regions)
  • Y = Signal of certain variable (NDVI, DMP), averaged and hopefully "representative"
    • Yr,k = signal of class k in region r
    • Yr = signal of region r aggregated over the K different classes
    • Yk = signal of class k, aggregated over the R different regions
    • Y = general signal, aggregated over the different classes/regions


For a certain region r, one gets (summation over K classes):

Yr = (Yr,1 . Sr,1 + Yr,2 . Sr,2 + ...) / (Sr,1+Sr,2+...) = \SigmaK(Yr,k . Sr,k) / \SigmaK(Sr,k) = \SigmaK(Yr,k . Sr,k) / Sr = \SigmaK(Yr,k . Fr,k)

This is also the equation for the "lineair unmixing", with which the RUM-values Yr,k can be derived from the images (when inverted, applied on 1km-pixels instead of on regions, and when the area fractions Fr,k=Sr,k/Sr are known per pixel).

But in the aggregation we try to find Yk-values, representative for the group of R contributing regions (now with summation over the R regions):

Yk = (Y1,k . S1,k + Y2,k . S2,k + ...) / (S1,k + S2,k + ...) = \SigmaR(Yr,k . Sr,k) / \SigmaR(Sr,k) = \SigmaR(Yr,k . Sr,k) / Sk = \SigmaR(Yr,k . Gr,k)

We memorize the "practical equation": Yk = \SigmaR(Yr,k . Sr,k) / \SigmaR(Sr,k)

An AUX-file is prepared in advance, containing for each elementary region r (REG0) the corresponding areas for each class (Sr,k). The aggregation of class k over R elementary regions (part of the aggregated super-region) is then realised as follows:

  • Read Sr,k from the AUX-file for all R regions.
  • Read the individual signals Yr,k from the C/RUM-database.
  • Apply the practical equation: Yk = \SigmaR(Yr,k . Sr,k) / \SigmaR(Sr,k)