Time series analysis
Similarity analysis
The objective is to compare each pixel's NDVI or fAPAR profile of the on-going year with that of the previous years, and thereby to define the most "resemblant year". With the underlying idea that the crop yields of the current year might be similar to those of this resemblant year (neglecting the trend effects). This approach stems from the realm of "scenario analysis" and it involves a rather simple N-dimensional multivariate analysis, with N the number of available dekads in the year (1-36).
The similarity analysis is performed every 10 days on the smoothed time series of NDVI and fAPAR of SPOT-VEGETATION, TERRA-MODIS and NOAA-AVHRR with the GLIMPSE-program SIMILI. The analysis is performed always for the season starting in March until present, and the season starting in October until present. Furthermore, the classes ‘cropland’ and ‘pastures’ are treated separately.
The similarity measure used is the root mean squared error (RMSE). For each pixel, the RMSE is calculated for each paired comparison of the ongoing season and each previous season in the archive, taken into account possible shifts, using this formula:
with
n | = number of dekads as defined by Date 1 and Date 2 |
s | = shift (in this example [-2, -1, 0, +1, +2]) |
The minimum of valid observations is 95%.
The output of the Similarity-tool, are five byte images. If you look at the header-files of these images, you will notice how the images are scaled (see Headers):
- The most similar year without shifts: value = byte-value + 1900
- The best similarity value without shift (lowest RMSE): value = 0.005 * byte-value
- The most similar year with shift : value = byte-value + 1900
- The best similarity value with shift (lowest RMSE): value = 0.005 * byte-value
- The best shift: value = byte-value – 100
Every dekad, the following analyses are repeated: 8 similarity analyses, for 2 startdates (m/o) x 2 classes (a/p) x 2 variables (k/b). This in total yields 40 images (8 x 5 n-values) and 16 QuickLooks (8 x 2 m-values).
Cluster classification
The objective of the cluster classification is to obtain a spatial overview of the areas for which a number of indicators are progressing in the same way. The Similarity analysis focussed on similarity per pixel with previous years. In the cluster classification, a time series of a certain pixel is compared to the time series of the other pixels in the region of interest to identify similar evolutions of the season. The analysis is performed for the following data sets:
- Two seasons are considered, with start dates in March and in October (always till present).
- The cluster analysis is performed for three distinct classed: cropland, pastures and rice.
- Two smoothed vegetation indicators are used: NDVI and fAPAR, both the actual values and the relative difference to the long term average.
- Data from SPOT-VGT, TERRA-MODIS and NOAA-AVHRR are used.
The cluster classification is performed in different steps.
- Step 1: Selection of a subset of training pixels by means of systematic sampling and/or the use of a mask image (e.g. only pasture pixels). The corresponding image-values are stored in an ASCII-database. These data (typically for about 10 000 pixels) are input for the next step. (the GLIMPSE-program ClasUIp is used)
- Step 2: Calibration by means of a more recent version of ISOdata, called ISOclus. The following modifications were introduced:
- The classical ISOdata/ISOclus requires the user-input of two parameters (LUMP and STDEV) which are very difficult to define in advance as they depend on the dimensionality and structure of the data. The program offers the option for automated search of these parameters.
- In the classical version, the Nk clusters are initialised simply by taking the signatures of as many randomly selected pixels. Two more elaborate and iterative approaches were included, based on the minimization of the within-class variance and maximization of the between-class spread. In many cases one immediately obtains the final/optimal clustering.
- The final number of observed clusters Nk varies between the min/max-bounds specified by the user. An option was included in which Nk is forced afterwards to a specific value, by means of additional iterations. This is useful to always end up with a fixed Nk-value (for MCYFS: Nk=7).
- Classical implementations solve the clustering via a "pure" Minimum Distance (MD) approach: only the cluster mean vectors are relevant, the VAR/COVAR-matrices are implicitly considered as "elementary" (all covariances zero, all variances 1). In other words: all clusters are represented as hyperspheres with constant radii. The new version also accounts for differences in the variances between clusters and image variables (hyperellipsoids with variable axes – though still parallel with the main image axes).
- This is done using GLIMPSE-program ClasUIc
- Step 3: Our standard program for Maximum Likelihood (ML) is used to extrapolate the calibrated algorithm over the entire image set. The fact that in this MD-variant the covariances are zero is irrelevant. Here too, different program options are provided, for instance the restriction of the classification by means of a MASK-image, and a special feature which deals with missing values. This step is realised with the GLIMPSE-program ClasMaxL.
The figure below shows an example of the obtained results for SPOT-VGT.
Every dekad, the following analyses are repeated: 24 clusterings, for 2 startdates (m/o) x 3 classes (a/p/r) x 4 variables (k/b/k1/b1). This results in 24 images and 24 QuickLooks.