# Talk:Forecasting methods

To deal with the residual uncertainty given by the unknown evolution of the season from the moment the forecast is issued to the moment the crop is harvested, agro-meteorological scenario's can be produced and analysed. The scenario analyses consist in finding the most similar agro-meteorological years basing on the time series of parameters simulated by the CGMS. The analysis is based on Principal Component Analysis (PCA), Factor Analysis and Cluster Analysis (Hair et al., 1998). As input crop indicators of the CGMS of all available years are used (see Vol. 3 of this series). It is stressed that the climatic similarities are established basing on the time series of agro-meteorological parameters. In fact year similar in climatology are not necessarily similar in crop response as small changes in the sequence of the meteorological events can have a major effect in crop behaviour, this is why the approach is run directly on the crop parameters. The PCA gives a new combination of independent variables (factors). The first factors, explaining up to 90% of the variability, are selected and the combination of pairs of factors' axis are analysed using as unit the original variables. The Unit (years' observation) are then plotted on the new factors to characterise the years (for instance dry and hot season…). This is repeated for each country and at crop level (we remind that the original variables are the crop growth parameters as simulated by the CGMS). The analyst launches then a cluster analysis on the new factors (normally a hierarchical cluster) obtaining groups of homogeneous year according to obtained factors. Similarity or dissimilarity matrixes help to put a hierarchy on similarities among years. Once the score of similarities obtained and the hierarchy of similarity obtained the forecast is obtained as weighted average of the corresponding yields (in case de-trended). Weights are given by the similarity indexes. From the cluster of similar years different simple statistics are also used: within the group of the similar year the maximum and the minimum values of yield are used for optimistic and pessimistic yield scenarios. The routine used in SPSS is the following:

In this example ds, sm, wlai, plai, wb, pb, twc, twr stand respectively for development stage, soil moisture, potential leaf area index, water limited biomass, potential biomass, total water consumption, total water requirements, and are the parameters simulated by CGMS. The initial data are (a dekade, crop and country/region/grid fixed) the CGMS simulations per year (year are the units). Step 1) In this example we extract the main variable for a given crop simulated and country and with a FACTOR analysis we reduce to few variables explaining about 90% of the variability (2 in the example). In alternative to fixing a dekade the procedure can run on several dekades, thus the number of variables could substantially increase. Step 2) We then obtain (GRAPH) plots of the original units (years) on the new axis and this will result in characterizing the current season in terms of impact on crop growth, i.e. wet and cold; wet and hot; dry and cold; dry and hot. Step 3) The third step (CLUSTER) is then used to look at the similar years as the graph factor analysis could not be sufficient to find these. The cluster algorithm is here based on The similar years are determined looking at coefficients of dissimilarity produced in the distance analysis. These coefficients are used in two ways: 1 st detect the first ten similar years (or the ones similar below a defined threshold); 2 nd use them as weights to define a prediction. Step 4) The fourth step is the prediction derived from the similar years (this is not in the routine above). The pairs (year, yield) belonging to the group of similar years. This will determine a range of yields and an average (min and max can be used as scenario min and scenario max where their explanation is given by the characterization of the factor analysis). The prediction is then obtained either using the average or (better) calculating a weighted average where the weights come from the dissimilarity coefficients. In case of presence of trend all the steps is in fact run on distances from trend (the trend model choice will affect all of the results). Here follow some examples: 1 - Germany soft wheat scenarios in 2003 made during the second dekade of May using as input all dekades of soil moisture and development stage values (years analysed from 1975 to 2003):

The scree plot on the left shows the eigenvalues of the factor analysis run on 20 variables. The first two corresponding axis (the most explanatory) are given above on the right. One can note that the first quadrant is correlated with development stage (all concentrated here) direct expression of the influence of temperatures on crops. The y axis is explained by the crop soil moisture in April. Looking at the axis counter-clockwise the north-east direction in the first quadrant expresses the most hot and humid years (in terms of effect on crops), the north-west direction in the second quadrant the cold and humid years, the southwest in the third quadrant the cold and dry year, the south-east in the forth quadrant the hot and dry years. The graph below shows the position of the years in the new axis obtained (first two):

In this example one can note that the 2003 was at that time not so far from the origin appearing as a year slightly dry and cold in May. However, the position of 2003 in the new system of co-ordinates was opposite to 2002 (year characterised by a very high level of precipitations). 2- Spain soft wheat scenarios in 2000 made during March using as input all dekades of soil moisture and development stage values (years analysed from 1975 to 2000): In this example all the variables are analysed in the same dekade. The variables are Dev. Stage (DS), Soil Moisture (SM), Potential Biomass (PB), Potential Storage Organs (PS), Waterlimited Storage Organs (WS), Potential Leaf Area Index (PLAI) and Water Limited Leaf Area Index (WLAI). The difference between potential and water limited indicators is explained in Vol. 2 of this series.

The factor analysis gave the following results that show that the first two components explain almost 90% of the variability. Total Variance Explained

Extraction Method: Principal Component Analysis

and this is the corresponding plot and the contribution of each variable to the final variability: Scree Plot

Component Matrix

Extraction Method: Principal Component Analysis a.2 components extracted And below the corresponding plots of the variables on the first 2 axis and the units (year). The year 2000 was placed among the normal year at that time. To be stressed the years in the fourth quadrant of the last chart that can be read the area of the dry and hot years (years of drought) already well characterised in March. Component Plot

This technique helps to understand how the yield prediction could still change before harvesting. In theory the more the growing season advances the lower is the number of similar years remaining thus lower the uncertainty. Further studies are in course to validate the approach.