Difference between revisions of "Forecasting methods"
(→Prediction model) 

Line 60:  Line 60:  
Originally, it was intended to predict yields by solely using the water limited weight of storage organs in the prediction model. Later on, the other three were added. Water limited yield, for instance, is inappropriate for a region with a lot of irrigation. Furthermore drought stress can be strongly reduced in case of groundwater influence. This factor is not included in the CGMS. The simulated biomass indicators were added because these are more robust, and less sensitive to modelling errors in the distribution of assimilates. Moreover they also allow yield prediction during the growing season, when grain filling has not yet started or grains are still very small ([[Referencesde Koning et al. , 1993]]).  Originally, it was intended to predict yields by solely using the water limited weight of storage organs in the prediction model. Later on, the other three were added. Water limited yield, for instance, is inappropriate for a region with a lot of irrigation. Furthermore drought stress can be strongly reduced in case of groundwater influence. This factor is not included in the CGMS. The simulated biomass indicators were added because these are more robust, and less sensitive to modelling errors in the distribution of assimilates. Moreover they also allow yield prediction during the growing season, when grain filling has not yet started or grains are still very small ([[Referencesde Koning et al. , 1993]]).  
−  Although by default only the 4 above listed crop indicators are taken into account, regression models can be constructed from a combination of  +  Although by default only the 4 above listed crop indicators are taken into account, regression models can be constructed from a any combination of indicator available in the [[Weather Monitoring]] module, [[Crop Simulation]] module and [[Remote Sensing]] module. These models can be constructed with SPSS or the user interface of the [[Software Tools#CGMS statistical toolCGMS statistical tool]]). 
The statistical sub system of the CGMS uses a combination of a linear time trend and crop growth simulation results as proposed by [[ReferencesVossen (1990b, 1992)]]. This prediction model can be described as:  The statistical sub system of the CGMS uses a combination of a linear time trend and crop growth simulation results as proposed by [[ReferencesVossen (1990b, 1992)]]. This prediction model can be described as: 
Revision as of 14:39, 9 December 2011
Background
Various authors have proposed to subdivide crop yield in three components: mean yield, multiannual trend and residual variation (e.g. Vossen, 1989; Dagnelie et al. , 1983; Dennet et al. , 1980; Odumodu and Griffits, 1980). It is assumed that the interacting effects of climate, soil, management, technology, etc. determine the mean yield. Observed national, regional and subregional yields show a trend in time. The trend is mainly due to longterm economic and technological dynamics such as increased fertiliser application, improved crop management methods, new high yielding varieties, etc. The third component, the residual variation, is considered to be the variation among years (Dennet et al. , 1980). It is exactly this part which should be explained by weather, crop and remote sensing indicators.
According to Dennet et al. (1980) and Odumodu and Griffits (1980), the technological time trend should be removed from the crop yield time series, assuming that the residual variation is independent of that trend. This approach can be summarised as (Vossen, 1989):
Y_{T,obs} equals Y_{avg} + f(T) + e 

where:

Palm and Dagnelie (1993) fitted various time trend functions to national yield series (ton.ha 1 ) of several crops for 9 EU member states. Regressions were executed for the period prior to 1983 and a forecast for 1983 was made. This procedure was repeated for successive years up till 1988. The prediction results were compared with national yield values. Of the tested functions a quadratic function of time performed best. However, differences with a simple linear trend function were small. In a next step, these authors removed the trend from the yield series using the quadratic function. The residuals for the period prior to 1983 were regressed against various meteorological parameters and a prediction for 1983 was made. Again, this procedure was repeated for successive years up till 1988. This was done for 19 Departments in France . Comparing the predicted and official yield series demonstrated that the applied meteorological variables did not improve the prediction accuracy.
Swanson and Nyankori (1979) for corn and soybean production in the USA , Sakamoto (1978) for wheat production in South Australia , Agrawal and Jain (1982) for rice yields in the Raipur District in India , considered the technological timetrend dependent on the residual variation. According to Winter and Musick (1993), Hough (1990) and Smith (1975), weather affects farm management practices such as planted area, timing of field operations, application of inputs, etc. Hence, the time trend should be analysed simultaneously with the explaining variables. This approach can be summarised as (Vossen, 1989):
Y_{T,obs} equals b_{o} + f(T) + f(weather) + e 

where:

Swanson and Nyankori (1979) showed that the time trend was underestimated when weather data were not analysed simultaneously with the time trend. Similar results were found for millet in Botswana (Vossen, 1989).
The previous equation does not account for the interaction between crop growth and weather variability. Also root characteristics and soil physical properties are not accounted for. Therefore Vossen (1990b, 1992) proposed to use crop growth simulation results to describe yeartoyear yield variation. In a crop growth simulation model weather and soil characteristics are summarised and crop characteristics, including yield form the output, i.e. simulation results quantitatively represent the influence of weather variables on crop growth. The yield can be written as:
Y_{T,obs} equals b_{o} + f(T) + f(simulation) + e 

where:

Prediction model
Official statistics of regional mean yields are predicted by the CGMS using one of the following simulated predictors (see Crop Simulation):
 Potential above ground biomass (ton.ha1 dry weight)
 Water limited above ground biomass (ton.ha1 dry weight)
 Potential storage organs biomass (ton.ha1 dry weight)
 Water limited storage organs biomass (ton.ha1 dry weight)
Originally, it was intended to predict yields by solely using the water limited weight of storage organs in the prediction model. Later on, the other three were added. Water limited yield, for instance, is inappropriate for a region with a lot of irrigation. Furthermore drought stress can be strongly reduced in case of groundwater influence. This factor is not included in the CGMS. The simulated biomass indicators were added because these are more robust, and less sensitive to modelling errors in the distribution of assimilates. Moreover they also allow yield prediction during the growing season, when grain filling has not yet started or grains are still very small (de Koning et al. , 1993).
Although by default only the 4 above listed crop indicators are taken into account, regression models can be constructed from a any combination of indicator available in the Weather Monitoring module, Crop Simulation module and Remote Sensing module. These models can be constructed with SPSS or the user interface of the CGMS statistical tool).
The statistical sub system of the CGMS uses a combination of a linear time trend and crop growth simulation results as proposed by Vossen (1990b, 1992). This prediction model can be described as:
Y_{T} equals b_{0} + b_{1}T + b_{2}S_{T} 

where:
Suboptimal production circumstances such as drought, low temperatures etc. are allowed for by the constant b_{2}, which should lie between 0 and 1. 
Per region, for a moving window of at least 9 years, the regression coefficients are established and subsequently used for yield prediction of the 10th year (‘oneyearahead'). The selection of the predictor to forecast the final yield is as follows:
 Each candidate predictor is fitted to the data currently available for this region.
 Candidates with a negative estimate of b_{2} are rejected because of the nature of the process.
 From the remaining ones, that with the lowest jackknife mean square error is selected.
Jackknife errors 

Jackknife errors are calculated by simulating that an observation is absent and that the predictor is used to assess its value. It reveals the error in predicting the observation which had been kept out of sight. Obviously, jackknife errors are not entirely relevant in the present situation where we want to predict the future rather than to reconstruct the past. For direct application it is more relevant to investigate the prediction of the oneyearahead. Still the jackknife method is used because the jackknife errorsize estimates are less variable, being based on a larger number of predictions. With the same number of observations ‘n' the jackknife method has ‘n' error estimates while the ‘one year ahead' prediction, has only ‘ny' error estimates where ‘y' is the number of years on which the prediction is based. More detailed descriptions are given by de Koning et al. (1993) and Jansen (1995). 
A quadratic trend function is also considered in the CGMS. However, based on results of Palm and Dagnelie (1993) and de Koning et al. (1993), it was concluded that a linear trend sufficiently describes the increasing official yields. A smooth trend of any type over a large number of years assumes a continuity which might be unrealistic (de Koning et al. , 1993; [[ReferencesVossen, 1992[[References; Vossen, 1990a). According to Vossen and Rijks (1995) the predictor should only be based on data from the recent past. The length of the series should nevertheless be long enough to give a sufficient number of degrees of freedom in the regression analysis. Gradual shift in the time trend is allowed for by the shortness of the time series, used to derive the predictor.
Required input data are stored in the tables
 DATA_FOR_YIELD_FORECAST (GUI version)
 CROP_YIELD (Batch version)
 EUROSTAT
 NUTS
 STAT_CROP
The statistics have a wider range of crops than the ones considered by the Crop Simulation. Therefore yields of some of the 'statistical crops' are forecasted using the same 'CGMS crop'. This relation is stored in table STAT_CROP.
To be able to run the forecast in batch mode, all model parameters are stored in advance in tables:
 RUN
 MODEL_EXCL_YEARS
 MODEL_INCL_INDICATORS
 MODEL_REGR_INDICATIFS
 MODEL_SCEN_SIM_YEARS
 MODEL_SCEN_INDICATIFS
Each ten days the all stored models are run an results are written to the tables:
 FORECASTED_NUTS_YIELD (GUI version)
 FORECASTED_NUTS_YIELD_HIS (Batch version)
Before the start of each growing season, yield forecast are produced based on the long term average and corrected for a technological trend. The MARS analyst can change the length of the time series. This redefines the trend function and results in different CGMS level 3 forecasts.
Trend analysis
When for a certain combination of country and crop the accuracy is deemed not to be sufficient, the MARS analyst start to redefine trend periods and functions using Excel, SPSS or the user interface of the CGMS statistical tool.
First, trends for a longer period (1975 until current year) are determined if yield statistics for such a period are available. Next, trends for more recent periods are studied. For Eastern Europe the period after 1990 is used (to exclude strong changes caused by political changes around 1990). For countries within the European Union the period after 1992 is important because in 1992 the Common Agricultural Policy went through important changes that affected yield and planted areas.
Besides changing the trend period, different trend functions are studied. Yield statistics of each country are directly taken from the CRONOS database which is updated each month. Linear, quadratic and other type of trends are studied. MARS analysts also study the minimum and maximum trend evolution by separating the data set in two groups representing the 50% highest and 50% lowest values.