Calibration Manager

From Agri4castWiki
Jump to: navigation, search


Introduction

The calibration Manager is a Python packages that links Python wofost (PCSE) to an open optimization tool NLopt and various databases required to run and optimize the model. This makes it possible to calibrate (a combination of) crop parameters (e.g. TSUM1 and TSUM2, SPAN and SLATB and TDWI,...) using one or more target variables (e.g. day of anthesis, day of maturity, lai-max, harvest index,...). The selected target variables are combined in a single objective function that is optimized. Additional functionalities are added to the calibration manager, such as normalizing target variables, assigning weights to experiments and calendars, criteria when expert knowledge is allowed to enter the optimization, applying additional crop masks to exclude non agricultural areas from the calibration.

Architecture

Calibration manager architecture.jpg

Components:

  • Calibration manager (Python package)
  • PCSE (Python Wofost crop model)
  • NLopt (nonlinear optimization module)
  • AgroPheno database (SQLite) defining observed:
    • Start of season (DOP or DOE)
    • Flowering (DOA)
    • End of season (DOM or DOH)
    • Target LAI (LAI-max and LAI-end)
    • Target Harvest index (HI)
  • CGMS12 or CGMS14 database (Oracle) defining

Calibrating of TSUMS

TSUM1 and TSUM2 (and TSUMEM) can be calibrated in a combined objective function that is optimized. Instead of using a simple model to simulate phenology, the calibration manager uses the full PCSE, including optional vernalization and photoperiodicity and is also suitable for calibrating winter crops with more complex temperature dependencies.

Using experimental observations

Calibrating tsum1 tsum2.jpg

Preferably, experiments are used to calibrate TSUMS. As a first requirement, each experiment should have an observed start-of-season: to start a simulation and simulate flowering and end-of-season. In the calibration, PCSE starts either on DOP or DOE, depending on what is available. It is not needed to derive one from the other bases on a rule of thumb or calculated local average between observed duration between DOP and DOE. As second requirement the experiment should have observed flowering and/or observed end-of-season; to compare with the simulated flowering and end-of-season. In case an experiment (or calendar) has DOH and not DOM, it is converted into DOM by subtracting a (configurable) number of days. In case an experiment or calendar has both DOH and DOM, DOH is removed.

To summarize: each calibration zone needs at least one DOP-DOA or DOE-DOA and at least one DOP-DOM or DOE-DOM or DOP-DOH or DOE-DOH. Consider the example of a zone with 8 experiments (see graph). For experiments 1-7, the simulation can start. For these experiments, simulated flowering and end-of-season are available. For experiments 1, 2, 3, 5, 6 and 7 observed flowering and/or end-of-season are available (green blocks). These can be compared to the simulated values: 7 pairs for which the difference can be calculated.

All differences are combined in a single objective function f(x):

f(x) = √(f(a)+f(m)) f(a) = mean squared difference of anthesis f(m) = mean squared difference of maturity

The calibration manager searches for TSUM1 and TSUM2 where f(x) is minimized. In the above example, the observations from experiment 8 are not used because there is no observed start-of-season. Thus, flowering and maturity cannot be simulated and differences with observed flowering and maturity cannot be calculated.

Weight for experiments

Calibrating tsum1 tsum2 weighted.jpg

To avoid that grids with many experiment in the same grid have a heavy impact on the calibration for the entire zone, the experiments are weighted in such a way that each grid has the same weight. Consider the example of a calibration zone with 8 experiments and 2 regional calendars (see graph). Three experiments (1, 2 and 3) fall inside grid number 1. Each of these experiments will be given weight 1/3. There is only a single experiment (5) in grid number 2 which is therefore given weight 1. Etc.

Including regional calendars

In case not enough experimental data are available, regional calendars are added. Regional calendars are included if within the zone:

  • There is not a single experiment defining the vegetative phase (DOP-DOA or DOE-DOA)
  • Or there is not a single experiment defining the full season (DOP-DOM or DOE-DOM or DOP-DOH or DOE-DOH)
  • Or the combined number of above phases is less than 15 (configurable).

In the example, there are 2 experiments defining the vegetative phase and there are 5 experiments defining the full season. Theoretically this is enough to calibrate TSUM1 and TSUM2 for this zone. Because the combined number of above observations is lower than 15, also the two regional calendars will be included. Regional calendars are valid for many grids and many years. Therefore a weight is assigned to each used calendar, in order not to outnumber available experimental observations that are only valid for a single grid and a single year. The weight is calculated in such a way that each calendar as a whole has a weight 1. In the example, two regional calendars are found: regional calendar 9 and 10. Calendar 9 is assigned to 5 grids and valid for years 2001-2010: 50 combinations. Calendar 10 is assigned to 4 grids and is valid for years 1991-2010: 80 combinations. Each combination of calendar 9 gets a weight of 1/50. Each combination of calendar 10 gets a weight of 1/80.

To limit the number of included AgroPheno data from regional calendars, only grids are included that fall within the Mapspam crop mask. For experiments this limitation is not applied! By excluding grids with regional calendars outside the Mapspam crop mask, in some zones still no vegetative and full season are defined. In that case also grids outside the Mapspam crop mask are allowed.

Calibration of leaf are dynamics

After having calibrated phenology, leaf area dynamics can be calibrated. Because experimental data are not available to calibrate the leaf area dynamics, one calibration source per crop is added to the AgroPheno database, valid for all zones of respective crop mask, linked to the period 2000-2016, and containing target values for selected target summary variables (e.g. maximum leaf area index reached over the season (LAI-max), end-of-season leaf area index (LAI-end) and end-of-season Harvest index (HI)). To start calibration simulations, sowing dates are taken from the crop calendar in the oracle database. The selected target summary variables are combined in a objective function f(x) that is minimized. All differences are normalized relative to the simulated value, to make the different target summary variables equally important. Thus, the objective function is the root of the weighted mean squared normalized differences.

All single value crop parameters can be selected as target parameters (e.g. SPAN, TDWI, TBASE,...). When xy crop parameters are selected (e.g. FOTB, SLATB,...), it should be defined how to adjust the functions. For instance, in the case of FOTB, all parameter_xvalue’s of FOTB_01, FOTB_02, FOTB_03, FOTB_04, FOTB_05 and FOTB_06 are simultaneously relatively shifted (i.e. shifting the AFGEN function along the x-axes). At the same time, FLTB and FSTB parameter_xvalue’s should be shifted accordingly to compensate for the changes in FOTB, enforcing the sum of parameter_xvalue of FOTB, FLTB and FSTB at each point in time to be 1. For SLATB, the parameter_yvalue’s of SLATB_01, SLATB_02, SLATB_03, SLATB_04 and SLATB_05 are simultaneously relatively shifted (i.e. shifting the AFGEN function along the y-axes).

Processing sequence

1 Loop over zones
   1.1 Get all grid_no’s of zone
   1.2 Loop over grids
      1.2.1 Get all experiments of grid
      1.2.2 Get regional calendars of grid including a flag if the grid has a MAPSPAM pixel > 50 ha of physical area for rainfed wheat
      1.2.3 In case experiment or regional calendar has DOH and not DOM, convert DOH to DOM by subtracting a (configurable) number of days
      1.2.4 In case experiment or regional calendar has DOH and DOM, remove DOH
   1.3 End loop over grids
   1.4 (a) Count experiment combinations with DOP and DOE per grid
   1.5 (b) Count experiment combinations with DOP-DOA and/or DOE-DOA
   1.6 (c) Count experiment combinations with DOP-DOM, DOE-DOM
   1.7 Calculate weights
      1.7.1 Weight experiments = 1/a
      1.7.2 Weight calendars
        if b=0 or c=0 or (b+c) < 15 and grid-year combinations under mapspam > 0
        then weight = 1/(count of grid-year combinations)
        if b=0 or c=0 or (b+c) < 15 and grid-year combinations under mapspam = 0
        then weight = 1/(count of grid-year combinations including grids outside mapspam mask)
        else weight = 0
   1.8 Loop over grid-year combinations for experiments and regional calendars
      1.8.1 run simulation with PCSE
      1.8.2 Calculate differences between simulated and observed flowering and end-of-season
   1.9 End loop over grid-year combinations
   1.10 Calculate combined RMSD of all differences, using weights
   1.11 Calibration: Repeat 1.8, 1.9. and 1.10 with different TSUM1, TSUM2 until combined RMSD is minimized (with NLopt)
2 End loop over zones
3 Export optimal TSUM1, TSUM2 and other calibration stats for each zone

Calibration settings

The following important settings must be specified which are also stored in the output result files.

Max_evaluations

The maximum number of iterations of the calibration. The calibration exits with status ‘MaxEval reached’. The last tested target parameter values is the final calibration results.

MinimumObservationsCount

The minum number of expiments needed to fully ignore regional calendars. Below this number also available regional calendars will be used in the calibration

Nworkers

Number of zones that should be calibrated in parallel.

Debug_mode

Switch for using calibration in debug mode (true or false).

AssumeCropParameterInvariableInZone

Crop parameter remains unchanged throughout the zone and time window (true or false).

AssumeCropCalendarFromDB

Allows that calibration sources from the agropheno database can be used that do not have DOP or DOE. These will be collected from the crop_calendar table in the database.

meteoStartDate

Start date from which weather data should be retrieved.

Optimizer

Specifies the optimizer that is used. Note that multiple optimizer can be provided. Each running after the other. In general we start with a global optimizer (NG_DIRECT_L) to find a global minimum. After this first rough calibration, a local optimizer (LN_COBYLA) further optimizes the target parameter values.

Objfunc_ftol

Tolerances values to exit calibration. If the result of the objective function change less than the tolerance, the calibration exits with status ‘Ftol reached’. When using multiple optimizer, each should have a Objfunc_ftol specified (comma separated).

Output_summary_variables

Variable to be shown in the output summery results.

Target_summary_variables

Parameters to be included in the objective function (comma separated). When calibrating phenology, these are DOA (anthesis) and DOM (maturity). Note that DOH was converted to DOM if needed.

Target_summary_weights

Weights that should be assigned to the weighted mean squared error of the individual summary variables. It can be used to favor one a summary variable over the other or make all summary variables equally important.

Target_parameter_names

Parameters to be calibrated (comma separated).

Parameter_is_absolute

Boolean to specify if value of target_parameter should be varied with absolute steps or relative steps (%). Relative steps are often used to calibrate xy target parameters (AFGEN function).

Parameter_default_values

Default values of the target parameters (comma separated).

Parameter_initial_step

Initial step size for testing different target parameter values (comma separated).

Parameter_lower_bounds

Lower boundary of target parameter values (comma separated).

Parameter_upper_bounds

Upper boundary of target parameter values (comma separated).

Calibration results

For each zone a result file is created, containing:

  • Last values of target_parameter_names (e.g. TSUM1, TSUM2, SPAN, SLATB,...)
  • Last WMSE of each target_summary_variable
  • Count of each target_summary_variable
  • Last correlation coefficient for each target_summary_variable
  • WMSE of objective function
  • Number of simulations per iteration
  • Last optimization method used
  • Status when exiting calibration (maximum number of simulations, tolerance reached or error).
In a post processing step, the results of the individual zones are collected in a summary variety results file that can be used to visualize the calibration results in arcgis.