Interpolation of observed weather

From Agri4castWiki
Jump to: navigation, search


General description

Interpolation from weather stations to regular climate grid.

The daily meteorological data is interpolated towards the centre of a regular climatic grid. The grid definition is stored in object GRIDS. The data of the climatic grid is stored in object WEATHER_OBS_GRID. In the case of the European window (EUR), the regular grid projected in coordinate system ETRS_1989_LAEA.


The grid cell size is based on the assumption that within a region of 25 by 25 kilometers the meteorological data are homogeneous. It is expected that temperature, sunshine, humidity, and wind speed gradually change over distances of 50 to 150 kilometers. Local weather variations may be larger than changes over such distances, most obviously for solar radiation caused by differences in slope aspect, or presence of persistent clouds. Temperature conditions near the surface follow variations in solar radiation but there are also temperature gradients due to cold air drainage or differences in surface cover and in topography. Sometimes large differences occur over a short distance (10-40 km) when there is sharp transition between two different air masses, e.g. when a front passes or when a region under a persistent cloud cover exists adjacent to a region under a cloudless sky (van Diepen, 1998). Beek (1991) reports that if daily means are used, most meteorological variables do not differ too much over distances in the range of 50-150 km.

More complicated is the spatial variation in precipitation, usually in the form of rainfall. Rain may fall from a local cumulonimbus in showers with high density (convective rains), or in a front passage with low density over large areas (depression rains). Geographic patterns of rainfall are influenced by the geometry of land and sea surfaces, and by general circulation patterns. Western facing slopes of hill and mountain ranges receive more than average rainfall, east facing slopes less. For that, the spatial distribution patterns of rainfall are thus irregular (Beek, 1991; van Diepen, 1998).

The methodology for the spatial interpolation of the data of the existing network of meteorological stations towards the climatic grid cells centre is based on the studies of Beek, 1991 and van der Voet et al., 1994. It is described by van der Goot, 1998a. This method was chosen because its simple approach made it easy to automate while the accuracy was sufficient to serve as input to the crop growth model. The interpolation is executed in two steps: first the selection of suitable weather stations to determine representative meteorological conditions for a specific climatic grid cell. Second, a simple average is calculated for most of the meteorological parameters, with a correction for the altitude difference between the station and grid cell centre in case of temperature and vapor pressure. As an exception rainfall data are taken directly from the most suitable station.

It must be stressed that the consequence of the selected interpolation is that values obtained for each climatic grid cell represents an ‘average’ daily condition. They do not necessarily represent meteorological conditions that could be measured at the climatic grid cell centre. For instance, the altitude used is not the altitude that can be measured at the climatic grid cell centre, but rather a value that represents the mean altitude of the agricultural activity in the concerned cell. In addition different elements like rainfall and temperature can be based on different stations as they belong to different interpolation classes.

The interpolation is programmed in CGMS. Required tables are presented in 'flowchart grid weather interpolation historic years'.

General selection of weather stations (temporal availability)

Not all weather stations broadcast a complete set of data and not all stations broadcast continuously. To increase interpolation reliability, and to reduce the computational requirements, the CGMS performs checks on data availability of weather stations (see flowchart). First a classification is made with respect to the data type that stations can deliver. Five classes are distinguished:

  • PRECIPITATION
  • SNOWDEPTH
  • TEMPERATURE
  • REST (radiation, evapotranspiration, vapour pressure and wind speed)
  • HUMIDITY

Secondly, only weather stations within a radius of 250 kilometers around the climatic grid cell centre are candidates for interpolation. The radius is defined in the table INTERPOLATION_PARAMETERS (parameter IPOL_MAXDIST).

The selection of available stations is done differently for the current year and the historic years.

Current year

The current year is defined in the table INTERPOLATION_PARAMETERS. For each climatic grid cell, for each day, for each of the above five classes all stations are selected that are within a radius of 250 km around the grid cell. Thus, in the current year, each combination of grid cell, day and class can have its own unique list of stations. With regard to relative humidity the availability check of a station also include a check on the complete day: a station is available if the temperature and humidity observations of all 3-hourly time steps within a day are available. In case no stations are found for a combination of climatic grid cell, day and class, all stations are selected that are within a radius of 250 km around the climatic grid cell and have data for this specific day and class in table WEATHER_OBS_STATION_LTA ánd are considered reliable. Reliability is defined per station in table STATIONS.


Historic years (the archive)

For interpolation of historic station weather each climatic grid cell uses the same set of stations for a complete year. For most recent years around 3622 stations have a sufficient temporal coverage (enough observations within one year). Only these are used in the interpolation procedure. The selection procedure determines for each weather station if the data availability in WEATHER_OBS_STATION and WEATHER_OBS_STATION_CALCULATED for a class exceeds a certain threshold for each complete year separately. The threshold value can be configured per station and per class. Currently the classes PRECIPITATION, TEMPERATURE, REST and HUMIDITY have a threshold of 79.9% for all stations (defined in table STATIONS). It means i.e. if the station data in a particular class for a historic year is for more than 79.9% complete, the station is a candidate for the interpolation of the data in the concerned class. In case of class SNOWDEPTH the threshold is equal to 0% meaning that any station can be used for interpolation, as long as the station has at least one snow depth measurement for the selected historic year. Reason is the low temporal availability of snow observations during the year.

One additional check is done: the station in the selected class must have climatology data for all 366 calendars days stored in table WEATHER_OBS_STATION_LTA. After determination of availability, CGMS writes the results into the table WEATHER_OBS_STATION_AVAIL. Only these stations are used in the interpolation for historic years.

With regard to relative humidity the availability check of a station also include a check on the complete day. A station is available if the temperature and humidity observations of all 3-hourly time steps within a day are available (complete day) followed by the check of more than 79.9% of all days in that year have complete days.

Final selection of weather stations

To select the final station(s) to be used for a climatic grid cell on a specific day and for a specific class, CGMS applies a selection procedure that relies on the similarity of the 'source' station and the 'targetted' climatic grid cell centre. For the classes PRECIPITATION, SNOWDEPTH and HUMIDITY only 1 station is selected. For these classes the similarity score is expressed as the result of an algorithm that takes the following criteria into account:

  1. distance between the station and the grid cell centre
  2. difference in altitude between station and grid cell
  3. difference in distance to the coast between the station and the grid cell centre
  4. location of the station and the grid cell centre with regard to uniform weather regions

The higher the score, the lower the similarity between the station and the grid cell centre.



For the classes TEMPERATURE and REST, the interpolation to a climatic grid cell can use data from 1 up to 4 stations. To determine the most suitable set of stations the CGMS calculates a set score (combination score) in a similar way as the single station score (see similarity score) for each class TEMPERATURE and REST (vapour pressure, windspeed, evapotranspiration, radiation). It is based on the mean scores of a set of stations. Only the stations that satisfy the temporal availability check are included in the selection.


Theoretically, the set score could be calculated for all possible combinations of one up to four weather stations, taken from all available weather stations with sufficient observations. However obviously this would lead to many unnecessary calculations. Therefore, the CGMS determines the set score only for all combinations of one up to 4 weather stations, taken from the seven weather stations most similar to the climatic grid cell centre. This results in 98 sets for which the set score has to be calculated.

Interpolation of rainfall

Interpolated rainfall should be realistic in terms of number of rainy days, amount of rainfall and temporal distribution. The effects on the soil water balance and consequently on crop growth simulation of various small daily showers is different than for example one large rainfall event per week. When rainfall data from several surrounding stations are averaged, the rainfall peaks are leveled off and the number of rainy days increases. Therefore the CGMS takes for each climatic grid cell the rainfall of the most similar station i.e. the weather station with the lowest score for the concerned grid cell that has data. Only stations that satisfy the temporal availability check are included in the selection.

Stations can have missing data for a specific day. In case of historic years a maximum of 20.1% missing values is allowed (see temporal availability check). If the most similar station has no data for a certain day the second most similar station is checked etc. When no data are available for the 7 most similar stations, the long term average value of the most similar station is taken from table WEATHER_OBS_STATION_LTA.

In case of the current year all the selected stations have data for the day selected (see temporal availability check). Still it is possible that within a radius of 250 km of a grid cell no stations are found that have observations for class PRECIPITATION. To secure data availability for the 25 km grid cell the long term average value of the WEATHER_OBS_STATION_LTA is selected. In this case only stations are used that are identified as reliable stations in table STATIONS.


Interpolation of snow depth

Snow depth is interpolated in a similar manner as rainfall. For each climatic grid cell the snow depth of the most similar station is taken i.e. the weather station with the lowest score for the concerned grid cell that has data. Only stations that satisfy the temporal availability check are included in the selection. However for snow depth the threshold is set to 0% (see temporal availability) meaning that all stations with one or more observations in a historic year will be selected. Missing data of a selected station for historic years is treated in the following way. If the most similar station has no data for a certain day the second most similar station is checked etc. When no data are available for the 7 most similar stations, no snow depth data (NULL value) is returned for the selected grid cell.

In case of the current year all the selected stations have data for the day selected (see temporal availability check). Still it is possible that within a radius of 250 km of a grid cell no stations are found that have observations for class SNOWDEPTH. Unlike rainfall such grid cells will not receive a snow depth value based on climatology.

Finally a post processing is applied to avoid appearance of snow depth in locations that are too warm e.g. interpolation from a hill top to a valley. Snow depth is set to no data (NULL value) in case the specific day has a maximum temperature higher than 5 and a minimum temperature higher than zero degrees Celsius.


Interpolation of other weather data

Once the best set of weather stations for each climatic grid cell has been determined, as expressed by the minimum set score, data of these stations are simply averaged for the climatic grid cell. Temperature and vapor pressure are corrected for the difference in altitude between the selected weather station and the climatic grid cell centre. The correction factors used for the temperature and vapor pressure are respectively -0.006 (°C.m-1) and -2.5% per 100 meter increase (van der Voet et al.,1994). The averaging is carried out without weighting for distance, because in a comparative test it appeared that weighting did not improve the accuracy. This is not surprising, because the procedure for selecting the optimum set of stations contains already a weighing element (van Diepen, 1998).

Stations can have missing data. In case of historic years a maximum of 20.1% missing values is allowed (see temporal availability check). If for a certain day data is missing simply the long term average value of that day is taken from table WEATHER_OBS_STATION_LTA.

For the current year all the selected stations have data for the selected day (see temporal availability check). Still it is possible that within a radius of 250 km of a grid cell no stations are found that have observations for class TEMPERATURE or REST. To secure data availability for the 25 km grid cell the long term average value of the WEATHER_OBS_STATION_LTA is selected. In this case only stations are used that are identified as reliable stations in table STATIONS. For determining reliable stations see 'determining reliable stations'.


Interpolation of humidity

Relative humidity is interpolated in a similar manner as snow depth. For each climatic grid cell the relative humidity of the most similar station is taken i.e. the weather station with the lowest score for the concerned grid cell that has data. Only stations that satisfy the temporal availability check are included in the selection. Missing data of a selected station for historic years is treated in the following way. If the most similar station has no data for a certain day the second most similar station is checked etc. When no data are available for the 7 most similar stations, no relative humidity data (NULL value) is returned for the selected grid cell.

In case of the current year all the selected stations have data for the day selected (see temporal availability check). Still it is possible that within a radius of 250 km of a grid cell no stations are found that have observations for class HUMIDITY. Unlike rainfall (and similar to snow depth) such grid cells will not receive a relative humidity value based on climatology.

Because relative humidity cannot be corrected for altitude differences between the station and the target climatic grid cell, relative humidity is converted to actual vapour pressure before interpolation. For each climatic grid cell the most similar station is selected. Next, for each time step (06, 09, 12, 15, 18 UTC) for a selected station:

  • the saturated vapour pressure is calculated using the temperature and formula: Es = 6.11 * exp((17.4*T)/(T+239)).
  • the actual vapour pressure is calculated by multiplying saturated vapour pressure with the relative humidity divided by 100.

Next:

  • the actual vapour pressure of the similar station is corrected for altitude differences between the similar station and the centre of the climatic grid cell applying a correction factor of vapour pressure (a decrease of 2.5% for an increase of 100 m). Note this correction is also applied in the interpolation of actual vapour pressure in CGMS (see Interpolation of other weather data).
  • the saturated vapour pressure of the climatic grid cell is calculated by using the temperature of the similar station with a correction for altitude (a decrease in temperature of 0.6 degree Celsius for an increase of 100 m).
  • finally, the relative humidity is calculated by dividing the actual vapour by the saturated vapour pressure of the grid cell and multiply it with 100.
  • after applying the altitude corrections the value of relative humidity is checked for the physical plausible limits: 0 and 100%.


Check on missing grid weather

Two checks are performed to monitor station data availability:

  • The Project Management Board (PMB) checks whether the daily number of records in table WEATHER_OBS_STATION and RAIN_OBS_STATION is not below a certain minimum threshold;
  • Each day after interpolation an automatic check is done if the missing number of grid cells (per country) is larger than the long term average missing number of grid cells (regardless if grid cells are assigned to a crop calendar). Only if this is true operators receive an automatic email alert, containing a detailed summary of the grid cell counts per country.

Example:

======================================
MISSING GRID WEATHER FOR DAY 03-01-2014
======================================

daily missing cc:   number of grids with missing weather having a crop calendar
average missing cc: lta number of grids with missing weather having a crop calendar
total cc:           total number of grids having a crop caledar
daily missing:      number of grids with missing weather
average missing:    lta number of grids with missing weather
total:              total number of grids


                      region    daily       average     total      daily      average    total                                                                                           
COUNTRY               code      missing cc  missing cc  cc         missing    missing    missing                                                            
-------------------- ---------- ----------  ----------  ---------- ---------- ---------- ----------                                                                                                       
Algeria                   10531         29       14.6       1953         57       48.2       2052

The long term average missing number of grid cells, used as threshold to trigger the email alert, are stored in table WEATHER_OBS_GRID_LTA_MISS.

These checks allow operators to better monitor the actual data flow and to take any necessary action in case the number of observations drops significantly leading either to i) interpolated values of grid cells that are based on too remote located stations or ii) to ‘empty’ grid cells in important agricultural regions thus without any interpolated values. Incomplete time series of grid cells affects the analysis negatively and blocks the simulation of crop development and biomass. ‘Empty’ grid cells happens in the 'current year' if for the day selected for an element:

  • no observations are found in a radius of 250 km around the centre of the grid cell
  • and also no reliable climatology (reliable stations) is available for the same radius.

Based on the results from the checks operators can respond accordingly:

  • Possibly backorder missing observations and deliver the missing data well before the end of the dekad so that it can be processed before the Crop Simulation and Yield Forecasting of the current dekad.
  • If observations are (still) missing just before the end of a dekad the missing grid weather will be substituted with weather forecast data (ECMWF) in order not to hinder the Crop Simulation and Yield Forecasting. Substitution of data is manually triggered per country per day and the substituted data are flagged (string '2' in all class-clumns of WEATHER_OBS_GRID_INFO table).
  • When the missing observations may arrive later they will be reprocessed for both the weather and crop simulation.