Meteorological data from ground stations
The processing of observed station weather into the MCYFS involves four steps:
Data acquisition from weather stations
The stations are limited to those that regularly collect data and can supply the data in near real time (Burrill and Vossen, 1992). Relevant information of stations includes station number, station name, latitude, longitude and altitude. This data is available in the table STATION.
Currently the data acquisition and processing applies to two regional windows: Europe and China. However in the documentation mainly examples for Europe are shown.
Some of the historic meteorological data were purchased directly from National Meteorological Services. Others were acquired via the . As data are obtained from a variety of different sources, considerable preprocessing was necessary to convert them to a standard format. Around 1992 the historic meteorological data represented approximately 380 stations in the EU, Switzerland, Poland and Slovenia with data from 1949 to 1991 (Burrill and Vossen, 1992). Later the historic sets have been extended with stations in Eastern Europe, western Russia, Maghreb and Turkey. The historic data were converted into consistent units and checked on realistic values. The database was also scanned for inconsistencies, such as successive days with the same value for a variable, or minimum temperatures higher than maximum temperatures (Burrill and Vossen, 1992).
From 1991 to present, meteorological data are received in near real time from sources like the GTS network for different hours within one day. The data is pre-processed and quality checked using the AMDAC software package (MeteoConsult, 1991) which extracts, decodes and processes the observations. Since 2014, more and more National Meteorological Services (NMI) migrate to the new encoding format BUFR, owned by the WMO. For the encoding of BUFR additional software, i.e. FMdecode by MetWatch, is applied.
In 2016 data from Chinese stations have been acquired starting a new service for this region.
The station database stored in table STATIONS holds over 9450 stations distributed over 67 countries in Europe. Over 4400 of these stations provide weather data in near real time. All weather data is stored in the stations weather database (table WEATHER_OBS_STATION and table RAIN_OBS_STATION) that currently counts over 41.6 million records. The figures on the right illustrate the increase of available number of stations for the temperature indicator between 1975 and 2009. In general the stations density in the monitored areas is considered sufficiently high for the purpose of the project.
Raw station data are collected from various sources:
- (essential data and data licensed by restrictions)
- European National Meteorological Institutes (NMI) (licensed)
Observations which are provided directly by National Meteorological Institutes or regional authorities come from secondary networks and are provided in proprietary formats.
Meteorological stations selected in priority are those located in the agricultural zones and equally distributed over the mainland (instead of over islands - for Portugal, Spain or Greece in particular). In particular, for western Russia (western of Urals) the main areas covered are the agricultural districts.
In the case of China a little bit less than 300 stations were selected meeting the following criteria:
- Near real time delivery
- A 20-years archive
- Located in the main agricultural areas
- Covering the elements: precipitation, minimum and maximum temperature, humidity and wind speed
The basic indicators that are received from weather stations include:
- Measured radiation
- Cloud cover
- Vapour pressure
- Wind speed
- Snow depth
Observations of maximum and minimum temperatures, precipitation amounts and sunshine duration (when available) are contained in the main hours synoptic. METAR data provide temperature, dew point, visibility and cloud amount. As far as available, they can be used for intermediate or even non-standard (i.e. all but main and intermediate) hours. From most countries outside Europe, 3-hourly synoptic data are exchanged worldwide.
For China the set is limited to:
- Precipitation (24 hour)
- Temperature (3-hourly)
- Humidity (3-hourly)
- Wind speed (3-hourly)
Data quality check
The software package Actual Meteorological Database Construction (AMDAC) is used to perform decoding, completing and quality evaluation of actual meteorological data which are used as input for agro-meteorological models. The chain of data processing and quality control can be described as follows:
Near real-time pre-processing (3-hourly data)
- Decode intermediate-hour and main-hour reports and METAR reports from weather stations circulating on the GTS in the geographical zone of interest;
- Extract or calculate and store the meteorological parameters in a separate database;
- Check the quality of the observed elements in the received weather reports by performing data range and time consistency checks; the latter is done by comparing the values of reported parameters with those previously or subsequently reported from the same station;
- Correct automatically obvious errors detected while performing these checks;
- Automatically fill gaps in the database through interpolation based on time consistency criteria;
- Flag dubious observations which cannot be corrected automatically;
- Write all automatic corrections and flagged dubious observations to a log file;
- Have the flagged observations checked and, if necessary, corrected by a trained meteorologist; when a correction is done, the derived parameters are recalculated and the data are written back to the database.
Dedicated trained and qualified meteorologists go through the dubious observation values that are flagged as such by the AMDAC automatic pre-checking program. The MIDAS software application is used to graphically visualize and analyze additional information such as:
- Station observation data
- Satellite images
- Radar data
which may serve as the basis of either confirmation or rejection of the observed values.
Conversion to daily values
Once the database has been filled following the method described above, the 3-hourly data are aggregated to daily values. This includes the following indicators:
- Precipitation (daily and 6-hourly)
- Temperature (daily maximum, daily minimum and 3-hourly)
- Measured radiation
- Cloud cover
- Vapour pressure
- Wind speed
- Snow depth
- Humidity (3-hourly)
A final check is then performed on these daily values before an output file is created for further processing. This automated quality check consists in verifying the data according to the table below. If errors are found, the meteorologist will check the data again and make modifications if relevant.
|Daily mean of total cloud cover : N||0 to 8 octas|
|Measured sunshine duration: MeaSun||0 to 24 hours|
|Measured radiation: RadMea||0 - 36 MJ/m2|
|Minimum temperature: Tn||-35 to 35°C depending on region|
|Maximum temperature: Tx||-20 to 50°C depending on region|
|Maximum temperature - Minimum temperature||0< Tx-Tn <30°C|
|Daily mean vapour pressure: e||0 to 35 hPa depending on region|
|Daily mean wind speed at 10 metres: ff10||0 to 15 m/s|
|Amount of precipitation from 6 UTC-6 UTC: RRR||0 to 140 mm depending on region|
|Air temperature: TT||-35 to 50°C depending on region|
|Relative humidity: RH||5 to 100% depending on region|
|Daily mean vapour pressure deficit: vpd||0 to 60 hPa depending on region|
|Daily mean slope of saturation vapour pressure vs. temperature curve: slope||0 to 3 hPa/°C|
|Daytime mean of total cloud cover: N||0 to 8 octas|
|Penman evaporation: ETP||0 to 25 mm/day depending on region|
|Snow cover: SNOW||(Tn+Tx)/2 < 10°C|
The variables in the above table have wide ranges but can be exceeded in some countries. For instance, in the winter time minimum temperatures below -35°C can occur in the northern countries. However, in most situations it is known before that the potential range for a variable is much smaller. To detect small potential errors, the MOS (Model Output Statistics) forecasting system of Meteo Consult is used. The MOS forecast-errors for the first 24 hours are usually very small. Therefore the observation thresholds are defined as a small range around the MOS forecast. If the observation is outside the range, it is flagged (already during the pre-processing checks of the 3-hourly values take place) for manual control by a meteorologist. MOS reckons with seasons and applicable climatology. Excessive persistence is very unlikely and spatial consistency is large. This is different for rainfall (see Consecutive zero values for rainfall).
Information on the way the daily element values are constructed/defined is stored in the tables WEATHER_OBS_STATION_INFO and RAIN_OBS_STATION_INFO. Currently the table WEATHER_OBS_STATION_INFO is only used to store information on rainfall e.g. period definition of the daily rainfall sum.
Finally the meta data of all stations in the MCYFS database is checked once a year.
Consecutive zero values for rainfall
Consecutive time series of zero rainfall are difficult to detect. Such errors can only be identified by inspecting longer time series going back several weeks to several months.
The following procedure is running:
- Each day the rainfall sum over the last 150 days of each station is checked with the following settings:
- Include only stations that frequently report (past 150 days more > 30 days)
- AND report very little or no rainfall (<10 mm)
- AND have 90% of rainfall observations = 0
- AND in region having > 100 mm rainfall (according climatology)
Flagged data that have a suspiciously low rainfall sum over the analysed periods are compared with surrounding stations with the MIDAS work station and the MARS viewers.
- In case of suspicious data the historic time series of this specific station and surrounding stations are retrieved.
- If the time series of the station are found to be wrong (thus wrongly zero for a long period) the following actions are executed:
- The station is added to a black list: the station is immediately excluded from the operational station list.
- The erroneous time series are deleted from RAIN_OBS_STATION and the PRECIPITATION value in WEATHER_OBS_STATION is set to ‘Null’. The erroneous values are saved in separate tables (WEATHER_OBS_STATION_ERRORS).
- All affected grid cells (WEATHER_OBS_GRID) and regions (WEATHER_OBS_REGIONCOVER) are reprocessed. In case these erroneous data were also used in the crop simulation and yield forecast these data sets are also reprocessed.
- Before mirroring the data to the analysts, they are informed to secure an optimal analysis environment.
Once a year each station on the blacklist is verified. Afterwards it is decided if stations can return to the operational work flow. Falsely blocked data is backordered, added and reprocessed.
Sufficient observations per country
Each month an overview is created showing the delivered number of stations per country. Information is also added on sudden changes and follow-up actions. Similar listings are made on a daily basis for internal use.
Ingestion into the database
After the station weather data passed all checks daily weather data is exported to a fixed formatted ASCII file (S-file) containing the data of a single day that can be imported in the table WEATHER_OBS_STATION. In the near real time situation a s-file is delivered one day later. For example in the afternoon of day 31 March 2016 the following file is generated: s20160330.dat.
|Format ASCII S-file (daily station weather)|
The 6-hourly rainfall data is exported to a plain ASCII file (rrr-file) containing the data of one 6-hourly time step within one single day. This data can be imported in the table RAIN_OBS_STATION. In the near real time service each day 4 rrr-files are generated at once containing data of 4 6-hourly time steps: 12 UTC (06-12 UTC of previous day), 18 UTC (12-18 UTC of previous day), 00 UTC (18-00 UTC of previous day) and 06 UTC (00-06 UTC of present day). For example in the afternoon of day 31 March 2016 the following files are generated: rrr_2016033012.txt, rrr_2016033018.txt, rrr_2016033100.txt and rrr_2016033106.txt.
|Format ASCII rrr-file (6-hourly station rainfall)|
Calculation of advanced parameters
Global radiation is the daily sum of incoming solar radiation that reaches the earth surface. It is mainly composed of wavelengths between 0.3 μm and 3 μm. Approximately half of the incoming radiation with wavelengths between 0.4 and 0.7 μm is Photosynthetically Active Radiation (PAR). Global radiation is the driving variable in the growth-determining CO2 assimilation process and thus crop growth models are sensitive to radiation data (van Diepen, 1992). A major problem is the scarcity of measured global radiation. In cases where no direct observations are available it must be derived from sunshine duration, cloud cover and/or temperature, on the basis of relatively weak relationships.
The global radiation calculation uses one of three formulae (Ångström-Prescott, Supit-Van Kappel, and Hargreaves), depending on the availability of meteorological parameters. An important component in these formulae is the amount of Angot radiation which is the extraterrestrial radiation integrated over the day at certain latitude on a certain day. In fact, all of the three formulae estimate the fraction of Angot radiation actually received at the earth surface. The calculation of the Angot radiation and the three different formulae are described by Supit et al. (1994) and van der Goot (1998a).
Ångström-Prescott, Supit-Van Kappel, and Hargreaves regression constants
The main problem with the application of the Ångström-Prescott, Supit-Van Kappel, and Hargreaves formulae is the quality of the regression constants. Studies by Supit (1994), Supit and van Kappel (1998) and van Kappel and Supit (1998) showed no relationship between latitude and the coefficients for Europe, although such a relation is frequently used to estimate these regression constants. Initially in MCYFS regression constants of Supit and van Kappel (1998) and van Kappel and Supit (1998) for Europe were used. They obtained sets of regression constants for the formulae for as many weather stations as possible, with a geographic distribution that corresponds to the area of interest for the MCYFS. As a result, a set of 256 reference stations was identified for which a relevant set of measured radiation data and other parameters in the formulae existed. For these stations regression constants were calculated based on measured radiation data for the three formulae mentioned above.
In 2012 the regression coefficients of these solar radiation models for Europe were updated using a new set of weather station data and an alternative source of radiation data: 6 years (2005-2010) of the down-welling surface shortwave radiation flux (DSSF) 30-minutes product derived from Meteosat Second Generation satellite data by the Land Surface Analysis Satellite Applications Facility (LSA SAF) (Bojanowski et al.,2013). For each solar radiation model a set of weather stations was selected having sufficient observations of either sunshine duration, or cloud cover/temperature or only temperature to perform a regression analysis. Results are stored in table STATION_REFERENCE_COEFFICIENTS.
Station archive data for China did not include measured radiation. Therefore radiation was derived from other observed elements namely cloud cover and minimum and maximum temperature. The models Hargreaves and Supit-VanKappel model have been trained using modelled radiation by Tang et al., 2013. The 50yrRad database of Tang et al., 2013 containing ‘modelled’ radiation data for 716 CMA stations, has demonstrated its superior performance over previous estimates of locally calibrated Angstrom-Prescott models.
The program SupitConstants uses this set of data (via the view SUPIT_REFERENCE_STATIONS), consisting of latitude, longitude, altitude and calculated regression constants, to derive the regression constants for all stations in the MCYFS. Interpolation of the regression constants of the reference stations to other stations is based on a distance weighted average of the three nearest stations. This process is carried out once, unless the set of reference stations changes or when new stations are added.
|Interpolation of regression constants|
| This body of data, consisting of latitude, longitude, altitude and the regression constants calculated for the reference stations, is being used for the derivation of the regression constants for the set of stations used for the interpolation of the daily meteorological data. This is a process that only has to be carried out once, unless the set of reference stations changes. Once the regression constants have been established for the operational set of stations, the global radiation estimation can proceed using any one of the formulae.
The interpolation of the regression constants is based on a simple distance weighted average of the three nearest stations. For each of the three sets of constants (Ångström-Prescott, Supit-Van Kappel, and Hargreaves) a subset is created from the complete set of reference stations, by selecting only those stations that have the regression coefficients for the desired method. This subset of stations is then sorted based on distance to the station for which the regression coefficients are being calculated. This sorting process is also subject to an altitude threshold test i.e. if the altitude difference between the target station and a reference station is greater than a set threshold the reference station is rejected in favour of the next nearest reference station. Depending on a distance threshold, the nearest one, two or three stations are then used to calculate the regression constants. If the threshold tests exclude all stations, the nearest station will be used, regardless of the distance. The altitude threshold value is 200 m; the distance threshold is 200 km.
The distance weighted average method used is based on the relative distance of the reference stations to the station of interest.
Assume the distances d0, d1 and d2 to be the distances to the three nearest reference stations, and w0, w1 and w2 the weights to be used in the calculation. As an example, assume that d1 is 2*d0, then w1 will be w0/2. More general, w1 = w0*d0/d1. Similarly, w2 = w0*d0/d2. Furthermore, the sum of the weights should be 1, so w0+w1+w2 = 1. From the above, the following relation can be established:
Interpolated regression constants are written in the table SUPIT_CONSTANTS and copied to table STATIONS. After the regression constants have been established for all stations, global radiation can be calculated by CGMS using any one of the above formulae. Finally, the CGMS writes the derived daily global radiation of every station in the table WEATHER_OBS_STATION_CALCULATED (see flowchart).
The following hierarchical method is used to calculate global radiation in CGMS (Supit and van Kappel, 1998). If observed/measured global radiation is available it will be used.
The principle component of all calculations is the extraterrestrial radiation, or Angot radiation. The extraterrestrial radiation is calculated as:
In case sunshine duration is available, global radiation is calculated using the equation postulated by Ångström (1924) and modified by Prescott (1940). The two constants in this equation depend on the geographic location.
Supit-Van Kappel formula
When sunshine duration is not available but minimum and maximum temperature and cloud cover are known, the Supit-Van Kappel formula is used, which is an extension of the Hargreaves formula (Supit, 1994). Again, the regression coefficients depend on the geographic location.
Finally, when only the minimum and maximum temperatures are known the equation of Hargreaves et al. (1985) is used. Again, the regression coefficients depend on the geographic location.
Daily meteorological station data collected from stations does not contain potential evapotranspiration. This parameter is calculated by the CGMS with the well-known Penman-Monteith equation (Allen et all., 1998). In general, the evapotranspiration from a reference surface, the so-called reference crop evapotranspiration or reference evapotranspiration can be described by the FAO‑Penman-Monteith:
Evapotranspiration from a wet bare soil surface (ES0) and from a crop canopy (ET0) is calculated with the well-known Penman formula (Penman, 1948). In general, the evapotranspiration from a water surface (E0) can be described by the Penman formula. Only the albedo and surface roughness differs for these two types of evapotranspiration as explained below:
The net absorbed radiation depends on incoming global radiation, net outgoing long-wave radiation, the latent heat and the reflection coefficient of the considered surface (albedo). For ET0, ES0, and ET0 albedo values of 0.05, 0.15 and 0.20 are used respectively. The evaporative demand is determined by humidity, wind speed and surface roughness. For a free water surface and for the wet bare soil (E0, ES0) a surface roughness value of 0.5 is used. For a more detailed description of the underlying formulae we refer to Supit et al. (1994) and van der Goot (1997).
Note that coefficients of the Angstrom method are required to calculate the atmospheric transmission within the calculation of the net outgoing long wave radiation. Currently for China only one set of Angstrom coefficients has been implemented: A = 0.18 and B = 0.55, taken from Frère, and Popov, 1979, valid for ‘cold and temperate zones’. This will be replaced by more accurate estimations of these coefficients in the course of 2016.The calculated E0, ES0, and ET0 are stored in table WEATHER_OBS_STATION_CALCULATED.