-
Notifications
You must be signed in to change notification settings - Fork 19
MAPL History Component
The History component in the MAPL library is one of several specialized components provided by the library. History exists to write diagnostic data from the export state of a gridded component in a MAPL hierarchy. Note, History does not handle the checkpointing of the component states for subsequent use as restarts. That is a separate code from History. The History resource file consists of "collections" which define a group of variables and the components they can be found in that are output with identical parameters. At its most basic use, i.e. if you don't explicitly tell History to do something else, it will write the field in the native representation of as it exists in the gridded component the field comes from (i.e. on the same horizontal grid and with same number of vertical levels as in the component).
The History resource file uses the ESMF config format. The structure is built around the concept of a collection, where a collection is a set of fields that will be written to a common file stream and processed for output with the same options. The basic History resource file consists of three sections and some option keywords that apply to the output as a whole. Note that files created will be named with the EXPID+collection_name+collection_template. The following describes the options for the resource file.
The following are global options that may be set in the resource file:
EXPID: experiment id
FileOrder: optional, sets the order of the variables in the collection in the netcdf file to alphabetical (default) and makes sure any variables that part of the metadata like lons or lats go first. If you don't want this for some reason set to "add_order" which will just put them in the order they get added to the netcdf file.
The collection list specifies which collections to write. Even if the collection is defined in the rc file, unless it is here, it will not be written. The collection list is specified as follows:
Collections: 'collection_a'
'collection_b'
::
In other words, if you want to temporarily disable writing of a collection, just remove from this list, not need to delete is definition later in the file.
The grid label section provides a list of grid definitions that may be referred to in collections for the HORIZONTAL regridding, so the LM value is irrelevant. If you put something it will be ignored, the actually non-distributed dimensions of the field will be examined to make decision about how the vertical will be handled. These definitions define the horizontal output grid for the collection if user wants the output regridded to a different horizontal grid than the native grid the requested field is defined on. Currently this supports Lat-Lon and Cubed-Sphere grids. Each grid has the form of grid_name.option where the grid_name is what is referred to in the collection. Note that each grid definition must have a GRID_TYPE entry. The rest of the entries may varying depending on the grid type. Here is an example Lat-Lon definitions. The user specifies the longitudinal (IM_WORLD) size, the latitudinal (JM_WORLD) size, the pole (options PC or PE for pole edge and pole center), and the dateline options (DE or DC for dateline edge and dateline center).
PC96x49-DC.GRID_TYPE: LatLon
PC96x49-DC.IM_WORLD: 96
PC96x49-DC.JM_WORLD: 49
PC96x49-DC.POLE: PC
PC96x49-DC.DATELINE: DC
PC96x49-DC.LM: 72
For a complete list of supported grid types and options for each type see the following page about creating grids from an ESMF_Config (which what the History.rc file is): https://github.com/GEOS-ESM/MAPL/wiki/Creating-Grids-with-MAPL-Grid-Factories
The actual grid to grid transformation is performed using ESMF and we currently support bilinear and first order conservative. For more information see: ESMF Regridding
coll_name.template: grads style template that defines time characteristics of the output file, e.g. %y4%m2%d2_%h2%n2z.nc4
coll_name.format: output file format, 'flat' binary or 'CFIO' netcdf, optional, default 'flat'
coll_name.mode: controls time output, whether to time average or write instantaneous values. Options 'instantaneous' (default) or 'time-averaged'
coll_name.frequency: time interval in HHMMSS format, frequency collection will be written
coll_name.duration: time interval in HHMMSS format, define how long to write to the current file before creating a new file, by default duration is the freuqency for only one time will be written to each file
coll_name.grid_label: grid definition to use for the output horizontal regridding
coll_name.vscale:
coll_name.vunit:
coll_name.vvars:
coll_name.levels:
coll_name.ref_time: time in HHMMSS format, optional, reference time used in conjunction with ref_date and frequency to determine when to write, optional, default 000000
coll_name.ref_date: date in YYYYMMDD format, optional, reference date used in conjunction with ref_time and frequency to determine when to write, optional, defaults to the date of the application clock
coll_name.end_date: date in YYYYMMDD format, optional, turns off collection at this date, by default no end date
coll_name.end_time: time in HHMMSS format, optional, turns off collection at this time, by default no end time
coll_name.regrid_name:
coll_name.regrid_exch:
coll_name.fields: Definition of the fields that make up the collection, described later
coll_name.monthly:
coll_name.splitField:
coll_name.UseRegex:
coll_name.nbit: bit shaving, integer, optional, if not present, no bit shaving, otherwise integer, retain that many bits of the mantissa, useful for better compression
coll_name.deflate: netcdf compression level, default 0, can be 0-9
coll_name.chunksize: netcdf chunking, by default the chunksizes will match the dimension, otherwise must be a list of comma separated numbers that match the number of dimensions in the output file. For example, suppose you are outputting on a 180x90 lat-lon grid, an there are 3D variables in the file, the file will have 4 dimensions, lon,lat,lev,time so you could say 90,45,1,1
coll_name.conservative: use conservative regridding, default 0, 0 - bilinear, 1 - conservative
The fields entry is described in more detail here as it has several options. The entry can consist of multiple lines, each of which may have two to four entries. For example:
geosgcm_prog.fields: 'PHIS' , 'AGCM' ,
'SLP' , 'DYN' ,
'U;V' , 'DYN' ,
'ZLE' , 'DYN' , 'H' ,
'OMEGA' , 'DYN' ,
'Q' , 'MOIST' , 'QV' ,
::
Each line consists of the follow:
- short_name of the variable in the gridded component
- name of component the variable may be found in
- optional name to use in the output file in place of the short_name
- optional modification to the coupler if time averaging. By default the coupler time averaged over the interval, set to 'MIN' or 'MAX' if you want the minimum or maximum in the interval. Note that in the example above the entry with U;V. This denotes that the two variables separated by the ; represent a vector pair and if regridded to a new grid should be handled accordingly.
- regrid_method: available on and after v2.22.0, regrid method to use. The options can be found here MAPL REGRIDDING METHODS. It is an error to specify both this and the conservative keyword.
- conservative: (starting from v2.22.0 new regridding keyword available, consider this depreciated when making new collections) use conservative regridding, default 0, 0 - bilinear, 1 - conservative.
- deflate: defaults to 0, deflation level used in NetCDF
- frequency: this is the frequency to output the collection in HHMMSS format.
- levels: list of space separated levels to output. If no vvars option is specified these are the actual level indices in a fortran sense. For example if you specify 1 2 3 this would output the levels indexed by 1,2, and 3 in the undistributed dimension in the underlying fortran array. If vvars is specified then these are the levels that will be interpolated to and output matching the type represented by vvars. For example if ZLE is specified as vvars, for levels you could specify something like this 10 20 50 100 1000 which would be the heights in meters you want output.
- mode: 'instantaneous' (default) or 'time-averaged', either time average the fields between writes or just output the instantaneous value.
- nbits: this performs "bit shaving" and sets 24-nbits of the mantissa for each value output to zero. This helps compression at the loss of some information
A collection can be regridded from the native horizontal grid of the fields in the collection to a different grid. This is controlled via 2 keywords, the grid_label and pre MAPL v2.22.0 the conservative keyword and on/after 2.22.0 the regrid_method keyword.
The grid_label tells it WHAT grid definition to regrid the collection to.
The regrid_label or conservative keyword tells it HOW to regrid to that grid, i.e. do I want to do bilinear regridding, conservative, or some other method.
Neither of these have ANY effect on the undistributed dimensions of the field. Those could represent the model levels or something else.
The vertical regridding is controlled vi the vvar, vscale, vunit, and levels keywords. The grid_label/regrid_method have absolutely no effect on the vertical regridding.
A collection can have an option called splitField. This is effectively a dimensionality reducer/splitter for fields. Basically any 4D dimensional fields with a trailing dimension of N will be split into N 3D fields (the names of the fields will be appended with the index number). Likewise any 3D fields with a trailing dimension of of N will be split into N 2D fields.
- Unless you are interpolating to a set of levels, you can not mix variables that are defined on the center and edge in the vertical in a collection as only one vertical coordinate may be defined in the output. If you want to output both center and edge variables on the native levels, you must write two collections.
- Likewise if your field has an ungridded dimension you can output it (the ungridded dimension is denoted as a level in the NetCDF file), but it can't have any vertical level as well (unless you use the splitting keyword for 4D fields ...)