GAMAP v2–19 User Guide
Previous | Next | Printable View (no frames)
6. GAMAP data formats and data structures
The ASCII punch file format is being phased out in favor of netCDF file formats. Some versions of the GISS-CTM may still use this format. We shall leave this section here for reference.
This is the original data format developed by M. Prather (now at UC Irvine) for the old 9-layer Harvard CTM driven by GISS–II meteorology. This file format is still used by some older versions of the GISS 9 and 23 layer models. This output file is format has largely been supplanted by both the binary punch format and the netCDF format.
The chief advantage of the ASCII "punch" files is that they can be read easily by Fortran, IDL, or S-Plus code running on either big-endian or little-endian machines. Their main disadvantatage is that they only allow for 2 decimal places of precision, which leads to truncation errors in the diagnostic output.
Here is what the top of an ASCII punch file looks like. The first line is the top-of-file header. The next lines is the header for the first level of the first data block, followed by sample data lines in '(12f6.2)' format.
GLOBAL-RUN--4X5+SOM-PFILT--CTM version 1.0 (3/93 Prather) 24 72 46 20 0 0 1985 IJ-AVG-1 276 2 75960 76680 1.000E-10 72 46 0 288.86288.86288.90288.90288.92288.96288.96288.98288.99289.00289.01289.01 289.01289.02289.01289.00289.01289.02289.00289.01289.02289.02289.03289.04
Note that in the ASCII punch file, each level is stored as a 2-D data block. Therefore, the data for the second level will have the category IJ-AVG-2, etc.
6.2 Binary output ("punch") file format (version 2.0)
Please see the "File I/O with GAMAP" page on the GEOS–Chem wiki for more tips about handling binary punch files!
We are starting to phase out the binary punch format v2.0 in favor of netCDF COARDS format. netCDF file I/O capability has been introduced into GEOS–Chem starting with version v9–01–03. This transition will occur gradually.
The binary "punch" file format (v. 2.0) is now the standard data format used by the GEOS-Chem model. This format was designed to facilitate smooth interaction with GAMAP. A binary punch file also loads much faster and typically only requires 2/3 as much disk space as the equivalent ASCII punch file. This section describes the binary punch file format version 2.0.
6.2.1 General header for binary output files
Field | Type | Description |
---|---|---|
ftype | CHARACTER*40 | The file identifier string (CTM bin 02 or CTM bin 4D) |
toptitle | CHARACTER*80 | A title line, similar to the ASCII punch file format. |
6.2.2 Datablock header for binary output files
First line of datablock header:
Field | Type | Description |
---|---|---|
modelname | CHARACTER*20 | Name of the model (e.g. GEOS–3, GEOS4_30L) |
modelres | 2 x REAL*4 | Resolution of the model (lon, lat) |
halfpolar | INTEGER*4 | =1 if the
model has half-sized boxes at the poles, =0 otherwise |
center180 | INTEGER*4 | =1 if the
model has the first longitude centered on 180°, =0 otherwise |
Second line of datablock header:
Field | Type | Description |
---|---|---|
category | CHARACTER*40 | Diagnostic category name (same as in ASCII punch file) |
tracer | INTEGER*4 | Tracer number |
unit | CHARACTER*40 | Unit string (e.g 'molec/cm2/s', 'ppbv', etc.) |
tau0 | REAL*8 |
Starting TAU value (hours since 0 GMT 1/1/1985) for diagnostic interval |
tau1 | REAL*8 | Ending TAU value (hours since 0 GMT 1/1/1985) for diagnostic interval |
reserved | CHARACTER*40 | Extra string for future use |
dim | 6 x INTEGER*4 | Dimensions and starting box of data block:
NI, NJ, NL, I0, J0, L0 |
skip | INTEGER*4 | Length of data block in bytes |
These two header lines are followed by the data block (REAL*4).
NOTE: While "multilevel" diagnostics are stored as individual layers in ASCII files, they are stored as 3D blocks in binary files. Category names are then e.g. IJ-AVG-$ instead of IJ-AVG-1, IJ-AVG-2, etc
6.3 COARDS-compliant netCDF files
GAMAP can read data from COARDS-compliant netCDF files. This is a regular netCDF file which adheres to a standard set of dimension and attribute names. In particular:
Standard names for dimensions:
Each index array has the same name as its dimension. Therefore, the lon dimension is used to define the lon index array, etc.
Also index array values must be listed as monotonically increasing or decreasing.
Standard attribute names for each variable stored in the netCDF file. In particular, the most important of these are:
The time dimension should also have a calendar attribute, which may be:
For more information about the COARDS standards, please see this web page.
6.3.2 Reading COARDS-compliant netCDF files from disk
The main GAMAP program (as well as the lower-level routines like CTM_GET_DATA, CTM_GET_DATABLOCK) can read directly from COARDS-compliant netCDF files. The only limitation (at present) is that information about the tracer (i.e. category and tracer number) category and tracer name must be present in the diaginfo.dat and tracerinfo.dat files.
You can also read any netCDF file directly with helper routine NCDF_READ. This routine will return a structure containing all of the variables and attributes in the netCDF file.
6.3.3 Writing COARDS-compliant netCDF files from disk
GAMAP routine BPCH2COARDS can convert a bpch file to a COARDS-compliant netCDF file. This may be used in conjunction with routine MAKE_MULTI_NC to convert several bpch files in a single session.
Contains information about a file that has been accessed in a GAMAP session. For each file read by GAMAP, a corresponding FILEINFO structure is stored in a global common block. The global common block is accessible via the include statement @gamap_cmn.
6.4.1 Structure tag names for FILEINFO
Tag Name |
Type |
Description |
---|---|---|
filename | string | Name of the file containing data to be processed. |
ilun | long | Logical unit number. This field links the FILEINFO and DATAINFO structures. |
filetype | fix | 000-099
= ASCII files 100-199 = binary files 200-299 = netCDF files 300-399 = HDF4 or HDF4–EOS files 400-499 = HDF5 or HDF5–EOS files 500-999 = reserved for future use |
status | fix | Should be 1 if everything is correct, 0 indicates error (not thoroughly implemented though) |
toptitle | string | Holds the title line from the model output file (ASCII or binary) |
modelinfo | structure | Contains information about the model, so that grid can be computed (see below) |
gridinfo | pointer | Points to a structure that contains the horizontal and vertical grid (see below) |
GAMAP uses function CREATE3DFSTRU to create new FILEINFO structure(s).
NOTE: The correspondence between IDL and FORTRAN data types is as follows
Contains information about each data record that is in a model file. If a new file is opened, the complete header information is parsed once and the location of the individual data blocks is saved. Therefore, subsequent access is really fast.
6.5.1 Structure tag names for DATAINFO
Tag Name | Type | Description |
---|---|---|
ilun | long | Logical unit number. This field links FILEINFO and DATAINFO structures. |
filepos | double long | Saves position of associated data record in the file |
category | string | The diagnostics name (see data formats) |
tracer | fix | A number identifying a tracer. This number corresponds to the CTM tracers but can be offset by multiples of a large number such as 100 or 1000; (information on all tracers is stored in tracerinfo.dat) |
tracername | string |
The (full) name of the tracer as stored in tracerinfo.dat |
tau0, tau1 | 2 x double | The tau values indicating the model time (e.g. hours since 0 GMT 1/1/1985). In general, GAMAP uses tau0 rather than tau1 to identify data records. |
scale | float |
A scale factor that is applied to the data. Upon reading, the data is scaled automatically (a) to accomodate the scale factor stored in the punch file header, (b) to convert the data to the default unit given in tracerinfo.dat. Therefore, scale will usually have a value of 1. |
unit | string |
The physical unit of the data (e.g. `ppbv'). Units can be converted with CTM_CONVERT_UNIT. |
format | string |
A format string describing the ASCII format of the data (e.g. `(12F6.2)'). Takes the value `binary' for binary data files. If the data is free-format, leave this field empty. |
status | fix | 0 indicates data not read, 1 indicates data read. This field is used in ctm_get_data.pro to specifically select (un)read data records. Convention: 2 indicates either loaded or unloaded data. |
dim | 4 x fix |
Dimensions of the data array: dim[0]=NI, dim[1]=NJ, dim[2]=NL, dim[3]=ntimesteps. This is important to correctly assign a 2D field to LON-LAT or LON-ALT or LAT-ALT. If a dimension is not present, the respective field can be either 0 or 1. |
first | 3 x fix |
First index stored in each dimension. NOTE: these values are FORTRAN indices starting with 1 instead of 0 (which is the IDL convention). |
data | pointer |
Points to a 1-D, 2-D, 3-D, or 4-D float array. Access data as data = *( datainfo[index].data ) |
Use function CREATE3DHSTRU to create new DATAINFO structure(s).
NOTE: The correspondence between IDL and FORTRAN data types is as follows
This structure holds basic information about the model so that a grid can be computed.
ModelInfo = CTM_Type( modelname [, options] )
ModelInfo = CTM_Type( `GEOS_3', resolution=4 )
6.6.3 Structure tag names for MODELINFO
Tag Name | Type | Description |
---|---|---|
name | string |
The name of the model (or name of the grid on which the data resides). Current options are
Also note that the special names
can also be used to denote reduced vertical-resolution grids. |
family | string |
A model family: GEOS, GISS, FSU, generic |
nlayers | long |
The maximum number of vertical layers |
ntrop | long |
The maximum number of layers in the troposphere (i.e. with chemistry) |
ptop, psurf | float |
Pressure at model top and surface (used to convert sigma layers into pressure) |
resolution | 2 x float | an array containing [ DI, DJ ] |
halfpolar | long |
Flag that indicates whether polar boxes are half size or not |
center180 | long |
Flag that indicates whether grid boxes are centered (=1) or edged (=0) on 180 degrees |
fullchem | long | Flag that indicates whether full or reduced chemistry set was used (NOTE: This is now obsolete.) |
hybrid | long |
Flag that indicates whether the model uses a hybrid sigma-pressure grid (=1) or a pure sigma level grid (=0) |
Most parameters are automatically set to the correct values in function CTM_TYPE, so you only need to provide the model name and resolution.
NOTE: The correspondence between IDL and FORTRAN data types is as follows
This structure holds the actual grid for the data set (both horizontal and vertical). It is created from the information stored in MODELINFO, but may be frequently re-generated e.g to accomodate for different surface pressures. The GRIDINFO structure is generated by a call to ctm_grid.pro (see below).
GridInfo = CTM_Grid( ModelInfo )
ModelInfo = ctm_type( 'GEOS_3', resolution=4 ) ; To have the GRIDINFO structure contain vertical layers GridInfo = ctm_grid( ModelInfo ) ; To suppress vertical layers from the GRIDINFO structure GridInfo = ctm_grid( ModelInfo, /No_Vertical )
6.7.3 Structure Tag Names for GRIDINFO
Tag Name | Type | Description |
---|---|---|
IMX | fix | Number of grid boxes in longitude |
JMX | fix | Number of grid boxes in latitude |
DI | float | Width of grid boxes in degrees longitude |
DJ | float | Width of grid boxes in degrees latitude |
XEDGE | float array | Longitude at western edge of grid box in degrees |
XMID | float array | Longitude at center of each grid box in degrees |
YEDGE | float array | Latitude at southern edge of grid box in degrees |
YMID | float array |
Latitude at center of each grid box in degrees |
When GRIDINFO contains vertical information (i.e. when the /NO_VERTICAL keyword to CTM_GRID is omitted), then the following fields are added:
Tag Name | Type | Description |
---|---|---|
LMX | fix | The number of vertical layers |
SIGEDGE | float array | Unitless sigma coordinate at the bottom edge of each vertical level. |
SIGMID | float array | Unitless sigma coordinate at the center of each vertical level. |
ETAEDGE | float array | For hybrid P-sigma coordinate data sets, defines the unitless ETA coordinate at the bottom edge of each vertical level. If ETAEDGE and ETAMID are defined, then SIGEDGE and SIGMID will be zero (and vice-versa). |
ETAMID | float array | For hybrid P-sigma coordinate data sets, defines the unitless ETA coordinate at center of each vertical level. If ETAEDGE and ETAMID are defined, then SIGEDGE and SIGMID will be zero (and vice-versa). |
PEDGE | float array | Pressure at bottom edge of each vertical level (computed from pressure values using a fit to the US standard atmosphere) |
PMID | float array | Pressure at center of each vertical level (computed from pressure values using a fit to the US standard atmosphere) |
ZEDGE | float array | Altitude at bottom edge of each vertical level (computed from pressure values using a fit to the US Standard Atmosphere) |
ZMID | float array | Altitude at center of each vertical level (computed from pressure values using a fit to the US Standard Atmosphere) |
The computation of pressures and altitudes may at some point be improved to allow for different surface pressures for individual grid boxes. However, this can be pretty complicated in terms of averaging.
NOTE: The correspondence between IDL and FORTRAN data types is as follows:
This structure contains information about the various quantities stored in the model output. Tracers are indexed by number, and the number is usually identical to the tracer number used in the model, but it may be offset by a multiple of a large number (usually 100 or 1000) to distinguish between different diagnostics.
This structure will be returned from procedure ctm_tracerinfo.pro. The information about all tracers is stored in the file tracerinfo.dat. This file is only read once, however, it can be reloaded with:
ctm_tracerinfo, /force_reading
You can pass either a tracer name or number to ctm_tracerinfo.pro.
6.8.1 Structure Tag Names for TRACERINFO
Tag Name | Type | Description |
---|---|---|
name | string | Short tracer name (8 characters or less) |
fullname | string | Full tracer name |
mwt | float | Mole weight of tracer in g. Can be used to convert from mixing ratio to mass and vice versa. |
index | long | Tracer number (index). |
molc | float | Number of carbon atoms (used to convert ppbv to ppbC). |
scale | float | Scale factor that must be applied to convert output to standard unit given in tracerinfo.unit. NOTE: This relies on consistent output within all models! |
unit | string |
Standard unit that shall be used for display of data for this tracer. Units can be converted with ctm_convert_unit (still in progress...) |
is_edged | long | Is this tracer defined on level edges? |
NOTE: The correspondence between IDL and FORTRAN data types is as follows:
This structure holds information about the individual diagnostics from all models. It is be returned from procedure ctm_diaginfo.pro. The information about all diagnostics is stored in the file diaginfo.dat. This file is only read once, however, it can be reloaded with
ctm_diaginfo, /force_reading
You can pass either a category name or diagnostic number to ctm_tracerinfo.pro. Note, however, that category names are unique, while numbers may be used repeatedly (e.g. 45 for IJ-AVG-$, 43 for CHEM-L=$).
6.9.1 Structure Tag Names for DIAGINFO
Tag Name | Type | Description |
---|---|---|
category | string | The diagnostic name (e.g. IJ-AVG-$) |
maxtracer | fix | This field is used for "source-type" diagnostics where the tracer number is misused as a general identifier (e.g. such as for lightning NOx in the GISS–II model). For all "good" diagnostics, this number is 1. (NOTE: For the GEOS-CHEM model, this tag name is obsolete, since all "maxtracer-like" diagnostics have been removed). |
offset | fix | An offset (multiple of 100 or 1000) that is added to the tracer number as stored in the output file to uniquely identify the quantity. |
spacing | fix | The interval (usually a number like 100 or 1000) that will be added internally to tracer numbers in order to separate tracers from different diagnostics. |
NOTE: IDL fix = Fortran INTEGER*2.
6.10 About 4-dimensional data blocks in GAMAP
In a nutshell: Reading and writing DATAINFO structure for 4D data blocks is now possible in GAMAP. Reading must be done with CTM_GET_DATA and it is exactly the same as with 2D or 3D cases. For a correct writing, you must specify FileType=106 when calling CTM_MAKE_DATAINFO. That's it.
In GAMAP v2.11, the FileType 106 has been introduced. FileType is a tag of the DATAINFO structures. When set to 106, it is basically a flag for 4D data block. CTM_WRITEBPCH, CTM_READ3DB_HEADER, CTM_OPEN_FILE, and CTM_READ_DATA have been modified to handle that case, so that you can write and read the DATAINFO structure for 4D data.
Also be sure to view our brief tutorial about working with GEOS–Chem timeseries output data (e.g. files generated by the ND48, ND49, ND50, and ND51 diagnostics, as well as met fields and BPCH files).
6.10.1 Writing a 4-D data block into a binary punch (BPCH) file:
Here is a very simple example of writing one 4D data block into a BPCH file:
;; ------ create 4D data: ;; let's create 27 timeseries, each with 4 points Data = Randomu(seed,3,3,3,4) ; Create modelinfo and datainfo structures ; assume GEOS-4 4x5 grid ModeIinfo = CTM_Type( 'GEOS-4', res=4 ) GridInfo = CTM_Grid( modelinfo ) ;; ------ create DataInfo structure: Success = CTM_Make_Datainfo( Data, $ NewDataInfo, $ NewFileInfo, $ Filetype=106, $ Model=ModelInfo, $ Grid=GridInfo, $ DiagN='IJ-AVG-$', $ Tracer=1, $ Tau0=12., $ Tau1=84., $ Dim=Size( Data, /Dimension ), $ First=[1,1,1], $ /No_Global ) ;; ------ write data into a BPCH file:: CTM_WriteBpch, NewDataInfo, NewFileInfo, FileName='filename.bpch'
This is the same as with 3D or 2D data. The only difference is that you need to manually specify FileType=106 in the call to CTM_MAKE_DATAINFO. That's it.
However TAU1 can be useful if correctly set. Here, we have 4 time steps. We want them to cover 4 days, assuming DATA is a daily value. We can set TAU at midday:
tau0 = nymd2tau(19850101,120000) ; --> 12 tau1 = nymd2tau(19850104,120000) ; --> 84
6.10.2 Reading a 4-D data block from binary punch (BPCH) file:
Data can be accessed as usual with CTM_GET_DATA:
;; Info is an array of structures ctm_get_data, Info, file='filename.bpch'
The first 4D data block can be accessed with:
data = *info[0].data help, data
Caution : the gamap.pro routine disregards the 4th dimension. To access the 4th dimension, you need to use CTM_GET_DATA as above.
This is basically it. The main interest here is to combine separate data block with different TAU0 values into one block, while keeping access to a lot of information through the Info structure. Routines to handle the specific cases of ND49 and ND48 GEOS-Chem output are being developed right now. Stay tuned.
6.10.3 Some basic manipulations
A very basic plot:
;; data are in a pointer and 4D plot, (*info[0].data)[0,0,0,*], title='first time series' print, ‘First TS starts at:' + info[0].tau0 print, ‘First TS ends at:' + info[0].tau1
Information about grid , model is easily accessed from:
GetModelAndGridInfo, info[0], modelInfo, gridinfo
TimeSeries location (s) in the first dataset:
;; offsets Lon0 = Info[0].First[0] – 1 Lat0 = Info[0].First[1] – 1 Z0 = Info[0].First[2] – 1 LonVector = GridInfo.XMid[ Lon0 : Lon0+Info[0].Dim[0]-1 ] LatVector = GridInfo.YMid[ Lat0 : Lat0+Info[0].Dim[1]-1 ] LevelVector = GridInfo.ZMid[ Z0 : Z0+Info[0].Dim[2]-1 ]
Then, the timeseries
(*info[0].data)[ i, j, k,*]
is located at:
Lon = LonVector[i] Lat = LatVector[j] Lev = LevelVector[k]