GAMAP v2–19 User Guide

Previous | Next | Printable View (no frames)

6. GAMAP data formats and data structures

6.1 ASCII "punch" file format

The ASCII punch file format is being phased out in favor of netCDF file formats. Some versions of the GISS-CTM may still use this format. We shall leave this section here for reference.

This is the original data format developed by M. Prather (now at UC Irvine) for the old 9-layer Harvard CTM driven by GISS–II meteorology.  This file format is still used by some older versions of the GISS 9 and 23 layer models. This output file is format has largely been supplanted by both the binary punch format and the netCDF format.

The chief advantage of the ASCII "punch" files is that they can be read easily by Fortran, IDL, or S-Plus code running on either big-endian or little-endian machines. Their main disadvantatage is that they only allow for 2 decimal places of precision, which leads to truncation errors in the diagnostic output.

Here is what the top of an ASCII punch file looks like. The first line is the top-of-file header. The next lines is the header for the first level of the first data block, followed by sample data lines in '(12f6.2)' format.

GLOBAL-RUN--4X5+SOM-PFILT--CTM version 1.0 (3/93 Prather)    24 72 46 20  0  0  1985
  IJ-AVG-1  276    2     75960     76680 1.000E-10    72  46   0 

288.86288.86288.90288.90288.92288.96288.96288.98288.99289.00289.01289.01 
289.01289.02289.01289.00289.01289.02289.00289.01289.02289.02289.03289.04

Note that in the ASCII punch file, each level is stored as a 2-D data block. Therefore, the data for the second level will have the category IJ-AVG-2, etc.

6.2 Binary output ("punch") file format (version 2.0)

Please see the "File I/O with GAMAP" page on the GEOS–Chem wiki for more tips about handling binary punch files!

We are starting to phase out the binary punch format v2.0 in favor of netCDF COARDS format. netCDF file I/O capability has been introduced into GEOS–Chem starting with version v9–01–03. This transition will occur gradually.

The binary "punch" file format (v. 2.0) is now the standard data format used by the GEOS-Chem model.  This format was designed to facilitate smooth interaction with GAMAP.  A binary punch file also loads much faster and typically only requires 2/3 as much disk space as the equivalent ASCII punch file.  This section describes the binary punch file format version 2.0.

6.2.1 General header for binary output files

Field Type Description
ftype CHARACTER*40 The file identifier string (CTM bin 02 or CTM bin 4D)
toptitle CHARACTER*80 A title line, similar to the ASCII punch file format.

6.2.2 Datablock header for binary output files

First line of datablock header:

Field Type Description
modelname CHARACTER*20 Name of the model (e.g. GEOS–3, GEOS4_30L)
modelres 2 x REAL*4 Resolution of the model (lon, lat)
halfpolar INTEGER*4 =1 if the model has half-sized boxes at the poles,
=0 otherwise
center180 INTEGER*4 =1 if the model has the first longitude centered on 180°,
=0 otherwise

Second line of datablock header:

Field Type Description
category CHARACTER*40 Diagnostic category name (same as in ASCII punch file)
tracer INTEGER*4 Tracer number
unit CHARACTER*40 Unit string (e.g 'molec/cm2/s', 'ppbv', etc.)
tau0 REAL*8

Starting TAU value (hours since 0 GMT 1/1/1985) for diagnostic interval

tau1 REAL*8 Ending TAU value (hours since 0 GMT 1/1/1985) for diagnostic interval
reserved CHARACTER*40 Extra string for future use
dim 6 x INTEGER*4 Dimensions and starting box of data block:
NI, NJ, NL, I0, J0, L0
skip INTEGER*4 Length of data block in bytes

These two header lines are followed by the data block (REAL*4).

NOTE: While "multilevel" diagnostics are stored as individual layers in ASCII files, they are stored as 3D blocks in binary files.  Category names are then e.g. IJ-AVG-$ instead of IJ-AVG-1, IJ-AVG-2, etc

6.3 COARDS-compliant netCDF files

6.3.1 Overview

GAMAP can read data from COARDS-compliant netCDF files. This is a regular netCDF file which adheres to a standard set of dimension and attribute names. In particular:

  1. Standard names for dimensions:

  2. Each index array has the same name as its dimension. Therefore, the lon dimension is used to define the lon index array, etc.

  3. Also index array values must be listed as monotonically increasing or decreasing.

  4. Standard attribute names for each variable stored in the netCDF file. In particular, the most important of these are:

  5. The time dimension should also have a calendar attribute, which may be:

For more information about the COARDS standards, please see this web page.

6.3.2 Reading COARDS-compliant netCDF files from disk

The main GAMAP program (as well as the lower-level routines like CTM_GET_DATA, CTM_GET_DATABLOCK) can read directly from COARDS-compliant netCDF files. The only limitation (at present) is that information about the tracer (i.e. category and tracer number) category and tracer name must be present in the diaginfo.dat and tracerinfo.dat files.

You can also read any netCDF file directly with helper routine NCDF_READ. This routine will return a structure containing all of the variables and attributes in the netCDF file.

6.3.3 Writing COARDS-compliant netCDF files from disk

GAMAP routine BPCH2COARDS can convert a bpch file to a COARDS-compliant netCDF file. This may be used in conjunction with routine MAKE_MULTI_NC to convert several bpch files in a single session.

6.4 The FILEINFO structure

Contains information about a file that has been accessed in a GAMAP session. For each file read by GAMAP, a corresponding FILEINFO structure is stored in a global common block. The global common block is accessible via the include statement @gamap_cmn.

6.4.1 Structure tag names for FILEINFO

Tag Name

Type

Description

filename string Name of the file containing data to be processed.
ilun long Logical unit number. This field links the FILEINFO and DATAINFO structures.
filetype fix 000-099 = ASCII files
100-199 = binary files
200-299 = netCDF files
300-399 = HDF4 or HDF4–EOS files
400-499 = HDF5 or HDF5–EOS files
500-999 = reserved for future use
status fix Should be 1 if everything is correct, 0 indicates error (not thoroughly implemented though)
toptitle string Holds the title line from the model output file (ASCII or binary)
modelinfo structure Contains information about the model, so that grid can be computed (see below)
gridinfo pointer Points to a structure that contains the horizontal and vertical grid (see below)

GAMAP uses function CREATE3DFSTRU to create new FILEINFO structure(s).

NOTE: The correspondence between IDL and FORTRAN data types is as follows

6.5 The DATAINFO structure

Contains information about each data record that is in a model file. If a new file is opened, the complete header information is parsed once and the location of the individual data blocks is saved. Therefore, subsequent access is really fast.

6.5.1 Structure tag names for DATAINFO

Tag Name Type Description
ilun long

Logical unit number. This field links FILEINFO and DATAINFO structures.

filepos double long

Saves position of associated data record in the file

category string

The diagnostics name (see data formats)

tracer fix

A number identifying a tracer.  This number corresponds to the CTM tracers but can be offset by multiples of a large number such as 100 or 1000; (information on all tracers is stored in tracerinfo.dat)

tracername string

The (full) name of the tracer as stored in tracerinfo.dat

tau0, tau1 2 x double

The tau values indicating the model time (e.g. hours since 0 GMT 1/1/1985).  In general, GAMAP uses tau0 rather than tau1 to identify data records.

scale float

A scale factor that is applied to the data.  Upon reading, the data is scaled automatically (a) to accomodate the scale factor stored in the punch file header, (b) to convert the data to the default unit given in tracerinfo.dat. Therefore, scale will usually have a value of 1.

unit string

The physical unit of the data (e.g. `ppbv').  Units can be converted with CTM_CONVERT_UNIT.

format string

A format string describing the ASCII format of the data (e.g. `(12F6.2)').  Takes the value `binary' for binary data files.  If the data is free-format, leave this field empty.

status fix

0 indicates data not read, 1 indicates data read.  This field is used in ctm_get_data.pro to specifically select (un)read data records.  Convention: 2 indicates either loaded or unloaded data.

dim 4 x fix

Dimensions of the data array: dim[0]=NI, dim[1]=NJ, dim[2]=NL, dim[3]=ntimesteps.  This is important to correctly assign a 2D field to LON-LAT or LON-ALT or LAT-ALT.  If a dimension is not present, the respective field can be either 0 or 1.

first 3 x fix

First index stored in each dimension.  NOTE: these values are FORTRAN indices starting with 1 instead of 0 (which is the IDL convention).

data pointer

Points to a 1-D, 2-D, 3-D, or 4-D float array.  Access data as data = *( datainfo[index].data )

Use function CREATE3DHSTRU to create new DATAINFO structure(s).

NOTE: The correspondence between IDL and FORTRAN data types is as follows

6.6 The MODELINFO structure

This structure holds basic information about the model so that a grid can be computed.

6.6.1 General usage

ModelInfo = CTM_Type( modelname [, options] )

6.6.2 Example

ModelInfo = CTM_Type( `GEOS_3', resolution=4 )

6.6.3 Structure tag names for MODELINFO

Tag Name Type Description
name string

The name of the model (or name of the grid on which the data resides).  Current options are

  • GEOS_1
  • GEOS_STRAT
  • GEOS_3
  • GEOS_4
  • GEOS_5
  • MERRA
  • MERRA-2
  • GEOS_FP
  • GISS_II
  • GISS_II_PRIME (aka II_PRIME)
  • MATCH
  • FSU
  • GCAP
  • GENERIC

Also note that the special names

  • GEOS3_30L
  • GEOS4_30L
  • GEOS5_47L
  • GEOSFP_47L
  • MERRA_47L
  • MERRA2_47L

can also be used to denote reduced vertical-resolution grids.

family string

A model family: GEOS, GISS, FSU, generic

nlayers long

The maximum number of vertical layers

ntrop long

The maximum number of layers in the troposphere (i.e. with chemistry)

ptop, psurf float

Pressure at model top and surface (used to convert sigma layers into pressure)

resolution 2 x float

an array containing [ DI, DJ ]

halfpolar long

Flag that indicates whether polar boxes are half size or not

center180 long

Flag that indicates whether grid boxes are centered (=1) or edged (=0) on 180 degrees

fullchem long

Flag that indicates whether full or reduced chemistry set was used (NOTE: This is now obsolete.)

hybrid long

Flag that indicates whether the model uses a hybrid sigma-pressure grid (=1) or a pure sigma level grid (=0)

Most parameters are automatically set to the correct values in function CTM_TYPE, so you only need to provide the model name and resolution.

NOTE: The correspondence between IDL and FORTRAN data types is as follows

6.7 The GRIDINFO structure

This structure holds the actual grid for the data set (both horizontal and vertical).  It is created from the information stored in MODELINFO, but may be frequently re-generated e.g to accomodate for different surface pressures.  The GRIDINFO structure is generated by a call to ctm_grid.pro (see below).

6.7.1 General usage

GridInfo = CTM_Grid( ModelInfo )

6.7.2 Example

ModelInfo = ctm_type( 'GEOS_3', resolution=4 )
 
; To have the GRIDINFO structure contain vertical layers 
GridInfo = ctm_grid( ModelInfo ) 
 
; To suppress vertical layers from the GRIDINFO structure 
GridInfo = ctm_grid( ModelInfo, /No_Vertical )

6.7.3 Structure Tag Names for GRIDINFO

Tag Name Type Description
IMX fix

Number of grid boxes in longitude

JMX fix

Number of grid boxes in latitude

DI float

Width of grid boxes in degrees longitude

DJ float

Width of grid boxes in degrees latitude

XEDGE float array

Longitude at western edge of grid box in degrees

XMID float array

Longitude at center of each grid box in degrees

YEDGE float array

Latitude at southern edge of grid box in degrees

YMID float array

Latitude at center of each grid box in degrees

When GRIDINFO contains vertical information (i.e. when the /NO_VERTICAL keyword to CTM_GRID is omitted), then the following fields are added:

Tag Name Type Description
LMX fix

The number of vertical layers

SIGEDGE float array

Unitless sigma coordinate at the bottom edge of each vertical level.

SIGMID float array

Unitless sigma coordinate at the center of each vertical level.

ETAEDGE float array

For hybrid P-sigma coordinate data sets, defines the unitless ETA coordinate at the bottom edge of each vertical level.  If ETAEDGE and ETAMID are defined, then SIGEDGE and SIGMID will be zero (and vice-versa). 

ETAMID float array

For hybrid P-sigma coordinate data sets, defines the unitless ETA coordinate at center of each vertical level.  If ETAEDGE and ETAMID are defined, then SIGEDGE and SIGMID will be zero (and vice-versa). 

PEDGE float array

Pressure at bottom edge of each vertical level (computed from pressure values using a fit to the US standard atmosphere)

PMID float array

Pressure at center of each vertical level (computed from pressure values using a fit to the US standard atmosphere)

ZEDGE float array

Altitude at bottom edge of each vertical level (computed from pressure values using a fit to the US Standard Atmosphere)

ZMID float array

Altitude at center of each vertical level (computed from pressure values using a fit to the US Standard Atmosphere)

The computation of pressures and altitudes may at some point be improved to allow for different surface pressures for individual grid boxes.  However, this can be pretty complicated in terms of averaging.

NOTE: The correspondence between IDL and FORTRAN data types is as follows:

6.8 The TRACERINFO structure

This structure contains information about the various quantities stored in the model output.  Tracers are indexed by number, and the number is usually identical to the tracer number used in the model, but it may be offset by a multiple of a large number (usually 100 or 1000) to distinguish between different diagnostics.

This structure will be returned from procedure ctm_tracerinfo.pro.  The information about all tracers is stored in the file tracerinfo.dat.  This file is only read once, however, it can be reloaded with:

ctm_tracerinfo, /force_reading

You can pass either a tracer name or number to ctm_tracerinfo.pro.

6.8.1 Structure Tag Names for TRACERINFO

Tag Name Type Description
name string

Short tracer name (8 characters or less)

fullname string

Full tracer name

mwt float

Mole weight of tracer in g.  Can be used to convert from mixing ratio to mass and vice versa.

index long

Tracer number (index).

molc float

Number of carbon atoms (used to convert ppbv to ppbC).

scale float

Scale factor that must be applied to convert output to standard unit given in tracerinfo.unit.  NOTE: This relies on consistent output within all models!

unit string

Standard unit that shall be used for display of data for this tracer.  Units can be converted with ctm_convert_unit (still in progress...)

is_edged long

Is this tracer defined on level edges?

NOTE: The correspondence between IDL and FORTRAN data types is as follows:

6.9 The DIAGINFO structure

This structure holds information about the individual diagnostics from all models.  It is be returned from procedure ctm_diaginfo.pro. The information about all diagnostics is stored in the file diaginfo.dat.  This file is only read once, however, it can be reloaded with

ctm_diaginfo, /force_reading

You can pass either a category name or diagnostic number to ctm_tracerinfo.pro. Note, however, that category names are unique, while numbers may be used repeatedly (e.g. 45 for IJ-AVG-$, 43 for CHEM-L=$).

6.9.1 Structure Tag Names for DIAGINFO

Tag Name Type Description
category string

The diagnostic name (e.g. IJ-AVG-$)

maxtracer fix

This field is used for "source-type" diagnostics where the tracer number is misused as a general identifier (e.g. such as for lightning NOx in the GISS–II model).  For all "good" diagnostics, this number is 1.  (NOTE: For the GEOS-CHEM model, this tag name is obsolete, since all "maxtracer-like" diagnostics have been removed).

offset fix

An offset (multiple of 100 or 1000) that is added to the tracer number as stored in the output file to uniquely identify the quantity.

spacing fix

The interval (usually a number like 100 or 1000) that will be added internally to tracer numbers in order to separate tracers from different diagnostics.

NOTE: IDL fix = Fortran INTEGER*2.

6.10 About 4-dimensional data blocks in GAMAP

In a nutshell: Reading and writing DATAINFO structure for 4D data blocks is now possible in GAMAP. Reading must be done with CTM_GET_DATA and it is exactly the same as with 2D or 3D cases. For a correct writing, you must specify FileType=106 when calling CTM_MAKE_DATAINFO. That's it.

In GAMAP v2.11, the FileType 106 has been introduced. FileType is a tag of the DATAINFO structures. When set to 106, it is basically a flag for 4D data block. CTM_WRITEBPCH, CTM_READ3DB_HEADER, CTM_OPEN_FILE, and CTM_READ_DATA have been modified to handle that case, so that you can write and read the DATAINFO structure for 4D data.

Also be sure to view our brief tutorial about working with GEOS–Chem timeseries output data (e.g. files generated by the ND48, ND49, ND50, and ND51 diagnostics, as well as met fields and BPCH files).

6.10.1 Writing a 4-D data block into a binary punch (BPCH) file:

Here is a very simple example of writing one 4D data block into a BPCH file:

;; ------ create 4D data:
;; let's create 27 timeseries, each with 4 points
Data      = Randomu(seed,3,3,3,4) 

; Create modelinfo and datainfo structures
; assume GEOS-4 4x5 grid
ModeIinfo = CTM_Type( 'GEOS-4', res=4 )
GridInfo  = CTM_Grid( modelinfo )

;; ------ create DataInfo structure: 
Success   = CTM_Make_Datainfo( Data,                         $
                               NewDataInfo,                  $
                               NewFileInfo,                  $
                               Filetype=106,                 $
                               Model=ModelInfo,              $
                               Grid=GridInfo,                $ 
                               DiagN='IJ-AVG-$',             $
                               Tracer=1,                     $
                               Tau0=12.,                     $
                               Tau1=84.,                     $
                               Dim=Size( Data, /Dimension ), $
                               First=[1,1,1],                $
                               /No_Global )

;; ------ write data into a BPCH file::
CTM_WriteBpch, NewDataInfo, NewFileInfo, FileName='filename.bpch'

This is the same as with 3D or 2D data. The only difference is that you need to manually specify FileType=106 in the call to CTM_MAKE_DATAINFO. That's it.

However TAU1 can be useful if correctly set. Here, we have 4 time steps. We want them to cover 4 days, assuming DATA is a daily value. We can set TAU at midday:

tau0 = nymd2tau(19850101,120000) ; --> 12
tau1 = nymd2tau(19850104,120000) ; --> 84

6.10.2 Reading a 4-D data block from binary punch (BPCH) file:

Data can be accessed as usual with CTM_GET_DATA:

;; Info is an array of structures
ctm_get_data, Info, file='filename.bpch' 

The first 4D data block can be accessed with:

data = *info[0].data
help, data

Caution : the gamap.pro routine disregards the 4th dimension. To access the 4th dimension, you need to use CTM_GET_DATA as above.

This is basically it. The main interest here is to combine separate data block with different TAU0 values into one block, while keeping access to a lot of information through the Info structure. Routines to handle the specific cases of ND49 and ND48 GEOS-Chem output are being developed right now. Stay tuned.

6.10.3 Some basic manipulations

A very basic plot:

;; data are in a pointer and 4D
plot, (*info[0].data)[0,0,0,*], title='first time series' 
print, ‘First TS starts at:' + info[0].tau0
print, ‘First TS ends at:' + info[0].tau1

Information about grid , model is easily accessed from:

GetModelAndGridInfo, info[0], modelInfo, gridinfo

TimeSeries location (s) in the first dataset:

;; offsets
Lon0        = Info[0].First[0] – 1 
Lat0        = Info[0].First[1] – 1
Z0          = Info[0].First[2] – 1
LonVector   = GridInfo.XMid[ Lon0 : Lon0+Info[0].Dim[0]-1 ]
LatVector   = GridInfo.YMid[ Lat0 : Lat0+Info[0].Dim[1]-1 ]
LevelVector = GridInfo.ZMid[ Z0   :   Z0+Info[0].Dim[2]-1 ]

Then, the timeseries

(*info[0].data)[ i, j, k,*]

is located at:

Lon = LonVector[i]
Lat = LatVector[j]
Lev = LevelVector[k]

Previous | Next | Printable View (no frames)