GEOS–CHEM v6–02–05 User's Guide
Contact: Bob Yantosca (bmy@io.harvard.edu)


6. Running the GEOS–CHEM Model


6.1 Input File Checklists for GEOS–CHEM Model Simulations

After you have compiled the GEOS–CHEM source code (Section 2), installed your data directory (Section 4), and set up your run directory (Section 5), you may proceed to run the model. The first thing you will want to do is to ensure that all of the input files in your run directory are set correctly for the type of GEOS–CHEM simulation that you want to do. Here follows some convenient checklists that you may use as a guide before starting your model runs.


6.1.1 Checklist for NOx-Ox-Hydrocarbon chemistry simulation with SMVGEAR

  1. Set NSRCX=3 and NTRACE=41 in input.ctm
  2. Set LSULF=T in input.geos in order to turn on sulfur and nitrogen species emission & chemistry
  3. Set LCARB=T in input.geos in order to turn on carbon aerosol emission & chemsitry
  4. Set LDUST=T in input.geos in order to turn on desert dust aerosol emission & chemistry
  5. Set LSSALT=T in input.geos in order to turn on sea salt aerosol emission & chemistry
  6. Set diagnostic switches in input.ctm
  7. Schedule days for punch file output in input.ctm
  8. Make sure you have output scheduled for the last day of the run in input.ctm
  9. Set start and end time of run, and turn operators on/off in input.geos
  10. Set the dynamic timestep variable NTDT=1800 (for 4 x 5) or NTDT=900 (for 2 x 2.5)
  11. Make sure inptr.ctm contains the proper tracer names and molecular weights
  12. Check your convergence criteria in mglob.dat
  13. Make sure you are using the proper globchem.dat and ratj.d files
  14. Check family tracer specifications in tracer.dat
  15. Check diag.dat to make sure that the tracers of interest will be printed out
  16. Check ND48 station timeseries settings in inptr.ctm
  17. Check ND49 movie timeseries settings in timeseries.dat
  18. Check chemical prod/loss family settings for ND65 diagnostic in prodloss.dat
  19. Check the aircraft flight track settings in Planeflight.dat

6.1.2 Checklist for Radon - Lead - Beryllium simulation

  1. Set NSRCX=1 and NTRACE=3 in input.ctm
  2. Set diagnostic switches in input.ctm
  3. Schedule days for punch file output in input.ctm
  4. Make sure you have output scheduled for the last day of the run in input.ctm
  5. Set start and end time of run, and turn operators on/off in input.geos
  6. Set the dynamic timestep variable NTDT=1800 (for 4 x 5) or NTDT=900 (for 2 x 2.5)
  7. Make sure inptr.ctm contains the proper tracer names and molecular weights
  8. Check diag.dat to make sure that the tracers of interest will be printed out
  9. Check ND48 station timeseries settings in inptr.ctm
  10. Check ND49, ND50, ND51 movie timeseries settings in timeseries.dat
  11. Check the aircraft flight track settings in Planeflight.dat

6.1.3 Checklist for Methyl Iodide (CH3I) simulation

  1. Set NSRCX=2 and NTRACE=5 in input.ctm
  2. Set diagnostic switches in input.ctm
  3. Schedule days for punch file output in input.ctm
  4. Make sure you have output scheduled for the last day of the run in input.ctm
  5. Set start and end time of run, and turn operators on/off in input.geos
  6. Set the dynamic timestep variable NTDT=1800 (for 4 x 5) or NTDT=900 (for 2 x 2.5)
  7. Make sure inptr.ctm contains the proper tracer names and molecular weights
  8. Check diag.dat to make sure that the tracers of interest will be printed out
  9. Check ND48 station timeseries settings in inptr.ctm
  10. Check ND49, ND50, ND51 movie timeseries settings in timeseries.dat
  11. Make sure you are using the special chem.dat and ratj.d files for CH3I simulation (which contain the cross-sections, reagents, and products for the CH3I photolysis reactions).
  12. Check the aircraft flight track settings in Planeflight.dat

6.1.4 Checklist for HCN simulation

  1. Set NSRCX=4 and NTRACE=1 in input.ctm
  2. Set diagnostic switches in input.ctm
  3. Schedule days for punch file output in input.ctm
  4. Make sure you have output scheduled for the last day of the run in input.ctm
  5. Set start and end time of run, and turn operators on/off in input.geos
  6. Set the dynamic timestep variable NTDT=1800 (for 4 x 5) or NTDT=900 (for 2 x 2.5)
  7. Make sure inptr.ctm contains the proper tracer names and molecular weights
  8. Check diag.dat to make sure that the tracers of interest will be printed out
  9. Check ND48 station timeseries settings in inptr.ctm
  10. Check ND49, ND50, ND51 movie timeseries settings in timeseries.dat
  11. Check the aircraft flight track settings in Planeflight.dat

6.1.5 Checklist for Tagged Ox simulation

  1. Set NSRCX=6 and NTRACE=13 in input.ctm
  2. Set diagnostic switches in input.ctm
  3. Schedule days for punch file output in input.ctm
  4. Make sure you have output scheduled for the last day of the run in input.ctm
  5. Set start & end time of run, and turn operators on/off in input.geos
  6. Check setting of NTDT (dynamic timestep variable) in input.geos or 4 x 5 (1800 s) or 2 x 2.5 (900 s)
  7. Make sure inptr.ctm contains the proper tracer names and molecular weights
  8. Check diag.dat to make sure that the tracers of interest will be printed out
  9. Check ND48 station timeseries settings in inptr.ctm
  10. Check ND49, ND50, ND51 movie timeseries settings in timeseries.dat
  11. Check the aircraft flight track and diagnostic settings in Planeflight.dat

6.1.6 Checklist for Tagged CO simulation

  1. Set NSRCX=7 and NTRACE to the # of your tracers in input.ctm
  2. Set diagnostic switches in input.ctm
  3. Schedule days for punch file output in input.ctm
  4. Make sure you have output scheduled for the last day of the run in input.ctm
  5. Set start & end time of run, and turn operators on/off in input.geos
  6. Set the dynamic timestep variable NTDT=1800 (for 4 x 5) or NTDT=900 (for 2 x 2.5)
  7. Make sure inptr.ctm contains the proper tracer names and molecular weights
  8. Check diag.dat to make sure that the tracers of interest will be printed out
  9. Check ND48 station timeseries settings in inptr.ctm
  10. Check ND49, ND50, ND51 movie timeseries settings in timeseries.dat
  11. Check the aircraft flight track settings in Planeflight.dat

6.1.7 Checklist for offline Sulfate–Carbon–Dust–Sea Salt aerosol simulation

  1. Set NSRCX=10 and NTRACE=12 in input.ctm
  2. Set LSULF=T in input.geos in order to turn on sulfur and nitrogen species emission & chemistry
  3. Set LCARB=T in input.geos in order to turn on carbon aerosol emission & chemsitry
  4. Set LDUST=T in input.geos in order to turn on desert dust aerosol emission & chemistry
  5. Set LSSALT=T in input.geos in order to turn on sea salt aerosol emission & chemistry
  6. Set diagnostic switches in input.ctm
  7. Schedule days for punch file output in input.ctm
  8. Make sure you have output scheduled for the last day of the run in input.ctm
  9. Set start & end time of run, and turn operators on/off in input.geos
  10. Set the dynamic timestep variable NTDT=1800 (for 4 x 5) or NTDT=900 (for 2 x 2.5)
  11. Make sure inptr.ctm contains the proper tracer names and molecular weights
  12. Check diag.dat to make sure that the tracers of interest will be printed out
  13. Check ND48 station timeseries settings in inptr.ctm
  14. Check ND49, ND50, ND51 movie timeseries settings in timeseries.dat
  15. Check the aircraft flight track settings in Planeflight.dat

6.2 Running a Regular GEOS–CHEM Job

The following section describes how to run the GEOS–CHEM model for either the LSF or PBS batch queue systems. However, you may want to use the TESTRUN package, which will automatically compile and run GEOS–CHEM code via your local batch queue system.

Also, it is STRONGLY RECOMMENDED to test your simulation with a short (1-day or 2-day) run before submitting a very long-term GEOS–CHEM simulation. A shorter run will make it easier to detect errors or problems without tying up precious computer time.


6.2.1 LSF Batch Queue System

If your platform uses the LSF batch queue system, you can use the following commands to submit, delete, and check the status of GEOS–CHEM jobs:

bman            : prints LSF man pages to stdout 
bsub or submit  : submit a batch jobs to a queue 
bkill           : kill batch jobs  
bjobs           : Lists all jobs currently running 
bqueues         : Lists available batch queues 
bhist           : shows history list of submitted jobs 
lsload          : shows % of each machine's resources that 
                  is currently utilized 

Perhaps the best way to submit batch jobs to the queues is to write a simple job script, such as:

#!/bin/tcsh -f                # Script definition line
cd /scratch/bmy/run.v5–05–03  # cd to your run dir
rm -f log                     # clear pre-existing log files    
time geos > log               # time job; pipe output to log file  
exit(0)                       #  exit normally 

and then save that to a file named job. To submit the job script to the queue system, pick a queue in which to run the GEOS–CHEM, and type:

bsub -q queue-name job

at the Unix prompt. You can check the status of the run by looking at the log file. LSF should also email you when your job is done, or if for any reason it dies prematurely.


6.2.2 PBS Batch Queue System

If your platform uses the PBS batch queue system, you can use the following commands to submit, delete, and check the status of GEOS–CHEM jobs:

qsub              : submits a PBS job 
qstat -Q          : list all available batch queues 
qstat -a @machine : list all PBS jobs that are running on 'machine'
qstat -f jobid    : list information about PBS Job jobid 
qdel jobid        : Kills PBS Job jobid
xpbs              : Graphical user interface for PB

Then create a simple GEOS–CHEM job script (named job), similar to the above example for LSF:

#!/bin/tcsh -f                # Script definition line
cd /scratch/bmy/run.v5–05–03  # cd to your run dir
rm -f log                     # clear pre-existing log files    
time geos > log               # time job; pipe output to log file  
exit(0)                       #  exit normally 

and then submit this with the qsub command:

qsub -q queue-name -o output-file-name job

at the Unix prompt.

The job status command qstat -f jobid sometimes provides a little too much information. Bob Yantosca has written a script called pbstat (this is already installed at Harvard) which condenses the output you get from qstat -f jobid. If you type pbstat at the Unix prompt, you will output similar to:

     --------------------------------------------------------
     PBS Job ID number    : 10929.sol
     Job owner            : pip@sol
     Job name             : run.sh
     Job started on       : Sun Aug 10 17:22:21 2003
     Job status           : Running
     PBS queue and server : q4x64 on amalthea
     Job is running on    : hera/0*4 (R12K processors)
     # of CPUs being used : 4 (max allowed is 4)
     CPU utilization      : 394% (ideal max is 400%)
     Elapsed walltime     : 22:14:43 (max allowed is 64:00:00)
     Elapsed CPU time     : 76:20:40
     Memory usage         : 10475844kb (max allowed is 1700Mb)
     VMemory usage        : 8244704kb

This allows you to obtain information about your run much more easily. If you type pbstat all, you will obtain information about every job which is running. If you type pbstat userid , then you will get information about all of the jobs that user userid is running.


6.3 Profiling GEOS–CHEM execution (SGI only)

You can use the SGI Speedshop profiler to obtain additional information about your run, including how long each individual subroutine takes, how efficient the parallelization is, and which lines of code are the most time consuming. This can be very helpful in determining potential bottlenecks in the code.

A word of warning: profiling runs should ALWAYS be done on a single processor. A multi-processor profiling job that dies can potentially hang the entire machine that it is running on. Therefore, you should only profile GEOS–CHEM code which has been previously tested and is known to be stable.

To invoke the SpeedShop profiler on a single processor (once again assuming that your executable file is named geos), set up the following job script:

#!/bin/tcsh -f                 # shebang line
cd /scratch/bmy/run.v6–02–05   # your run dir
rm -f log                      # clear log file 
time ssrun -pcsamp geos > log  # time job; pipe to log 
exit(0)                        # exit normally 

and submit it to a single processor queue on your system. This will start the SpeedShop profiler with the PC-sampling option and send the output to the geos.log file. After the job has finished, you will notice a file named:

geos.pcsamp.m_____

This is the output from the profiler for the main thread. Immediately following the "m" will be a unique number assigned by the system.

These *.pcsamp.* files are binary output files and are not human-readable. To convert them to ASCII files, you must type:

prof -usage -lines geos.pcsamp.m______ > main.pcs

This will generate an ASCII report which details the percentage of time spent in each routine, plus the percentage of time spent at certain lines of code. Using this output, you may determine exactly where your code is spending the most time.

It is also possible to write a shell script which calls the prof command. The shell script can even be submitted to the LSF batch queue system. However, you must make sure to submit the shell script to the same machine as the job ran on, otherwise the prof command won't work.

Bob Yantosca has some IDL software that can help you read and plot the information contained in these ASCII files. Contact him (bmy@io.harvard.edu) for more information.

NOTE: The -usage switch to the prof command will include statistics on CPU time usage. The -lines switch will identify the lines of code in each subroutine that are the most CPU-intensive. These can be useful tools in identifying bottlenecks in your CTM code. You can omit these options in order to obtain a more basic report. Also see the Unix man pages for ssrun and prof for more information.


6.4 Error output

Almost all of the GEOS–CHEM code supports I/O error trapping. In other words, if an error occurs while reading from a file or writing to a file, the run will stop and an appropriate error message will be displayed. Many of the error messages have the following format:

=============================================================== 
I/O Error Number    4001 in file unit       10 
Encountered in routine read_bpch2:3  
=============================================================== 

This means that an error (#4001) has occurred while reading from logical file unit 10. The routine where the error occurred is routine read_bpch2(which happens to belong to bpch2_mod.f). The string read_bpch2:3 indicates that the third error trap within read_bpch2 was where the error occurred. If you grep for the string read_bpch2:3 in bpch2_mod.f, you will be taken to the offending line of code.

For the SGI platform, it is possible to get a more detailed explanation of the error. Simply type at the Unix prompt:

explain lib-4001 

which gives the following output:

A READ operation tried to read past the end-of-file.


A Fortran READ operation encountered the endfile record 
(end-of-file), and neither an END nor an IOSTAT specifier 
was present on the READ statement. Either 1) add an END=s 
specifier (s is a statement label) and/or an IOSTAT=i 
specifier (i is an integer variable) to the READ statement, 
or 2) modify the program so that it reads no more data than 
is in the file. For more information, see the input/output
section of your Fortran reference manual.


Because this is an end-of-file condition, the negative of 
this error number is returned in the IOSTAT variable, if 
specified. The error class is UNRECOVERABLE (issued by the
run-time library). 

On SGI, error numbers 4000 and greater are FORTRAN library errors, hence the prefix lib- in the command explain lib-4001. Error numbers 1000 and greater are generated by the Cray F90 library, and so the appropriate command would be explain cf90-1xxx.

Also, you will find that error number 2 is a common I/O error. This error condition usually happens when you try to read from a file that does not exist (i.e. a symbolic link is invalid or the file is not found in the directory).

Finally, Sometimes you might be presented with error number 1133. This occurs when there is no more disk space in the run directory. You will have to remove some large files from your run directory and then restart the run.

If you are not running GEOS–CHEM on the SGI platform, then consult your local computer guru for more information on Fortran error numbers for your particular compiler and operating system.


6.5 GEOS–CHEM Debugging

GEOS–CHEM is an evolving model. New features and functionalities are constantly being added to it by a rapidly increasing group of users. As with any software project, mistakes are inevitable.

Most of the bugs you you will encounter when working on GEOS–CHEM will fall into one of two categories:

  1. True bugs: typos, omissions, reading the wrong file, or other outright mistakes.
  2. Design limitation bugs: that is, writing code in such a way that will make it difficult or impossible to extend the functionality of the model at some future time.

Bugs of the first category are, in general, easily rectified. The fixes to these bugs typically involve either correcting a misspelled word or updating an incorrect numerical value.

Bugs belonging to the second category can be rather pernicious. In almost all instances, you will find that the model is working fine -- that is, until you try to add in a new chemistry simulation, diagnostic, or third-party routine. Then you may find that a major modification to the structure of GEOS–CHEM is necessary before the new code can be successfully interfaced.

GEOS–CHEM is a combination of several different indivdual pieces: emissions, chemistry and deposition routines from Harvard, transport and convection routines from GSFC, photolysis from UC Irvine, etc. Therefore, the structure of GEOS–CHEM was (and still is) largely defined by the structure of the individual pieces from which it was created. It is not always possible to deviate from this set structure without having to rewrite entire sections of source code.


6.5.1 Debugging Tips

Here are some steps you can take to try to diagnose a particular GEOS–CHEM error.

1.

Turn on diagnostic ND70 in the file input.ctm. This will cause debugging messages (via routine debug_msg contained in error_mod.f) to be written to the log file after operations such as transport, chemistry, emissions, dry deposition, etc. are called from the main program. In this way, you should be able to identify in which operation the error occurred.

NOTE: debug_msg will cause the text to be flushed to disk after it is printed. Most Unix systems feature buffered I/O; that is, the contents of a file or screen output are not updated until an internal buffer (usually 16K of memory) is filled up. If you don't flush the error message to disk, then the last output to the log file may not accurately indicate the location at which the error occurred. Therefore, we recommend using debug_msg instead of a standard Fortran WRITE or PRINT* statement.

 
2.

It may also be necessary to include additional debug statements into main.f or other routines. This may be done by calling subroutine debug_msg as follows

CALL DEBUG_MSG( '### after routine X' )

By adding several of these debug statements, you should be able to track down the particular place at which the error is occurring.

 
3.

If you suspect a that problem with one of the meteorological field input files could be causing GEOS–CHEM to die with an unexplained error, then look for the following line in main.f:

! Update dynamic timestep 
CALL SET_CT_DYN( INCREMENT = .TRUE. )

and, immediately below, insert the following subroutine call:

### Debug
CALL MET_FIELD_DEBUG

This will cause GEOS–CHEM to print out the minimum, maximum, and sum of each meteorological field at the top of the dynamic loop. You can then examine this output to determine if the data range of a particular field is invalid.

 
4.

If you want to just print out the minimum and maximum value of an array variable that is not included in met_field_debug, then simply add the following line of code:

PRINT*, '### Min, Max: ', MINVAL( X ), MAXVAL( X )
CALL FLUSH( 6 )

where X is the name of the variable.

If you want to print out the sum of X instead of the min and max, add this line of code:

PRINT*, '### Sum of X : ', SUM( X )
CALL FLUSH( 6 )

Here we also have to call the Fortran routine FLUSH(6), which ensures that the output (in this case, to the screen, unit #6), will be immediately written to disk.

 
5.

It is a good idea to periodically compile your code with array-out-of-bounds error checking. This will make sure that all of the arrays are being accessed with indices whose values fall within the specified array dimensions.

For example, if you have the following situation in your code:

REAL*8 :: A(10), B(10)
...
DO I = 1, 10
   B(I) = A(I+1)
ENDDO

then the code will try to access the 11th element of the A array. But since A only has 10 elements, the code will try to access the next contiguous memory location, which may belong to a different variable altogether. Therefore, a "junk" value will be copied into the 10th element of the B array.

 
6.

To invoke array-out-of-bounds checking in your code, make clean, and then make sure you the appropriate line in your Makefile:

FFLAGS = f90 -cpp -64 -O3 -OPT:reorg_common=off -C   (SGI   )
FFLAGS = f90 -cpp -convert big_endian -tune host -C  (Compaq)
FF     = pgf90 -Mpreprocess -byteswapio -Mbounds     (Linux )
FFLAGS = f90 -xpp=cpp -O4 -xarch=v9 -C               (Sparc)

Then recompile your code as usual. This will build the array-out-of-bounds checking into your executable. Any errors will be detected at runtime, and you should get error output such as:

lib-4964: WARNING 
Subscript is out of range for dimension 1 for array 
'A' at line 8 in procedure 'MAIN__', 
diagnosed by '__f90_bounds_check'.

The above error was generated on the SGI platform; Alpha and Linux compilers should give similar errors.

NOTE: Once you have located and fixed the offending array statement, you should recompile to make an executable without the array-out-of-bounds checking built in. The error checking is thorough, but it can cause your code to slow down noticeably. Therefore, it is only recommended to use array-out-of-bounds checking in debugging runs and not in production runs.


[Return to Chapter 5] | [Go on to Chapter 7]