Giter Site home page Giter Site logo

Comments (13)

JiaweiZhuang avatar JiaweiZhuang commented on June 12, 2024

Got the exact same error if the entire HEMCO directory (MainDataDir in rundir) is set to an empty directory.

from gchp_legacy.

lizziel avatar lizziel commented on June 12, 2024

The HEMCO log write appears to be interrupted during initialization of DICE_KEROSENE_OCPI. HEMCO is in the process of iterating over lines in HEMCO_Config.rc, in reverse order. Have you changed your HEMCO_Config.rc in any way?

Do you get a different entry in the HEMCO log if MainDataDir is set to an empty directory?

from gchp_legacy.

lizziel avatar lizziel commented on June 12, 2024

I should clarify, HEMCO_Config.rc is already read, but now the linked list is being iterated over in reverse order to initialize.

from gchp_legacy.

JiaweiZhuang avatar JiaweiZhuang commented on June 12, 2024

Have you changed your HEMCO_Config.rc in any way?

No, except the verbose level.

Do you get a different entry in the HEMCO log if MainDataDir is set to an empty directory?

The ending entry is different each time and seems completely random. No matter an empty directory or the HEMCO-small directory is used. Two tries with identical configuration:

I guess another MPI process is causing the crash, at the same time the main process can be at a random stage.

from gchp_legacy.

lizziel avatar lizziel commented on June 12, 2024

I recently had an issue in dev/12.7 where I did not have an updated HEMCO_Config.rc for a chemistry update that required new files, and I also got random location crashes, although not in this specific stage of HEMCO. It seems that there needs to be more error handling in HEMCO regarding missing files, not just in 12.3.2 but in the current version too.

Since the problem isn't easy to pinpoint for location, the best way forward may be compare files side-by-side, or brute force add them all back in and reduce with some kind of algorithm to minimize runs until you hone on the missing ones.

from gchp_legacy.

JiaweiZhuang avatar JiaweiZhuang commented on June 12, 2024

brute force add them all back in and reduce with some kind of algorithm to minimize runs until you hone on the missing ones.

I can't believe that I just did a manual binary search and found the missing file 😂! It's PARANOX.

Here are all the 55 files that exist in HEMCO-big but not HEMCO-small:

SAMPLE_BCs
TAGGED_O3
OH
SF6
WEEKSCALE
HTAP
CEDS
NEI2011_ag_only
GFED2
BRAVO
RRTMG
CH3I
DUST_GINOUX
MERCURY
COUNTRY_ID
OLSON_MAP
NEI2011ek
RCP
CORBETT_SHIP
O3
OXIDANTS
CHLA
MASAGE_NH3
OFFLINE_AEROSOL
FINN
PARANOX
TrashEmis
MODIS_XLAI
grids
BIOBURN
MACCITY
README
CO2
ICOADS_SHIP
MAP_A2A
TNO
ARCTAS_SHIP
GFED3
BCOC_COOKE
RONO2
POPs
VISTAS
TAGGED_CO
LIGHTNOX
EDGAR
raw_data
kgyr_to_kgm2s.sh
NEI99
OFFLINE_SFLUX
OFFLINE_LIGHTNING
BB4CMIP6
XIAO
UCX
Yuan_XLAI
CAC

A binary search will find the proper file in log2(55) ~ 6 steps. The search procedure:

  1. Select half (~22) of the files and put into a text file files_selected
  2. Inside the HEMCO_small directory, symlink those files from HEMCO_big:
cat files_selected | while read line 
do
   ln -s ../HEMCO_big/$line ./
done
  1. Run the model to see whether it crashes or succeeds

  2. No matter whether it crashes or not, remove the symlinks using the current files_selected before modifying it.

cat files_selected | while read line 
do
   rm $line
done
  1. If simulation succeeded, halve the current files_selected. If failed, change files_selected to the other half of the file list. Iterate (to step 2) until reaching a single file.

from gchp_legacy.

JiaweiZhuang avatar JiaweiZhuang commented on June 12, 2024

I still have no idea why PARANOX is causing the problem. It does not exist in either ExtData.rc or the output log.

The only place it appears is HEMCO_Config.rc:

102     ParaNOx                : on    NO/NO2/O3/HNO3
    --> LUT data format        :       txt
    --> LUT source dir         :       $ROOT/PARANOX/v2015-02

If GCHP is using this data, how does it get read? Should it actually appear in ExtData.rc? Maybe that's a bug in ExtData.rc?

from gchp_legacy.

JiaweiZhuang avatar JiaweiZhuang commented on June 12, 2024

It seems that there needs to be more error handling in HEMCO regarding missing files, not just in 12.3.2 but in the current version too.

As a minimum requirement, the model should probably print a meaningful error message if the entire HEMCO directory is empty.

from gchp_legacy.

lizziel avatar lizziel commented on June 12, 2024

I can't believe that I just did a manual binary search and found the missing file

Great!

As a minimum requirement, the model should probably print a meaningful error message if the entire HEMCO directory is empty.

I think everyone will agree on that.

The issue you are having is occurring in HEMCO and not ExtData. Typically when a run fails so early and has to do with emissions it is a HEMCO issue, often a typo in the file but possibly other issues too with the setup. I'm just guessing here, but maybe it hit a problem reading the file on one of the threads. The code to read the text file is in hcox_paranox_mod.F90, and the call to paranox init is in hcox_driver_mod. All the error handling looks like this:

IF ( RC /= HCO_SUCCESS ) RETURN

So you aren't going to get any helpful messages about where it fails. Like GMAO is doing in MAPL and core GEOS-Chem, we should replace all of these with a message that prints prior to returning.

from gchp_legacy.

yantosca avatar yantosca commented on June 12, 2024

@JiaweiZhuang, the PARANOX extension reads lookup tables that cannot be parsed by HEMCO's normal I/O.

In 12.6.1, we now should print all files that are read by GEOS-Chem and HEMCO to either the log file or the HEMCO.log file. I know you are using 12.3.2 but you can look at what I did in 12.6.1 to add the extra file printouts.

Also, yes -- many of the error trappings in HEMCO were originally added to just return but not throw an error. We need to fix those.

from gchp_legacy.

lizziel avatar lizziel commented on June 12, 2024

@yantosca are those prints for files read only printed by root? If yes, this might still not catch the problematic file if running with MPI.

from gchp_legacy.

yantosca avatar yantosca commented on June 12, 2024

@lizziel Yes, I think they are printed to root.

from gchp_legacy.

lizziel avatar lizziel commented on June 12, 2024

I created https://github.com/geoschem/geos-chem/issues/119 to address the HEMCO error handling improvements needed to prevent this issue in the future.

from gchp_legacy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.