Giter Site home page Giter Site logo

Comments (6)

yantosca avatar yantosca commented on June 12, 2024

Jiawei Zhuang wrote:

The run also crashes at 00:10 if I only save one collection SpeciesConc_inst with only two species SpeciesConc_NO and SpeciesConc_O3 in it.

  --- Chemistry done!
  --- Do wetdep now
  --- Wetdep done!

 Setting history variable pointers to GC and Export States:
 SpeciesConc_NO
 SpeciesConc_O3
 AGCM Date: 2016/07/01  Time: 00:10:00
                                             Memuse(MB) at MAPL_Cap:TimeLoop=  4.723E+03  4.494E+03  2.306E+03  2.684E+03  3.260E+03
                                                                      Mem/Swap Used (MB) at MAPL_Cap:TimeLoop=  1.852E+04  0.000E+00
 offline_tracer_advection
ESMFL_StateGetPtrToDataR4_3                     54
DYNAMICSRun                                    703
GCHP::Run                                      407
MAPL_Cap                                       792

But with two collections SpeciesConc_avg and SpeciesConc_inst, each with only two species SpeciesConc_NO and SpeciesConc_O3 in it, the run is able to finish and print full timing information:

 Writing:    144 Slices (  1 Nodes,  1 PartitionRoot) to File:  OutputDir/GCHP.SpeciesConc_avg.20160701_0530z.nc4
 Writing:    144 Slices (  1 Nodes,  1 PartitionRoot) to File:  OutputDir/GCHP.SpeciesConc_inst.20160701_0600z.nc4


  Times for GIGCenv
TOTAL                   :       2.252
INITIALIZE              :       0.000
RUN                     :       2.250
GenInitTot              :       0.004
--GenInitMine           :       0.003
GenRunTot               :       0.000
--GenRunMine            :       0.000
GenFinalTot             :       0.000
--GenFinalMine          :       0.000
GenRecordTot            :       0.001
--GenRecordMine         :       0.000
GenRefreshTot           :       0.000
--GenRefreshMine        :       0.000

HEMCO::Finalize... OK.
Chem::Input_Opt Finalize... OK.
Chem::State_Chm Finalize... OK.
Chem::State_Met Finalize... OK.
   Character Resource Parameter GIGCchem_INTERNAL_CHECKPOINT_TYPE: pnc4
 Using parallel NetCDF for file: gcchem_internal_checkpoint_c24.nc

  Times for GIGCchem
TOTAL                   :     505.760
INITIALIZE              :       3.617
RUN                     :     498.376
FINALIZE                :       0.000
DO_CHEM                 :     488.864
CP_BFRE                 :       0.121
CP_AFTR                 :       4.080
GC_CONV                 :      36.070
GC_EMIS                 :       0.000
GC_DRYDEP               :       0.119
GC_FLUXES               :       0.000
GC_TURB                 :      17.966
GC_CHEM                 :     403.528
GC_WETDEP               :      19.443
GC_DIAGN                :       0.000
GenInitTot              :       2.719
--GenInitMine           :       2.719
GenRunTot               :       0.000
--GenRunMine            :       0.000
GenFinalTot             :       0.963
--GenFinalMine          :       0.963
GenRecordTot            :       0.000
--GenRecordMine         :       0.000
GenRefreshTot           :       0.000
--GenRefreshMine        :       0.000

   -----------------------------------------------------
      Block          User time  System Time   Total Time
   -----------------------------------------------------
   TOTAL                      815.4433       0.0000     815.4433
   COMM_TOTAL                   3.3098       0.0000       3.3098
   COMM_TRAC                    3.3097       0.0000       3.3097
   FV_TP_2D                    90.1448       0.0000      90.1448


===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 3126 RUNNING AT ip-172-31-0-74
=   EXIT CODE: 134
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

from gchp_legacy.

lizziel avatar lizziel commented on June 12, 2024

This issue is not reproducible on the Harvard Odyssey cluster. If you repeat the same tests multiple times do you always get the same result? Do you bypass the issue if transport is turned off (turn off in runConfig.sh not input.geos)?

from gchp_legacy.

yantosca avatar yantosca commented on June 12, 2024

On the AWS cloud, I can faithfully reproduce the run dying at 00:10 when all collections are turned off in HISTORY.rc.

With all collections turned off AND with transport turned off, the run still fails at 00:10.

from gchp_legacy.

JiaweiZhuang avatar JiaweiZhuang commented on June 12, 2024

Using OpenMPI 2.1 instead of MPICH 3.3 fixes this problem #10
But then it runs into the problem of not being able to save diagnostics.

from gchp_legacy.

lizziel avatar lizziel commented on June 12, 2024

Upgrading to OpenMPI 3 may fix the remaining issue. We ran into this on the Odyssey cluster and switching to the new OpenMPI fixed it.

from gchp_legacy.

lizziel avatar lizziel commented on June 12, 2024

I am closing this issue since it is fixed by switching to OpenMPI 2.1 from MPICH 3.3.

from gchp_legacy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.