Giter Site home page Giter Site logo

Comments (30)

LiamBindle avatar LiamBindle commented on September 18, 2024 1

The build matrix would be especially useful. That would help us identify versions of our dependencies and combinations of dependencies (if any) that are broken. That would help simplify the "GCHP Quick Start Guide" which is a bit daugnting right now.

It would probably be best to wait until the CMake update IMO for the following reasons: (1) gchp_ctm is dropping support for GNUMakefiles so we will have to set up the CI again in a few months, (2) ESMF becomes an external library which significantly reduces compile-time and the build's memory requirements, and (3) set up with CMake should be faster because there is less reliance on environment variables.

I'll look into getting this going for gchp_ctm.

from gchp.

yantosca avatar yantosca commented on September 18, 2024 1

Maybe as a first step we could try to set up a build matrix for GC-Classic, just to get going. That would use everything except the MPI. Then we could translate that over to GCHP once the Cmake transition there is complete. Just a thought...

from gchp.

yantosca avatar yantosca commented on September 18, 2024 1

@LiamBindle, I might be able to help you out with this as well, as time allows. If we make a container, we should store it on the GCST dockerhub site so that we can all have access to it.

I've been recently trying to fix a lot of the GCPy issues and make updates that were requested by the GCSC to the benchmark output. But if you need a hand I could probably find some time.

from gchp.

yantosca avatar yantosca commented on September 18, 2024 1

This is great, Liam! I think using Azure pipelines is a good move.

from gchp.

JiaweiZhuang avatar JiaweiZhuang commented on September 18, 2024 1

@LiamBindle brought some good ideas in a conservation.

  1. Should also test NetCDF 4.1.x which is the last version containing both C and Fortran parts in a single library.
  2. For GC-classic, should also test CMake in addition to Make. This helps detecting issues like geoschem/geos-chem#64
  3. Instead of "build matrix", an easier & cheaper setup is "perturbing a single component". Otherwise the total number of builds can increase too quickly. For GC-classic, a "standard setup" can be GCC 8.x + NetCDF 4.6 + CMake + Debian. Then, explore each component by:
  • Changing GCC to 4.x, 5.x, 6.x, 7.x, 8.x, 9.x
  • Changing NetCDF to 4.1.x
  • Changing CMake to Make
  • Changing base OS to CentOS
    (that's exact 10 builds in total)

For GCHP, can reduce the number of compiler variants (older gcc won't work anyways; probably just test 8.x and 9.x), and add a new "MPI" dimension.

from gchp.

LiamBindle avatar LiamBindle commented on September 18, 2024 1

Hi everyone,

Just following up on where I got earlier today.

Pipeline for generating build matrix images

I got a initial version of a pipeline for building the build matrix images done and that can be found here: LiamBindle/geos-chem-build-matrix. Because it was easiest, this inial version only builds the following images: latest netcdf-c and netcdf-fortran with gcc4, gcc5, gcc6, gcc7, gcc8, and gcc9, as well as netcdf4.1 with gcc7. These should be a good start, and we can build on these to cover multiple versions of glibc and different base OSs as next steps.

Screenshot from 2019-09-12 22-09-42

A build matrix pipeline for GEOS-Chem Classic

As a test, I set up a build matrix pipeline on the feature/AzurePipeline branch of LiamBindle/geos-chem. The pipeline can be found here: https://dev.azure.com/lrbindle/geos-chem/_build/results?buildId=35. Right now build tests are running on all 7 images (i.e. all major GCCs, and GCC 7 with the old NetCDF) for Standard, TOMAS, TransportTracers, Hg, and complexSOA_SVPOA with APM. This is what it looks like

Screenshot from 2019-09-12 22-23-43

I wasn't able to do this for on a feature branch of the offical GEOS-Chem repo because I don't have the proper permissions. To set this up for GEOS-Chem, we just need to create a GEOS-Chem organization on Azure DevOps, create a GEOS-Chem project, create a new pipeline, and then replace the default azure-pipeline.yml file with azure-pipeline.yml.

I'm going to be focusing on prepping for our Goddard visit for the next week, but I'll be happy the help set this up once I'm back.

from gchp.

LiamBindle avatar LiamBindle commented on September 18, 2024 1

@JiaweiZhuang Thanks for mentioning.

I've come across BLT before, but personally, I don't think a CMake macro libraries like BLT would benefit GEOS-Chem/GCHP. The new MAPL already depends on ecbuild. It might be that larger projects find these libraries convenient, but I think that for GEOS-Chem, vanilla CMake is more clear and easier to maintain—for now at least.

from gchp.

JiaweiZhuang avatar JiaweiZhuang commented on September 18, 2024 1

Should we move the discussion to #1 ?

Update: Will use #1 to track new GCHP development. General discussions will still be posted here.

from gchp.

JiaweiZhuang avatar JiaweiZhuang commented on September 18, 2024 1

In today's telecon, @sdeastham suggested using fake/null variables in ExtData to minimize the input data size, so we can actually run the model (not just build) as part of the CI pipeline. This can catch many of GCHP's run-time errors that cannot be detected at build time.

Long time ago I tried putting /dev/null in ExtData.rc's entry to skip reading a file. Will this set the variable to 0 globally? How to set a global constant other than zero? @lizziel will look it up.

As long as we can shrink the input data to less than 1 GB, we can package those data into a container image, without having to write scripts to pull input data on the fly.

Different from a real benchmark, the simulation here doesn't have to be scientific meaningful, as the goal is just to catch bugs. Running a real benchmark on CI will be a good next step, but let's just get a synthetic simulation running for now.

from gchp.

lizziel avatar lizziel commented on September 18, 2024 1

From the MAPL manual dated 2014 draft:

FileTemplate: The full path to the file. The actual filename can be the real file name or a grads style template. In addition you can simply set the import to a constant by specifying the entry as /dev/null:realconstant. If no constant is specified after /dev/null with the colon the import is set to zero.

from gchp.

JiaweiZhuang avatar JiaweiZhuang commented on September 18, 2024

Better wait until the CMake update? @LiamBindle probably knows better.

from gchp.

LiamBindle avatar LiamBindle commented on September 18, 2024

That's a good idea.

If no one else plans to take this on (or needs it urgently) I could start looking at it in my downtime. It sounds fun side project. I'm pretty busy right now though, so I'm not sure when I'll be able to get around to it.

Are you looking at picking this up @JiaweiZhuang?

from gchp.

LiamBindle avatar LiamBindle commented on September 18, 2024

@yantosca, sounds good. I'll keep you posted on any progress I make then.

from gchp.

JiaweiZhuang avatar JiaweiZhuang commented on September 18, 2024

Maybe as a first step we could try to set up a build matrix for GC-Classic

It will be a good first exercise! I guess testing different GCC versions would be useful. But the biggest use case for this is indeed making GCHP build more robust. Building GC-classic is easy.

from gchp.

JiaweiZhuang avatar JiaweiZhuang commented on September 18, 2024

I could start looking at it in my downtime.

That would be wonderful @LiamBindle ! I would recommend trying Azure pipeline first as it seems quite popular right now. I am also going to use it for my package.

from gchp.

LiamBindle avatar LiamBindle commented on September 18, 2024

Hi everyone,

Disclaimer: This is just me spilling my thoughts/findings from last night

I started looking into this last night and I think it looks pretty straightforward. I thought I'd put my finding here so that others with more expereince and knowledge (@JiaweiZhuang, @yantosca, and others) with docker and CI could potentially recongize any rabbit holes/antipatterns in my plan.


Azure pipelines

I found it really easy to set up a simple azure pipeline with a matrix. Thanks for the suggestion @JiaweiZhuang! The documentation is great, and also, their youtube tutorials made setting up an account and project really easy. (e.g. this one, this one, and this one ).

I set up a basic azure-pipeline.yml file that runs a few commands including gfortran --version with a build matrix for gcc 4, 5, 6, 7, and 8. The build commands at the end don't actually work yet because my master branch doesn't have cmake support, but it's the general idea.

trigger:
- master

pool:
  vmImage: 'ubuntu-latest'

strategy:
  matrix:
    gcc4:
      containerImage: gcc:4
    gcc5:
      containerImage: gcc:5
    gcc6:
      containerImage: gcc:6
    gcc7:
      containerImage: gcc:7
    gcc8:
      containerImage: gcc:8

container: $[ variables['containerImage'] ]

steps:
- script: |
    gfortran --version
    mkdir build
    cd build
    cmake -DRUNDIR=IGNORE -DRUNDIR_SIM=standard $(Build.Repository.LocalPath)
    make -j
    make install
  displayName: 'Building GEOS-Chem'

Here is the pipeline project: https://dev.azure.com/lrbindle/geos-chem/_build?definitionId=1.

Essentially, my plan is to just replace the containerImages in the matrix with "geos-chem-build-matrix" images.


GEOS-Chem-Build-Matrix Images

I put together a simple dockerfile that builds an image with GEOS-Chem Classic's dependencies. You can find it here.

Essentially, I was thinking that I'd set up an azure pipeline with a matrix of gcc, netcdf (pre 4.2), netcdf-c (post 4.2), and netcdf-fortran (post 4.2) that builds images and pushes them to dockerhub. Then in geos-chem's azure-pipeline.yml we pull those images.


Further steps

First, I'm going to get the "geos-chem-build-matrix" pipeline going. I thought I would start with building images with all gcc major versions, and the latest HDF5, NetCDF C and Fortran libraries. I'm planning to push these to DockerHub/liambindle/gcc-netcdf-c-netcdf-fortran with tags like "5-4.7.1-4.4.5" for thegcc, netcdf-c, and netcdf-fortran versions.

The second step will be creating a branch on GEOS-Chem like feature/AzurePipeline with an azure-pipeline.yml file similar to the one I posted above.


Next week I'm going to be focusing on prepping for our visit to Goddard, so after today, it will probably be about 2-3 weeks before I can pick this up again. I thought I should get this down to make picking it back up in a few week easier.

from gchp.

JiaweiZhuang avatar JiaweiZhuang commented on September 18, 2024

@LiamBindle That's really fast work!

their youtube tutorials made setting up an account and project really easy.

Oh, I didn't notice this before. They have some great stuff.

I put together a simple dockerfile that builds an image with GEOS-Chem Classic's dependencies.

Great, just quick comments: LiamBindle/geos-chem-build-matrix#1 LiamBindle/geos-chem-build-matrix#2

Essentially, my plan is to just replace the containerImages in the matrix with "geos-chem-build-matrix" images.

I think we should reuse Docker images more cleverly. The build matrix can become every large. We might end up having something like:

  • 2 base OS
  • 3 gcc versions
  • 8 MPI variants and versions

That's 2x3x8=48 parallel builds. Alternatively, we can pick up some representative combinations, instead of try all possible cases. But 20+ builds would still be normal.

Some points to consider:

  • We shouldn't rebuild the MPI & NetCDF libraries before every time we build GEOS-Chem. Do you know whether Azure pipeline caches images? Or can it pull pre-built images from DockerHub / other container registries so you don't need to rebuild it every time?
  • When testing different versions of MPI, we can use a single installation of the NetCDF libraries, to save time & space. (I don't think GCHP is using the MPI features in NetCDF?)
  • To avoid maintaining too many images, a single Docker image might contain several compilers and libraries, managed by spack env. But on the other hand, if you put all libraries into a single image, it would be very huge and take forever to build (can't be parallelized). A reasonable choice is having a single compiler version for each image (the GCC image you choose is a good starting point), and build NetCDF + 8 MPI variants inside this image. Or maybe there are more clever ways.

from gchp.

LiamBindle avatar LiamBindle commented on September 18, 2024

Thanks for the feedback!

@JiaweiZhuang, thanks, I thought you would probably have some good insight because I think you have a lot more experience with containers than I do. I think it maybe wasn't quite clear though in what I'm thinking.

Or can it pull pre-built images from DockerHub / other container registries so you don't need to rebuild it every time?

This is what I am thinking. Essentally, LiamBindle/geos-chem-build-matrix is a project that builds and pushes such images to DockerHub. We can pull these images later in geoschem/geos-chem for our build matrix pipeline. Because there are many possible combinations, as you have mentioned, I'm working on a pipeline that uses a matrix to build all these prebuilt images and push them to dockerhub.

Once those are working, we can set up a Azure Pipeline for geoschem/geos-chem that pulls these prebuilt images with a matrix. This should be easy once there are images we can just pull from Docker Hub.

I think that when I get the prebuilt image pipeline working it would be good to make the images more complex (e.g. seperate images for each gcc, and then install multiple netcdf versions in each image by using spack).

What do you think?

from gchp.

JiaweiZhuang avatar JiaweiZhuang commented on September 18, 2024

builds and pushes images to DockerHub.

Oh I thought you will be building the images on the fly before building GEOS-Chem (didn't see DockerHub-related commands in your repo). But now I see your plan.

From Azure docs "Docker task" section I see that:

Use this task in a build or release pipeline to build and push Docker images to any container registry using Docker registry service connection.

The Docker Registry service connection can be either Azure Container Registry or Docker Hub.

If I understand correctly, this "Docker task" is mainly for publishing/deploying/pushing Docker images (say, to get around the resource limit of DockerHub's own build).

Then, the next stage, is using Container jobs to pull pre-built images (containing MPI libraries, etc.) to further build GCHP.

from gchp.

JiaweiZhuang avatar JiaweiZhuang commented on September 18, 2024

Basically, there are two independent steps

1. Build the base images containing NetCDF and MPI libraries.

This can be done on DockerHub itself, or on Azure pipelines, or on many other platforms (even on local machines). This is mostly what your current repo (https://github.com/LiamBindle/geos-chem-build-matrix) is doing. This step only needs to be done once, and rarely needs a rebuild.

One major question for this step is how to manage all variants of images and minimize redundant installation of libraries. A useful reference for handling image dependencies is Jupyter's docker stack: https://github.com/jupyter/docker-stacks (it has a deep dependency chain)

This step needs its own repo. Maybe eventually merging it to https://github.com/geoschem/geos-chem-docker.

2. Actually run CI inside the images built in step 1

This is where the "build matrix" is defined. Although you can still have a "build matrix" in step 1, it won't be as big as the matrix here (a single image might contain multiple libraries)

Azure pipelines will be the primary choice, as it provides more resources than Travis & Docker Hub.

This step needs to add CL config files into GEOS-Chem's code repo, so the build can be triggered at every commit. GCHP's code repo should keep a tag/hash pointing to a GC-classic commit.

from gchp.

JiaweiZhuang avatar JiaweiZhuang commented on September 18, 2024

In terms of the resource limit on Azure pipeline, I saw in Parallel jobs that:

Public project: 10 free Microsoft-hosted parallel jobs that can run for up to 360 minutes (6 hours) each time, with no overall time limit per month.

So the matrix can't be too big if we stick to the free plan :) But 6 hours are very long, compared to 50 minutes on Travis and 2 hours on Docker Hub. A single job can probably finish 3~4 GCHP builds (another reason why merging multiple libraries into one image is useful)

As a starting point, let's just build 8~10 environments independently, choosing from:

  • Compiler: gcc 7.x, 8.x, 9.x
  • MPI: OpenMPI 3.x, Intel MPI, MVAPICH, MPICH

from gchp.

JiaweiZhuang avatar JiaweiZhuang commented on September 18, 2024

@LiamBindle Thanks for the update!!

Quick question: I cannot access the link https://dev.azure.com/lrbindle/geos-chem/_build/results?buildId=35. Got "401 - Uh oh, you do not have access." after logging into my Azure account. Would you be able to make the link public (without requiring log-in)?

from gchp.

LiamBindle avatar LiamBindle commented on September 18, 2024

Whoops, sorry about that. Done!

from gchp.

JiaweiZhuang avatar JiaweiZhuang commented on September 18, 2024

Just notice LLNL's Building-Linking-and-Testing (BTL) framework (https://github.com/LLNL/blt). Not sure if that's useful for our task, but @LiamBindle might find it interesting:

BLT is a streamlined CMake-based foundation for Building, Linking and Testing large-scale high performance computing (HPC) applications.

from gchp.

JiaweiZhuang avatar JiaweiZhuang commented on September 18, 2024

@LiamBindle Just notice a paper on CI/CD for HPC using Jenkins and Singularity container: Continuous Integration and Delivery for HPC: Using Singularity and Jenkins

Following a similar idea, it is totally possible to develop a build-run-plot CI pipeline to automate the entire GEOS-Chem benchmarking workflow, with additional benefits like performance monitoring (see pydata benchmark for example). For the "delivery" side, the same pipeline can be further extended to build containers/AMIs/Conda packages/Spack packages for GC-classic/GCHP, for general science users.

CI/CD is rarely done for huge HPC codebase, but I think you have the technical capability to make it work, and then the same workflow can be adopted by other models. It can become a serious research project and lead to a GMD-style publication, if you'd like to continue working on build systems.

References:

The actual framework (AWS vs Azure vs open-source) probably doesn't matter that much, because the high-level logic of those tools are very similar, and they will face the same challenges such as how to efficiently pull GEOS-Chem's large input data into a container environment.

from gchp.

LiamBindle avatar LiamBindle commented on September 18, 2024

That's very cool. Thanks for the info!

I think run/plot pipelines would be pretty doable with a self-hosted agent or two. Self-hosting agents might also make ifort licensing easier. Right now though, I don't think I have the capacity to take something like this on, but I think this would be a really cool project for someone to pick up!

I'm going to submit a CI PR for GEOS-Chem Classic this afternoon. Looking forward to your hearing everyones feedback.

from gchp.

JiaweiZhuang avatar JiaweiZhuang commented on September 18, 2024

I think run/plot pipelines would be pretty doable with a self-hosted agent or two. Self-hosting agents might also make ifort licensing easier.

Indeed. That CI/CD paper also runs on-prem, and that's why they use Singularity instead of Docker due to permission issues on local HPC. Running the pipeline on Harvard Odyssey/Cannon will simplify data movement, but that also has many downsides such as limited compute resources, frequent system maintenance and update, etc.

but I think this would be a really cool project for someone to pick up!

Yeah I just put the ideas here. Nothing urgent at all!

from gchp.

JiaweiZhuang avatar JiaweiZhuang commented on September 18, 2024

Thanks @lizziel ! /dev/null:realconstant is like /dev/null:3.0, /dev/null:5.0 etc?

Is there a similar feature in HEMCO for GC-classic?

from gchp.

lizziel avatar lizziel commented on September 18, 2024

Continuous integration (build only) is available with GCHP 13.0.0. I will therefore close this issue. However, some of the discussion here is relevant for continuing to develop the CI capability to include running the model. I am therefore porting this issue to the GCHPctm repository which replaced this GCHP repository with the release of 13.0.0.

from gchp.

rscohn2 avatar rscohn2 commented on September 18, 2024

Here are some sample configurations for using intel compilers in public CI systems : https://github.com/oneapi-src/oneapi-ci

from gchp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.