Comments (9)
A new announcement today from the Singularity team:
- A New Open Source Project Integrates Singularity and Slurm via Kubernetes https://www.sylabs.io/2019/05/introducing-hpc-affinities-to-the-enterprise-a-new-open-source-project-integrates-singularity-and-slurm-via-kubernetes/
Maybe this can be used with Amazon EKS.
from gchp_legacy.
Interestingly, Alibaba cloud already supports RDMA-enabled containers:
- Using RDMA on Container Service for Kubernetes https://www.alibabacloud.com/blog/using-rdma-on-container-service-for-kubernetes_594462
They got a bandwidth of ~5500 MB/s from ib_read_bw
. The network latency is not shown in the post, but I guess it should be ~1 us, close to an InfiniBand cluster. Not sure what's their container runtime, but probably not Docker.
Welcome anyone to test it out. I probably don't have time to test it on my own...
from gchp_legacy.
A very well-written paper just released today:
- Evaluation of Docker Containers for Scientific Workloads in the Cloud: https://arxiv.org/abs/1905.08415
They were able to use Docker over InfiniBand RDMA with very low overhead!
from gchp_legacy.
A new post today: Benchmarking MPI Applications in Singularity Containers on Traditional HPC and Cloud Infrastructures
Covers:
- Overview of recent HPC + cloud trend (as of 2019/05)
- OSU bandwidth/latency benchmark on local HPC and Azure
- MPICH vs OpenMPI compatibility issues for Singularity
from gchp_legacy.
Just notice a new post today on using Singularity + MPI on Azure: Accelerating HPC Containers on Azure
from gchp_legacy.
@JiaweiZhuang Thanks for sharing. Has anyone attempted running GCHP on Azure? Regardless, I would think the equivalent advances for multi-node capability on other cloud platforms can't be far behind. I'm hopeful.
From the blog:
The latest release also includes a new MPI interface for multi-node job execution without having
to deal with potential configuration complications between Batch multi-instance tasks, the
selected MPI runtime, and the VM instance size selected. Batch Shipyard now provides an
easy-to-use schema for executing your MPI jobs with support for popular MPI frameworks such
as Open MPI, MPICH, MVAPICH, and Intel MPI. This new interface works seamlessly between
both Singularity and Docker containers, which when combined with Azure's HB/HC instances
with 100Gbit/s EDR InfiniBand, provides unparalleled performance for your distributed HPC
container applications in the cloud.
from gchp_legacy.
Singularity 3.3 adds a new doc on Singularity and MPI applications. The situation is similar to 2.x that you need to have a compatible MPI installation on the host:
The drawbacks are:
The MPI in the container must be compatible with the version of MPI available on the host.
The configuration of the MPI implementation in the container must be configured for optimal use of the hardware if performance is critical.
from gchp_legacy.
@lizziel There are definitely examples with multi-node Docker runs (e.g. AWS Batch Multi-node Parallel Jobs). They are just less explored, with limited documentations & benchmark results.
from gchp_legacy.
@WilliamDowns and @LiamBindle of GCST are working on multi-node MPI runs with containers. I am closing out this discussion for legacy GCHP. Future discussions about multi-node MPI runs using containers may be created at new GCHP repository GCHPctm.
from gchp_legacy.
Related Issues (20)
- [BUG/ISSUE] Incorrect regridding if file latitude data ends in +/- 90 HOT 4
- [BUG/ISSUE] Not printing the missing HEMCO data file that causes model crash HOT 13
- [BUG/ISSUE] Change in MAPL vertical flip rules impacting mesospheric chemistry HOT 1
- [BUG/ISSUE] H2O2AfterChem vertically flipped in restart HOT 2
- [BUG/ISSUE] MODIS LAI not properly updated at correct time HOT 6
- [BUG/ISSUE] Run failure in transport tracers simulation with 12.6.2 HOT 1
- [FEATURE REQUEST] ESMF v8 public release HOT 1
- [BUG/ISSUE] Run crashes in MAPL when running full chemistry simulation at c360 HOT 6
- [QUESTION]Should it make cleanup_output everytime at the beginging of smulation? HOT 4
- [BUG/ISSUE] Fullchem run failure in 12.7.0+ at c180+ due to reduced timesteps HOT 4
- [DISCUSSION] This repository will be retired in version 13.0.0
- [BUG/ISSUE] Monthly diagnostics incorrect for Feb in leap years if using multi-run option HOT 1
- [QUESTION] Error in MAPL_IO.F90 reading restart file?
- [BUG/ISSUE]Invoking MPI_ABORT causes Open MPI to kill all MPI processes when run GCHP at c360 HOT 9
- [BUG/ISSUE]make build_all, gchp error in ESMF: cpp/node/detail/node_iterator.h(64): error: namespace "std" has no member "addressof" HOT 2
- [BUG/ISSUE] compiling GCHP 12.8.2 HOT 8
- [QUESTION] Compiling GCHP failed HOT 5
- [BUG/ISSUE] Non-advected species concentrations not copied for output restart file
- [QUESTION] The dimensions of gchp restart file HOT 3
- [QUESTION] Why does GCHP fail when meteorology turned on in 12.9.3 HEMCO_Config.rc HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gchp_legacy.