Giter Site home page Giter Site logo

pydmtcp's Introduction

pyDMTCP

Python Interface to DMTCP (+SLURM)

Supercomputers become increasingly important due to the growing demand of computational power and the amount of data that needs to be analyzed. As supercomputing systems become larger and serve many users simultaneously, the costs of building and maintaining such systems increase. Therefore, efficiency is important for both providers and users. One of the main factors of a supercomputer efficiency is the scheduling of jobs submitted by the users, to mainly increase system utilization and reduce power consumption. Meanwhile, python becomes one of the major tools/wrappers to interface scientific applications, by managing and analyzing scientific jobs conveniently. Integrating job schedulers in these tools/wrappers may increase system efficiency. DMTCP is a system-level Checkpoint/Restart (C/R) library which allows to perform C/R operations without any source code modifications. DMTCP can be utilized by scientific applications wrappers, and especially job schedulers to provide more flexibility by pausing, porting and resuming jobs. It can also be used to restart a job that crashed -- an event that becomes more common in larger supercomputers. A Python module, "dmtcp.py", has been created to support DMTCP checkpointing both from within an interactive Python/IPython session and programmatically from within a Python program. However, this module is used internally in python programs, and cannot support interfacing DMTCP with other black-box executables. In this work, we extend "dmtcp.py", and firstly introduce pyDMTCP, a python module that enables python schedulers/wrappers of scientific applications to easily utilize DMTCP checkpointing via a python interface and externally to applications.

Prerequisites

First, clone the pyDMTCP code provided here. clone https://github.com/Scientific-Computing-Lab-NRCN/pyDMTCP.git Then, you should install and load the supported packages (i.e. dmtcp, openmpi) to your environment. You will also need to install SLURM. In addition, you should downlowd Python3.7 with the following packages:

• dmtcp

• openmpi

Know Your Flags

• --start : Specify the binary file you plan to run under pyDmtcp.

• --compress : Define if you want to compress the dmtcp files.

• --overwrite : Specify if you want to overwrite the last dmtcp checkpoint.

• --rollback : Define how many checkpoint files save to rollback.

• --stop : Define the job number to stop.

• --restart : Define the job number to restart (using the dmtcp checkpoint file).

Examples

python3 pyDMTCP.py --start lulesh2.0 --compress True --overwrite True --rollback 1

python3 pyDMTCP.py --stop 100685

python3 pyDMTCP.py --test_restart lulesh2.0 --compress True --overwrite True --rollback 1

pydmtcp's People

Contributors

gabidrepo avatar yehonatan123f avatar scientific-computing-user avatar

Stargazers

Chris Coates avatar

Watchers

 avatar Chris Coates avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.