Giter Site home page Giter Site logo

gcpbatchtracker's Introduction

gcpbatchtracker

DRMAA2 JobTracker implementation for Google Batch

Experimental Google Batch support for DRMAA2os.

How gcpbatchtracker Works

The project is created for embedding it as a backend in https://github.com/dgruber/drmaa2os

What gcpbatchtracker is

It is a basic DRMAA2 implementation for Google Batch for Go. The DRMAA2 JobTemplate can be used for submitting Google Batch jobs. The DRMAA2 JobInfo struct is used for getting the status of a job. The job state model is converted to the DRMAA2 spec.

How to use it

See examples directory which uses the interface directly.

Converting a DRMAA2 Job Template to an Google Batch Job

DRMAA2 JobTemplate Google Batch Job
RemoteCommand Command to execute in container or script or script path
Args In case of container the arguments of the command (if RemoteCommand empty then the arguments of entrypoint)
CandidateMachines[0] Machine type or when prefixed with "template:" it uses an instance template with that name
JobCategory Container image or $script$ or $scriptpath$ for other runnables which interpretes then RemoteCommand as script or script path
JobName JobID
AccountingID Sets a tag "accounting"
MinSlots Specifies the parallelism (how many tasks to run in parallel)
MaxSlots Specifies the amount of tasks to run. For MPI set MinSlots = MaxSlots.
MinPhysMemory MB of memory to request; should be set to increase from default to full machine size
ResourceLimits key could be "cpumilli", "bootdiskmib", "runtime" -> runtime limit like "30m" for 30 minutes

Override resource limits "cpumilli" to get full amount of resources one running just one task per machine (like 8000 for 8 cores)!

For StageInFiles and StageOutFiles see below.

In case of a container following files are always mounted from host:

    "/etc/cloudbatch-taskgroup-hosts:/etc/cloudbatch-taskgroup-hosts",
    "/etc/ssh:/etc/ssh",
    "/root/.ssh:/root/.ssh",

For a container the following runtime options are set:

  • "--network=host"

Default output path is cloud logging. If "OutputPath" is set it is changed to LogsPolicy_PATH with the OutputPath as destination.

JobTemplate Extensions

DRMAA2 JobTemplate Extension Key DRMAA2 JobTemplate Extension Value
ExtensionProlog / "prolog" String which contains prolog script executed on machine level before the job starts
ExtensionEpilog / "epilog" String which contains epilog script executed on machine level after the job ends successfully
ExtensionSpot / "spot" "true"/"t"/... when machine should be spot
ExctensionAccelerators / "accelerators" "Amount*Accelerator name" for machine (like "1*nvidia-tesla-v100")
ExtensionTasksPerNode / "tasks_per_node" Amount of tasks per node
ExtensionDockerOptions / "docker_options" Override of docker run options in case a container image is used
ExtensionGoogleSecretEnv / "secret_env" Used for populating env variables from Google Secret Manager. Please use SetSecretEnvironmentVariables()

JobInfo Fields

DRMAA2 JobInfo Batch Job
Slots Parallelism

Job Control Mapping

Did not yet find some way to put a job in hold, suspend, or release a job. Terminating a job deletes it...

Job State Mapping

DRMAA2 State Batch Job State
Done JobStatus_SUCCEEDED
Failed JobStatus_FAILED
Suspended -
Running JobStatus_RUNNING JobStatus_DELETION_IN_PROGRESS
Queued JobStatus_QUEUED JobStatus_SCHEDULED
Undetermined JobStatus_STATE_UNSPECIFIED

File staging using the Job Template

NFS (Google Filestore) and GCS is supported.

For NFS in containers besides directories also files can be specified. In case of files, the directory is mounted to the host and from there the file inside the container as specified in key. For the directory case a leading "/" is required.

    StageInFiles: map[string]string{
            "/etc/script.sh": "nfs:10.20.30.40:/filestore/user/dir/script.sh",
            "/mnt/dir": "nfs:10.20.30.40:/filestore/user/dir/",
            "/somedir": "gs://benchmarkfiles", // mount a bucket into container or host
        },

StageOutFiles creates a bucket if it does not exist before the job is submitted. If that failes then the job submission call fails. Currently only gs:// is evaluated in the StageOutFiles map.

    StageOutFiles: map[string]string{
            "/tmp/joboutput": "gs://outputbucket",
        },

Examples

See examples directory.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.