Giter Site home page Giter Site logo

Cedana

GitHub Release GitHub Actions Workflow Status

Welcome to Cedana! This repository is the home of the cedana daemon and the low-level orchestration of our save/migrate/resume (SMR) functionality, and is the entry-point into the larger cedana ecosystem.

We build on top of and leverage CRIU to provide userspace checkpoint/restore of processes and the many different abstraction levels that lie above. We also provide the ability to checkpoint/restore rootfs in both containerd and CRIO interfaces for full container checkpoint/restores.

For a list of supported runtimes, see our container support matrix.

We can monitor, migrate and automate checkpoints across a real-time network and compute configuration enabling ephemeral and hardware agnostic compute. See our website for more information about our managed product.

Some problems Cedana can help solve include:

  • Cold-starts for containers & processes
  • Keeping a process or container running independent of hardware/network failure
  • Managing multiprocess/multinode systems (independent of Kubernetes/SLURM or any orchestration)
  • GPU checkpoint/restore
  • And more!

Build

Cedana needs libgpgme, libbtrfs and libseccomp on the machine to build against. On a debian based system, you can install them with:

apt install libgpgme-dev libseccomp-dev libbtrfs-dev

on centOS/RHEL:

yum install gpgme-devel libseccomp-devel btrfs-progs-devel 

To build:

go build.

Documentation

To get started using cedana locally, check out the docs.

Contributing

See CONTRIBUTING.md for guidelines.

Cedana's Projects

beta9 icon beta9

The open-source serverless GPU container runtime.

cedana icon cedana

Orchestrated process and container checkpointing

cedana-cli icon cedana-cli

Cedana: Access and run on compute anywhere in the world, on any provider. Migrate seamlessly between providers, arbitraging price/performance in realtime to maximize pure runtime.

cri-o icon cri-o

Open Container Initiative-based implementation of Kubernetes Container Runtime Interface

cricket icon cricket

cricket is a virtualization solution for GPUs

criu icon criu

Checkpoint/Restore tool

go-daemon icon go-daemon

A library for writing system daemons in golang.

hami icon hami

OpenAIOS vGPU scheduler for Kubernetes is originated from the OpenAIOS project to virtualize GPU device memory.

kata-containers icon kata-containers

Kata Containers is an open source project and community working to build a standard implementation of lightweight Virtual Machines (VMs) that feel and perform like containers, but provide the workload isolation and security advantages of VMs. https://katacontainers.io/

nanogpt icon nanogpt

The simplest, fastest repository for training/finetuning medium-sized GPTs.

otelgrpc icon otelgrpc

Fork of https://github.com/open-telemetry/opentelemetry-go-contrib/blob/main/instrumentation/google.golang.org/grpc/otelgrpc/interceptor.go to use internally.

pytorch icon pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

runc icon runc

CLI tool for spawning and running containers according to the OCI specification

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.