Giter Site home page Giter Site logo

plancton's Introduction

Plancton: opportunistic computing using Docker containers

Build Status PyPI version

Plancton continuously deploys pilot Docker containers running any application you want based on the amount of available system resources.

Main features

  • Upgrade pilot jobs to pilot containers. Plancton is meant to run "pilot" containers: your container starts and tries to fetch something to do. When the container exits, Plancton will replace it with a brand new one. An example of application easy to containerize is WorkQueue from cctools).

  • Meant for clusters. Pilot applications are containerized and deployed on a cluster of nodes, each one of them running a Plancton instance. Plancton instances are totally independent, therefore it naturally scales.

  • Monitoring. Sends monitoring data to InfluxDB, easy to plot via Grafana.

  • Containers for the masses. Plancton brings the features of Docker containers (environment consistency, isolation, sandboxing) to disposable cluster applications. Plancton is not a replacement to Apache Mesos or Kubernetes but it is a very simple and lightweight alternative when you don't need all the extra features they offer.

Instant gratification

Docker is required, and a recent Linux operating system.

Install the latest version with pip:

pip install plancton

If you want to install from the master branch (use at your own risk):

pip install git+https://github.com/mconcas/plancton

Plancton can be run as root or as any user with Docker privileges:

planctonctl start

Configure

The configuration file is located under /etc/plancton/config.yaml and it can be modified while Plancton is running. By default it starts with an empty configuration running dummy busybox containers.

You can get configurations with:

plancton-bootstrap <gh-user/gh-repo:branch>

and they'll be downloaded to the correct place. An example dry run configuration can be obtained with:

plancton-bootstrap <mconcas/plancton-conf:dryrun>

Credits

Credits for the name go to G.

plancton's People

Contributors

dberzano avatar mconcas avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

plancton's Issues

Nameless containers make Plancton crash

Due to some unknown reasons docker spawns nameless container instead the plancton-slave-XXXXX ones. Therefore Plancton crashes in accessing those NoneType keys.

TypeError: 'NoneType' object is unsubscriptable 2016-09-21 07:23:39 plancton CRITICAL [daemon.start] Terminating abnormally...

Fix README

This is the first thing that people see when connecting to the page, it has to be much simpler and reflect our changes.

No cvmfs_cache available with CVMFS device mounted inside containers

Mounting a CVMFS repo using FUSE inside containers does not allow us to benefit from cvmfs_cache.
In fact, CVMFS, at first mount creates locks to avoid data races.

[root@container_1 /]# mount -t cvmfs alice-ocdb.cern.ch /cvmfs/alice-ocdb.cern.ch/
CernVM-FS: running with credentials 498:497
CernVM-FS: loading Fuse module... done
CernVM-FS: mounted cvmfs on /cvmfs/alice-ocdb.cern.ch/

and everything just works fine.
Therefore a second container can't mount the same Fully Qualified Repository Name getting:

[root@container_2 /]# mount -t cvmfs alice.cern.ch /cvmfs/alice.cern.ch/
Repository alice.cern.ch is already mounted on /cvmfs/alice.cern.ch/

As reference i spotted the lines of code where this control is defined.
#L142

Annoying behaviour in case the max number of launchable containers is equal to 1

@dberzano
I found that the line 187 generates a dumb behaviour in a scenario where max_docks is set to 1.

  1. At the beginning of main_loop there are no running containers: the comparison made in min(max(self._count_containers(), 1), self.conf["max_docks"]) is equal to 0, that is it sends the program to a grace-spawn cooldown (it always find a cpu efficiency grater than 0.00%). This means that the execution starts with a delay proportional to the quota grace_spawn: in the config file.
  2. In case self.conf["max_docks"] is set to 1, we fall again in the case where containers are spawned and killed later on. If there was a limit of more than one container for max_docks it would result in having max_docks - 1 since the corresponding overhead would fit in the threshold, and that would be fine. In case of max_docks, however, it simply spawns and kill a container.

For these reasons i will make two pr to fix this, i will need your opinion, though.

Use CVMFS from a container without Parrot

CVMFS through parrot_run has a series of known problems, most of them by design (i.e. somewhat low performances or orphan processes adoption). We should try to mount FUSE filesystems (like CVMFS) directly inside the container.

  • Find a way to do that
  • Change the Plancton run script to modprobe fuse in advance (I guess this is needed)
  • Fully document the solution

Max TTL for containers

Containers should have a maximum time to live, no matter what's going on there. After TTL is passed, docker kill.

Behaviour to assume in case of an exception in talking with InfluxDB database

Question:

Basically we have to decide how to manage the requests.exceptions.ConnectionError that may rise in feeding a database.

Description:

There are two cases in which this may occur:

  1. Plancton attempts to create a database during its self.init() phase. This is done once a startup time.
  2. Plancton periodically sends data to the configured host:port.

My idea:

Plancton should fail to start in case we decide that the reachability of the database is a necessary condition to run (yet another flag to set this, or a force design choice).
Plancton should be tolerant enough in case the service becomes unreachable for some time or forever. In case of an address changing we might want to be able to reconfigure Plancton to feed the correct new one, this is a second-order enhancement, IMHO.

Elasticsearch communication

Plancton must be able to communicate informations (its status, its container statuses, etc. ) to an elasticsearch DB .

Configurable policy for spawning containers by looking at host used resources

We need to have a plan for that. Some policy that includes a configurable, say, "mathematical formula" or "Python expression" that calculates the number of containers to be spawned.

You might periodically save a series of parameters. For instance:

  • disk usage
  • load
  • CPU usage

and save also an "averaged" version of them over a period of time (5 minutes, 30 minutes, one hour, 12 hours, 1 day).

Decisions can be taken according to this values.

Note: this is just a plan, we need a draft to discuss before implementing anything.

Export JSON for Grafana dashboards

I will consider the possibility to export the JSON file of the dashboard on a repository, in order to constitute a common baseline for new installations.

@dberzano, let me know if you agree.

Use remote configuration

Find a way to fetch the configuration from, say, a remote HTTP URL. Maybe, HTTPS is better, but bear in mind that the CA must be known. So, in that case, better keeping the configuration on GitHub (it's even versioned then) than on one of the University servers (as long as no sensitive data is contained).

Add a force-start mode

Plancton should be able to start and remove the drainfile placeholder, if it is present.
force-start = resume + start

Make it possible to send InfluxDB data to configurable hostname/IP

At the moment Plancton has a default that send data to its localhost, it is currently working, indeed we want to have a configurable InfluxDB target for installations.
Value in default configuration dictionary currently points to localhost.
We just have to verify the correct behaviour in case a custom value is set in config.yaml to override the default one.

Stream data to an InfluxDB/ES database

We would like to monitor:

  • CPU efficiency on every host
  • Number of running containers on each host and overall
  • Average containers' lifetime
  • Plancton status

Daemonize

Make this running as a daemon - have a look at elastiq, there is a Daemon class ready.

Improve configuration mode

General idea (TBD): overlay a set of files/dirs with Docker volumes. Make it independent from Condor. Make every reference to Condor and/or Parrot disappear from the Plancton codebase!

New installation procedure should be shorter

Refactoring of the plancton-bootstrap script will lead some changes:

  • Plancton repository won't be cloned anymore from github.
  • Requisites check for python modules won't be performed by plancton-bootstrap script
  • plancton user and home directory will still be created at bootstrap time

Code refactoring

Better organise code structure, define common coding conventions, etc...

Make sure latest version of container is used

If you adopt the naming schema for containers:

mconcas/slc6-container:v1
mconcas/slc6-container:v2
mconcas/slc6-container:latest

where latest always points to the latest version, you can tell Docker to run mconcas/slc6-container:latest.

If the container does not exist, with the current implementation, it is silently pulled. However, no check is performed to make sure that the intended latest version is used.

To do that with the command line you'd do:

docker pull mconcas/slc6-container:latest
docker run mconcas/slc6-container:latest [opts...]

The docker pull part downloads nothing if the latest version is there already, but at least it ensures you are using that one.

The exercise is to implement it via the REST API if possible, like you have done with the rest.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.