Giter Site home page Giter Site logo

tubbz-alt / cryoem-airflow Goto Github PK

View Code? Open in Web Editor NEW

This project forked from slaclab/cryoem-airflow

0.0 1.0 0.0 342 KB

Data Acquisition and Data Processing Pipelines for CryoEM

Home Page: https://confluence.slac.stanford.edu/display/cryoEM/CryoEM+Data+Acquisition+Pipeline

License: Apache License 2.0

Python 87.97% Shell 10.04% Ruby 0.44% Dockerfile 1.21% Makefile 0.34%

cryoem-airflow's Introduction

cryoem-airflow

This repo contains airflow based workflows for cryoem data collection and analysis at the SLAC National Accelerator Center.

Specifically, the airflow workflows do:

  • Data Moving: copying the raw data and metadata from the cryoem hardware over to long term GPFS storage
  • Pre-processing: runs initial CTF calculations and alignment of images; logs and reports data to a timeseries datastore, the cryoem elogbook and slack.

Pre-requisites

A docker-compose configuration file is provided to facilitate deployment.

Setup Host

The relevant CIFS and GPFS mounts should be created on all nodes that will participate for the data acquisition (DAQ); eg

# install CIFS
yum install -y samba-client samba-common cifs-utils

# setup credentias for CIFS
cat <<EOF > /etc/samba/tem.creds
username=<user_for_tem>
password=<password_for_tem>
EOF
# ensure permissions
chmod go-rwx /etc/samba/tem.creds

# setup CIFS mountpoint and options
cat <<EOF >> /etc/fstab

# cryoem TEM
# mount
mkdir -p /srv/cryoem/tem1
mkdir -p /srv/cryoem/tem2
mkdir -p /srv/cryoem/tem3
mkdir -p /srv/cryoem/tem4

# edit fstab for persistence
cat <<EOF >> /etc/fstab

# TEM mountpoints
//<ip_of_tem>/data    /srv/cryoem/tem1/ cifs uid=<cryoem_user>,gid=<cryoem_group>,forceuid,forcegid,dom=<domainname_of_tem>,file_mode=0777,dir_mode=0777,noperm,credentials=/etc/samba/tem.creds 0 0
EOF

Setup Repository and Files

cd
# get the code
git clone https://github.com/slaclab/cryoem-airflow.git cryoem-airflow
# setup persistence
mkdir data/
mkdir data/postgres
mkdir data/redis
mkdir data/logs
# setup cryoem experiments
mkdir -p experiment/tem1
cat > experiment/tem1/tem1-experiment.yaml << EOF

experiment: name: 20171204_sroh-hsp microscope: krios1 fmdose: 1.2 EOF

Setup Docker

This work utilises apache-airflow for Docker. It is based on the work from puckel's docker-airflow and uses the official Postgres as backend and Redis as queue.

# install pre-reqs
yum install -y epel-release python-pip wget git
pip install -U pip
yum  -y remove  docker-common docker container-selinux docker-selinux docker-engine
wget https://download.docker.com/linux/centos/docker-ce.repo -O /etc/yum.repos.d/docker-ce.repo
yum -y install docker-ce

# optional - setup user privs for docker:
sudo usermod -aG docker <userid>

# on all nodes
sudo systemctl restart docker
sudo systemctl enable docker

We make use of docker swarm to provide quick deployment and scalability.

sudo docker swarm init --force-new-cluster

# on the other nodes
docker swarm join --token <token_from_swarm> <ip_of_swarm_master>:2377

We choose to run 3 managers and 5 workers.

For local testing, you may choose to use Docker Compose instead.

Installation and Usage

We may(?) need to build the image first:

docker build --rm -t slaclab/cryoem-airflow:1.8.2 .

# push to dockerhub
docker login
docker push slaclab/cryoem-airflow:1.8.2

Now that everything should be setup, let's start the airflow containers:

cd cryoem-airflow
docker stack deploy  --prune -c docker-compose.yaml cryoem-airflow

After a little time, all of the services should be up:

$ docker stack ls
NAME                SERVICES
cryoem-airflow      6

$ docker stack ps cryoem-airflow
ID                  NAME                         IMAGE                          NODE                DESIRED STATE       CURRENT STATE            ERROR               PORTS
troui5fwarqr        cryoem-airflow_postgres.1    postgres:9.6                   cryoem-daq03        Running             Running 7 seconds ago
90ciwxm0ndvb        cryoem-airflow_redis.1       redis:3.2.7                    cryoem-daq01        Running             Running 28 seconds ago
qkuj38s0vr94        cryoem-airflow_worker.1      slaclab/cryoem-airflow:1.8.2   cryoem-daq01        Running             Running 29 seconds ago
zwj1waucgag2        cryoem-airflow_scheduler.1   slaclab/cryoem-airflow:1.8.2   cryoem-daq01        Running             Running 31 seconds ago
i7u36z7qt0gu        cryoem-airflow_flower.1      slaclab/cryoem-airflow:1.8.2   cryoem-daq03        Running             Running 32 seconds ago
30w8vsm7nxvp        cryoem-airflow_webserver.1   slaclab/cryoem-airflow:1.8.2   cryoem-daq02        Running             Running 34 seconds ago
lafftqw1psfh        cryoem-airflow_worker.2      slaclab/cryoem-airflow:1.8.2   cryoem-daq03        Running             Running 30 seconds ago
s62hnw3p6k1g        cryoem-airflow_worker.3      slaclab/cryoem-airflow:1.8.2   cryoem-daq02        Running             Running 29 seconds ago

Check Airflow Documentation

You should then be able to goto localhost:8080 and see all of the workflows.

Technical Details

The workflows are python scripts placed under dags. One can also install reusable operator and sensors under plugins.

file-drop

The file_drop.py DAGs reads in experimental setup information from a yaml file and sets up the storage in preparation for the rsyncing of the data from the TEM to our long term file store. It also deletes old data from the TEM.

pre-processing

Notes

Install custom python package

  • Create a file "requirements.txt" with the desired python modules
  • Mount this file as a volume -v $(pwd)/requirements.txt:/requirements.txt
  • The entrypoint.sh script execute the pip install command (with --user option)

cryoem-airflow's People

Contributors

darkk avatar eshizhan avatar hadsed-genapsys avatar ianburrell avatar jbdalido avatar kristi avatar lamroger avatar mendhak avatar puckel avatar yee379 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.