Giter Site home page Giter Site logo

excelerate-demonstrator-4.3's Introduction

Elixir-Excelerate Demonstrator 4.3

Elixir logo

Excelerate logo

Introduction

Massive sequencing and genotyping of crop and forest plants and their pathogens and pests generates large quantities of genomic variation data. ELIXIR is designing an infrastructure to allow genotype-phenotype analysis for crop plants based on the widest available public datasets. Data is scattered across the laboratories seeking to describe and understand the life of plants at the molecular level.

ELIXIR Compute supports the plant science community to track and bring these data together, which enriches data analysis capabilities of the scientific community โ€“ local data can be interpreted in the global context. This EXCELERATE demonstrator show the fundamental technical integration needed to achieve data transfers from geographically distributed sites onto the scalable Compute platform built by ELIXIR.

This repository contains instructions and scripts for setting up a cloud resource using Elixir id, deploying a storage endpoint VM, and using the Elixir Data Transfer Service to move a set of files to the cloud instance.

By cloning this repository and following the instructions, it should be possible to reproduce the demonstration.

A terminal session recording is available at https://asciinema.org/a/5XlGftbaq0KXGgQeGdEpAq4nd .

Prerequisites

  • git - for cloning this repository
    • git clone https://github.com/NBISweden/excelerate-demonstrator-4.3.git
  • an Elixir id
  • an ssh key pair
  • access to resources from a cloud provider
    • in this demonstor, we have used the denbi cloud, which is integrated with the Elixir AAI system
  • terraform (infrastructure as code software by HashiCorp)
  • fts3 command line tools
    • documentation
    • for a workstation running CentOS with the EPEL repository enabled, simply run yum install fts-client

Deploying a storage endpoint

We use terraform to deploy a VM running a gridftp daemon.

First we download a proxy certificate that will be used to authentate with the Elixir Transfer service. For this the user must have access to the VO Portal.

Registering for the Elixir VO

  1. Visit the Elixir VO registration page
  2. Send an e-mail to [email protected] and state that you would like to have access to the VO portal.

Obtaining a proxy certificate

Once the VO access is approved, we can log in at the portal and save the proxy certificate in a text file, e.g. by pasting it into cert.txt.

From the VO Portal, we also take note of the identity (typically a string of the form /DC=eu/DC=rcauth/DC=rcauth-clients/O=ELIXIR/CN=Firstname Lastname Randomstring. The identity can of course also be extracted from the proxy certificate itself, using e.g. the openssl tool:

openssl x509 -subject -noout -in cert.txt

Deploying endpoint

In this example, we are using a clod provider that is running Open Stack, so in addition to terraform, we need the openstack command line tools, which can be installed by e.g. pip install openstack.

We also need the api keys associated with our project from the cloud provider. These can be downloaded from the Horizon interface in OpenStack (log in to the cloud dashboard, go to "Access & Security", click "Download OpenStack RC file" and save the file as e.g. openstack.rc).

We source the OpenStack rc file and query the cloud provider for details about the public network:

source openstack.rc
openstack network list --external

Next, we deploy a VM and install the gridftp daemon by using the terraform scripts. We start by initializing terraform, which will download the terraform Openstack provider, in case it's missing.

terraform init

The VM specifications and network rules can be found in main.tf, settings that might be different for different providers are in the file variables.tf. We tell terraform to apply this at our cloud provider:

terraform apply \
-var 'external_gateway=52b76a82-5f02-4a3e-9836-57536ef1cb63' \
-var 'pool="Public External IPv4 Network"' \
-var 'certificate="/DC=eu/DC=rcauth/DC=rcauth-clients/O=ELIXIR/CN=Firstname Lastname abc123"' \
-var '[email protected]'

Note that if the variables are not specified as arguments, terraform will ask the user for them.

For cases where the cloud provider do not have domain names registered for the public IP numbers, it is possible to explicitly set the domain name via the variable fqdn. Then it is necessary to create a DNS record pointing to the IP number. This can be carried out by the installation script by specifying an additional DNS update script via the variable dnsupdatescript.

Upon apply, terraform will spin up a VM based on Centos 7, attach a public IP number and apply some network rules. The details of this can be found in main.tf.

Finally the gridftp server is installed, equipped with a letsencrypt server certificate, and set to map the user certificate to a user account called gridftp on the VM. This is done by terraform by uploading the script centos-gridftp-rw.sh and running it on the VM with our certificate identity and e-mail address as arguments.

The terraform provisioning takes a few minutes. Once done, terraform will print the ip number to the VM once the provisioning is complete. It is then possible to log in to the VM with ssh (user name: centos). A log of the software installation can be found in /tmp/centos-gridftp-rw.log.

TODO: Locating data

It would be nice if we could have a section here on finding data that we want to copy to the VM. For example, by using the Plant user community query interface.

Transfering data to the virtual machine

Next, we transfer data from different sources to our cloud instance.

Specifying endpoint as destination

The files that we will transfer to the VM are listed in url.txt. For the transfer job, we need to specify a destination for each file. Here, we choose to store all files on the VM in a directory structure of the form /srv/data/$PRURL, where $PRURLis the protocol-relative URL. E.g. a URL https://www.elixir-europe.org/system/files/white-orange-logo.png would be stored on the VM as /srv/data/www.elixir-europe.org/system/files/white-orange-logo.png.

We append the destination URL (where we use gsiftp as protocol) for each source URL and save it in a new file:

PREFIX='gsiftp:\/\/vm-123.denbi.de\/srv\/data'
sed "s/\(.*:\/\)\(.*$\)/\1\2 $PREFIX\2/" urls.txt > transfers.txt

where vm-123.denbi.de is the host name of our VM.

Submitting a transfer job

We start by using our certificate to authenticate to the Elixir Data Transfer Service:

fts-delegation-init --proxy cert.txt -v -s https://fts3.du2.cesnet.cz:8446

We can then submit our transfer job to the fts3 server:

fts-transfer-submit --proxy cert.txt --nostreams 8 -s https://fts3.du2.cesnet.cz:8446 -f transfers.txt

The data transfer job is then carried out and monitored by the Elixir data transfer service.

Monitoring the transfer

The fts3 service can also be used to monitor the status and progress of transfer jobs. When we submitted the transfer job, a string identifying the transfer was returned. We can query the fts3 service for the status on the job with this id, viz:

fts-transfer-status --proxy cert.txt --verbose -d -s https://fts3.du2.cesnet.cz:8446 -l bc6b2602-2e83-11e8-a97e-525400cb6b4b

Once the data has been transferred to the cloud instance, we can of course log in, process the data there, and finally use the transfer service to copy the result to some other endpoint, or download to our workstation.

Conclusion

In this short demonstration, we have used our Elixir identity to deploy a cloud instance, and to copy a data set to it using the Elixir Transfer Service.

Acknowledgements

In setting up this demonstrator, we have used cloud resources graciously provided by the SNIC Science Cloud and the deNBI Cloud.

deNBI logo NBIS logo terraform logo

excelerate-demonstrator-4.3's People

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.