Giter Site home page Giter Site logo

tutorial_uploaddatatoega's Introduction

How to upload data to EGA?

What is EGA?

The European Genome-phenome Archive (EGA) is a service for permanent archiving and sharing of personally identifiable genetic, phenotypic, and clinical data generated for the purposes of biomedical research projects or in the context of research-focused healthcare systems (From EGA Website).

Prepare documents before getting a submission account

Data encryption

Before uploading data to EGA, you should encype your data locally.

Download data encryptor from EGA:

$ wget https://ega-archive.org/files/EgaCryptor.zip

$ unzip EgaCryptor.zip

Encrype data:

# Here I am using scRNAseq count matrix as an exmple.

$ tar -czvf ./data/Pool01_1_outs.tar.gz ./data/EGA/GEXdata/Pool01_1/outs/

$ java -jar EGA-Cryptor-2.0.0/ega-cryptor-2.0.0.jar -t {threads} \
    -i ./data/Pool01_1_outs.tar.gz \
    -o ./data/EGA/encryped/

# After the encryption, you will see three outputs.

$ ls ./data/EGA/encryped/

Pool01_1_outs.tar.gz.gpg
Pool01_1_outs.tar.gz.gpg.md5
Pool01_1_outs.tar.gz.md5

Upload data to EGA

I use Aspera ascp command line program to upload data. You can also use FTP for uploading files. More details are under: https://ega-archive.org/submission/tools/ftp-aspera

Download Aspera CLI

$ wget https://d3gcli72yxqn2z.cloudfront.net/downloads/connect/latest/bin/ibm-aspera-connect_4.2.6.393_linux_x86_64.tar.gz

$ tar xvzf ibm-aspera-connect_4.2.6.393_linux_x86_64.tar.gz

# For my computer, it is installed under home directory:
~/.aspera/connect/bin/ascp

Upload encryped data to EGA

# For a single file:
$ ASPERA_SCP_PASS={password} ~/.aspera/connect/bin/ascp  -P33001  -O33001 -QT -m3000M -L- ./data/EGA/encryped/Pool01_1_outs.tar.gz.gpg ega-box-{id}@fasp.ega.ebi.ac.uk:/.

# For a directory
$ ASPERA_SCP_PASS={password} ~/.aspera/connect/bin/ascp  -P33001  -O33001 -QT -m3000M -L- ./data/EGA/encryped/* ega-box-{id}@fasp.ega.ebi.ac.uk:/.

Register sample metadata

You can either type the sample information one by one in the submission protal or upload a csv file to the portal for large data sets.

Prepare sample csv file

Here is an example format of how this csv file should look like

$ head sample_meta.csv

title,alias,description,subjectId,bioSampleId,caseOrControl,gender,organismPart,cellLine,region, phenotype
scRNAseq,Pool01_1,tumor,001,,case,female,brain,,cortex,relapsed
scRNAseq,Pool01_2,normal,002,,control,male,brain,,cortex,wt
.....

Link data with registered sample metadata

After sample registration, you will want to link the data uploaded to EGA with registered samples. You can do this step manually by clicking on the sample and data item on EGA submission portal. However, you will never want to repeat this many times for large data sets. Here is an example of csv file, where you can link the .bam data items with samples easily. Uploading this file to the submission portal will automatically link the bam files and their md5 checksums to the registered samples.

Here is an example format of how this csv file should look like:

$ head sample_bam.csv

Sample alias,BAM File,Checksum,Unencrypted checksum
Pool01_1,Pool01_1.bam,,
Pool01_2,Pool01_2.bam,,
......

Final notes:

  • So far, I have not found out a way of linking analysis objects with sample meta via uploading csv file. The only way seems to be manually clicking in the protal. Because of that, I recommend puutting all the analysis data in to one folder, compress to the tar format and upload to the portal.

tutorial_uploaddatatoega's People

Contributors

baigal628 avatar

Stargazers

Li Song avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.