Giter Site home page Giter Site logo

cellnet_cloud's Introduction

The CellNet RNA-Seq Web Application

CellNet is a computational tool to assess the establishment of cell type specific gene regulatory networks in engineered cells. Previously we built a web application that allows researchers to upload microarray data and analyze it using CellNet. We recently adapted CellNet to analyze RNA sequencing data but processing this type of data is to computationally intensive to analyze on our own servers. Below is a walkthrough for how to use the cloud-based CellNet RNASeq web application.

The CellNet Web Application delegates compute-intensive tasks to Amazon Web Services (AWS) compute resources. To fully utilize all features of the web application, users must have the following: an AWS account, username and password, AWS Access ID and Secret Access ID, and account permissions to access to EC2, S3, and Cloud Formation.

Below is an outline of the steps needed to run the web application.

0. Prep your sequencing data.

The required format for uploading FASTQ files from a local machine is a gzipped tar archive. This means that the uncompressed FASTQ files are compressed on the command line using a command like the following:

tar -cvfz my_archive_name.tgz fastq_folder_to_compress

The local upload limit is capped at 4GB.

If the resulting archive will be larger than 4GB (or if otherwise desired), files should first be stored on AWS S3, and a path to the files can be specified in the Web Application. These files must be stored in a folder containing only the FASTQ files. This is the recommended option.

If you have >4GB of sequencing data but prefer not to upload your files to S3, we have provided a command line script at the bottom of this README for down-sampling reads to a lower read depth, which you can use to abridge your total file size to ≀4GB.

1. Login to AWS and select Cloud Formation from the Services menu:

Services

2. Cloud Formation Homepage

Cloud Formation

Click "Create New Stack."

3. Paste the provided link to the Stack Template:

Paste the following link:

https://s3.amazonaws.com/cahanlab/remy.schwab/Stack_Templates/CellNet_publicStackTemplate.json

into the 'Specify an Amazon S3 template URL' field.

template

Clicking the link is not necessary. You only need to copy and paste it.

4. Name your stack as desired:

Name

5. Skip this page!

options

6. Review

review

Review the details and hit 'Create.' This will launch an EC2 instance that is running the CellNet Web Application AMI.

7. The instance should take about 5 minutes to initialize, but this can vary.

events

Amazon will let you know when everything is supposed to be ready but the link to the web application may still be unavailable for a few minutes after Amazon says it’s ready. Once the instance is ready, a URL will be available under the 'Outputs' tab.

8. Click the link or navigate to this URL in another window of your browser.

link

This will take you to the front page of the Web Application.

9. Homepage

homepage

  • Input your email address, as results will be emailed as an attachment.
  • Directly upload previously compressed archive of sequencing files OR Specify path to FASTQ folder on S3 (See Step 0). Accessing S3 will require your S3 Access ID and Secret Access Key.
  • CellNet is able to compare to both the Human and Mouse transcriptome. Please specify which species your data is coming from.
  • Submit.

It may take several minutes to an hour to proceed to the next page depending on the size of the files, as both transfer and decompression is finished before the next step.

10. Construct or upload sample metadata table.

construct

upload_st

If uploading, see the CellNet GitHub page or Radley et al, 2016 for required metadata table format.

Also specify starting and target cell types.

When finished, click Submit.

11. Track your Progress

progress

Several steps in this progress are slow, e.g. sequence read mapping and quantification. The entire process may take 30 minutes - several hours.

You can use the "Cancel Job" button to terminate the entire process and return you to the homepage. However, IT WILL NOT TERMINATE THE INSTANCE

12. Done

done

This screen indicates analysis has finished. CellNet analysis is complete!

13. IMPORTANT: Once finished, make sure to delete the Stack.

delete_stack

Confirm "Yes, Delete"

This will terminate the running EC2 instance. AWS will continue to charge the user for computational resources used until the Stack is deleted.



Down-sampling your reads

We have provided a command line tool to downsample FASTQ files. (See Step 0)

Download down.py.

To help you test out the down sampling procedure, we have provided two small fastq files of RNA-Seq from mouse. They are compressed:

example fastq 1 example fastq 2

The screenshot above shows what your setup should look like. Put all of the FASTQ files you plan on uploading in one directory. For simplicity, we recommend that you put the down.py file in the same directory as the directory containing your FASTQ files. Below is an example command you would use to sample 1.5 million reads from each FASTQ file. Note that you need to de-compress the FASTQs.

python down.py -n 1000000 FASTQ

This is what you should see if the downsampling process has finished successfully. The final output is a GZipped compressed, TAR archive. This can be directly uploaded to CellNet.

cellnet_cloud's People

Contributors

remyschwab avatar emilyklo avatar pcahan1 avatar

Stargazers

Alpha Centauri avatar Emir Turkes avatar

Watchers

 avatar  avatar

cellnet_cloud's Issues

WebServerInstance Create_Failed

Attempted to follow CellNet WebApp walk-through found in the supplemental from https://doi.org/10.1101/614594. At Step 4- 'specify an Amazon S3 template URL' I used the one listed: https://s3.amazonaws.com/cahanlab/remy.schwab/Stack_Templates/CellNet_publicStackTemplate.json. After creating the stack, it threw these errors.
error

Specifically- The image id '[ami-8e2b0eeb]' does not exist (Service: AmazonEC2; Status Code: 400; Error Code: InvalidAMIID.NotFound; Request ID: f4dadc19-18cd-4687-84c8-e099e69b37de), and The following resource(s) failed to create: [WebServerInstance]. . Rollback requested by user.

Is there a new stack template URL to use on AWS CloudFormation?

No results email

Hello, I used the CellNet_Cloud webapp for 2 RNA-seq samples. The webapp progress bar completed and displayed the message: "Done! Check your email for your results." However, no results were in my email inbox. I repeated the analysis using a different email, and that also didn't work. I tried deleting and creating a new stack, which also didn't work, even after waiting an hour after the webapp said it was done.

There were no errors thrown during the analysis progress. One potential issue: the samples I uploaded had 38-bp reads (R2 end of paired end). Could this be the problem, since the training data were trimmed to 40-bp?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.