Giter Site home page Giter Site logo

isugenomics / bioinformatics-workbook Goto Github PK

View Code? Open in Web Editor NEW
164.0 17.0 88.0 169.22 MB

Bioinformatics Workbook repository

Home Page: https://bioinformaticsworkbook.org

License: MIT License

HTML 98.18% Ruby 0.01% JavaScript 1.11% SCSS 0.69% Jupyter Notebook 0.02%
bioinformatics

bioinformatics-workbook's Introduction

Bioinformatics Workbook

Preface

The best way to learn bioinformatics is through examples of real world problems. The Bioinformatics Workbook provides the reader with an in depth understanding of experimental design, data acquisition, data wrangling, data analysis and visualization. This is accomplished through worked out example problem in each of these sections along with one or more advanced problem sets and corresponding solutions. This books assumes that the reader has some knowledge of biology and basic understanding of the Unix command line. However, for the beginner, the appendix contains introductory material and tips/tricks for common bioinformatic problems, that is referred to for more information throughout the book.

Please start your exploration at the index.

Citing the Bioinformatics workbook

Please use DOI when citing our work.

Collaborative etiquette

This project is an Open Source and encourages everyone to be fearless in their contributions. However, we have some ground rules to make this a happy, collaborative place for everyone. Our contribution guidelines are listed here. Please pay attention to the Code of Conduct. Don't hesitate to reach out for us if you need help at [email protected] or tweet @isugif).

Happy writing!

Funding

  • Primary support from SCINet project of the United States Department of Agriculture - Agricultural Research Service, (USDA-ARS) project number 0500-00093-001-00-D
  • Additional support from National Science Foundation (NSF) IOS#1546858

bioinformatics-workbook's People

Contributors

aedawid avatar aseetharam avatar ephantus-wambui avatar gitbook-bot avatar hsiaoyi0504 avatar isugif avatar j23414 avatar margaretwoodhouse avatar maryam-sayadi avatar molecules avatar remkv6 avatar sharupaul avatar sivanandan avatar sm30 avatar tayabsoomro avatar usha-m avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bioinformatics-workbook's Issues

GATK Tutorial - Picard is included in GATK

The GATK Best Practices Workflow for DNA-Seq tutorial currently presents information that may require an update. It is important to note that Picard is included in GATK starting with GATK4.

Would it be possible to update this tutorial to reflect the change in the inclusion of Picard within GATK4? While it is understood that the tutorial may be intended to support GATK versions prior to 4.0, it is important to bring attention to this update for the sake of accuracy and clarity.

Optionally, it may be beneficial to link the tutorial to the relevant GATK Best Practices Workflows along with a note about any modifications necessary to adapt to non-human data (e.g. maize, plants).

Confirm that use of BLAST's `-max_target_seqs` is intentional

Hi there,

This is a semi-automated message from a fellow bioinformatician. Through a GitHub search, I found that the following source files make use of BLAST's -max_target_seqs parameter:

Based on the recently published report, Misunderstood parameter of NCBI BLAST impacts the correctness of bioinformatics workflows, there is a strong chance that this parameter is misused in your repository.

If the use of this parameter was intentional, please feel free to ignore and close this issue but I would highly recommend to add a comment to your source code to notify others about this use case. If this is a duplicate issue, please accept my apologies for the redundancy as this simple automation is not smart enough to identify such issues.

Thank you!
-- Arman (armish/blast-patrol)

fastq-dump parallel error in tutorial

Thanks for this great resource. fastqc-dump is fairly poorly documented so this workbook was a great starting point for me.

I think there is a small error on: https://bioinformaticsworkbook.org/dataAcquisition/fileTransfer/sra.html#gsc.tab=0

parallel --jobs 3 "fastq-dump --split-files --origfmt --gzip {}" ::: SRR.numbers

I believe this will only work if you cat SRR.numbers

parallel --jobs 3 "fastq-dump --split-files --origfmt --gzip {}" ::: $( cat SRR.numbers)

Add README.md selection to support other developers

Here are some materials I think it should be added to README.md. Originally rejected in #10 and @aseetharam suggests me to post it to issue section here, which disappears for at least two weeks.

For Developer

To run the repo locally, go to the root directory of this repo

  • install bundler and jekyll by RubyGems:

      gem install bundler jekyll
  • install the dependencies:

        bundle install
  • run the site:

      bundle exec jekyll serve
  • when developing, use the watch mode to automatically regenerate the site:

      bundle exec jekyll serve --watch

Braker2: Acquiring Transcript/EST data for a de novo genome

Hello!

Thank you for the great bioinformatics workbook. I've been using the Braker tutorial as inspiration to annotate me own de novo genome and it's been quite helpful this far.

I am trying to generate the transcript/EST information as part of the (required?) input for Braker2. Is the Transcript/EST information a different datatype entirely than RNA-seq, and if I don't have it then this step must be skipped? Or can it be generated from the RNA-seq data?

I was thinking that if it can be generated then perhaps I'd generate a gtf file with stringtie/guided trinity, pull the fasta file from that with bedtools, and then re-align the fasta to the genome. This being said I may be way off-base and missing something obvious.

If you have any advice here then that would be very valuable!

Best,
Dustin

WGCNA doesn't recommend filter by DEGs

Hi Jennifer,

Thanks for writing the awesome tutorial!

I found you wrote "Optionally you could subset to only genes that are differentially expressed between groups." in this WGCNA tutorial. However, from what I have noticed from here, the author actually doesn’t recommend this.

Taotao

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.