isugenomics / bioinformatics-workbook Goto Github PK

View Code? Open in Web Editor NEW

168.0 17.0 88.0 169.22 MB

Bioinformatics Workbook repository

Home Page: https://bioinformaticsworkbook.org

License: MIT License

HTML 98.18% Ruby 0.01% JavaScript 1.11% SCSS 0.69% Jupyter Notebook 0.02%

bioinformatics

bioinformatics-workbook's Introduction

Bioinformatics Workbook

Preface

The best way to learn bioinformatics is through examples of real world problems. The Bioinformatics Workbook provides the reader with an in depth understanding of experimental design, data acquisition, data wrangling, data analysis and visualization. This is accomplished through worked out example problem in each of these sections along with one or more advanced problem sets and corresponding solutions. This books assumes that the reader has some knowledge of biology and basic understanding of the Unix command line. However, for the beginner, the appendix contains introductory material and tips/tricks for common bioinformatic problems, that is referred to for more information throughout the book.

Please start your exploration at the index.

Citing the Bioinformatics workbook

Please use when citing our work.

Collaborative etiquette

This project is an Open Source and encourages everyone to be fearless in their contributions. However, we have some ground rules to make this a happy, collaborative place for everyone. Our contribution guidelines are listed here. Please pay attention to the Code of Conduct. Don't hesitate to reach out for us if you need help at [email protected] or tweet @isugif).

Happy writing!

Funding

Primary support from SCINet project of the United States Department of Agriculture - Agricultural Research Service, (USDA-ARS) project number 0500-00093-001-00-D
Additional support from National Science Foundation (NSF) IOS#1546858

bioinformatics-workbook's People

Contributors

Stargazers

Watchers

Forkers

kyounghyoun ruixiangliu hsiaoyi0504 dylansosa weedcentipede qliugithub deeplearningpk wangmz0617 cnyuanh maryam-sayadi grlazo jasonjwilliamsny davan690 sunbymoon dayedepps cparsania santosmafalda htnani wenchaolin bioyliu usda-ars-fsepru jtrachsel leornardzhou vikash84 tayyrov subhi nikolaospapachristou eddug yuzhenpeng solaymane harimenath pythseq bioinformatics-hub-ke tanguylallemand thyagoleal rsuchecki sailfish009 akrammendez mostafaabuzaid25 williamrowell sm30 mkyriak ahmedelhosseiny radhu903 antonioggsousa atchon molecules wangchengww gashanjakim rpucheq m-naghizadeh schlogl2017 babasaraki vsatheesh kovimallik biosharp-dotnet-labs rintukutum dariushghasemi margaretwoodhouse ephantus-wambui ahkamsaddam sharupaul zky17715002 huoww07 juadiegaitan bass-cigass izasilva15 xgrau lbundalian noure-bess manasealoo namlq shivanshss bioinfo-hub makujabi albertrockg gc-content crowmane420 kkpenn sumeetmankar171 kurtshowmaker olivia-c-haley amazingshi

bioinformatics-workbook's Issues

fastq-dump parallel error in tutorial

Thanks for this great resource. fastqc-dump is fairly poorly documented so this workbook was a great starting point for me.

I think there is a small error on: https://bioinformaticsworkbook.org/dataAcquisition/fileTransfer/sra.html#gsc.tab=0

parallel --jobs 3 "fastq-dump --split-files --origfmt --gzip {}" ::: SRR.numbers

I believe this will only work if you cat SRR.numbers

parallel --jobs 3 "fastq-dump --split-files --origfmt --gzip {}" ::: $( cat SRR.numbers)

GATK Tutorial - Picard is included in GATK

The GATK Best Practices Workflow for DNA-Seq tutorial currently presents information that may require an update. It is important to note that Picard is included in GATK starting with GATK4.

Would it be possible to update this tutorial to reflect the change in the inclusion of Picard within GATK4? While it is understood that the tutorial may be intended to support GATK versions prior to 4.0, it is important to bring attention to this update for the sake of accuracy and clarity.

Optionally, it may be beneficial to link the tutorial to the relevant GATK Best Practices Workflows along with a note about any modifications necessary to adapt to non-human data (e.g. maize, plants).

Braker2: Acquiring Transcript/EST data for a de novo genome

Hello!

Thank you for the great bioinformatics workbook. I've been using the Braker tutorial as inspiration to annotate me own de novo genome and it's been quite helpful this far.

I am trying to generate the transcript/EST information as part of the (required?) input for Braker2. Is the Transcript/EST information a different datatype entirely than RNA-seq, and if I don't have it then this step must be skipped? Or can it be generated from the RNA-seq data?

I was thinking that if it can be generated then perhaps I'd generate a gtf file with stringtie/guided trinity, pull the fasta file from that with bedtools, and then re-align the fasta to the genome. This being said I may be way off-base and missing something obvious.

If you have any advice here then that would be very valuable!

Best,
Dustin

Confirm that use of BLAST's `-max_target_seqs` is intentional

Hi there,

This is a semi-automated message from a fellow bioinformatician. Through a GitHub search, I found that the following source files make use of BLAST's -max_target_seqs parameter:

dataAnalysis/RNA-Seq/annotating-transcripts.md

Based on the recently published report, Misunderstood parameter of NCBI BLAST impacts the correctness of bioinformatics workflows, there is a strong chance that this parameter is misused in your repository.

If the use of this parameter was intentional, please feel free to ignore and close this issue but I would highly recommend to add a comment to your source code to notify others about this use case. If this is a duplicate issue, please accept my apologies for the redundancy as this simple automation is not smart enough to identify such issues.

Thank you!
-- Arman (armish/blast-patrol)

Add README.md selection to support other developers

Here are some materials I think it should be added to README.md. Originally rejected in #10 and @aseetharam suggests me to post it to issue section here, which disappears for at least two weeks.

For Developer

To run the repo locally, go to the root directory of this repo

install bundler and jekyll by RubyGems:
```
  gem install bundler jekyll
```
install the dependencies:
```
    bundle install
```
run the site:
```
  bundle exec jekyll serve
```
when developing, use the watch mode to automatically regenerate the site:
```
  bundle exec jekyll serve --watch
```

Add contribution guidelines

as suggested by @hsiaoyi0504

Consider use darker color for the sub-title of item

It seems to me current color of the sub-title of item is a little not obvious. For example, introduction to unix of following figure.

WGCNA doesn't recommend filter by DEGs

Hi Jennifer,

Thanks for writing the awesome tutorial!

I found you wrote "Optionally you could subset to only genes that are differentially expressed between groups." in this WGCNA tutorial. However, from what I have noticed from here, the author actually doesn’t recommend this.

Taotao