update rmarkdown homework & reference material

the rmarkdown lesson links to some out of date resources. e.g., https://github.com/bioconnector/workshops/blame/252121cfaa059c318d75778512c2e1f81d6e26ce/r-rmarkdown.Rmd#L27. probably should clean these up. cc @vpnagraj (you're teaching this one right @vpnagraj?)

take down link to http://bioconnector.org/markdown (for now at least), since this stuff's a bit dated
link to RStudio's rmarkdown documentation instead http://rmarkdown.rstudio.com
update the help page for markdown/rmarkdown to link to better resources, update cheat sheet, etc.
http://commonmark.org/help/tutorial/

revise landing page contents (index.md)

index.md/html is pretty boring, and mostly leftover cruft from bims8382. update this to make it more relevant to the workshop series, perhaps with links to registration on the dot edu site.

update/remove topics page

topics page in navbar mostly pertains to leftovers from BIMS8382. this should go away, or be rolled into the lessons menu, or perhaps moved into an existing page under the ? menu

BIMS8382 class 2016 was made by using a Makefile to render all Rmd files to HTML, and committing all files to both master and gh-pages (source. Using RMarkdown websites has been much easier to maintain here, and cleaner (only source files on master, only rendered files on gh-pages).

Rather than going back and updating bioconnector/bims8382, I'd like to use Rmarkdown website functionality for rendering the 2017 course materials. Instead of going back to bioconnector/bims8382 and updating everything, and maintaining two parallel repos, I propose we add bims8382-specific materials to this repo. Do you agree, @vpnagraj ?

Some things to do (check them off here as you create separate issues):

add a syllabus (see http://bioconnector.org/bims8382/syllabus.html)
update the existing FAQ (see http://bioconnector.org/bims8382/faq.html)
add homework/assignments for all lessons from bioconnector/bims8382
update old bims8382's website's landing page bioconnector/bims8382#8 to link to this repo

kubrik table doesn't print nicely

the kubrik table here http://bioconnector.org/workshops/r-interactive-viz.html#highcharter doesn't print nicely, too wide. consider selecting out a few columns?

@vpnagraj

survival: colondeath==2

the survival analysis lesson states that you should use the death data, but filters where etype==1 (recurrence, not survival). options are:

Change text to "recurrence" and keep the output
Change etype==1 to 2 and update all output and exercises.

2 is probably a better option.

update lesson material
update exercises
update exercise handout compiled md/pdf. Ugh.

hadley/precis

Keep an eye on https://github.com/hadley/precis - maybe useful for replacing base::summary() for data frames especially if it becomes a canonical part of the tidyverse.

rmarkdown: paged tables

See http://rmarkdown.rstudio.com/html_document_format.html#paged_printing

review stats hw

ST to review ~~stat hw~~ stats hw

add style guide link

http://style.tidyverse.org/

add PDF exercises to handouts/

add pdf exercises to handouts

dplyr-yeast
ggplot2

exercises for stats lesson

Add exercises for stats lesson

descriptives
continuous data hypothesis testing
categorical data tests
power/sample size

survival: cheat sheet RTCGA package

add packages needed to build site

add commands to install all packages needed to build the site.

Better in the README or CONTRIBUTING md file? @vpnagraj

better gh-pages publishing

may obviate the need for the git deploy subtree madness

https://github.com/blog/2228-simpler-github-pages-publishing

extras

ggplot2

ggthemeassist https://github.com/calligross/ggthemeassist
ggthemes

stats lesson problem ex 2 doesn't specify outcome variable

add IPA results to rna-seq lesson

40b85aa added results from IPA on significant results. add some of these to the end of the RNA-seq lesson.

upstream regulators table
inflammatory network
eNOS pathway

survival: handouts + slides

Handout should have background section with information about the cox model, and knitted exercises.

predictive analysis lesson

Outline

Predictive modeling

Data

Data cleaning & prep

Model fitting

Model comparison

Predicting new samples

Quick forecasting demo

rmarkdown: update the reproducible research section / handout

"good enough..." is published:
http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005510

see also: COS, Leek, more recent papers...

reading list for stats/etc

Nice list from Mike Love

Rendered: https://biodatascience.github.io/compbio/

Source: https://raw.githubusercontent.com/biodatascience/compbio/gh-pages/index.Rmd

What is the role of the computational biologist / statistician?
- All biology is computational biology Florian Markowetz
- Questions, Answers and Statistics Deborah Nolan
- 50 Years of Data Science David Donoho
- The Future of Data Analysis John Tukey (this article, discussed by Donoho, is from 1962)
- Ten Simple Rules for Effective Statistical Practice Kass, Caffo, Davidian, Meng, Yu, and Reid
- Statistical Modeling: The Two Cultures Leo Breiman
Exploratory data analysis
Bioconductor
- Orchestrating high-throughput genomic analysis with Bioconductor Huber et al
Distances and normalization
- Differential expression analysis for sequence count data Simon Anders and Wolfgang Huber
- Tackling the widespread and critical impact of batch effects in high-throughput data Leek et al
- Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis Jeffrey Leek and John Storey
- Normalization of RNA-seq data using factor analysis of control genes or samples Risso et al
- Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses Stegle et al
Multiple testing
- Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Yoav Benjamini and Yosef Hochberg
- A direct approach to false discovery rates John Storey
- Statistical significance for genomewide studies John Storey and Robert Tibshirani
- Large-scale simultaneous hypothesis testing Bradley Efron
- Empirical Bayes Analysis of a Microarray Experiment Efron et al
- Measuring reproducibility of high-throughput experiments Li et al
Expectation maximization
- What is the expectation maximization algorithm? Chuong B Do and Serafim Batzoglou
- Gaussian mixture models and the EM algorithm Ramesh Sridharan
- EM algorithm notes Andrew Ng
- MEME: discovering and analyzing DNA and protein sequence motifs Bailey et al
Hierarchical models
- Linear models and empirical Bayes methods for assessing differential expression in microarray experiments Gordon Smyth
- Analyzing ’omics data using hierarchical models Hongkai Ji and X Shirley Liu
- Stein's Paradox in Statistics Bradley Efron and Carl Morris
- Stein's estimation rule and its competitors - an empirical Bayes approach Bradley Efron and Carl Morris
Signal processing
- An Introduction to Hidden Markov Models Lawrence Rabiner and Biing-Hwang Juang
- Hidden Markov models approach to the analysis of array CGH data Fridlyand et al
Network analysis
- Static And Dynamic DNA Loops Form AP-1 Bound Activation Hubs During Macrophage Development Phanstiel et al

update rna-seq lessons

The RNA-seq lesson really needs to be updated to the latest best practices starting with transcript quantification rather than alignment and gene-counting. Rough outline:

get original fastq data
quant with kallisto/salmon
get separate output files, import with tximport, export count table
or just start with separate outputs and a tx2gene table

r version dependency in setup

student had trouble installing tidyverse on 3.1.0. looks like tibble requires >=3.1.2. Perhaps require >=3.2.0?

Add back in all old lessons from BIMS8382

update survival lesson with new features in survminer 0.2.4

http://www.sthda.com/english/wiki/survminer-0-2-4

Of note: surv_summary(), faceting ggsurvplot(), and using ggsurvplot() to plot a cox model. Also new functions for visualizing optimal cutpoints. See post above for details, and update lesson.

add contributing.md file

add a contributing file that links to resources used to build this using the rmarkdown website framework. also, document how to push _site to gh-pages using a subtree push. add this file to the exclusion list on _site.yml, e.g.,

exclude: ["contributing.md"]

Really nice resource, 3Q

Thanks for all your contribution. The website is wonderful, I can learn a lot.

interactive viz: hc_add_series_scatter deprecated

Some warnings on building the interactive viz lesson:

Warning messages:
1: 'hc_add_series_scatter' is deprecated.
Use 'hc_add_series' instead.
See help("Deprecated")
2: 'hc_add_series_scatter' is deprecated.
Use 'hc_add_series' instead.
See help("Deprecated")

@vpnagraj

add setup instructions for interactive viz and shiny

add setup instructions for interactive viz and shiny classes from #9 @vpnagraj

ncbi lesson: `entrez_fetch` fasta sequence causes build failure

Line 216 in the ncbi lesson fails...

entrez_fetch(db = "nuccore", id = nuclinkid, rettype = "fasta")

...with the error:

Error in curl::curl_fetch_memory(url, handle = handle) : 
  Operation was aborted by an application callback

This results in a build error. In 1022146 I turned off (eval=FALSE) a few chunks that create or refer to this data.

@vpnagraj

need a better readme

more information about the class, bioconnector
link to CONTRIBUTING.markdown
???

build warnings (old R, geom_point NAs)

I get the following warnings when I build:

Output created: docs/index.html
Warning messages:
1: package 'DESeq2' was built under R version 3.3.1 
2: package 'S4Vectors' was built under R version 3.3.1 
3: package 'GenomeInfoDb' was built under R version 3.3.1 
4: Removed 32155 rows containing missing values (geom_point). 
5: Removed 32270 rows containing missing values (geom_point).

1-3 don't much matter. Fix by updating R.
4-5 - where are these coming from? maybe the viz lesson, perhaps when you try to do something on the log scale? figure this out if possible.

improve setup instructions for RNA-seq lesson

download necessary data
install packages
pre-requisite knowledge of dplyr/ggplot2. link to lessons
links to recommended reading/papers

stats additions

add to stats lessons:

paired vs unpaired t-tests
one-tailed vs two-tailed tests
batch effects
biological vs technical replicates
- What is "n" in cell culture experiments?
- Replicates and repeats?what is the difference and is it significant?
~~self-guided e.g., swirl, learnr ?~~

breaking changes in dplyr/tibble

dplyr 0.5 and tibble 1.1 introduce some breaking changes. things that come to mind are

no arranging within groups,
tibble functions being split out / not exported from dplyr which may require loading separately,
warnings about future deprecation of summarize_each, mutate_each?
they're now called "tibbles" not "table underscore DFs" or whatever we called them before. Change the narrative?

perhaps @vpnagraj could help test this one and suggest changes

find a place to show the rmarkdown video

https://www.youtube.com/watch?feature=youtu.be&v=s3JldKoA0zw

build warnings on stats lesson from missing data in ggplot

Output created: _site/index.html
There were 12 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: Removed 166 rows containing non-finite values (stat_bin).
2: Removed 31 rows containing non-finite values (stat_bin).
3: Removed 31 rows containing non-finite values (stat_bin).
4: Removed 166 rows containing missing values (geom_point).
5: Removed 31 rows containing non-finite values (stat_smooth).
6: Removed 31 rows containing missing values (geom_point).
7: Removed 723 rows containing non-finite values (stat_bin).
8: Removed 31 rows containing non-finite values (stat_boxplot).
9: Removed 31 rows containing non-finite values (stat_smooth).
10: Removed 31 rows containing missing values (geom_point).
11: Removed 31 rows containing non-finite values (stat_smooth).
12: Removed 31 rows containing missing values (geom_point).

options:

pipe to na.omit before plotting
ignore them and close this issue.

fix issues with dplyr homework on yeast data

dplyr 0.5.0 introduces a breaking change with distinct(): now only keeps the distinct variables. If you want to return all variables (using the first row for non-distinct values) use .keep_all = TRUE
ambiguous q10.

survival: remove tidyverse/dplyr dependency?

tbl_df to as_tibble

See http://dplyr.tidyverse.org/news/index.html#dplyr-0-5-0-9000

as_tibble() will soon be exported from tibble in dplyr, and tbl_df() will be deprecated.

link to old workshops website

from 2014-2015, add link on homepage or FAQ page somewhere.

setup & loading in lessons: library(tidyverse)

https://twitter.com/hadleywickham/status/773264703711617024

The tidyverse package installs and loads pretty much all the tidy packages we use here - ggplot2, dplyr, readr, tidyr, tibble, stringr, broom, etc.

Once this hits CRAN (next week?) change setup instructions and lesson-specific library calls.

ncbi: entrez_fetch causing build failure

The following lines seems to be causing a build failure.

recs <- entrez_fetch(db = "pubmed", id = res$ids[1:25], rettype = "xml", parsed = TRUE)

recs <- entrez_fetch(db = "pubmed", web_history = res$web_history, rettype = "xml", parsed = TRUE)

They both work fine when run interactively but throw the following error when trying to knit the Rmd.

Error in entrez_check(response) : 
  HTTP failure: 502, bad gateway. This error code is often returned when trying to download many records in a single request.  Try using web history as described in the rentrez tutorial
Calls: <Anonymous> ... entrez_fetch -> do.call -> <Anonymous> -> entrez_check
Execution halted

Exited with status 1.

I turned these off with eval=FALSE for now.

cc @vpnagraj
see also #43

rmarkdown: burn it all down

http://bioconnector.org/workshops/r-rmarkdown.html
https://github.com/bioconnector/workshops/blob/master/r-rmarkdown.Rmd

This whole lesson probably needs reworking. No one needs to know the gory details presented here. It should really be taught top-down instead of bottom up. Show people some RMarkdown in action first, then get into the details.

See also #49 - the reproducible research section is woefully out of date.

Some resources to look at:

On reproducibility:

Reproducibility and replicability is a glossy science now so watch out for the hype · Simply Statistics
https://simplystatistics.org/2017/03/02/rr-glossy/

Instead of research on reproducibility, just do reproducible research · Simply Statistics
https://simplystatistics.org/2015/12/11/instead-of-research-on-reproducibility-just-do-reproducible-research/

A Simple Explanation for the Replication Crisis in Science · Simply Statistics
https://simplystatistics.org/2016/08/24/replication-crisis/

Practical computational reproducibility in the life sciences | bioRxiv
https://www.biorxiv.org/content/early/2017/10/11/200683

Project-oriented workflow - Tidyverse
https://www.tidyverse.org/articles/2017/12/workflow-vs-script/

cc @vpnagraj

bioconnector / workshops Goto Github PK

workshops's People

Contributors

Stargazers

Watchers

Forkers

workshops's Issues

ggplot2

Outline

Predictive modeling

Data

Data cleaning & prep

Model fitting

Model comparison

Predicting new samples

Quick forecasting demo

Recommend Projects

Recommend Topics

Recommend Org