Giter Site home page Giter Site logo

fchen365 / surf Goto Github PK

View Code? Open in Web Editor NEW
2.0 0.0 3.0 7.25 MB

The statistical utility for RBP functions (SURF)

License: GNU General Public License v3.0

R 100.00%
package rna-seq clip-seq integrative-analysis surf differential-analysis alternative-splicing

surf's Introduction

SURF

lifecycle

The Statistical Utility for RBP Functions (SURF) is an integrative analysis framework to identify alternative splicing (AS), alternative transcription initiation (ATI), and alternative polyadenylation (APA) events regulated by individual RBPs and elucidate protein-RNA interactions governing these events. We used SURF to analyzed 104 RBP data (K562 cells, available from ENCODE).

A detailed vignette is available here.

Installation

You can install the development version of surf from GitHub with:

# install.packages("devtools")
devtools::install_github("fchen365/surf")

What can you do with SURF?

SURF is versatile in handling ATR event-centric analysis. Provided the data, here are four different things you could do with SURF.

Data Format Task
1 genome annotation any (gtf, gff, …) parse ATR events
2 + RNA-seq alignment (bam) detect differential ATR events
3 + CLIP-seq alignment (bam) detect functional association
4 + external RNA-seq summarized table differential transcriptional activity

SURF Pipeline

— One task at one call

The four tasks of SURF pipeline should be streamlined. Once you have the data in hand (see the following sub-section), each step can be performed with a single function:

library(surf)

event <- parseEvent(anno_file)                              # task 1
drr <- drseq(event, rna_seq_sample)                         # task 2
far <- faseq(drr, clip_seq_sample)                          # task 3
dar <- daseq(far, getRankings(exprMat), ext_sample)         # task 4

Here, anno_file, rna_seq_sample, clip_seq_sample, and ext_sample are data description, and exprMat is a table of extra transcriptome quantification (e.g., TCGA, GTEx, …).

— Tell surf about your data

Describing your data should be easy. Simply follow the example below.

For task 1, a file directory will do.

anno_file <- "gencode.v24.annotation.filtered.gtf"

For task 2, surf needs to know where the alignment files (bam) are and the experimental condition for differential analysis (e.g., RBP “knock-down” and “wild-type” control).

rna_seq_sample <- data.frame(
  row.names = c('sample1', 'sample2', 'sample3', 'sample4'),
  bam = paste0("rna-seq/bam/sample", 1:4, ".bam"),
  condition = c('knock-down', 'knock-down', 'wild-type', 'wild-type'),
  stringsAsFactors = F
) 

Similarly for task 3, surf needs to know where the alignment files (bam) are and the experimental condition (e.g., “IP” and the input control “SMI”).

rna_seq_sample <- data.frame(
  row.names = c('sample5', 'sample6', 'sample7'),
  bam = paste0('clip-seq/bam/', 5:7, '.bam'),
  condition = c('IP', 'IP', 'SMI'),
  stringsAsFactors = F
)

Finally, for task 4, surf assumes that you have transcriptome quantification summarized in a table exprMat, whose rows correspond to genomic features (e.g., genes, transcripts, …) and columns correspond to samples. You can use any your favorite measure (e.g. TPM, RPKM, …). Then, let surf know of the sample group (condition):

ext_sample <- data.frame(
  row.names = colnames(exprMat),
  condition = rep(c('TCGA', 'GTEx'), c(173, 337))
)

Reference

Chen, F., Keleş, S. SURF: integrative analysis of a compendium of RNA-seq and CLIP-seq datasets highlights complex governing of alternative transcriptional regulation by RNA-binding proteins. Genome Biol 21, 139 (2020). doi:10.1186/s13059-020-02039-7

surf's People

Contributors

fchen365 avatar

Stargazers

 avatar  avatar

Forkers

keleslab keles

surf's Issues

parseEvent not working

Hi,

I get an error when parsing the events from the annotation. I tested using the same input as you did, but I still get this error.
Here is my code:

library(rtracklayer)
library(usethis)
library(surf)

anno_file <- "/Data/gencode.v32.primary_assembly.annotation.gtf"
anno_hs <- import(anno_file)
gene_id <- anno_hs[seqnames(anno_hs) == "chr16" &
anno_hs$gene_type == "protein_coding" &
anno_hs$type == "gene"]$gene_id
gene_id_sampled <- sample(unique(gene_id), 24)
anno_hs_select <- anno_hs[anno_hs$gene_id %in% gene_id_sampled]

export(anno_hs_select, "/Data/gencode.v32.primary.example.gtf")

event <- parseEvent("/Data/gencode.v32.primary.example.gtf")

Error in initialize(value, ...) :
'initialize' method returned an object of class “DFrame” instead of the required class “surf”

The issue seems to come from

res <- new("surf",
anno_event,
genePartsList = genePartsList,
sampleData = DataFrameList())

in the annotateEvent() function.

Thank you in advance for your help.

Best,

Joao Lourenco

sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base

other attached packages:
[1] usethis_2.1.6 rtracklayer_1.56.1 surf_0.99.1 DEXSeq_1.42.0 RColorBrewer_1.1-3
[6] AnnotationDbi_1.58.0 DESeq2_1.36.0 SummarizedExperiment_1.26.1 GenomicRanges_1.48.0 GenomeInfoDb_1.32.4
[11] IRanges_2.30.1 S4Vectors_0.34.0 MatrixGenerics_1.8.1 matrixStats_0.62.0 Biobase_2.56.0
[16] BiocGenerics_0.42.0 BiocParallel_1.30.3 doParallel_1.0.17 iterators_1.0.14 foreach_1.5.2

loaded via a namespace (and not attached):
[1] fs_1.5.2 bitops_1.0-7 bit64_4.0.5 filelock_1.0.2 progress_1.2.2
[6] httr_1.4.4 tools_4.2.0 doRNG_1.8.2 utf8_1.2.2 R6_2.5.1
[11] DBI_1.1.3 colorspace_2.0-3 tidyselect_1.1.2 prettyunits_1.1.1 bit_4.0.4
[16] curl_4.3.2 compiler_4.2.0 cli_3.4.1 xml2_1.3.3 DelayedArray_0.22.0
[21] scales_1.2.1 genefilter_1.78.0 rappdirs_0.3.3 stringr_1.4.1 digest_0.6.29
[26] Rsamtools_2.12.0 XVector_0.36.0 pkgconfig_2.0.3 dbplyr_2.2.1 fastmap_1.1.0
[31] rlang_1.0.6 rstudioapi_0.14 RSQLite_2.2.18 BiocIO_1.6.0 generics_0.1.3
[36] hwriter_1.3.2.1 dplyr_1.0.10 RCurl_1.98-1.9 magrittr_2.0.3 GenomeInfoDbData_1.2.8
[41] Matrix_1.5-1 Rcpp_1.0.9 munsell_0.5.0 fansi_1.0.3 lifecycle_1.0.2
[46] yaml_2.3.5 stringi_1.7.8 zlibbioc_1.42.0 BiocFileCache_2.4.0 grid_4.2.0
[51] blob_1.2.3 crayon_1.5.2 lattice_0.20-45 Biostrings_2.64.1 splines_4.2.0
[56] annotate_1.74.0 hms_1.1.2 KEGGREST_1.36.3 locfit_1.5-9.6 knitr_1.40
[61] pillar_1.8.1 rjson_0.2.21 rngtools_1.5.2 geneplotter_1.74.0 codetools_0.2-18
[66] biomaRt_2.52.0 XML_3.99-0.11 glue_1.6.2 png_0.1-7 vctrs_0.4.2
[71] gtable_0.3.1 purrr_0.3.4 tidyr_1.2.1 assertthat_0.2.1 cachem_1.0.6
[76] ggplot2_3.3.6 xfun_0.33 xtable_1.8-4 restfulr_0.0.15 survival_3.3-1
[81] tibble_3.1.8 GenomicAlignments_1.32.1 memoise_2.0.1 statmod_1.4.37 ellipsis_0.3.2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.