vccri / ularcirc Goto Github PK

An R-shiny app that provides backsplice and canonical splicing analysis for both circular RNA (circRNA) and parental transcripts

License: GNU General Public License v3.0

R 100.00%

circrna rna-seq-analysis circular-rna sequencing-rna rna-seq splicing-analyses splicing-visualization backsplicing

ularcirc's Introduction

Ularcirc

An R package that provides analysis and visualisation of canonical and backsplice junctions. Takes output provided by the STAR aligner as well as CIRI2 and circExplorer2 output and enables circRNA downstream analysis.

Author and maintainer: David Humphreys (d.humphreys at victorchang dot edu dot au)

Ularcirc manuscript now available through Nucleic Acids Research.

Installation

You can install Ularcirc using the 'devtools' package.

> install.packages("devtools")
> library(devtools)
> devtools::install_github("VCCRI/Ularcirc", build = TRUE, build_vignettes = TRUE, build_opts = c("--no-resave-data", "--no-manual"))

Ularcirc can annotate circRNA with overlapping gene information. This is obtained from available bioconductor databases. Use the following command to identify what databases to download:

> library("Ularcirc")
> all_dbs <- Compatible_Annotation_DBs() # This will return all compatible databases
> mmu_dbs <- Compatible_Annotation_DBs(search_term = 'mm10') # returns mm10 compatible databases
> # Lets see what is stored in mmu_dbs
> mmu_dbs

annotation genome txdb
16 "org.Mm.eg.db" "BSgenome.Mmusculus.UCSC.mm10" "TxDb.Mmusculus.UCSC.mm10.ensGene"
17 "org.Mm.eg.db" "BSgenome.Mmusculus.UCSC.mm10" "TxDb.Mmusculus.UCSC.mm10.knownGene"

> # Now lets download all of the above databases
> if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")   # Make sure R is looking at bioconductor repository
> BiocManager::install(c(mmu_dbs))

To start Ularcirc shiny app

> library('Ularcirc')
  > Ularcirc()

Documentation

Please refer to vignette within R. Additionally there are a number of screen casts that highlights how to get going with Ularcirc.

##Screen casts

Please click this link to view a ~5 minute screen cast that walks through a simple circRNA analysis using Ularcirc.

The following link demonstrates how to upload and recover sequence information from BSJ and FSJ

Features

Friendly user interface
Circular RNA detection independent of gene annotation.
Provides visualisation of forward AND backsplice junctions
Recover predicted circRNA sequence
Recover sequence of backsplice junctions and forward splice junctions
Support both single-read and paired-end sequencing (paired end prefered).
Detect miRNA binding sites
detect putative open reading frame of circRNA

ularcirc's People

Contributors

Stargazers

Watchers

Forkers

mantaspanos zhangxi226425 xchromosome219 leiming8886 yunablum wong-ziyi hanqinzhang

ularcirc's Issues

Step 2 Not Working

STEP2: Load annotation databases: Ularcirc comes with one existing data set that has been aligned to hg38. While annotation is not required to identify circRNAs we recomment to download the respective human annotation databases as follows:

if (!requireNamespace("BiocManager", quietly=TRUE))
        install.packages("BiocManager")

\dontrun { 
  BiocManager::install(c("BSgenome.Hsapiens.UCSC.hg19",         # Genome; enables sequence analysis
                      "TxDb.Hsapiens.UCSC.hg19.knownGene",    # Transcript database
                      "org.Hs.eg.db"))                        # Annotation database
      }

When I run this, it says Error: unexpected symbol in "\dontrun"

PDF Export Default Extension is CSV

Could there be a better default name, perhaps containing the gene name and ending in PDF? The default file name causes the file to be by default opened with a spreadsheet program and an error results.

loadSTAR_chimeric Missing

Loading new files into the application terminates it with an error

'loadSTAR_chimeric' is not an exported object from 'namespace:Ularcirc'

Searching the package's R code only finds a function call, not a definition, for loadSTAR_chimeric.

Name Prefix Must End with Period But Not Documented

For STAR's --outFileNamePrefix, it adds the suffixes directly to the end the prefix without any delimiter. For example, if the prefix is Patient1, the file name of the splice junctions file will be Patient1SJ.out.tab. However, Ularcirc demands that there is a period separating the prefix from the suffix, which is not what STAR does by default.

File must end with either .SJ.out.tab or .ReadsPerGene.out.tab or .Chimeric.out.junction
Please review the following files that failed to upload:
OC1hg38Chimeric.out.junction
OC1hg38SJ.out.tab

Can the period be optional since STAR doesn't use one to separate the prefix from the suffix? In the meantime, I'll work around it by using soft links.

Large File Silently Added To User Home Directory

I was wondering why I got an automatic disk quota warning and then I realised that clicking on Build Table button in Gene View tab creates a large binary file in the user's home directory in an invisible .cache folder.

$ ls -lht $HOME/.cache/BiocFileCache/ | head -n 2
total 570M
-rw-r----- 1 dario stgrad 570M Jan 14 11:20 608926b2a025_608926b2a025

Can there be some instructions in the vignette about how to change this to a different folder?

Bug reports when using Ensembl ID based de novo assembly transcriptome for the species other than human

Hi,
Thanks so much for making such amazing software.

I have some problems when I was using my de novo assembly transcriptome with ENSEMBL reference of the species other than human.

In the file Ularcirc/inst/shiny-app/circRNA/Server.R on the line 269 and 1338, the regular expression, ^ENSG([-0-9]+) and ^ENSG([-0-9]+), is only validate for Human case.

The Ensembl ID format is like ENS<species prefix><feature type prefix><a unique eleven digit number>. For example, ENSMUSG00000017167 is a ENSEMBL stable ID of a mouse gene.

Therefore, I changed the original Ensembl ID test to the follows:

for the line 269:

  test0 <- gsubfn::strapplyc(as.character(GeneName),pattern="^(ENS[[:alpha:]]*).*")
  test <- gsubfn::strapplyc(as.character(GeneName),pattern=paste0("^",test0[[1]],"([-0-9]+)"))
  if (length(test[[1]]) > 0)  # Ensembl ID

for the line 1338:

 ensembl_IDs <- gsubfn::strapplyc(as.character(ensembl_IDs),"^ENS[0-9]+")

Also, when I was tring to use my custome BSgenome and TxDb packages based on the de novo assembly transcriptome with ENSEMBL reference, I got the similar errors. Because the original function “Gene_Transcript_Features” is based on ENTREZID but my circexplorer2 is based on ENSEMBL ID.

Therefore, I added a ENSEMBL ID test in the “Gene_Transcript_Features” function as below:

for the line 450:

  test0 <- gsubfn::strapplyc(as.character(Gene_Symbol),pattern="^(ENS[[:alpha:]]*).*")
  test <- gsubfn::strapplyc(as.character(Gene_Symbol),pattern=paste0("^",test0[[1]],"([-0-9]+)"))
  if (length(test[[1]]) > 0)  # Ensembl ID
  {
    ensembl_gene <- paste(test0[[1]],test[[1]],sep="")
    a <- select(GeneList$Annotation_Library, keys = Gene_Symbol, columns=c("ENTREZID", "SYMBOL", "ENSEMBL"),keytype="ENSEMBL")
    if("EXONRANK"%in%keytypes(GeneList$transcript_reference)){
      b <- select(GeneList$transcript_reference, keys = a$ENSEMBL, columns=c('GENEID', 'TXNAME'),keytype="GENEID")
    }else{
      b <- select(GeneList$transcript_reference, keys = a$ENSEMBL, columns=c('GENEID', 'TXNAME', 'EXONRANK'),keytype="GENEID")
    }
  }
  else
  {
    a <- select(GeneList$Annotation_Library, keys = Gene_Symbol, columns=c("ENTREZID", "SYMBOL"),keytype="SYMBOL")
    if("EXONRANK"%in%keytypes(GeneList$transcript_reference)){
      b <- select(GeneList$transcript_reference, keys = a$ENTREZID, columns=c('GENEID', 'TXNAME'),keytype="GENEID")
    }else{
      b <- select(GeneList$transcript_reference, keys = a$ENTREZID, columns=c('GENEID', 'TXNAME', 'EXONRANK'),keytype="GENEID")
    }
  }

I forked this project and made the above changes in that branch. Could you please check my modifications?

https://github.com/wong-ziyi/Ularcirc

Thanks.

Essential Dependencies Not Installed Automatically

I installed the software from Bioconductor, but it takes a multiple tries to get the application to run. Firstly, I didn't have shinydashboard installed. Then, I reloaded it and found it also requires shinyjs. Eventually, after also installing moments and Organism.dplyr, it runs. Perhaps these should be in the Imports field of the DESCRIPTION file, so they are automatically installed when Ularcirc is installed.

Also, the README file contains an instruction which Bioconductor no longer supports and produces an error in contemporary versions of R.

source("http://bioconductor.org/biocLite.R")
Error: With R version 3.5 or greater, install Bioconductor packages using BiocManager

How to bring in custom annotations (BSgenome, TxDb)?

Hi,

thanks for this interesting tool. I am current trying to get ularcirc to run with some of my data.

Unfortunately, the reference genome for alignments don't match the UCSC chromosome naming conventions, so I thought of creating my own BSgenome and TxDb. I already forged the BSgenome, the TxDb is yet to come.

For now, with the BSgenome loaded in to the name space, I tried to find it in the shiny App under Setup configuration. My custom BSgenome was not listed - I could imagine that it would be due to my missing TxDb (yet to be produced).

My question for you:
Is it yet possible to bring in custom genome + annotation and if so, how can I achieve that?

best,
-Michael

Cannot upload new data

Hi David
Thank you for this tool. I had a problem with the "upload new data" function.
I tried the "SRR444655.Chimeric.out.junction" file and it seems working. However, when I upload my own Chimeric.out.junction file, the webpage just closes. I also tried with the output file of CIRCexplorer2 and the webpage also closes. Can you help with this? Thank you a lot.

Here's the information from RStudio;
Loading data SRR444655.Chimeric.out.junction
Loading data 10872179
Loading data
Loading data C:\Users\DC\AppData\Local\Temp\RtmpWE73jR/20e277aa12d290a2a451f945/0.junction
Loading data circularRNA_know_sort_1.txt
Loading data 229190
Loading data text/plain
Loading data C:\Users\DC\AppData\Local\Temp\RtmpWE73jR/ba54d2d98a1aa49d2581e931/0.txtWarning: Error in <-: replacement has length zero
77: eval [C:\Program Files\R\R-3.6.1\library\Ularcirc\shiny-app\circRNA/server.R#928]
76: eval
75: withProgress
74: LoadJunctionData [C:\Program Files\R\R-3.6.1\library\Ularcirc\shiny-app\circRNA/server.R#733]
73: observeEventHandler [C:\Program Files\R\R-3.6.1\library\Ularcirc\shiny-app\circRNA/server.R#1847]
2: shiny::runApp
1: Ularcirc

installation

Hi,

I am an 'entry-level' R user. I am having an issue with installing this package (please see the attached screenshot). Any guidance would be highly appreciated.
Thanks!

Genomic Information Not Retrieved for Any Junction

No matter which junction I click on in Gene_View, when I click on the Junction_View tab I get the pop-up error "Cannot retrieve genomic information for this gene". Below the load database button in Setup tab, I see Hsapiens.UCSC.hg38, so it seems the annotations have successfully been loaded.

Problem with BS junction sequences

Hi David,
First, thank you for your great tool. I'm wondering why the BS junction sequence do not match the acceptor / donor sequences from the circRNA sequence. This is the case as well with your example project TwoSzabo. Do you have any explanation?
In your function Grab_BS_Junc_Sequence, I'm not sure why you have the following lines for, which might cause the pb:

    if (TranscriptStrand =="+")
    TranscriptStrand <- "-"
    else
    TranscriptStrand <- "+"

Thank you in avance,
Yuna

Chimeric File Crashes Ularcirc

To enable STAR's chimeric output files to work with STAR-Fusion, the option --chimOutJunctionFormat 1 is used. Loading such a file into Ularcirc causes the application to crash with a traceback

Detected 21 column names but the data has 20 columns. Filling rows automatically. Set fill=TRUE explicitly to avoid this warning.
Warning: Error in : non-numeric argument to binary operator
  78: FilterChimeric [/dskh/nobackup/biostat/Bioconductor/Ularcirc/shiny-app/circRNA/Server.R#979]
  74: LoadJunctionData [/dskh/nobackup/biostat/Bioconductor/Ularcirc/shiny-app/circRNA/Server.R#733]
  73: observeEventHandler [/dskh/nobackup/biostat/Bioconductor/Ularcirc/shiny-app/circRNA/Server.R#1847]

I don't know if that's related to the crash, but it's the only difference to the default output I can think of.

Ularcirc crashing with STAR data and gene annotation

Hi
I am dealing with BSJ and FSJ data generated from STAR. When I tried to display table annotated by genes under the "Gene_View" section, the program crashed. It happens with my data and also with the test data that is coming with the software.

What I am doing wrong?

Thanks for your help

Annotate With Parental Gene Missing

I read the instruction "on the left hand panel select Annotate with parental gene" in the vignette but my view doesn't seem to have that option. If I click on Build Table, I get an error No Data Loaded, although I seem to have successfully loaded TwoSzabo. I can see two samples with checked check boxes in the Selected Sample section of the Project tab.

Result Table Column Explanations

The output table in Gene View tab has 22 columns. Could an explanation for each be in the vignette? For example, I see BSJ_vs_FSJ is always 0 and I wonder what that means. The last few columns are a mystery.

Couldn't visit Youtube

Dear author,
It's very interesting of your Ularcirc, and I had read through your paper and installed your software.
But I suffered a issue that ORGANISM of part setup seems not work and report Error: invalid 'pattern' argument
By the way, we couldn't visit demonstrates of Youtube from china, however there are more players of circRNA study in China.

BS junction sequences: one-base shift

Hi David,
Thank you for the updated code for the BS junction sequence. I still have a minor pb when using Ciri output as it uses bwa and not STAR there is a shift of one base that appears in the sequence. For instance, when using the default data resulting from Circexplorer, your tool gives this BS sequence for HIPK3:

Using CIRI output only, for the same circRNA your tool gives:

Best,
Yuna

Following Ularcirc tutorial gives error

I am trying to follow the tutorial in the vignette and getting an error when trying to perform step 3 (building the annotations). Here is the output from the R console:

These messages appear upon running Ularcirc():

> Ularcirc()

Listening on http://127.0.0.1:6085
Warning in min(Gene_Transcripts$Transcript$start) :
  no non-missing arguments to min; returning Inf
Warning in max(Gene_Transcripts$Transcript$stop) :
  no non-missing arguments to max; returning -Inf

This appears after navigating to the Gene-View tab (NOTE: the example project file TwoSzabo has been successfully loaded and I can see two files SRR1721284 and SRR1721290 in the selected samples):

Loading species transcriptome coordinates Fri Nov 16 16:54:30 2018
  Displaying list of  61050  genes built Fri Nov 16 16:54:30 2018Warning in IdentifyDataSets() :
  No data loaded or selected so nothing to display
Please navigate back to PROJECT tab and load a data set

I select Annotate with parental gene: in the Gene-View tab and click Annotate. I get the following message and the app terminates:

Warning: Error in if: argument is of length zero
  73: observeEventHandler [/Library/Frameworks/R.framework/Versions/3.5/Resources/library/Ularcirc/shiny-app/circRNA/server.R#1488]
   2: shiny::runApp
   1: Ularcirc

Gene Count Table Requirements Missing

Could there be a specification provided of what format a valid gene count table must have? I have gene counts from RSEM and I am wondering how to convert them into a format that Ularcirc requires.

$ head OC1.genes.results # Using GENCODE Genes
gene_id transcript_id(s)        length  effective_length        expected_count  TPM     FPKM
ENSG00000000003.14      ENST00000373020.8,ENST00000494424.1,ENST00000496771.5,ENST00000612152.4,ENST00000614008.4       2229.13 2077.88 2361.00 6.97    24.07
ENSG00000000005.6       ENST00000373031.5,ENST00000485971.1     1205.00 1053.75 4.00    0.02    0.08
ENSG00000000419.12      ENST00000371582.8,ENST00000371584.8,ENST00000371588.9,ENST00000413082.1,ENST00000466152.5,ENST00000494752.1     1078.13 926.88  583.00  3.86    13.33
ENSG00000000457.14      ENST00000367770.5,ENST00000367771.11,ENST00000367772.8,ENST00000423670.1,ENST00000470238.1      3750.92 3599.66 565.00  0.96    3.33
ENSG00000000460.17      ENST00000286031.10,ENST00000359326.9,ENST00000413811.3,ENST00000459772.5,ENST00000466580.6,ENST00000472795.5,ENST00000481744.5,ENST00000496973.5,ENST00000498289.5      2727.37 2576.12    94.00   0.22    0.77
ENSG00000000938.13      ENST00000374003.7,ENST00000374004.5,ENST00000374005.8,ENST00000399173.5,ENST00000457296.5,ENST00000468038.1,ENST00000475472.5   1925.87 1774.62 60.00   0.21    0.72
ENSG00000000971.15      ENST00000359637.2,ENST00000367429.8,ENST00000466229.5,ENST00000470918.1,ENST00000496761.1,ENST00000630130.2     3523.08 3371.83 1724.99 3.14    10.84
ENSG00000001036.13      ENST00000002165.10,ENST00000367585.1,ENST00000451668.1  2255.20 2103.94 693.00  2.02    6.98
ENSG00000001084.13      ENST00000504353.1,ENST00000504525.1,ENST00000505197.1,ENST00000505294.5,ENST00000509541.5,ENST00000510837.5,ENST00000513939.6,ENST00000514004.5,ENST00000514373.3,ENST00000514933.2,ENST00000515580.1,ENST00000616923.5,ENST00000643939.1,ENST00000650454.1        2066.31 1915.08 703.00  2.25    7.78

Also,

For full functionality at least one FSJ, one BSJ, and one gene count data set be loaded per sample.

What are other possible combinations of uploaded files, and which kinds of analysis can be done if those combinations are uploaded? What is the reduced functionality that this sentence hints at?

installation error (Failed to install 'Ularcirc' from GitHub)

Dear Sir or Madam,

I am trying to use the Ularcirc to visualize the circRNAs I found. I just followed the installation pipeline but failed with the error info:

package ‘yaml’ successfully unpacked and MD5 sums checked
Error: Failed to install 'Ularcirc' from GitHub:
(converted from warning) cannot remove prior installation of package ‘yaml’

Do you know how to resolve the issue?

thanks,

Shan