Giter Site home page Giter Site logo

ropensci / ucscxenatools Goto Github PK

View Code? Open in Web Editor NEW
99.0 6.0 12.0 2.93 MB

:package: An R package for accessing genomics data from UCSC Xena platform, from cancer multi-omics to single-cell RNA-seq https://cran.r-project.org/web/packages/UCSCXenaTools/

Home Page: https://docs.ropensci.org/UCSCXenaTools

License: GNU General Public License v3.0

R 86.59% XQuery 11.14% TeX 2.27%
ucsc-xena downloader api-client tcga ccle icgc ucsc toil treehouse r

ucscxenatools's People

Contributors

shixiangwang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

ucscxenatools's Issues

RemoveGermlineCNV 没有移除种系CNV?

尊敬的作者,您好,我最近在使用咱们的工具下载拷贝数变异数据,但有个问题想咨询您,我利用

AA=getTCGAdata(project = 'OV',GisticCopyNumber = TRUE, Gistic2Threshold = FALSE,
               download = TRUE, RemoveGermlineCNV = FALSE)

AA=getTCGAdata(project = 'OV',GisticCopyNumber = TRUE, Gistic2Threshold = FALSE,
               download = TRUE, RemoveGermlineCNV = TRUE)

下载的是一样的,其中移除生殖细胞这块一个是FALSE一个是TRUE,是其本身就一样还是怎么样啊?还请给予回答,这个问题对我来说很重要,多谢您啦;

missing "OS", "OS.time", "OS.unit", "RFS", "RFS.time", "RFS.unit" columns in the downloaded clinical infomration file from TCGA

Hello,

I'm following this tutorial "TCGA Pan-cancer data download" (https://xsliulab.github.io/tumor-immunogenicity-score/#data-download-and-preprocessing) to download and clean TCGA clinical data.
However, columns like "OS", "OS.time", "OS.unit", "RFS", "RFS.time", "RFS.unit" are expected to be but absent in the clinical information files.
Is this due to updates of the "UCSCXenaTools" package?

Best,
Danshu

issue: couldn't download pancanAtlas data

Hi authors,
I tried to download pancancerAtlas dataset thru UCSCXEnaTools, but failed. Code is pasted below and I have tried paste the url shown in the code result, it doesn't give me proper data. Could you help me with it? Thank you!

> pcA_cohort = XenaData %>% 
+     filter(XenaHostNames == "pancanAtlasHub") # select pancanAtlas Hub
> cli_query = pcA_cohort %>% 
+     filter(DataSubtype == "gene expression RNAseq") %>%  # select RNAseq data
+     XenaGenerate() %>%  # generate a XenaHub object
+     XenaQuery() %>% 
+     XenaDownload()
This will check url status, please be patient.
All downloaded files will under directory /var/folders/k2/zhwq4hld003_vbl84g1qvxcr0000gn/T//RtmpAjrRSW.
The 'trans_slash' option is FALSE, keep same directory structure as Xena.
Creating directories for datasets...
Downloading EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena.gz
trying URL 'https://pancanatlas.xenahubs.net/download/EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena.gz'
==> Trying #2
trying URL 'https://pancanatlas.xenahubs.net/download/EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena.gz'
==> Trying #3
trying URL 'https://pancanatlas.xenahubs.net/download/EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena.gz'
Can not find fileEB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena.gz, this file maybe not compressed.
Try downloading fileEB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena...
trying URL 'https://pancanatlas.xenahubs.net/download/EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena'
==> Trying #2
trying URL 'https://pancanatlas.xenahubs.net/download/EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena'
==> Trying #3
trying URL 'https://pancanatlas.xenahubs.net/download/EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena'
Your network is bad (try again) or the data source is invalid (report to the developer).
Warning messages:
1: In download.file(url, destfile, ...) :
  cannot open URL 'https://tcga-pancan-atlas-hub.s3.us-east-1.amazonaws.com:443/download/EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena.gz': HTTP status was '403 Forbidden'
2: In download.file(url, destfile, ...) :
  cannot open URL 'https://tcga-pancan-atlas-hub.s3.us-east-1.amazonaws.com:443/download/EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena.gz': HTTP status was '403 Forbidden'
3: In download.file(url, destfile, ...) :
  cannot open URL 'https://tcga-pancan-atlas-hub.s3.us-east-1.amazonaws.com:443/download/EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena.gz': HTTP status was '403 Forbidden'
4: In download.file(url, destfile, ...) :
  cannot open URL 'https://tcga-pancan-atlas-hub.s3.us-east-1.amazonaws.com:443/download/EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena': HTTP status was '403 Forbidden'
5: In download.file(url, destfile, ...) :
  cannot open URL 'https://tcga-pancan-atlas-hub.s3.us-east-1.amazonaws.com:443/download/EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena': HTTP status was '403 Forbidden'
6: In download.file(url, destfile, ...) :
  cannot open URL 'https://tcga-pancan-atlas-hub.s3.us-east-1.amazonaws.com:443/download/EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena': HTTP status was '403 Forbidden'

more detail in NEWS.md?

hi, thanks for including changes in your NEWS file https://github.com/ropensci/UCSCXenaTools/blob/master/NEWS.md

I wonder if you could include some details of what was done in each release? For example, instead of just

* #14 fixed

Include some details of what was done so users can quickly get a sense for the changes that were made

* fixed wrong url in the vignette (#14)

and having the issue number in parens will link to the issue on github

API function for querying single gene or sample does not work

Use .p_dataset_probe_values and .p_dataset_gene_probe_avg as example.

library(UCSCXenaTools)
hub = "https://pancanatlas.xenahubs.net"
dataset = "EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena"
samples = c("TCGA-02-0047-01","TCGA-02-0055-01")
probes =c("TP53", "RB1")

Work:

> .p_dataset_probe_values(hub, dataset, samples, probes)
[[1]]
  strand chromend chromstart chrom
1      + 49056122   48877911 chr13
2      -  7590868    7565097 chr17

[[2]]
      [,1]  [,2]
[1,] 10.84  9.96
[2,] 11.22 10.15

> .p_dataset_gene_probe_avg(hub, dataset, samples, probes) 
  gene                     position       scores
1 TP53   -, 7590868, 7565097, chr17  10.84, 9.96
2  RB1 +, 49056122, 48877911, chr13 11.22, 10.15

Does not work for single sample:

> .p_dataset_probe_values(hub, dataset, "TCGA-02-0055-01", probes)
[[1]]
  strand chromend chromstart chrom
1      + 49056122   48877911 chr13
2      -  7590868    7565097 chr17

[[2]]
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]
[1,]  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN   NaN   NaN   NaN   NaN   NaN   NaN
[2,]  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN   NaN   NaN   NaN   NaN   NaN   NaN

  gene                     position                                                                    scores
1 TP53   -, 7590868, 7565097, chr17 NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN
2  RB1 +, 49056122, 48877911, chr13 NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN

Does not work for single probe (like gene):

> .p_dataset_probe_values(hub, dataset, samples, "TP53")
 Error in UCSCXenaTools:::.xena_post(host, UCSCXenaTools:::.call(xquery,  : 
  Internal Server Error (HTTP 500). 
> .p_dataset_gene_probe_avg(hub, dataset, samples, "TP53") 
 Error in UCSCXenaTools:::.xena_post(host, UCSCXenaTools:::.call(xquery,  : 
  Internal Server Error (HTTP 500). 

Interesting, the .p_dataset_gene_probes_values works for single gene, but not single sample

> .p_dataset_gene_probes_values(hub, dataset, samples, "TP53")
[[1]]
[[1]]$position
  strand chromend chromstart chrom
1      -  7590868    7565097 chr17

[[1]]$name
[1] "TP53"


[[2]]
      [,1] [,2]
[1,] 10.84 9.96

> .p_dataset_gene_probes_values(hub, dataset, "TCGA-02-0047-01", "TP53")
[[1]]
[[1]]$position
  strand chromend chromstart chrom
1      -  7590868    7565097 chr17

[[1]]$name
[1] "TP53"


[[2]]
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]
[1,]  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN   NaN   NaN   NaN   NaN   NaN   NaN

treehouse update fails

Hi Shixiang,

Thank you for developing the package! It is very easy to use. However, I failed to update the dataset of treehouse. See below.
Thanks!

> packageVersion("UCSCXenaTools")
[1] ‘1.3.1’
> XenaDataUpdate()
=> Obtaining info from UCSC Xena hubs...
==> Searching cohorts for host https://ucscpublic.xenahubs.net...
==> Trying #1
===> #37 cohorts found.
===> Querying datasets info...
===> #114 datasets found.
...
==> Searching cohorts for host https://xena.treehouse.gi.ucsc.edu...
==> Trying #1
==> Trying #2
==> Trying #3
Error in value[[3L]](cond) : 
  Tried 3 times but failed, please check URL or your internet connection!

DownloadTCGA下载报错

downloadTCGA(project = "OV", data_type = "Phenotype", file_type = "Clinical Information", destdir = tempdir())
This will check url status, please be patient.
错误: Evaluation error: An unknown option was passed in to libcurl.

诗翔师兄你好,我在用ucscxenatools 出现上面的问题,下载不了数据,还有一个小问题是,GDC TCGA 和TCGA的数据有什么区别啊,为什么Xena要重复放这批数据? 麻烦师兄了!

---计算所-志强

Error downloading CCLE datasets from publicHub

Hi,

I'm trying to download CCLE files, but I get the file missing message:

cannot open URL 'https://ucscpublic.xenahubs.net/download/ccle/CCLE_copynumber_2013-12-03.seg.txt': HTTP status was '404 Not Found'

The code I use:

mysets <- XenaGenerate(subset = XenaHostNames=="publicHub") %>%
    XenaFilter(filterCohorts = "CCLE")
XenaQuery(mysets) %>%
    XenaDownload() -> ccle_download

If I try the same with MAGIC datasets, it works fine.

Explanation of terms

First of all thank you for developing this package. I am new to clinical analysis and this package and the vignettes / examples were a good start.

Regarding the survival analysis vignette, I have been trying to find a resource that explains / maps the variable names in the clinical data table and those used by the studies / Xena. For example the term OS.time doesn't show up in my searches of both the Xena portal or the Pan-Cancer Atlas. I am assuming time to remission, but it is just a guess. My question is there is a metadata table that explains what OS.time (and other terms) mean?

cheers,
António

引入 R actions

From https://github.com/thomasp85/patchwork/blob/master/.github/workflows/R-CMD-check.yaml

on: [push, pull_request]

name: R-CMD-check

jobs:
  R-CMD-check:
    runs-on: ${{ matrix.config.os }}

    name: ${{ matrix.config.os }} (${{ matrix.config.r }})

    strategy:
      fail-fast: false
      matrix:
        config:
        - { os: windows-latest, r: '3.6', args: "--no-manual"}
        - { os: macOS-latest, r: '3.6'}
        - { os: macOS-latest, r: 'devel', args: "--no-manual"}
        - { os: ubuntu-16.04, r: '3.2', cran: "https://demo.rstudiopm.com/all/__linux__/xenial/latest", args: "--no-manual" }
        - { os: ubuntu-16.04, r: '3.3', cran: "https://demo.rstudiopm.com/all/__linux__/xenial/latest", args: "--no-manual" }
        - { os: ubuntu-16.04, r: '3.4', cran: "https://demo.rstudiopm.com/all/__linux__/xenial/latest", args: "--no-manual" }
        - { os: ubuntu-16.04, r: '3.5', cran: "https://demo.rstudiopm.com/all/__linux__/xenial/latest", args: "--no-manual" }
        - { os: ubuntu-16.04, r: '3.6', cran: "https://demo.rstudiopm.com/all/__linux__/xenial/latest", args: "--no-manual" }

    env:
      R_REMOTES_NO_ERRORS_FROM_WARNINGS: true
      CRAN: ${{ matrix.config.cran }}

    steps:
      - uses: actions/checkout@v1

      - uses: r-lib/actions/setup-r@master
        with:
          r-version: ${{ matrix.config.r }}

      - uses: r-lib/actions/setup-pandoc@master

      - uses: r-lib/actions/setup-tinytex@master
        if: contains(matrix.config.args, 'no-manual') == false

      - name: Cache R packages
        uses: actions/cache@v1
        with:
          path: ${{ env.R_LIBS_USER }}
          key: ${{ runner.os }}-r-${{ matrix.config.r }}-${{ hashFiles('DESCRIPTION') }}

      - name: Install system dependencies
        if: runner.os == 'Linux'
        env:
          RHUB_PLATFORM: linux-x86_64-ubuntu-gcc
        run: |
          Rscript -e "install.packages('remotes')" -e "remotes::install_github('r-hub/sysreqs')"
          sysreqs=$(Rscript -e "cat(sysreqs::sysreq_commands('DESCRIPTION'))")
          sudo -s eval "$sysreqs"
      - name: Install dependencies
        run: Rscript -e "install.packages('remotes')" -e "remotes::install_deps(dependencies = TRUE)" -e "remotes::install_cran('rcmdcheck')"

      - name: Check
        run: Rscript -e "rcmdcheck::rcmdcheck(args = '${{ matrix.config.args }}', error_on = 'warning', check_dir = 'check')"

      - name: Upload check results
        if: failure()
        uses: actions/upload-artifact@master
        with:
          name: ${{ runner.os }}-r${{ matrix.config.r }}-results
          path: check

      - name: Test coverage
        if: matrix.config.os == 'macOS-latest' && matrix.config.r == '3.6'
        run: |
          Rscript -e 'remotes::install_github("r-lib/covr@gh-actions")'
          Rscript -e 'covr::codecov(token = "${{secrets.CODECOV_TOKEN}}")'

bad option check in fetch_dense_values

> host = "https://toil.xenahubs.net"
> dataset = "tcga_RSEM_gene_tpm"
> samples = c("TCGA-02-0047-01","TCGA-02-0055-01","TCGA-02-2483-01","TCGA-02-2485-01")
> probes = c('ENSG00000282740.1', 'ENSG00000000005.5', 'ENSG00000000419.12')
> genes =c("TP53", "RB1", "PIK3CA")
> fetch_dense_values(host, dataset, genes, samples, check = TRUE, use_probeMap = TRUE)
Checking identifiers...
The following identifiers have been removed fro host https://toil.xenahubs.net dataset tcga_RSEM_gene_tpm
[1] NA NA NA
Done.
Checking samples...
Done.
Checking if the dataset has probeMap...
Done. ProbeMap is found.
Error in dimnames(x) <- dn : 'dimnames'的长度[2]必需与陈列范围相等

Function naming strategy

Comments from ropensci/software-review#315

  • Argument naming is not consistent - for example, the fetch_ API functions are in snake_case but a number of other functions are camel-cased with the first letter capitalized (i.e. XenaScan) or regular camel-cased (getTCGAdata). It would be helpful to adopt a similar casing style even if snake_case doesn't work because of consistency with other tools for these data sets.

This issue should be fixed in the next incompatible version.

增加ProbeMap下载

有些数据集有探针用于各种ID的转换,可以在XenaQuery()中支持这个

New feature: XenaExperiment?

This comes from to do list of xenaR, maybe I can implement it.

XenaExperiment() to represent a collection of datasets from XenaHub(), subset to contain specific samples and features.

Basic data retirieval of all or part of the assays present in a XenaExperiment.

xenaPython对外开放的API函数

from . import xenaQuery as xena

def Gene_values (hub, dataset, samples, gene):
    values = xena.dataset_gene_values (hub, dataset, samples, [gene])
    return values[0]["scores"][0]

def Genes_values (hub, dataset, samples, genes):
    values = [x["scores"][0] for x in xena.dataset_gene_values (hub, dataset, samples, genes)]
    return values

def Probe_values (hub, dataset, samples, probe):
    values = xena.dataset_probe_values (hub, dataset, samples, [probe])
    return values[0]

def Probes_values (hub, dataset, samples, probes):
    values = xena.dataset_probe_values (hub, dataset, samples, probes)
    return values

def dataset_samples (hub,dataset):
    return xena.dataset_samples(hub, dataset)

def dataset_fields (hub, dataset):
    return xena.dataset_field (hub, dataset)

def all_cohorts(hub):
    return xena.all_cohorts(hub)

CRAN checks

Dear maintainer,

Please see the problems shown on
https://cran.r-project.org/web/checks/check_results_UCSCXenaTools.html.

Please correct before 2021-07-24 to safely retain your package on CRAN.

It seems we need to remind you of the CRAN policy:

'Packages which use Internet resources should fail gracefully with an informative message
if the resource is not available or has changed (and not give a check warning nor error).'

This needs correction whether or not the resource recovers.

The CRAN Team

移除warning

In dir.create(i, recursive = TRUE) : 'data/Xena' already exists
Error: Unable to establish connection with R session

can not access the GDC dataset

> XenaGenerate(subset = XenaHostNames=="gdcHub") %>% 
+   XenaFilter(filterDatasets = "methylation|phenotype") %>% 
+   XenaFilter(filterDatasets = "UCS") -> df_todo
> XenaQuery(df_todo) %>%
+   XenaDownload() -> xe_download
This will check url status, please be patient.
All downloaded files will under directory /tmp/RtmpciLecI.
The 'trans_slash' option is FALSE, keep same directory structure as Xena.
Creating directories for datasets...
'/tmp/RtmpciLecI/TCGA-UCS/Xena_Matrices' already exists'/tmp/RtmpciLecI/TCGA-UCS/Xena_Matrices' already exists/tmp/RtmpciLecI/TCGA-UCS/Xena_Matrices/TCGA-UCS.GDC_phenotype.tsv.gz, the file has been download!
/tmp/RtmpciLecI/TCGA-UCS/Xena_Matrices/TCGA-UCS.methylation450.tsv.gz, the file has been download!


获取 sparse 数据的接口

.p_sparse_data("https://ucscpublic.xenahubs.net", "ccle/CCLE_DepMap_18Q2_maf_20180502",
               samples = list("HCE4_OESOPHAGUS", "NCIH2818_PLEURA"), genes = list("TP53"))
.p_sparse_data_examples("https://ucscpublic.xenahubs.net", "ccle/CCLE_DepMap_18Q2_maf_20180502", 2)
UCSCXenaTools::fetch_dataset_samples("https://ucscpublic.xenahubs.net", "ccle/CCLE_DepMap_18Q2_maf_20180502")

写一个 fetch_sparse_value

Cannot query copy number data

This works in xenaPython

hub = "https://tcga.xenahubs.net" 
dataset = "TCGA.PANCAN.sampleMap/Gistic2_CopyNumber_Gistic2_all_data_by_genes"   
samples = ["TCGA-02-0047-01","TCGA-02-0055-01","TCGA-02-2483-01","TCGA-02-2485-01"] 
In [15]: xena.dataset_fetch(hub, dataset, samples, ["TP53"])                                                                         
Out[15]: [[-0.012, -0.323, -0.033, -0.025]]

In [16]: xena.dataset_probe_values(hub, dataset, samples, ["TP53"])                                                                  
Out[16]: [None, [[-0.012, -0.323, -0.033, -0.025]]]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.