ropensci / ucscxenatools Goto Github PK

:package: An R package for accessing genomics data from UCSC Xena platform, from cancer multi-omics to single-cell RNA-seq https://cran.r-project.org/web/packages/UCSCXenaTools/

Home Page: https://docs.ropensci.org/UCSCXenaTools

License: GNU General Public License v3.0

R 86.59% XQuery 11.14% TeX 2.27%

ucsc-xena downloader api-client tcga ccle icgc ucsc toil treehouse r

ucscxenatools's People

Contributors

Stargazers

Watchers

Forkers

jianguozhou3 csuxu mengchengyao huangliang0828 zhangyupisa sara-sousi yixf-self msq-123 pythseq siyangming bia-stransky xudeh

ucscxenatools's Issues

use a GitHub release after each CRAN update?

so that the next rOpenSci newsletter might include the update https://ropensci.org/blog/2021/03/16/ropensci-news/

xena的下载链接全局有了更新

https://xenabrowser.net/datapages/?dataset=Caldas2007%2FchinSFGenomeBio2007_genomicMatrix&host=https%3A%2F%2Fucscpublic.xenahubs.net&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443

treehouse 还没有上线

https://xenabrowser.net/datapages/?host=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443

CRAN 检查 README 也报错了

Hiplot server deploy and mirror setting

RemoveGermlineCNV 没有移除种系CNV?

尊敬的作者，您好，我最近在使用咱们的工具下载拷贝数变异数据，但有个问题想咨询您，我利用

AA=getTCGAdata(project = 'OV',GisticCopyNumber = TRUE, Gistic2Threshold = FALSE,
               download = TRUE, RemoveGermlineCNV = FALSE)

AA=getTCGAdata(project = 'OV',GisticCopyNumber = TRUE, Gistic2Threshold = FALSE,
               download = TRUE, RemoveGermlineCNV = TRUE)

下载的是一样的，其中移除生殖细胞这块一个是FALSE一个是TRUE，是其本身就一样还是怎么样啊？还请给予回答，这个问题对我来说很重要，多谢您啦；

检查下 CRAN not ok

https://cran.r-project.org/web/checks/check_results_UCSCXenaTools.html

missing "OS", "OS.time", "OS.unit", "RFS", "RFS.time", "RFS.unit" columns in the downloaded clinical infomration file from TCGA

Hello,

I'm following this tutorial "TCGA Pan-cancer data download" (https://xsliulab.github.io/tumor-immunogenicity-score/#data-download-and-preprocessing) to download and clean TCGA clinical data.
However, columns like "OS", "OS.time", "OS.unit", "RFS", "RFS.time", "RFS.unit" are expected to be but absent in the clinical information files.
Is this due to updates of the "UCSCXenaTools" package?

Best,
Danshu

options(use_hiplot = TRUE) doesn't work

options(use_hiplot = TRUE) doesn't work and the url "https://xena.hiplot.com.cn/" is not available anymore.

issue: couldn't download pancanAtlas data

Hi authors,
I tried to download pancancerAtlas dataset thru UCSCXEnaTools, but failed. Code is pasted below and I have tried paste the url shown in the code result, it doesn't give me proper data. Could you help me with it? Thank you!

> pcA_cohort = XenaData %>% 
+     filter(XenaHostNames == "pancanAtlasHub") # select pancanAtlas Hub
> cli_query = pcA_cohort %>% 
+     filter(DataSubtype == "gene expression RNAseq") %>%  # select RNAseq data
+     XenaGenerate() %>%  # generate a XenaHub object
+     XenaQuery() %>% 
+     XenaDownload()
This will check url status, please be patient.
All downloaded files will under directory /var/folders/k2/zhwq4hld003_vbl84g1qvxcr0000gn/T//RtmpAjrRSW.
The 'trans_slash' option is FALSE, keep same directory structure as Xena.
Creating directories for datasets...
Downloading EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena.gz
trying URL 'https://pancanatlas.xenahubs.net/download/EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena.gz'
==> Trying #2
trying URL 'https://pancanatlas.xenahubs.net/download/EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena.gz'
==> Trying #3
trying URL 'https://pancanatlas.xenahubs.net/download/EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena.gz'
Can not find fileEB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena.gz, this file maybe not compressed.
Try downloading fileEB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena...
trying URL 'https://pancanatlas.xenahubs.net/download/EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena'
==> Trying #2
trying URL 'https://pancanatlas.xenahubs.net/download/EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena'
==> Trying #3
trying URL 'https://pancanatlas.xenahubs.net/download/EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena'
Your network is bad (try again) or the data source is invalid (report to the developer).
Warning messages:
1: In download.file(url, destfile, ...) :
  cannot open URL 'https://tcga-pancan-atlas-hub.s3.us-east-1.amazonaws.com:443/download/EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena.gz': HTTP status was '403 Forbidden'
2: In download.file(url, destfile, ...) :
  cannot open URL 'https://tcga-pancan-atlas-hub.s3.us-east-1.amazonaws.com:443/download/EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena.gz': HTTP status was '403 Forbidden'
3: In download.file(url, destfile, ...) :
  cannot open URL 'https://tcga-pancan-atlas-hub.s3.us-east-1.amazonaws.com:443/download/EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena.gz': HTTP status was '403 Forbidden'
4: In download.file(url, destfile, ...) :
  cannot open URL 'https://tcga-pancan-atlas-hub.s3.us-east-1.amazonaws.com:443/download/EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena': HTTP status was '403 Forbidden'
5: In download.file(url, destfile, ...) :
  cannot open URL 'https://tcga-pancan-atlas-hub.s3.us-east-1.amazonaws.com:443/download/EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena': HTTP status was '403 Forbidden'
6: In download.file(url, destfile, ...) :
  cannot open URL 'https://tcga-pancan-atlas-hub.s3.us-east-1.amazonaws.com:443/download/EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena': HTTP status was '403 Forbidden'

New feature: XenaScan()

XenaScan is a function can be used before XenaGenerate(), it scans all rows according to user input by a regular expression (也许以后可以使用自然语言处理).

A related package.

https://github.com/VerbalExpressions/RVerbalExpressions

more detail in NEWS.md?

hi, thanks for including changes in your NEWS file https://github.com/ropensci/UCSCXenaTools/blob/master/NEWS.md

I wonder if you could include some details of what was done in each release? For example, instead of just

* #14 fixed

Include some details of what was done so users can quickly get a sense for the changes that were made

* fixed wrong url in the vignette (#14)

and having the issue number in parens will link to the issue on github

Treehouse 由于证书问题无法访问

Related to #26 #27

Twitter link https://twitter.com/UCSCXena/status/1283082867929538561

API function for querying single gene or sample does not work

Use .p_dataset_probe_values and .p_dataset_gene_probe_avg as example.

library(UCSCXenaTools)
hub = "https://pancanatlas.xenahubs.net"
dataset = "EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena"
samples = c("TCGA-02-0047-01","TCGA-02-0055-01")
probes =c("TP53", "RB1")

Work:

> .p_dataset_probe_values(hub, dataset, samples, probes)
[[1]]
  strand chromend chromstart chrom
1      + 49056122   48877911 chr13
2      -  7590868    7565097 chr17

[[2]]
      [,1]  [,2]
[1,] 10.84  9.96
[2,] 11.22 10.15

> .p_dataset_gene_probe_avg(hub, dataset, samples, probes) 
  gene                     position       scores
1 TP53   -, 7590868, 7565097, chr17  10.84, 9.96
2  RB1 +, 49056122, 48877911, chr13 11.22, 10.15

Does not work for single sample:

> .p_dataset_probe_values(hub, dataset, "TCGA-02-0055-01", probes)
[[1]]
  strand chromend chromstart chrom
1      + 49056122   48877911 chr13
2      -  7590868    7565097 chr17

[[2]]
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]
[1,]  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN   NaN   NaN   NaN   NaN   NaN   NaN
[2,]  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN   NaN   NaN   NaN   NaN   NaN   NaN

  gene                     position                                                                    scores
1 TP53   -, 7590868, 7565097, chr17 NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN
2  RB1 +, 49056122, 48877911, chr13 NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN

Does not work for single probe (like gene):

> .p_dataset_probe_values(hub, dataset, samples, "TP53")
 Error in UCSCXenaTools:::.xena_post(host, UCSCXenaTools:::.call(xquery,  : 
  Internal Server Error (HTTP 500). 
> .p_dataset_gene_probe_avg(hub, dataset, samples, "TP53") 
 Error in UCSCXenaTools:::.xena_post(host, UCSCXenaTools:::.call(xquery,  : 
  Internal Server Error (HTTP 500).

Interesting, the .p_dataset_gene_probes_values works for single gene, but not single sample

> .p_dataset_gene_probes_values(hub, dataset, samples, "TP53")
[[1]]
[[1]]$position
  strand chromend chromstart chrom
1      -  7590868    7565097 chr17

[[1]]$name
[1] "TP53"


[[2]]
      [,1] [,2]
[1,] 10.84 9.96

> .p_dataset_gene_probes_values(hub, dataset, "TCGA-02-0047-01", "TP53")
[[1]]
[[1]]$position
  strand chromend chromstart chrom
1      -  7590868    7565097 chr17

[[1]]$name
[1] "TP53"


[[2]]
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]
[1,]  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN   NaN   NaN   NaN   NaN   NaN   NaN

treehouse update fails

Hi Shixiang,

Thank you for developing the package! It is very easy to use. However, I failed to update the dataset of treehouse. See below.
Thanks!

> packageVersion("UCSCXenaTools")
[1] ‘1.3.1’
> XenaDataUpdate()
=> Obtaining info from UCSC Xena hubs...
==> Searching cohorts for host https://ucscpublic.xenahubs.net...
==> Trying #1
===> #37 cohorts found.
===> Querying datasets info...
===> #114 datasets found.
...
==> Searching cohorts for host https://xena.treehouse.gi.ucsc.edu...
==> Trying #1
==> Trying #2
==> Trying #3
Error in value[[3L]](cond) : 
  Tried 3 times but failed, please check URL or your internet connection!

DownloadTCGA下载报错

downloadTCGA(project = "OV", data_type = "Phenotype", file_type = "Clinical Information", destdir = tempdir())
This will check url status, please be patient.
错误: Evaluation error: An unknown option was passed in to libcurl.

诗翔师兄你好，我在用ucscxenatools 出现上面的问题，下载不了数据，还有一个小问题是，GDC TCGA 和TCGA的数据有什么区别啊，为什么Xena要重复放这批数据？麻烦师兄了！

---计算所-志强

Error downloading CCLE datasets from publicHub

Hi,

I'm trying to download CCLE files, but I get the file missing message:

cannot open URL 'https://ucscpublic.xenahubs.net/download/ccle/CCLE_copynumber_2013-12-03.seg.txt': HTTP status was '404 Not Found'

The code I use:

mysets <- XenaGenerate(subset = XenaHostNames=="publicHub") %>%
    XenaFilter(filterCohorts = "CCLE")
XenaQuery(mysets) %>%
    XenaDownload() -> ccle_download

If I try the same with MAGIC datasets, it works fine.

Explanation of terms

First of all thank you for developing this package. I am new to clinical analysis and this package and the vignettes / examples were a good start.

Regarding the survival analysis vignette, I have been trying to find a resource that explains / maps the variable names in the clinical data table and those used by the studies / Xena. For example the term OS.time doesn't show up in my searches of both the Xena portal or the Pan-Cancer Atlas. I am assuming time to remission, but it is just a guess. My question is there is a metadata table that explains what OS.time (and other terms) mean?

cheers,
António

判断下是不是没有基因留下了

UCSCXenaTools/R/fetch.R

Line 83 in 8b5edc7

identifiers <- identifiers[which_in]

引入 R actions

From https://github.com/thomasp85/patchwork/blob/master/.github/workflows/R-CMD-check.yaml

on: [push, pull_request]

name: R-CMD-check

jobs:
  R-CMD-check:
    runs-on: ${{ matrix.config.os }}

    name: ${{ matrix.config.os }} (${{ matrix.config.r }})

    strategy:
      fail-fast: false
      matrix:
        config:
        - { os: windows-latest, r: '3.6', args: "--no-manual"}
        - { os: macOS-latest, r: '3.6'}
        - { os: macOS-latest, r: 'devel', args: "--no-manual"}
        - { os: ubuntu-16.04, r: '3.2', cran: "https://demo.rstudiopm.com/all/__linux__/xenial/latest", args: "--no-manual" }
        - { os: ubuntu-16.04, r: '3.3', cran: "https://demo.rstudiopm.com/all/__linux__/xenial/latest", args: "--no-manual" }
        - { os: ubuntu-16.04, r: '3.4', cran: "https://demo.rstudiopm.com/all/__linux__/xenial/latest", args: "--no-manual" }
        - { os: ubuntu-16.04, r: '3.5', cran: "https://demo.rstudiopm.com/all/__linux__/xenial/latest", args: "--no-manual" }
        - { os: ubuntu-16.04, r: '3.6', cran: "https://demo.rstudiopm.com/all/__linux__/xenial/latest", args: "--no-manual" }

    env:
      R_REMOTES_NO_ERRORS_FROM_WARNINGS: true
      CRAN: ${{ matrix.config.cran }}

    steps:
      - uses: actions/checkout@v1

      - uses: r-lib/actions/setup-r@master
        with:
          r-version: ${{ matrix.config.r }}

      - uses: r-lib/actions/setup-pandoc@master

      - uses: r-lib/actions/setup-tinytex@master
        if: contains(matrix.config.args, 'no-manual') == false

      - name: Cache R packages
        uses: actions/cache@v1
        with:
          path: ${{ env.R_LIBS_USER }}
          key: ${{ runner.os }}-r-${{ matrix.config.r }}-${{ hashFiles('DESCRIPTION') }}

      - name: Install system dependencies
        if: runner.os == 'Linux'
        env:
          RHUB_PLATFORM: linux-x86_64-ubuntu-gcc
        run: |
          Rscript -e "install.packages('remotes')" -e "remotes::install_github('r-hub/sysreqs')"
          sysreqs=$(Rscript -e "cat(sysreqs::sysreq_commands('DESCRIPTION'))")
          sudo -s eval "$sysreqs"
      - name: Install dependencies
        run: Rscript -e "install.packages('remotes')" -e "remotes::install_deps(dependencies = TRUE)" -e "remotes::install_cran('rcmdcheck')"

      - name: Check
        run: Rscript -e "rcmdcheck::rcmdcheck(args = '${{ matrix.config.args }}', error_on = 'warning', check_dir = 'check')"

      - name: Upload check results
        if: failure()
        uses: actions/upload-artifact@master
        with:
          name: ${{ runner.os }}-r${{ matrix.config.r }}-results
          path: check

      - name: Test coverage
        if: matrix.config.os == 'macOS-latest' && matrix.config.r == '3.6'
        run: |
          Rscript -e 'remotes::install_github("r-lib/covr@gh-actions")'
          Rscript -e 'covr::codecov(token = "${{secrets.CODECOV_TOKEN}}")'

Wrong URL

UCSCXenaTools/vignettes/USCSXenaTools.Rmd

Line 77 in b5b097c

* Singel Cell Xena hub: <https://singlecell.xenahubs.net>

bad option check in fetch_dense_values

> host = "https://toil.xenahubs.net"
> dataset = "tcga_RSEM_gene_tpm"
> samples = c("TCGA-02-0047-01","TCGA-02-0055-01","TCGA-02-2483-01","TCGA-02-2485-01")
> probes = c('ENSG00000282740.1', 'ENSG00000000005.5', 'ENSG00000000419.12')
> genes =c("TP53", "RB1", "PIK3CA")
> fetch_dense_values(host, dataset, genes, samples, check = TRUE, use_probeMap = TRUE)
Checking identifiers...
The following identifiers have been removed fro host https://toil.xenahubs.net dataset tcga_RSEM_gene_tpm
[1] NA NA NA
Done.
Checking samples...
Done.
Checking if the dataset has probeMap...
Done. ProbeMap is found.
Error in dimnames(x) <- dn : 'dimnames'的长度[2]必需与陈列范围相等

Function naming strategy

Comments from ropensci/software-review#315

Argument naming is not consistent - for example, the fetch_ API functions are in snake_case but a number of other functions are camel-cased with the first letter capitalized (i.e. XenaScan) or regular camel-cased (getTCGAdata). It would be helpful to adopt a similar casing style even if snake_case doesn't work because of consistency with other tools for these data sets.

This issue should be fixed in the next incompatible version.

Basic data retirieval of all or part of the assays present in a XenaExperiment.

xenaPython对外开放的API函数

from . import xenaQuery as xena

def Gene_values (hub, dataset, samples, gene):
    values = xena.dataset_gene_values (hub, dataset, samples, [gene])
    return values[0]["scores"][0]

def Genes_values (hub, dataset, samples, genes):
    values = [x["scores"][0] for x in xena.dataset_gene_values (hub, dataset, samples, genes)]
    return values

def Probe_values (hub, dataset, samples, probe):
    values = xena.dataset_probe_values (hub, dataset, samples, [probe])
    return values[0]

def Probes_values (hub, dataset, samples, probes):
    values = xena.dataset_probe_values (hub, dataset, samples, probes)
    return values

def dataset_samples (hub,dataset):
    return xena.dataset_samples(hub, dataset)

def dataset_fields (hub, dataset):
    return xena.dataset_field (hub, dataset)

def all_cohorts(hub):
    return xena.all_cohorts(hub)

CRAN checks

Dear maintainer,

Please see the problems shown on
https://cran.r-project.org/web/checks/check_results_UCSCXenaTools.html.

Please correct before 2021-07-24 to safely retain your package on CRAN.

It seems we need to remind you of the CRAN policy:

'Packages which use Internet resources should fail gracefully with an informative message
if the resource is not available or has changed (and not give a check warning nor error).'

This needs correction whether or not the resource recovers.

The CRAN Team

更新下 pkgdown 列表的呈现方式

移除warning

In dir.create(i, recursive = TRUE) : 'data/Xena' already exists
Error: Unable to establish connection with R session

> XenaGenerate(subset = XenaHostNames=="gdcHub") %>% 
+   XenaFilter(filterDatasets = "methylation|phenotype") %>% 
+   XenaFilter(filterDatasets = "UCS") -> df_todo
> XenaQuery(df_todo) %>%
+   XenaDownload() -> xe_download
This will check url status, please be patient.
All downloaded files will under directory /tmp/RtmpciLecI.
The 'trans_slash' option is FALSE, keep same directory structure as Xena.
Creating directories for datasets...
'/tmp/RtmpciLecI/TCGA-UCS/Xena_Matrices' already exists'/tmp/RtmpciLecI/TCGA-UCS/Xena_Matrices' already exists/tmp/RtmpciLecI/TCGA-UCS/Xena_Matrices/TCGA-UCS.GDC_phenotype.tsv.gz, the file has been download!
/tmp/RtmpciLecI/TCGA-UCS/Xena_Matrices/TCGA-UCS.methylation450.tsv.gz, the file has been download!

获取 sparse 数据的接口

.p_sparse_data("https://ucscpublic.xenahubs.net", "ccle/CCLE_DepMap_18Q2_maf_20180502",
               samples = list("HCE4_OESOPHAGUS", "NCIH2818_PLEURA"), genes = list("TP53"))
.p_sparse_data_examples("https://ucscpublic.xenahubs.net", "ccle/CCLE_DepMap_18Q2_maf_20180502", 2)
UCSCXenaTools::fetch_dataset_samples("https://ucscpublic.xenahubs.net", "ccle/CCLE_DepMap_18Q2_maf_20180502")

写一个 fetch_sparse_value

hub = "https://tcga.xenahubs.net" 
dataset = "TCGA.PANCAN.sampleMap/Gistic2_CopyNumber_Gistic2_all_data_by_genes"   
samples = ["TCGA-02-0047-01","TCGA-02-0055-01","TCGA-02-2483-01","TCGA-02-2485-01"]

In [15]: xena.dataset_fetch(hub, dataset, samples, ["TP53"])                                                                         
Out[15]: [[-0.012, -0.323, -0.033, -0.025]]

In [16]: xena.dataset_probe_values(hub, dataset, samples, ["TP53"])                                                                  
Out[16]: [None, [[-0.012, -0.323, -0.033, -0.025]]]