Giter Site home page Giter Site logo

tcga.data's Introduction

TCGA.DATA R Package

This R Package allows to retrieve Gene Expression, Mutation and clinical data from TCGA database (The Cancer Genome Atlas). It retrieves a single type of cancer at a time.

We publish diferent package in the releases page that allow to quickly use the datasets.

The genome expression datasets are already in a matrix format ready to be used. The data is in FPKM (Fragments Per Kilobase Million) format. Any additional normalization to use in models must be performed

Package information

How to use the dataset

  1. Install brca.data by using devtools package. (brca.data, prad.data or skcm.data)

  2. Load the library

  3. Load the required datasets (one or more of the following)

    • multiAssay
    • gdc.original

In older versions of this package, prior to September 2018, the dataset was named fpkm.per.tissue or mutation, but we since improved the storage using a MultiAssayExperiment object from bioconductor.

To recover the datasets in the old matrix format use the following

data('multiAssay')
fpkm.data <- build.matrix('RNASeqFPKM', multiAssay)
fpkm.per.tissue <- fpkm.data$data
fpkm.clinical   <- fpkm.data$clinical

Example for BRCA package

# The library can also be loaded and use the function install_git without 'devtools::' prefix
BiocManager::install('https://github.com/averissimo/tcga.data/releases/download/2016.12.15-brca/brca.data_1.0.tar.gz')
#
# Load the brca.data package
library(brca.data)
# start using the data, for example the tissue data
data(fpkm.per.tissue)
# tissue is now in the enviromnet and will be loaded on the first
#  time it is used. For example:
names(fpkm.per.tissue)

How to build own data package

  1. Open vignettes/build_data.Rmd
  2. Change in the header of the Rmd (beginning of the document) the project param to the target TCGA project
  3. Open DESCRITION and change the name of the package to the desired name
  • we use a convention of ####.data where #### is the tcga project name in lowercase
  1. Run the vignettes/build_data.Rmd to build the cache of the data
  2. Run devtools::document() to create documentation
  3. Run devtools::build() to build the actual package

Ackowledgements

This package was developed primarily by André Veríssimo with support from Marta Lopes, Eunice Carrasquinha and Susana Vinga

This work was supported by:

  • FCT, through IDMEC, under LAETA, projects (UID/EMS/50022/2013);
  • Susana Vinga acknowledges support by program Investigador FCT (IF/00653/2012) from FCT, co-funded by the European Social Fund (ESF) through the Operational Program Human Potential (POPH);
  • André Veríssimo acknowledges support from FCT (SFRH/BD/97415/2013).

tcga.data's People

Contributors

averissimo avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.