Giter Site home page Giter Site logo

shizhaojingszj / cancerlocator Goto Github PK

View Code? Open in Web Editor NEW

This project forked from dangertrip/cancerlocator

0.0 1.0 0.0 16.59 MB

A Java package for non-invasive cancer diagnosis using methylation profiles of cell-Free DNA.

License: MIT License

Batchfile 0.42% Shell 0.42% Java 99.16%

cancerlocator's Introduction

CancerLocator

A tool for cell-free DNA based cancer diagnosis and tissue-of-origin prediction.

Prerequisites

Java 1.8
Apache Commons Math (if you want to build the source)

Usage

java -jar CancerLocator.jar config_file

config_file is a configuration file in Java Properties format.

Options in the configuration file:
trainFile: the traning file, only methylation values needed
testMethyFile: the testing file with methylation values
testDepthFile: the testing file with number of CpG measurements for each cluster
typeMappingFile: the file used to map sample types to prediction classes
resultFile: the output file
thetaStep: the interval of theta values used in the inference
methylationRangeCutoff: methylation range cutoff used for feature filtering
logLikelihoodRatioCutoff: the cutoff of log-likelihood ratio used in prediction
nThreads: number of threads

File Instruction

All the input and output files are tab delimited. trainFile, testMethyFile and testDepthFile must have the same number of columns.

trainFile:
Each line represents a training sample. The first column is the sample type and the remaining columns are methylation values (beta values) of the features.

testMethyFile:
Each line represents a training sample. The first column is the sample ID and the remaining columns are methylation values (beta values) of the features.

testDepthFile:
Each line represents a training sample. The first column is the sample ID and the remaining columns are CpG counts on reads aligned to these clusters.

typeMappingFile:
Column 1: the sample types
Column 2: corresponding classes in the prediction
For example, the following two lines in this file indicates that both LUAD and LUSC samples would be considered as lung cancer in the prediction.
LUAD lung cancer
LUSC lung cancer

resultFile:
Column 1: sample ID
Column 2: likelihood ratio in logarithmic scale
Column 3: predicted blood tumor burden (i.e. theta value)
Column 4: predicted sample calss (normal or one of the cancer tissues)

Example

The examples of all the files needed to run CancerLocator are provided under the "example" folder.

To run the example:
Linux/OSX:
./run_example.sh

Windows:
run_example.cmd

Reference

CancerLocator: non-invasive cancer diagnosis and tissue-of-origin prediction using methylation profiles of cell-free DNA
Genome Biology, March 2017 18:53. DOI:10.1186/s13059-017-1191-5
Shuli Kang, Qingjiao Li, Quan Chen, Yonggang Zhou, Stacy Park, Gina Lee, Brandon Grimes, Kostyantyn Krysan, Min Yu, Wei Wang, Frank Alber, Fengzhu Sun, Steven M. Dubinett*, Wenyuan Li*, Xianghong Jasmine Zhou* (* Joint corresponding author)

Contact

[email protected]

cancerlocator's People

Contributors

jasminezhoulab avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.