Giter Site home page Giter Site logo

cmuschwartzlab / sc-tusv-ext Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 0.0 321 KB

Sc-TUSV-ext: Tumor evolutionary tree reconstruction from single-cell DNA-seq data of single nucleotide variants (SNV), copy number alterations (CNA) and structural variants (SV).

Python 98.80% Shell 1.20%

sc-tusv-ext's Introduction

Sc-TUSV-ext

Sc-TUSV-ext is an integer linear programming (ILP) based tumor clonal lineage inference method using single nucleotide variants (SNV), copy number alterations (CNA) and structural variants (SV) from single-cell DNA sequencing data.

image

Installation

To run Sc-TUSV-ext, two separate environments are required.

  • The first environment is for MEDICC2. You can create the medicc2 environment by running the following commands:

    conda create -n medicc_env
    
    conda activate medicc_env
    
    conda install -c bioconda -c conda-forge medicc2
    

    Or, please follow the instructions as described in https://bitbucket.org/schwarzlab/medicc2/src/master/.

  • The second environment is for running the program with python 2.7. This will need the following commands:

    conda create -n sctusvext python=2.7
    conda activate sctusvext
    conda config --add channels conda-forge
    conda config --add channels bioconda
    

    Then, you will need the following packages in the sctusvext environment.
    - numpy
    - pandas
    - ete2
    - gurobipy
    - graphviz
    - biopython
    - scipy
    - PyVCF

  • We use the Gurobi optimzer for Sc-TUSV-ext. To acquire Gurobi license, you can sign up as an academic user in the Gurobi website - https://www.gurobi.com/downloads/end-user-license-agreement-academic/.

Inputs and Outputs

Input

The input folder should contain the processed variant called scDNAseq files in VCF format. An example can be found in the example/sample/ folder.

Outputs

  • Medicc2 output folder: Output of the MEDICC2 method.
  • clusters.tsv: The clone assignments for the single-cells according to the MEDICC2 distances. This file is saved inside the MEDICC2 output folder.
  • T.dot: Output tree with the clone assignments in the nodes and phylogenetic cost/number of SNV and SV mapped in the branches.
  • Z.tsv: Clone assignment matrix.
  • C.tsv: The estimated copy numbers of the clones.

Instructions for running

We suggest to run and change the sctusvext.sh file as per user's need. The command to run the file:

./sctusvext.sh input_folder output_folder number_of_leaves

For example, if you wish to have 3 leaves in the tree, i.e. 5 clones, the following command should be run -

./sctusvext.sh example/sample/ example/sample_output 3

In addition to this, we support the framework to run with a set of different settings accessible through the line 13 of sctusvext.sh file. The settings are the following:

python sc-tusv-ext.py -i example/sample -o "example/sample_output/output_sctusvext" -c2cl "example/sample_output/medicc2_output/clusters.tsv" -n 3 -c 10 -t 2 -r 2 -p 8 -m 1000 -col -b -sv_ub 80 -C 120 

Following inputs are mandatory:

  • -i : input folder
  • -o : output folder
  • -c2cl : clone assignment file.
  • -n : number of leaves.
  • -c : maximum copy number allowed for any breakpoint or segment on any node
  • -t : maximum number of coordinate-descent iterations
  • -r : number of random initializations of the coordinate-descent algorithm
  • -col : binary flag whether to collapse the redundant nodes
  • -sv_ub : the number of subsampled SV breakpoints
  • -const : number of total subsampled breakpoints and SNVs
  • -m : maximum time (seconds) in each coordinate descent iteration

Optional parameters:

  • -b : binary flag for the regularization parameters to be set automatically
  • -l : lambda regularization parameter for weighting the phylogenetic cost
  • -p : number of processors to use (uses all the available cores by default)
  • -s : number of segments (in addition to those containing breakpoints) that are randomly kept (default keeps all the segments)

sc-tusv-ext's People

Contributors

nishatbristy007 avatar

Stargazers

 avatar  avatar

Watchers

 avatar

sc-tusv-ext's Issues

error running example

Hello,

I thought I would give your tool a go after seeing the pre print last month :)

I have made the necessary conda enviornments and am now trying to run on the example data with this command: bash sctusvext.sh example/sample output_test 3.

However I am currently getting this error:

('The mutations at ', array([], dtype=int64), ' will be removed due to non-existing bp in CNV')
('The mutations at ', array([], dtype=int64), ' will be removed due to non-paired breakpoints')
('The mutations at ', array([], dtype=int64), ' will be removed due to non-paired breakpoints')
('The mutations at ', array([], dtype=float64), ' will be removed due to non-existing position for unsampled SNVs in CNV')
Traceback (most recent call last):
  File "sc-tusv-ext.py", line 769, in <module>
    main(sys.argv[1:])
  File "sc-tusv-ext.py", line 51, in main
    unmix(args['input_directory'], args['output_directory'], args['cell_to_clone_file'],args['num_leaves'], args['c_max'], args['lambda1'], args['restart_iters'], args['cord_desc_iters'], args['processors'], args['time_limit'], args['metadata_file'], args['num_subsamples'], args['overide_lambdas'], args['constant'], args['sv_upperbound'], args['only_leaf'], args['collapse'], args['threshold'], args['multi_num_clones'],args['set_root'])
  File "sc-tusv-ext.py", line 61, in unmix
    F_unsampled_info_phasing, sampled_snv_list_sort, unsampled_snv_list_sort, sampled_sv_list_sort, unsampled_sv_list_sort, df_clones = gm.get_mats(in_dir, c2cl, n, const=const, sv_ub=sv_ub)
  File "help/generate_matrices.py", line 178, in get_mats
    F_unsampled_phasing = compute_F_mode(F_unsampled_phasing, sampleList, df_clones)
  File "help/generate_matrices.py", line 59, in compute_F_mode
    group_data = F[indices, :]
IndexError: too many indices for array

Do you know how I can fix this?

Thanks! ๐Ÿ˜„

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.