Giter Site home page Giter Site logo

raphael-group / hotnet2 Goto Github PK

View Code? Open in Web Editor NEW
96.0 19.0 43.0 2.86 MB

HotNet2 is an algorithm for finding significantly altered subnetworks in a large gene interaction network

License: Other

Python 62.78% C 0.27% Fortran 2.86% HTML 32.01% Shell 2.07%

hotnet2's Introduction

HotNet2

We have introduced a new method, Hierarchical HotNet, that improves on HotNet2. As a result, HotNet2 is no longer actively updated. Please see the Hierarchical HotNet manuscript and GitHub repository for details.


HotNet2 identifies subnetworks of a protein-protein interaction network with more mutations ("heat") than expected.

HotNet2 was developed by the Raphael research group at Brown University.

Setup

Requirements

Latest tested version in parentheses:

  1. Python (2.7.12)

    a. NumPy (1.12.1)

    b. SciPy (0.19.0)

    c. NetworkX (1.11)

    d. h5py (2.7.0)

  2. gcc and gfortran (5.4.0)

Python dependencies

We recommend using virtualenv to install the Python requirements. After installing virtualenv, you can install the Python requirements for HotNet2 as follows:

virtualenv venv
source venv/bin/activate
pip install -r requirements.txt

Compilation

The C and Fortran extensions are not required, but will significantly speed up HotNet2. You can compile them as follows:

python hotnet2/setup_fortran.py build_src build_ext --inplace
python hotnet2/setup_c.py build_src build_ext --inplace

Usage

Recent update

We recently updated HotNet2 to simplify usage. This update requires some updates to your scripts and data. Consult the previous README for comparison.

Data preprocessing

  • Heat scores. Use makeHeatFile.py to create a JSON file of gene heat scores (weights). These scores can be generated from mutation data or from several other formats. HotNet2.py requires a path to at least one heat file as input.
  • Interaction network. Use makeNetworkFiles.py to generate the network files required for running HotNet2. These include the influence matrix of your input network, and permuted networks and their corresponding influence matrices. Each of these files are in HDF5 (.h5) format. HotNet2.py requires at least one network file and path to the permuted network files as input.

See paper/paper_commands.sh for examples of using the makeHeatFile.py and makeNetworkFiles.py scripts.

HotNet2

After generating a heat file and the network files using the scripts above, use the HotNet2.py script to run HotNet2. The minimum arguments required for HotNet2.py are as follows:

python HotNet2.py -nf <network_file> -pnp <permuted_networks_path> -hf <heat_file> -o <output_directory>

See paper/paper_commands.sh for an example of using the HotNet2.py scripts with outputs of the makeHeatFile.py and makeNetworkFiles.py scripts.

The output of HotNet2.py consists of a directory containing the following:

  • {network_name}-{heat_name}/: For each (network, heat score) pair, HotNet2.py outputs a directory of results. The directory contains subdirectories starting with "delta" for each delta parameter tested, each of which contain the subnetworks and statistical signifciance associated with that delta parameter.
  • consensus/: The consensus/ directory contains the consensus file across all networks and heat scores,

Other usages

  1. Generate consensus from single runs. Use scripts/consensus.py to generate the consensus file from the results of HotNet2 on a single network and heat score.
  2. Create dendrogram of strongly connected components. Use scripts/createDendrogram.py to generate a dendrogram of strongly connected components in the HotNet2 exchanged heat graph.
  3. Permute a single network. Use scripts/permuteNetwork.py to create a permuted edge list from an input network.
  4. Create influence matrix. Use scripts/createInfluenceMatrix.py to create a HotNet or HotNet2 influence matrix from an input network.

Testing

See testing/README.md for testing instructions.

Example

See paper/paper_commands.sh for a short but complete set of commands for reproducing the experiments in the HotNet2 paper.

Visualization

We provide scripts to run an interactive web application to view the output of HotNet2.py, including the subnetworks in the consensus and individual runs. See viz/README.md and the wiki for additional instructions and details.

Support

Please visit the HotNet Google Group to post questions and view discussions from other users about HotNet or HotNet2, or contact us through our research group's website.

Reference

If you use HotNet2 in your work, please cite

M.D.M. Leiserson*, F. Vandin*, H.T. Wu, J.R. Dobson, J.V. Eldridge, J.L. Thomas, A. Papoutsaki, Y. Kim, B. Niu, M. McLellan, M.S. Lawrence, A.G. Perez, D. Tamborero, Y. Cheng, G.A. Ryslik, N. Lopez-Bigas, G. Getz, L. Ding, and B.J. Raphael. (2014) Pan-Cancer Network Analysis Identifies Combinations of Rare Somatic Mutations across Pathways and Protein Complexes. Nature Genetics 47, 106–114 (2015).

If you use HotNet in your work, please cite:

F. Vandin, E. Upfal, and B.J. Raphael. (2011) Algorithms for Detecting Significantly Mutated Pathways in Cancer. Journal of Computational Biology. 18(3):507-22.

F. Vandin, P. Clay, E. Upfal, and B. J. Raphael. Discovery of Mutated Subnetworks Associated with Clinical Data in Cancer. In Proc. Pacific Symposium on Biocomputing (PSB), 2012.

(* denotes equal contribution)

hotnet2's People

Contributors

jveldridge avatar matthewreyna avatar mdml avatar melkebir avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hotnet2's Issues

Gene & Module Selection

A few more questions if you don't mind:

  1. Do you have any recommendations for gene choice? Should I include all genes or a subset that are significant by some threshold?
  2. How many genes per sample? Does it matter?
  3. How many samples? Is there a lower or upper bound?
  4. How is the heat map constructed? What does each cell represent and how is the clustering done?
  5. What is the origin of the included modules?
  6. What should custom modules look like? (# of modules, number of genes covered, size of modules, connectivity etc.)

Happy if you refer me to relevant material. I may of missed these items. I've got lot's of relevant input and am trying to follow best practices.

Daryl

Consensus visualization only shows interactions from single network

The consensus visualization only shows interactions from a single network, which is not clear to the user or the intended behavior for the visualization.

From HotNet2.py and hotnet/viz.py, it looks we need a generate_viz_json function in hotnet/viz.py to handle inputs from multiple networks or another function to merge multiple outputs from generate_viz_json.

Permuted networks were not put into subdirectories

  1. configs in examples is outdated, maybe for v1.1

  2. After run of makeNetworkfiles.py, permuted networks were not put into subdirectories with name 1, 2, 3... Therefore make followwing run of HotNet2.py fails

updating release tags

The latest tagged release is from 2015. It'd be nice to keep these up-to-date to avoid working with intermediate development states and to know when to update our installations. I've just take a user request to install this on our cluster and having these things would help me out.

Thanks for your consideration!

HINT & MULTINET influence matrices

Thanks for providing the permutations for these additional protein interaction datasets mentioned in the Nature Genetics paper. However in each downloaded dataset I don't see the "dataset"_edge_list representing the true (not permuted) protein interactions within the network and neither do I see them in the main package. Are these available for download somewhere else?

Visualization color scale

Hi,
The visualization figures can be hard to interpret since they're colored according to the local max and min gene score in each figure. Can the user input max and min scores so that all the figures are colored with relation to the same scale?
Thanks,
Priya

Any comments on a similar method - MashUp?

Hi Raphael group,

Just found out about your method from a talk at CSHL, and it reminded me of another method (published 1 year after yours). Their paper actually didn't seem to mention yours at all, despite the similarities.

Just wondering, if you guys have heard about this, and if so, any comments about differences between HotNet2 and MashUp?

Can be anything: focus, concepts, technicalities, etc.

burden test

Nice paper.

Is it possible to fudge the analysis to perform on generic gene level p/q values? I've got lot's of germline burden testing results that I'd like to test.

Daryl

Tests Failing

I ran the test script checkHotNet2Consensus.sh and found that the computed results do not match the reference. How do I diagnose this issue?

This is the output from the test script:

3,7c3,6
< 0     [ay, cg, ct, fl, fw, go, hh, hs, iw, kk, kp, kr, v]
< 1     [ar, bp, by, cq, cw, dv, ep, gn, hz, io, jx, kl, s] cv, hq, hy
< 2     [cl, dl, dy, ek, et, fx, fy, gi] dx
< 3     [av, ee, is, kn] gh, iz
< 4     [cm, eg, ii, jm]
\ No newline at end of file
---
> 0     [ay, cg, cm, ct, eg, fl, fw, go, hh, hs, ii, iw, jm, kk, kp, kr, v] gy
> 1     [cl, dl, dy, ek, et, fx, fy, gi] dx
> 2     [av, ee, is, kn] gh, iz
> 3     [ar, bp, by, cq, cw, dv, ep, gn, hz, io, jx, kl, s] cv, hq, hy
\ No newline at end of file

My only guess is that this may be caused by differing versions of Python packages. Is that possible? I tried to replicate the versions listed in the README, but conda reported compatibility issues between h5py 2.4.0 and NumPy1.6.2/SciPy 0.10.1, so it wouldn't let me install the exact versions listed (NetworkX 1.7 may have also conflicted with these versions of NumPy/SciPy - I don't remember exactly). This is my current python environment:

Python 2.7.13
NumPy 1.9.3
SciPy 0.17.0
NetworkX 1.7
h5py 2.4.0

Thanks,
Liron Ganel

makeHeatFile.py scores -hf data/heats/pan12.gene2mutsig.txt or mutsig flag?

Hi @matthewreyna, I was running a test today to make sure that as many functionalities of HotNet2 are exposed. Quick question, I observed here that the filename denotes mutsig, however makeHeatFile.py scores is used (see below for line).

python $hotnet2/makeHeatFile.py \
scores \
-hf data/heats/pan12.gene2mutsig.txt \

Can you verify if is this correct?

Refering to the usage info of makeHeatFile.py, each file should be run with the appropriate arguments, and wanted to make sure everything is correct in this bit.

makeHeatFile.py \ 
mutsig --mutsig_score_file \
MUTSIG_SCORE_FILE
makeHeatFile.py \
mutation --snv_file \
SNV_FILE 
makeHeatFile.py \
scores --heat_file \
HEAT_FILE
usage: makeHeatFile.py \
music --music_score_file \
MUSIC_SCORE_FILE

Running HotNet2 over a custom network

I would like to report an issue (or maybe my lack of understanding) on

min_score = min([score for score in heat.values() if score > 0])

This line is where my execution of HotNet2 halts. Further check revealed that none of values in "heat.values()" is larger than 0. In turn, it produces an empty list and "min" function raise s an exception.

If all values are zero (which is the case for me), it means that none of genes on that specific subnetwork have heat. So I wonder how this subnetwork became a significant hit. Thats where my confusion starts.

Would you please help figure out what I am missing?

Thanks!

Request - Technical Documentation

I'd like to use the HotNet2 algorithm/pipeline inside a Jupyter Notebook and be able to embed it in other Python scripts.

I have a couple questions which could be addressed with a bit of technical documentation:

  • Is it possible for me to bring my own NetworkX (di)graph?
  • If so, what schema its node and edge data dictionaries would need to follow to run the algorithm?
  • What functions run the algorithm?
  • Where do the results go?

If you have answers, I'd be happy to write it in Sphinx format and submit a PR :) I enjoyed your paper and thanks for making this code open source.

Error in findThreshold mutations

Getting a complaint about an unreferenced local variable when I run findThreshold mutations with the following input;

findThreshold.py mutations -r MAF0.1.CADD0.mutations.iref -if /home/storage/jpriest_backup/zoran/priest_apps/hotnet2-master/influence_matrices/irefindex/index_genes_uniform -hf ./MAF0.1.CADD0.heat -n 5 -c 16 -o ./MAF0.1.CADD0.iref.deltas.mutation -mf /home/storage/jpriest_backup/zoran/priest_apps/hotnet2-master/influence_matrices/irefindex/iref_ppr_0.55.mat -glf refflat.glf -gof refflat.gof -b 0.003 --bmr_file corr.bmr.MAF0.1.CADD0.combined.bmr

  • Performing permuted mutation data delta selection...
    Traceback (most recent call last):
    File "/home/storage/jpriest_backup/zoran/priest_apps/hotnet2-master/bin/findThreshold.py", line 161, in
    run(get_parser().parse_args(sys.argv[1:]))
    File "/home/storage/jpriest_backup/zoran/priest_apps/hotnet2-master/bin/findThreshold.py", line 102, in run
    deltas = get_deltas_for_mutations(args, infmat, infmat_index, heat_params)
    File "/home/storage/jpriest_backup/zoran/priest_apps/hotnet2-master/bin/findThreshold.py", line 147, in get_deltas_for_mutations
    heat_params["min_freq"], args.num_permutations, args.num_cores)
    File "/home/storage/jpriest_backup/zoran/priest_apps/hotnet2-master/hotnet2/permutations.py", line 117, in generate_mutation_permutation_heat
    genes = set([snv.gene for snv in snvs] + [cna.gene for cna in cnas])
    UnboundLocalError: local variable 'snvs' referenced before assignment

--alpha Arguments written in hprd.config and irefindex.config

Through last update(June 12), it seems the --alpha argument required by makeRequiredPPRFiles.py was chaged into --beta argument,
but inside hprd.config and irefindex.config, --alpha hasn't been changed to beta.

hprd.config
--edgelist_file influence_matrices/hprd/hprd_edge_list
--gene_index_file influence_matrices/hprd/hprd_index_genes
--prefix hprd
--alpha 0.60
--output_dir influence_matrices/hprd

Incorrect Edges Lead to Disconnected Components

Hello,

I'm trying to run HotNet2 on the BioPlex network (located at http://bioplex.hms.harvard.edu/data/BioPlex_interactionList_v4a.tsv) . After running the program I noticed a component with a pair of nodes that was disconnected from the others. Even at the minimal component size threshold being higher than two, the pair was present in the data.

I've been trying to debug the program and I've traced the issue as best as I could to narrow it down. Here's what I've found:

  1. The two nodes are not connected to any of the other nodes that are output as being in the same component in the original network. This appears to be true in the network as read in from the hdf5 file as well as the original text file.
  2. The two nodes are connected to one other node in the network internally. This appears to happen after the weighted graph is constructed in the run_helper function.
  3. When the visualization is output, the program looks at the original network (read through the hdf5 file) and not the weighted graph constructed by the similarity matrix, so that pair of nodes is disconnected.

I have a hard time understanding how the simliarity_matrix() and weighted_graph() functions are supposed to work, and my Python skills aren't great, so I've hit a wall in debugging this. Overall though, it looks like there's some errors with how the network is represented internally in the program.

Please let me know if I can be of more help here. I know I haven't provided many specifics, but I haven't figured out how to narrow the test case down to a small example that's trivially reproduced.

Error exsecute HotNet2

Good morning,
I tried executing HotNet2 on a Mac with Python 3.6.5 (command: python -V). As recommended on the README.txt file (present in the project’s root), I used virtualenv (v. 16.2.0, installed with the command: pip install virtualenv).

I used the “paper_commands.sh” file to see the commands to be executed via terminal. I based myself on:
> python ../makeNetworkFiles.py -e  data/networks/hint+hi2012/hint+hi2012_edge_list -i  data/networks/hint+hi2012/hint+hi2012_index_gene -nn hint+hi2012 -p  hint+hi2012 -b  0.4 -o  data/networks/hint+hi2012 -np 100 -c  1

Executing it, it gives me this error:
File "makeNetworkFiles.py", line 60
if not args.only_permutations:
^
SyntaxError: invalid syntax

The error is related to the -op (only permutation) parameter which is not defined in the command. So, I did some tests adding the file “data/heats/pan12.gene2freq.txt”, present in the project, but I still do not work (so it becomes python ../makeNetworkFiles.py -e  data/networks/hint+hi2012/hint+hi2012_edge_list -i  data/networks/hint+hi2012/hint+hi2012_index_gene -nn hint+hi2012 -p  hint+hi2012 -b  0.4 -o  data/networks/hint+hi2012 -np 100 -c  1 -op data/heats/pan12.gene2freq.txt).

Later, I went down to the "example" folder. By consulting the README.txt file in the folder I tried to run the command:
> python makeRequiredPPRFiles.py @example/configs/influence_matrix.config

Even though it was executed, it still couldn’t find the file “makeRequiredPPRFiles.py”, giving me the error:

python: can't open file 'makeRequiredPPRFiles.py': [Errno 2] No such file or directory

Do you have any suggestion?

Best regards

Alessandro LUMACA

Preparing the permutation files for hotnet2

Hi,
I have some difficulties when preparing the permutation files.
Firstly, I don't want to generate the permutation files by my self, but use the 1000 files you provided. I generated the influence matrix by createPPRMat.py from the old version of hotnet2. However, I noted that I also need a h5 file for the -pnp argument in hotnet2.py, but not a directory containing 1000 files. I do not know how to create this file if I just want to use the permutation files you provided on this website.
Secondly: so, I tried to create all of the permutation files by myself by the makeNetworkFiles.py. I followed the example in paper/paper_commands.sh. But an error occurred:
Traceback (most recent call last): File "/home/ruibinxi_pkuhpc/lustre1/software/hotnet2-master/makeNetworkFiles.py", line 92, in <module> run(get_parser().parse_args(sys.argv[1:])) File "/home/ruibinxi_pkuhpc/lustre1/software/hotnet2-master/makeNetworkFiles.py", line 65, in run save_diffusion_to_file( HOTNET2, args.beta, args.gene_index_file, args.edgelist_file, pprfile, params=params) File "/lustre1/ruibinxi_pkuhpc/software/hotnet2-master/hotnet2/network.py", line 81, in save_diffusion_to_file hnio.save_hdf5(output_file, output) File "/lustre1/ruibinxi_pkuhpc/software/hotnet2-master/hotnet2/hnio.py", line 420, in save_hdf5 f = h5py.File(file_path, 'a') File "/lustre1/ruibinxi_pkuhpc/jzj/anaconda2/lib/python2.7/site-packages/h5py/_hl/files.py", line 271, in __init__ fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr) File "/lustre1/ruibinxi_pkuhpc/jzj/anaconda2/lib/python2.7/site-packages/h5py/_hl/files.py", line 115, in make_fid fid = h5f.create(name, h5f.ACC_EXCL, fapl=fapl, fcpl=fcpl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/home/ilan/minonda/conda-bld/h5py_1490028130695/work/h5py/_objects.c:2846) File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/home/ilan/minonda/conda-bld/h5py_1490028130695/work/h5py/_objects.c:2804) File "h5py/h5f.pyx", line 98, in h5py.h5f.create (/home/ilan/minonda/conda-bld/h5py_1490028130695/work/h5py/h5f.c:2290) IOError: Unable to create file (Unable to open file: name = '/home/ruibinxi_pkuhpc/lustre1/software/hotnet2-master/paper/data/networks/hint+hi2012/hint+hi2012_ppr_0.4.h5', errno = 17, error message = 'file exists', flags = 15, o_flags = c2)
However, at the beginning, there is no hint+hi2012_ppr_0.4.h5 in the output directory, this file is indeed created by makeNetworkFiles.py itself. So I was confused. It stopped because a file created by itself existed...... Could you give me some advice? (If my first problem could be solved and I can skip the permutation generating step, it is the best result I want)

Thanks a lot!
Yang

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.