Giter Site home page Giter Site logo

greenelab / xswap-analysis Goto Github PK

View Code? Open in Web Editor NEW
3.0 6.0 1.0 118.5 MB

Analysis and experiments for https://github.com/greenelab/xswap-manuscript

License: BSD 3-Clause "New" or "Revised" License

Jupyter Notebook 99.59% Python 0.41%
hetnets xswap permutation networks edge-prior degree notebooks tool software

xswap-analysis's Introduction

Analysis and figure generation for XSwap project

This repository hosts notebooks, images, and data for the XSwap project. The results of these analyses are described in the manuscript titled:

The probability of edge existence due to node degree: a baseline for network-based predictions

Along with the manuscript, this project produced an open-source implementation of our network permutation algorithm. Available on PyPI as xswap and source on GitHub in hetio/xswap.

Layout

The analyses for this repository are performed by sequentially numbered Jupyter notebooks in the nb directory. Data is written to the data directory and figures are exported to the img directory.

The analyses depend on the Hetionet HetMat dataset, which can be downloaded by running the following Python command from this repo's root directory:

from hetmatpy.hetmat.archive import load_archive
load_archive(
    archive_path="https://github.com/hetio/hetionet/raw/b467b8b41087288390b41fdb796577ada9f03bda/hetnet/matrix/hetionet-v1.0.hetmat.zip",
    destination_dir="data/task1/hetionet-v1.0.hetmat",
)

Environment

This repository uses conda to manage its environment as specified in environment.yml. Install the environment with:

# install new xswap-analysis environment
conda env create --file=environment.yml

# update existing xswap-analysis environment
conda env update --file=environment.yml

# export the locked environment specification, which lists every package
# installed in the environment including implicit dependencies.
# Includes build numbers and is operating system specific.
conda env export --name=xswap-analysis > environment-lock-linux.yml

Then use conda activate xswap-analysis and conda deactivate to activate or deactivate the environment.

License

The entire repository is released under a BSD 3-Clause License. Furthermore:

  • the contents of the data directory are released under CC0 (public domain dedication).
  • the contents of the img directory are released under CC BY 4.0 (public domain dedication). For images that are used in the xswap-manuscript, please attribute this manuscript as the source.

xswap-analysis's People

Contributors

dhimmel avatar zietzm avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Forkers

zietzm dhimmel

xswap-analysis's Issues

Deposit repository to Zenodo including ignored data files

refs greenelab/xswap-manuscript#64

Expanding from #26 (comment):

Larger files should be included. Total size of data/task1 is 13 GB, data/task3 3.8 GB, data/feature-degree is 1.8 GB. Will need to upload these elsewhere e.g. Zenodo

Zenodo looks like a good option. Doesn't seem like they support directories as per zenodo/zenodo#1089, so we'd want to zip archive prior to upload. We could then include files ignored by .gitignore (if we make the zip ourselves). Also should consider not including .git.

Then could add a short portion to the readme about how to populate the data directory from the Zenodo download with some shell commands.

General methods for incorporating permuted networks

I am trying to perform a simple example application of XSwap-permuted networks for generic inference tasks. The simplest example that came to mind is to take a simple network (call original network A), drop a fraction of edges (resulting in reduced network B), and attempt to predict the dropped edges. To make the method not dependent on the DWPC metric and gamma hurdle model for it, I'm using both DWPC as well as two other metrics, the Jaccard coefficient and the number of common neighbors for prediction.

Overall, my hypothesis is that degree-preserving random networks contain information that is beneficial to the edge prediction task. Specifically, the permuted networks could provide a way to isolate the confounders of node degree and network density.

It's not obvious exactly how random networks should be incorporated, though. I have a few general ideas, though all have some downsides:

  1. Concatenate metrics based on B and average metrics based on permutations of B. Then compare performance of logistic regression on these features to performance on the B features alone. This is probably the easiest method, though it wouldn't be very interpretable and wouldn't capture any potential nonlinear relationships.

  2. Compute averages of metrics across permutations of B, then subtract averages from B metrics and predict edges. This makes some assumptions about the relationships between metrics and their averages, which may or may not be valid.

  3. Compare performance based on B metrics with performance on metrics of permuted B. This is basically the (delta AUROC) method used in Rephetio. Works well on the task for which it was designed, but I don't think it is appropriate here. We are only dealing with a single metaedge, and it's not clear what we could interpret from computing a delta AUROC here.

  4. (Not yet well-developed) Compute a prior edge probability or prior metric probability using permuted networks and update some conditional probability based either on B metrics themselves or logistic-regression derived probabilities.

  5. (Not yet well-developed) Somewhat like 4 but nonparametric. Rank potential connections using metrics computed on permuted networks. Somehow combine this information with B metrics (whether the metrics themselves or logistic-regression derived probabilities) to either get probabilities for each connection or at least rank potential connections. This would allow us to easily compute an AUROC value that could be compared to a simple no-permutation edge prediction method, though it's not clear how these data could be "combined."

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.