Giter Site home page Giter Site logo

hetio / hetnetpy Goto Github PK

View Code? Open in Web Editor NEW
89.0 9.0 28.0 817 KB

Hetnets in Python (relocated from dhimmel/hetio)

Home Page: https://het.io/software

License: Other

Python 82.29% Jupyter Notebook 17.71%
rephetio hetnet python graph heterogeneous-network edge-prediction

hetnetpy's Introduction

Hetnets in Python

CI Status PyPI Latest DOI GitHub issues Code style: black

Overview

Hetnetpy is a Python 3 package for creating, querying, and operating on hetnets. This software provides convenient data structures for hetnets, as well as algorithms for edge prediction. It is specifically tailored and streamlined for hetnets compared to other more generic network software. See https://het.io/software for additional software packages designed specifically for hetnets.

Package relocation

Note that this package was previously named hetio, available at the following repositories:

In July 2019, the package was renamed to hetnetpy to more clearly represent its functionality and disambiguate it from other products.

Background

Hetnets: Hetnets, also called heterogeneous information networks, are graphs with multiple node and edge types. Hetnets are both multipartite and multirelational. They provide a scalable, intuitive, and frictionless structure for data integration.

Purpose: This package provides data structures for hetnets and algorithms for edge prediction. It only supports hetnets, which is its primary advantage compared to other network software. Node/edge attributes and edge directionality are supported.

Impetus: Development originated with a study to predict disease-associated genes and continues with a successive study to repurpose drugs.

Caution: Documentation is currently spotty, testing coverage is moderate, and the API is not fully stable. Contributions are welcome. Please use GitHub Issues for feedback, questions, or troubleshooting.

Installation

PyPI

To install the current PyPI version (recommended), run:

pip install hetnetpy

For the latest GitHub version, run:

pip install git+https://github.com/hetio/hetnetpy.git#egg=hetnetpy

For development, clone or download-and-extract the repository. Then run pip install --editable . from the repository's root directory. The --editable flag specifies editable mode, so updating the source updates your installation.

Once installed, tests can be executed by running py.test test/ from the repository's root directory.

Design

A Graph object stores a heterogeneous network and relies on the following classes:

  1. Graph
  2. MetaGraph
  3. Edge
  4. MetaEdge

Development

This repo uses pre-commit:

# run once per local repo before committing
pre-commit install

This following is only relevant for maintainers. Create a new release at https://github.com/hetio/hetnetpy/releases/new. GitHub Actions will build the distribution and upload it to PyPI. The version information inferred from the Git tag using setuptools_scm.

License

This repository is dual licensed, available under either or both of the following licenses:

  1. BSD-2-Clause Plus Patent License at LICENSE-BSD.md
  2. CC0 1.0 Universal Public Domain Dedication at LICENSE-CC0.md

hetnetpy's People

Contributors

ben-heil avatar dhimmel avatar gwaybio avatar kkloste avatar veleritas avatar zietzm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hetnetpy's Issues

Speeding up the Neo4j import

Hi Daniel,

What was the reasoning behind importing the nodes and edges of the hetnet using the py2neo interface? I'm finding that the import process is quite slow even for small sized networks, and was wondering whether I should look into the batch CSV import that neo4j comes with.

From my experiments it seems like importing 20000 nodes and 22000 edges into neo4j with the current code takes roughly 45 minutes on an AWS instance with 8 cores and 32 GB RAM. At this speed it would basically take forever to load the entire network, so I'm wondering if I'm missing anything here.

Best,
Toby

Permuting Specific Genes Option

I am constructing hetnets in https://github.com/greenelab/interpret-compression and am looking to generate network permutations to use as a null distribution. I am running into the issue of potentially inflated z-scores (see example of similar procedure). Could the inflation be because there are many more genes in the network than what I am comparing against?

For example, if cell 6 doesn't include the total population of genes in the hetnet, won't there be an inflation of artificial zeros in the permuted swap? Would this then cause a deflated null distribution in the matrix multiplication in cell 9?

I am wondering if there could be functionality here to permute a hetnet for only certain genes, rather than only certain metaedges. Perhaps adding a variable nodes_to_include that defaults to all nodes would help. Maybe this addition could happen before deciding to loop over nodes here:

https://github.com/dhimmel/hetio/blob/9d9ef1320ee47609e3c61c9f4918531d3c1c8c96/hetio/permute.py#L15-L18

Would it be of interest to add this functionality here?

An alternative (and perhaps an easier alternative) would be to regenerate hetnets in my original scripts to only include genes of interest.

metaedge_to_adjacency_matrix as a Graph class method?

Do we want to make metaedge_to_adjacency_matrix a method of the hetio.hetnet.Graph class? Suggested implementation:

    def metaedge_to_adjacency_matrix(self, *args, **kwargs):
        import hetio.matrix
        hetio.matrix.metaedge_to_adjacency_matrix(self, *args, **kwargs)

This is relevant for changes we're considering in hetmech, specially the construction of a HetMat class.

basic how-to

Hey all,

Is there a basic how to guide that I'm missing for this work? It looks interesting and I want to use it to explore Hetionet.

Use setup.cfg to use README.md for PyPI description

Currently, we have to use README.rst for the long_description in setuptools to created the description for PyPI.

Accoridng to this tutorial, we may be able to create a setup.cfg file containing:

[metadata]
description-file = README.md

That would be awesome. This rst format and duplicating the README has been a real annoyance. Will wait on the merging of #6, so this doesn't cause any conflicts.

Return numpy ndarrays not matrices

In our Python 3.4 builds with numpy 1.15.2, we're getting the following PendingDeprecationWarning:

/home/travis/virtualenv/python3.4.8/lib/python3.4/site-packages/numpy/matrixlib/defmatrix.py:68: PendingDeprecationWarning: the matrix subclass is not the recommended way to represent matrices or deal with linear algebra (see https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html). Please adjust your code to use regular ndarray.
  return matrix(data, dtype=dtype, copy=False)

hetio.permute.permute_pair_list modifies pair_list in place

I recently discovered that permute_pair_list modifies the pair list it is given, in place. The returned pair list is then equivalent to the pair list that was passed as an argument to the function itself. I had not expected this behavior. There is no bug in the code that causes this. I am simply wondering if this is in fact the function's desired behavior. I could imagine that one wouldn't always want to modify an edge list in place.

image

py2neo version 4 doesn't have an httpstream package

Version 4 of py2neo moves away from http towards bolt. As a result, they removed the httpstream package from their repo so anyone using version 4 will get an import error whenever neo4j.py is used.

This might be able to be fixed by doing a version check before importing py2neo.packages, or by requiring version 3 of py2neo. I'm not familiar enough with the rest of the code in neo4j.py to know which would be the better course.

hetio.abbreviation.metaedges_from_metapath breaks for integers

I have been building hetnets for MSigDB at https://github.com/greenelab/interpret-compression and have been using MSigDB collection names for hetio metagraph IDs.

However, I am receiving an error in the function metapath_from_abbrev. For example, when calling graph.metagraph.metapath_from_abbrev('GpC1')

The traceback is:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-6-2cbc2580bcf8> in <module>()
      1 DWPCs = collections.OrderedDict()
      2 for name, graph in hetnets.items():
----> 3     metapath = graph.metagraph.metapath_from_abbrev('GpC1')
      4     rows, cols, dwpc_matrix, seconds = dwpc(graph, metapath, damping=0.4)
      5     DWPCs[name] = dwpc_matrix

~/anaconda3/envs/interpret-compression/lib/python3.6/site-packages/hetio/hetnet.py in metapath_from_abbrev(self, abbrev)
    333         for metaedge_abbrev in metaedge_abbrevs:
    334             metaedge_id = hetio.abbreviation.metaedge_id_from_abbreviation(
--> 335                 self, metaedge_abbrev)
    336             metaedges.append(self.get_edge(metaedge_id))
    337         return self.get_metapath(tuple(metaedges))

~/anaconda3/envs/interpret-compression/lib/python3.6/site-packages/hetio/abbreviation.py in metaedge_id_from_abbreviation(metagraph, abbreviation)
    128     abbrev_to_kind = {v: k for k, v in metagraph.kind_to_abbrev.items()}
    129     source_kind = abbrev_to_kind[source_abbrev]
--> 130     target_kind = abbrev_to_kind[target_abbrev]
    131     metanode = metagraph.get_node(source_kind)
    132     for edge in metanode.edges:

KeyError: 'C'

It appears the root of the error is in in line 332 to the method metaedges_from_metapath:

hetio.abbreviation.metaedges_from_metapath('GpC1')
>>> ['GpC']

Which also appears to break in line 114 in hetio/abbreviation.py.

I have not looked in the specific line in too much detail but I was wondering if there would be a way to accept integers in metapath abbreviations.

pattern = regex.compile('(?<=^|[a-z<>])[A-Z]+[a-z<>]+[A-Z]+')

Py2neo 3 compatibility

I'm working on upgrading hetio to be compatible with py2neo 3. Since many parts of the rephetio pipeline use hetio, what do you think the easiest way would be to test them all?

So far I've managed to get hetio to load nodes into Neo4j 3.1.1.

Hetio v0.2.3 breaks integration code

Hi Daniel,

Just wanted to note that the new assertions you added into v0.2.3 about whether a node or edge have already been integrated into the network breaks earlier code in the integrate repository.

Looking at the blame for hetnet.py shows that those assertion lines were added only 21 days ago with commit d32ba2a, and are therefore very recent (I wasn't having these problems before).

For now I've just commented out the cells which add edges which already exist in the network, but I just wanted you to know that dhimmel/integrate/integrate.ipynb won't run directly anymore with the latest hetio release.

Automatic abbreviation code and kind_to_abbrev safety checks

For compatibility with graphs saved prior to ef0b76b and for convenience, kind_to_abbrev (the dictionary of node and edge kind abbreviations) should have the ability to be automatically generated. Most of the code to perform this task used to be in the module, so it will not be difficult.

Additionally, safety checks should be executed to ensure that abbreviations do not create ambiguity. In other words, abbreviated metapaths should unambiguously identify the actual metapath.

Package does not install with pip

The hetio package fails to properly install when using the command pip install git+https://github.com/dhimmel/hetio.git#egg=hetio as specified in the README.

From a virtual environment with pip 8.1.2 and setuptools 28.0.0 only, running pip install git+https://github.com/dhimmel/hetio.git#egg=hetio gives the following error:

Collecting hetio from git+https://github.com/dhimmel/hetio.git#egg=hetio
  Cloning https://github.com/dhimmel/hetio.git to /tmp/pip-build-7e3bcckm/hetio
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-7e3bcckm/hetio/setup.py", line 2, in <module>
        import pypandoc
    ImportError: No module named 'pypandoc'

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-7e3bcckm/hetio/

This seems to go away if you run pip install pypandoc first prior to installing, since it gets stuck at a different step (lack of the pandoc package).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.