Giter Site home page Giter Site logo

uzhdag / pathpy Goto Github PK

View Code? Open in Web Editor NEW
49.0 7.0 17.0 2.23 MB

An OpenSource python package for the analysis of time series data on networks using higher-order and multi-order graphical models.

Home Page: http://www.pathpy.net

License: GNU Affero General Public License v3.0

Makefile 0.50% Python 92.10% HTML 7.40%

pathpy's Introduction

pathpy logo

Note: This is the repository of pathpy 2. It will soon be replaced by pathpy 3, which has a new home on gitHub.

Introduction

pathpy is an OpenSource python package for the analysis of time series data on networks using higher- and multi-order network models.

pathpy is specifically tailored to analyse temporal networks as well as time series and sequence data that capture multiple short, independent paths observed in an underlying graph or network. Examples for data that can be analysed with pathpy include time-stamped social networks, user click streams in information networks, biological pathways, citation networks, or information cascades in social networks.

Unifying the modelling and analysis of path statistics and temporal networks, pathpy provides efficient methods to extract causal or time-respecting paths from time-stamped network data. The current package distributed via the PyPI name pathpy2 supersedes the packages pyTempnets as well as version 1.0 of pathpy.

pathpy facilitates the analysis of temporal correlations in time series data on networks. It uses a principled model selection technique to infer higher-order graphical representations that capture both topological and temporal characteristics. It specifically allows to answer the question when a network abstraction of time series data is justified and when higher-order network representations are needed.

pathpy facilitates the analysis of temporal correlations in time series data on networks. It uses model selection and statistical learning to generate optimal higher- and multi-order models that capture both topological and temporal characteristics. It can help to answer the important question when a network abstraction of complex systems is justified and when higher-order representations are needed instead.

The theoretical foundation of this package, higher- and multi-order network models, was developed in the following published works:

  1. I Scholtes: When is a network a network? Multi-Order Graphical Model Selection in Pathways and Temporal Networks, In KDD'17 - Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, Nova Scotia, Canada, August 13-17, 2017
  2. I Scholtes, N Wider, A Garas: Higher-Order Aggregate Networks in the Analysis of Temporal Networks: Path structures and centralities, The European Physical Journal B, 89:61, March 2016
  3. I Scholtes, N Wider, R Pfitzner, A Garas, CJ Tessone, F Schweitzer: Causality-driven slow-down and speed-up of diffusion in non-Markovian temporal networks, Nature Communications, 5, September 2014
  4. R Pfitzner, I Scholtes, A Garas, CJ Tessone, F Schweitzer: Betweenness preference: Quantifying correlations in the topological dynamics of temporal networks, Phys Rev Lett, 110(19), 198701, May 2013

pathpy extends higher-order modelling approaches towards multi-order models for paths that capture temporal correlations at multiple length scales simultaneously. All mathematical details of the framework can be found in this openly available preprint.

Illustration of Multi-Order Model

A broader view on optimal higher-order models in the analyis of complex systems can be found here.

pathpy is fully integrated with jupyter, providing rich and interactive in-line visualisations of networks, temporal networks, higher-, and multi-order models. Visualisations can be exported to HTML5 files that can be shared and published onthe Web.

Download and installation

pathpy is pure python code. It has no platform-specific dependencies and should thus work on all platforms. pathpy requires python 3.x. It builds on numpy and scipy. The latest release version 2.0 of pathpy can be installed by typing:

> pip install pathpy2

Please make sure that you use the pyPI name pathpy2 as the package name pathpy is currently blocked.

If you want to install the latest development version, you can install it directly from the gitHub repository by typing:

> pip install git+git://github.com/uzhdag/pathpy.git

Tutorial

A comprehensive 3 hour hands-on tutorial that shows how you can use pathpy to analyse data on pathways and temporal networks is available online.

An explanatory video that introduces the science behind pathpy is available here:

Watch promotional video

A promotional video showcasing some of pathpy's features is available here:

Watch promotional video

Documentation

The code is fully documented via docstrings which are accessible through python's built-in help system. Just type help(SYMBOL_NAME) to see the documentation of a class or method. A reference manual is available here https://uzhdag.github.io/pathpy/hierarchy.html

Releases and Versioning

The first public beta release of pathpy (released February 17 2017) is v1.0-beta. Following versions are named MAJOR.MINOR.PATCH according to semantic versioning. The latest release version is 2.0.0.

Known issues

  • Depending on whether or not scipy has been compiled with MKL or openblas, considerable numerical differences can occur, e.g. for eigenvalue centralities, PageRank, spectral clustering, and other measures that depend on the eigenvectors and eigenvalues of matrices. Please refer to scipy.show_config() to show compilation flags. We are investigating this issue.
  • Interactive visualisations in jupyter are currently only supported for juypter notebooks, stand-alone HTML files, and the jupyter display integrated in IDEs like Visual Studio Code (which we highly recommend to work with pathpy). Due to its new widget mechanism, interactive d3js visualizations are currently not available for jupyterLab. Due to the complex document object model generated by jupyter notebooks, visualization performance is best in stand-alone HTML files and in Visual Studio Code.
  • The visualisation module currently does not support the drawing of edge arrows for temporal networks with directed edges. However, a powerful templating mechanism is available to support custom interactive and dynamic visualisations both for static and temporal networks.
  • The visualisation of paths in terms of alluvial diagrams within jupyter is currently unstable. This is due to the asynchronous loading of external scripts and possible network latencies e.g. in wireless networks. We will replace this in a future version.

Acknowledgements

The research and development behind pathpy is generously funded by the Swiss National Science Foundation via grant 176938.

The research behind this data analytics package was previously funded by the Swiss State Secretariate for Education, Research and Innovation via grant C14.0036. The development of the predecessor package pyTempNets was supported by the MTEC Foundation in the context of the project "The Influence of Interaction Patterns on Success in Socio-Technical Systems: From Theory to Practice".

Contributors

Ingo Scholtes (project lead, development)
Luca Verginer (development, test suite integration)

Past Contributors

Roman Cattaneo (development)
Nicolas Wider (testing)

Copyright

pathpy is licensed under the GNU Affero General Public License.

(c) Copyright ETH Zürich & University of Zurich, 2015-2018

pathpy's People

Contributors

gaofangshu avatar gotec avatar ingoscholtes avatar jarolim14 avatar nanumyan avatar simonnick avatar verginer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

pathpy's Issues

save_state_file prints the full information of nodes

From pathpy created by giava90 : sg-dev/pathpy#54

It creates a file where after the #Vertices , it writes each node with all its properties, e.g.

*Vertices 5
0 "{'inweight': array([ 0.,  0.]), 'outweight': array([ 5.,  2.]), 'indegree': 0, 'outdegree': 1}"
1 "{'inweight': array([ 11.,   3.]), 'outweight': array([ 11.,   5.]), 'indegree': 2, 'outdegree': 2}"
2 "{'inweight': array([ 0.,  0.]), 'outweight': array([ 6.,  1.]), 'indegree': 0, 'outdegree': 1}"
3 "{'inweight': array([ 5.,  3.]), 'outweight': array([ 0.,  0.]), 'indegree': 1, 'outdegree': 0}"
4 "{'inweight': array([ 6.,  2.]), 'outweight': array([ 0.,  0.]), 'indegree': 1, 'outdegree': 0}"

Instead, we usually need only its name (at least when using this .net files for infomap), e.g.

0  "1"
1 "2"
2 "3"
3 "4"
4 "5"

Special characters cause bug in visualization

Using some special characters results in all nodes being on top of eachother in the visualization. The characters include, but are probably not limited to:
"',[]{}!?:;.

At least warning would be useful, if this bug proves to be hard.

Add coverage test for pull requests

From pathpy created by verginer : sg-dev/pathpy#14

Add integration for test coverage service, some examples:

  • codecov.io
  • coveralls
  • codeclimate

Having a good feel for the coverage and how it changes should allow to edit the core code more
easily and understand which parts of the code are not properly tested.

Problem with pip and anaconda scipy versions due to MKL

From pathpy created by verginer : sg-dev/pathpy#39

The results of functions which rely on scipy linear algebra (e.g. scipy.sparse.linalg.eigen) can give significant different results depending on whether scipy was compiled with MKL (i.e. conda install) or without (i.e. pip install).

The differences in results are significant. For example from the tests:

test_eigen_centrality
...
1.7205880187995493 != 1.7461339874793727
2.007652443416882 != 2.030946758

Similarly convergence is inconsistent depending on MKL.

To make this more transparent it might be good to explicitely tell the user when using affected functions through a log message (i.e. INFO) that the results might be different if an other scipy version is used.

The following invocation will list any MKL compile time flags:

In [1]: import scipy
In [2]: scipy.show_config()

openblas_lapack_info:
  NOT AVAILABLE
mkl_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['anaconda3/envs/pathpy/lib']
    define_macros = [('SCIPY_MKL_H', None)]
    include_dirs = ['anaconda3/envs/pathpy/include']
lapack_mkl_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/anaconda3/envs/pathpy/lib']
    define_macros = [('SCIPY_MKL_H', None)]
    include_dirs = ['/anaconda3/envs/pathpy/include']
lapack_opt_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/anaconda3/envs/pathpy/lib']
    define_macros = [('SCIPY_MKL_H', None)]
    include_dirs = ['/anaconda3/envs/pathpy/include']
blas_mkl_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/anaconda3/envs/pathpy/lib']
    define_macros = [('SCIPY_MKL_H', None)]
    include_dirs = ['/anaconda3/envs/pathpy/include']
blas_opt_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/anaconda3/envs/pathpy/lib']
    define_macros = [('SCIPY_MKL_H', None)]
    include_dirs = ['/anaconda3/envs/pathpy/include']

Performance Issues with layer_likelihood (sparse matrix representation)

From pathpy created by verginer : sg-dev/pathpy#50

The layer_likelihood function in tests has a bottleneck the indexing of the transition matrix
https://github.com/sg-dev/pathpy/blob/595fc2d497b7446112ad34c562582d57079f6656/pathpy/classes/multi_order_model.py#L473-L480

Some stats changing the type of sparse matrix used:

dense lil csr csc
layer_likelihood 2.8 3.5 10 27
estimate_order 16.8 17.6 24 42
with instantiation 29.1 30 41 51

Pip install fails when numpy is not installed

pip install git+git://github.com/uzhdag/pathpy.git fails with error message:

ERROR: Complete output from command python setup.py egg_info:
    ERROR: Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\Users\patrick\AppData\Local\Temp\pip-req-build-rc8ef3uw\setup.py", line 7, in <module>
        from pathpy import __version__
      File "C:\Users\patrick\AppData\Local\Temp\pip-req-build-rc8ef3uw\pathpy\__init__.py", line 10, in <module>
        from .classes import *
      File "C:\Users\patrick\AppData\Local\Temp\pip-req-build-rc8ef3uw\pathpy\classes\__init__.py", line 5, in <module>
        from .paths import Paths
      File "C:\Users\patrick\AppData\Local\Temp\pip-req-build-rc8ef3uw\pathpy\classes\paths.py", line 30, in <module>
        import numpy as np
    ModuleNotFoundError: No module named 'numpy'
    ----------------------------------------
ERROR: Command "python setup.py egg_info" failed with error code 1 in C:\Users\patrick\AppData\Local\Temp\pip-req-build-rc8ef3uw\

This is caused by setup.py importing __version__ from the package directly and therefore loading numpy which is not yet installed.

Could use a sperate file __version__.py with a __version__ variable and then load it like:

here = os.path.abspath(os.path.dirname(__file__))

with open(os.path.join(here, NAME, '__version__.py')) as f:
    exec(f.read(), about)
...
setup(
  ...
  version = about['__version__']
  ...
)

To avoid loading dependencies before they are actually installed.

Use tab as default separator

We should consider switching the default separator of adjacency list files in Network, TemporalNetwork, and HigherOrderNetwork to a tab.

For most data sets, this will minimize the chance of generating conflicts with node names,.

However, to avoid breaking backwards compatibility with files saved with the current default separator comma, we need a deprecation procedure.

Coloring nodes in temporal networks

It seems that coloring nodes is allowed only for simple networks but not for temporal networks.

n = pp.Network()
n.add_edge('a', 'b')
n.add_edge('b', 'a')
n.add_edge('a', 'c')
n.add_edge('b', 'c')
n.add_edge('c', 'd')
n.add_edge('d', 'c')

style = {
    'node_color' : {'a': '#000000'}
}

pp.visualisation.plot(n, **style)

Capture1

n = pp.TemporalNetwork()
n.add_edge('a', 'b',1)
n.add_edge('b', 'a',1)
n.add_edge('a', 'c',2)
n.add_edge('b', 'c',2)
n.add_edge('c', 'd',3)
n.add_edge('d', 'c',4)

style = {
    'node_color' : {'a': '#000000'}
}

pp.visualisation.plot(n, **style)

Capture2

Having integers as node names induces error while visualizing

Message is the following:
\pathpy\visualisation\html.py in fix_node_name(v)
236 # prefix nodes starting with number as such IDs are not supported in HTML5
237 def fix_node_name(v):
--> 238 if v[0].isdigit():
239 return "n_" + v
240 return v
Error is :TypeError: 'int' object is not subscriptable .

Code that reproduces the error:

import pathpy as pp
n = pp.Network()
n.add_edge(1,2)
n

Temporal walks vs Ngram

Can I feed unique temporal walks (i.e., random walks forward in the time) as ngrams to calculate the optimal k-th order of the underlying network?

Bug in remove_node

In certain undirected networks with self-loops, remove_node raises a KeyError Exception. The reason for this is that in these cases a node itself can be both its predecessor and its successor.

Misleading "to_undirected" method in Network class

Issue moved from legacy repo at IngoScholtes/pathpy opened by @VincenzoPerri

The to_undirected method of the networks class loses the edge weights.
It would be better to provide a method that allows preserving weights and removes only the directedness. The user should be able to choose how this is performed (sum, average...)

Performance of path calculation

From pathpy created by IngoScholtes : sg-dev/pathpy#9

The function HigherOrderNetwork.getShortestPaths is a bottleneck for the calculation of centralities.

I believe that we can make this more efficient, as it currently takes several minutes even for higher-order networks that only have a few hundred nodes and a few thousand links.

Moreover, for the calculation of betweenness centrality, we can adopt faster algorithms like the one by Brandes et al., see here: www.tandfonline.com/doi/abs/10.1080/0022250X.2001.9990249

Weighted Temporal Edges

There isn't an option to assign edge weights when adding new links to a temporal network.

Bug in path_extraction.paths_from_temporal_network

When two or more edges have the same timestamp in a temporal network, only the first one is used to generate a path.

Minimal code:

t = pp.TemporalNetwork()
t.add_edge(source="a", target="b", ts=1)
t.add_edge(source="b", target="c", ts=2)
t.add_edge(source="b", target="d", ts=2)

paths = pp.path_extraction.paths_from_temporal_network(t, delta=10)
network = pp.HigherOrderNetwork(paths=paths, k=2)
network.edges

This results in:

defaultdict(dict, {('a,b', 'b,c'): {'weight': array([ 0., 1.])}})

However, we should have got something like this, as the two edges are simultaneous:

defaultdict(dict,
{('a,b', 'b,c'): {'weight': array([ 1., 0.])},
('a,b', 'b,d'): {'weight': array([ 1., 0.])}})

Eigenvector centrality does not work for Network object

pp.algorithms.centralities.eigenvector(g)
returns

AssertionError: network must be an instance of HigherOrderNetwork

which is weird.

Also, using leading eigenvector function is weird. Code looks like this:
g.leading_eigenvector(g.adjacency_matrix)
reasons:
leading_eigenvector is expected to be found in algorithms.spectral
leading_eigenvector is supposed to give the leading eigenvector for this specific network, why is it necessary to give the adjacency matrix as the argument?

Add multi-order matrix powers

From pathpy created by IngoScholtes : sg-dev/pathpy#49

A multi-order matrix powers function would be very handy. The basic idea is to generalize the concept of adjacency matrix powers to a multi-order model.

For a first-order network with adjacency matrix $A$, the $k$-th power gives the number of different paths between all pairs of nodes. In a multi-order network with max. order $K$ and adjacency matrices $A_i$ we can generalize the matrix power to:
$A_1 \cdot A_2 \cdot ... \cdot ... A_K^{k-K}$

Enable pre-defining node positions

A feature where user can define node positons, and then the visualisation is set with forces set to zero, and initialized in the positions given by the user.

Molloy Reed algorithm

Molloy Reed algorithm uses is_graphical_sequence function with directed = True argument.

Molloy Reed and is_graphical_sequence need to be revised anyway, though.

Function pp.HigherOrderNetwork.from_temporal_network() not working

With TG a TemporalNetwork, the line
G = pp.HigherOrderNetwork.from_temporal_network(TG)
raises an error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-9ac35034d065> in <module>
----> 1 G = pp.HigherOrderNetwork.from_temporal_network(TG)

~/.local/lib/python3.7/site-packages/pathpy/classes/network.py in from_temporal_network(cls, tempnet)
    206         by edge weights.
    207         """
--> 208         network = cls(directed=True)
    209 
    210         for (v, w, t) in tempnet.tedges:

TypeError: __init__() got an unexpected keyword argument 'directed'

MWE:

import pathpy as pp

TG = pp.TemporalNetwork()

tedges = [('A', 'B', 1),
            ('B', 'C', 2),
            ('C', 'D', 3),
            ('D', 'A', 4),
           ]
for tedge in tedges :
    TG.add_edge(*tedge)

G = pp.HigherOrderNetwork.from_temporal_network(TG)

Infomap for a single path is wrong.

_mld_paths()
Tested on a toy example that was in the lecture, gives too large number. (72 was the correct, it gave over 80).
Good algorithm is in the exercises.

Type error when creating a k regular network

when running pp.algorithms.random_graphs.random_k_regular(100,5) the following error occurs


TypeError Traceback (most recent call last)
in ()
----> 1 pp.algorithms.random_graphs.random_k_regular(100,5)

~\Documents\pathpy\pathpy\algorithms\random_graphs.py in random_k_regular(n, k, self_loops, node_names)
179 """
180 assert n*k%2 == 0, 'Error: parameters lead to non-graphic degree sequence.'
--> 181 return molloy_reed([k]*n, self_loops, node_names)
182
183

~\Documents\pathpy\pathpy\algorithms\random_graphs.py in molloy_reed(degree_sequence, node_names, self_loops, multi_edges)
128 if node_names is None:
129 node_names = [str(x) for x in range(n)]
--> 130 assert len(node_names) >= n, 'Error: Number of node names not matching degree sequence length'
131
132 network = Network(directed=False)

TypeError: object of type 'bool' has no len()

Random walks on weighted networks

Example

import pathpy as pp
import numpy as np

nwk = pp.algorithms.random_graphs.erdoes_renyi_gnm(10,20)
for edge in nwk_w1.edges:
    nwk.edges[edge]["weight"] = np.random.random()
pp.algorithms.random_walk.generate_walk(nwk)

The problem is that the probabilities are not weighted, because weights are not normalized.
The thing is that in an udirected network they cannot be normalized, for example, take graph
A-B-C
because A and C have only one link, weights have to be 1 on both AB and BC edges. But then weights around B are not normalized.

The degrees of freedom of a higher order network are not computed correctly when the data is not complete

I made a few corrections to the computation of the degrees of freedom of a higher order network, so that it works now, if the size of the data is large enough. If it isn't, the degrees of freedom will be overestimated.

The calculation goes as follows: we need to compute the number of links of a higher order network, and the number of nodes with a non-zero out-degree. The first, we infer from the first order network (sum of elements of the power of the adjacency matrix). The second we calculate by iterating through higher order nodes, and counting the ones with the non-zero out-degree.

The degrees of freedom are the difference of the two numbers:
< dof > = < number of higher-order links > - < number of higher order non-zero out-degree nodes >

Since the first order network is recovered whole with not much data, the first quantity is determined correctly. The second quantity, however, is underestimated when the data is insufficient, because the higher-order network is not complete.

This means that we overestimate the degrees of freedom of an order when we do not have enough data for the order. It finds the correct value when the data is large enough to visit all the nodes, however.

Temporal Network node names cant be integers if you visualize.

Error message:
pathpy\visualisation\html.py in fix_node_name(v)
486 def fix_node_name(v):
487 new_v = v
--> 488 if v[0].isdigit():
489 new_v = "n_" + v
490 if new_v[0] == '_':

TypeError: 'int' object is not subscriptable

Reproduction code:

test = pp.TemporalNetwork()
test.add_edge(1,0,1)
test

Add .travis CI support

From pathpy created by verginer : sg-dev/pathpy#13

Several problems in working on the codebase could be caught if before merging into the main branch the code would be tested on travis against several python versions on pristine machines.

  • investigate the cost to have a private repo on travis
  • set up for the public repo trevis tests for pull requests

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.