Giter Site home page Giter Site logo

linkpred's People

Contributors

dependabot-preview[bot] avatar rafguns avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

linkpred's Issues

how to explain the input file format

Hi,I am learning the project's code. But I don't know the labels in the input file means?
Some labels like this:
1 "Pereira, JCR"
2 "Peters, HPF"
3 "Widhalm, C"
4 "Verbeek, A"
5 "Salvador, P"

UndefinedError: Measure is undefined if there are no relevant or retrieved items

I am getting the undefined error when plotting ROC Curve of an evaluation using the following code:

n = len(test)
num_universe = n * (n - 1) // 2

test_set = set(linkpred.evaluation.Pair(u, v) for u, v in test.edges())
evaluation = linkpred.evaluation.EvaluationSheet(cn_results, test_set, num_universe)
plt.plot(evaluation.fallout(), evaluation.recall())
plt.show()

error when running linkpred from terminal

Hi Rafguns,

I get an error when trying to run linkpred from the mac terminal. The errors says: object of type generator has no len().

the complete error is this:


Stefans-MacBook-Air:lib stefan$ linkpred /Users/stefan/linkprediction/network1.graphml -p SimRank --output recall-precision
14:50:06 - INFO - Reading file '/Users/stefan/linkprediction/network1.graphml'...
14:50:06 - INFO - Successfully read file.
14:50:06 - INFO - Starting preprocessing...
Traceback (most recent call last):
File "/Users/stefan/anaconda3/bin/linkpred", line 64, in
main()
File "/Users/stefan/anaconda3/bin/linkpred", line 57, in main
linkpred.preprocess()
File "/Users/stefan/anaconda3/lib/python3.7/site-packages/linkpred/linkpred.py", line 169, in preprocess
self.training = preprocessed(self.training)
File "/Users/stefan/anaconda3/lib/python3.7/site-packages/linkpred/linkpred.py", line 163, in
without_selfloops(G), minimum=self.config['min_degree'])
File "/Users/stefan/anaconda3/lib/python3.7/site-packages/linkpred/preprocess.py", line 85, in without_selfloops
"Removing...".format(len(loops)))
TypeError: object of type 'generator' has no len()
Stefans-MacBook-Air:lib stefan$


Do you perhaps know what i am doing wrong here?

Kind regards,

Stefan Bloemheuvel

Simplify evaluation to plain functions

sklearn.metrics is in a way much simpler, using plain fuctions. Can we do something analogous or even depend on scikit-learn for stuff like ROC, recall-precision etc.?

AssertionError with self-loops from Python but not from command line

The code given in the README is issuing a AssertionError: Predicted link (981, 981) is a self-loop!, but the terminal command with the exact same training file is doing the job.

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
 in 
      7 
      8 simrank = linkpred.predictors.SimRank(training, excluded=training.edges())
----> 9 simrank_results = simrank.predict(c=0.5)
     10 
     11 evaluation = linkpred.evaluation.EvaluationSheet(simrank_results, test.edges())

~/anaconda3/envs/social-media/lib/python3.7/site-packages/linkpred/predictors/base.py in predict_and_postprocess(*args, **kwargs)
     65                 for u, v in self.excluded:
     66                     try:
---> 67                         del scoresheet[(u, v)]
     68                     except KeyError:
     69                         pass

~/anaconda3/envs/social-media/lib/python3.7/site-packages/linkpred/evaluation/scoresheet.py in __delitem__(self, key)
    193 
    194     def __delitem__(self, key):
--> 195         return dict.__delitem__(self, Pair(key))
    196 
    197     def process_data(self, data, weight='weight'):

~/anaconda3/envs/social-media/lib/python3.7/site-packages/linkpred/evaluation/scoresheet.py in __init__(self, *args)
    125                 "__init__() takes 1 or 2 arguments in addition to self")
    126         # For link prediction, a and b are two different nodes
--> 127         assert a != b, "Predicted link (%s, %s) is a self-loop!" % (a, b)
    128         self.elements = self._sorted_tuple((a, b))
    129 

AssertionError: Predicted link (381, 381) is a self-loop!
>>>

​

Allow specifying one's own file name for evaluation

At the moment, the listeners in linkpred.evaluation.listeners use fixed file names (well, they change, depending on dataset, predictor and time stamp, but it's not really possible to specify your own name). That sucks.

My original thought was that base class Listener would just accept an extra argument in its ctor, which could then be used by all descendant classes. The problematic cases, however, are CachePredictionListener and CacheEvaluationListener, since they can actually generate multiple files (e.g., one per predictor). Possibilities:

  • change them to deliver all their output to one file
  • figure out how we'd like to handle them

Typo while importing a module in Community class

The file misc.py located in linkpred/predictors/ contains a typo while importing generate_dendrogram from the community package.

Line 24 reads from community import generate_dendogram, partition_at_level but there's a missing "r" in dendogram.

Obs: I have installed linkpred using pip install linkpred

Add Scoresheet.from_file() and Scoresheet.to_file() methods

It should be possible to save Scoresheets to a CSV-like format and easily create them as well. This would also allow us to replace the code in CachePredictionListener.on_prediction_finished with

def on_prediction_finished(self, scoresheet, dataset, predictor):
    self.fname = _timestamped_filename("%s-%s-predictions" % (dataset, predictor))
    scoresheet.to_file(self.fname)

Added bonus: subclasses of Scoresheet could change the serialization; it would no longer be presupposed that scoresheet keys are tuples.

EvaluationSheet : fully understand the universe terme

I don't think I understand the "universe" term that is used as params, or how do I choose it in linkpred/evaluation/static/StaticEvaluation() also in EvaluationSheet() , you stated that this param is important to return the accuracy

Also, how do i get the confusion matrix, recall, precision and accuracy?

Concerning the accuracy do I pick the max value, like this : evaluation.accuracy().max() or is this wrong
or should i do this : acc = (sum(evaluation.tp + evaluation.tn))/(sum(evaluation.tp + evaluation.tn + evaluation.fp + evaluation.fn)) (also i imported 'division from future')

I want to use sklearn but what's confusiing me is how do I retrieve the y_true and y_pred from a graph sklearn.metrics.confusion_matrix(y_true, y_pred, *, labels=None, sample_weight=None, normalize=None)
how do I get these data from the graph to use them in other Machine learning algorithms such as SVM

this is my full code :


`import linkpred
import random
from matplotlib import pyplot as plt

random.seed(100)

# Read network
G = linkpred.read_network('BUP_full.net')

# Create test network
test = G.subgraph(random.sample(G.nodes(), 33))

# Exclude test network from learning phase
training = G.copy()
training.remove_edges_from(test.edges())

simrank = linkpred.predictors.SimRank(training, excluded=training.edges())
simrank_results = simrank.predict(c=0.5)

test_set = set(linkpred.evaluation.Pair(u, v) for u, v in test.edges())
evaluation = linkpred.evaluation.EvaluationSheet(simrank_results, test_set, simrank_results)

plt.plot(evaluation.recall(), evaluation.precision())`

Thank you

Clarify `excluded` parameter to `Predictor.__init__`

#12 showed that the intended use of excluded can be misunderstood. As I write there:

Note that the excluded argument to a predictor (SimRank in this case) is intended to exclude certain edges from appearing in the results, not to exclude them during training.

So I think two things need to happen:

  • rename this to be clearer
  • document what it's for

Check SimRank implementation

Our current implementation is based on equations and an algorithm in Antonellis et al. (2008) (see Appendix A and Algorithm 1). While it yields fairly good results, it seems that this is not equal to SimRank as defined by Jeh and Widom (2002). Unit tests disagree with results obtained therein.

test_katz fails in CI

On both 3.8 and 3.11. Here's the output for 3.11:

 =================================== FAILURES ===================================
  __________________________________ test_katz ___________________________________
  
      def test_katz():
          G = nx.Graph()
          G.add_weighted_edges_from(
              [(1, 2, 1), (0, 2, 5), (2, 3, 1), (0, 4, 2), (1, 4, 1), (3, 5, 1), (4, 5, 3)]
          )
      
          beta = 0.01
          I = np.identity(6)
          for weight in ("weight", None):
              katz = Katz(G).predict(beta=beta, weight=weight)
      
              nodes = list(G.nodes())
              M = nx.to_numpy_array(G, nodelist=nodes, weight=weight)
              K = np.linalg.matrix_power(I - beta * M, -1) - I
      
              x, y = np.asarray(K).nonzero()
              for i, j in zip(x, y):
                  if i == j:
                      continue
                  u, v = nodes[i], nodes[j]
  >               assert K[i, j] == pytest.approx(katz[(u, v)], abs=1e-5)
  E               assert 0.010038160831933126 == 0.010101010100000004 ± 1.0e-05
  E                 comparison failed
  E                 Obtained: 0.010038160831933126
  E                 Expected: 0.010101010100000004 ± 1.0e-05
  
  tests/test_predictors_path.py:28: AssertionError
  ----------------------------- Captured stdout call -----------------------------
  Computing matrix powers: [............................................................] 0/5
  Computing matrix powers: [############................................................] 1/5
  Computing matrix powers: [########################....................................] 2/5
  Computing matrix powers: [####################################........................] 3/5
  Computing matrix powers: [################################################............] 4/5
  Computing matrix powers: [############################################################] 5/5

Weirdly enough, this test passes on my computer. Perhaps this is due to a differnece in version of some package, like numpy?

Complete example

I would like to use your link prediction library, but I am missing the complete example, including evaluation code. For now I have the following code:

import linkpred
import random

# Read network
G = linkpred.read_network('linkpred-master/examples/inf1990-2004.net')

# Create test network
test = G.subgraph(random.sample(G.nodes(), 300))

# Exclude test network from learning phase
simrank = linkpred.predictors.SimRank(G, excluded=test.edges())
simrank_results = simrank.predict(c=0.5)

Could you please provide full example, i.e., how to calculate precision, recall, ROC curve, etc.

linkpred is in maintenance mode

I have decided to put linkpred in maintenance mode. Here's some information on what that means as well as the reasons behind this decision.

What can you expect? I will fix critical bugs and may also fix minor bugs if feasible. Furthermore, I will also try to keep the package usable in a modern Python installation, i.e. make changes to keep it working under newer Python versions and recent versions of numpy, networkx etc. The general idea is that if you have been using linkpred before, you shouldn't be required to keep an ancient installation lying around just to be able to use it.

What not to expect? Most likely, I will not implement any new features. Neither will I make big architectural changes to how linkpred works.

Why? Several reasons:

  • De facto linkpred has been in maintenance mode for a few years at least. It's time to make this explicit, so users can make an informed decision on whether or not to use this software.
  • I haven't done any link prediction work for several years. Linkpred started as a way of 'scratching my own itch', but that doesn't apply anymore.
  • The previous point also entails that I'm not up-to-date on current evolutions in link prediction research. So even if I wanted to, say, implement newer link prediction algorithms, I wouldn't be able to without significant effort.
  • Some fairly big architectural changes are needed to get the package to a state where new feature development would really pay off (especially on the evaluation side, see #3, #10). I lack the time and energy to put in all that work for something I don't use myself.

ROC Plot

I followed the example provided #12 using an edgelist as input and common neighbours as predictor but the roc plot is empty. Maybe i dont pass the correct arguments to ROCPlotter but i couldn't find any example.My code is this


import linkpred
import random
from matplotlib import pyplot as plt
import math

random.seed(100)

# Read network
G = linkpred.read_network('FollowGraph.edgelist')

testSize = math.ceil(len(G.nodes())*0.2)

# Create test network
test = G.subgraph(random.sample(G.nodes(), testSize))

# Exclude test network from learning phase
training = G.copy()
training.remove_edges_from(test.edges())

cn = linkpred.predictors.AdamicAdar(training, excluded=training.edges())
cn_results = cn.predict()

test_set = set(linkpred.evaluation.Pair(u, v) for u, v in test.edges())
evaluation = linkpred.evaluation.EvaluationSheet(cn_results, test_set)

linkpred.evaluation.listeners.ROCPlotter(evaluation)

plt.show()

Any suggestion?
Thank you in advance

README should have full (standalone) code example

We should give a full standalone code example in the README, including evaluation.

Spinoff from #12. The example I gave there is this (slightly reworked to take advantage of a7121f8):

import linkpred
import random
from matplotlib import pyplot as plt

random.seed(100)

# Read network
G = linkpred.read_network('examples/inf1990-2004.net')

# Create test network
test = G.subgraph(random.sample(G.nodes(), 300))

# Exclude test network from learning phase
training = G.copy()
training.remove_edges_from(test.edges())

simrank = linkpred.predictors.SimRank(training, excluded=training.edges())
simrank_results = simrank.predict(c=0.5)

evaluation = linkpred.evaluation.EvaluationSheet(simrank_results, test.edges())

plt.plot(evaluation.recall(), evaluation.precision())

Make linkpred easier to use from within Python

The prediction part is fairly straightforward, howver the evaluation part is terribly convoluted. Much of the heavy lifting is done by the LinkPred object. I can think of two steps:

  1. Make LinkPred easier to use. For instance:
    • It should be possible to just call LinkPred() without any arguments.
    • It supposes that the training and test networks still have to be read from a file, which is unlikely if used within Python
    • Preprocessing is hard to control exactly.
  2. Simplify the whole evaluation part. See #3.

How to do link prediction on a large graph without 'MemoryError'

Hi,
I used this library and what i wanna really do is to load a large graph and use simrank for link prediction, but i get the following error:

Traceback (most recent call last):
  File "pred.py", line 12, in <module>
    results = model.predict(c=0.4)
  File "/home/danial/Envs/graph/lib/python3.6/site-packages/linkpred/predictors/base.py", line 64, in predict_and_postprocess
    scoresheet = func(*args, **kwargs)
  File "/home/danial/Envs/graph/lib/python3.6/site-packages/linkpred/predictors/eigenvector.py", line 88, in predict
    sim = simrank(self.G, nodelist, c, num_iterations, weight)
  File "/home/danial/Envs/graph/lib/python3.6/site-packages/linkpred/network/algorithms.py", line 73, in simrank
    M = raw_google_matrix(G, nodelist=nodelist, weight=weight)
  File "/home/danial/Envs/graph/lib/python3.6/site-packages/linkpred/network/algorithms.py", line 87, in raw_google_matrix
    weight=weight)
  File "/home/danial/Envs/graph/lib/python3.6/site-packages/networkx/convert_matrix.py", line 369, in to_numpy_matrix
    M = np.zeros((nlen,nlen), dtype=dtype, order=order) + np.nan
MemoryError

could you help me?

How to use linkpred for evaluation

HI,I am a newcomer in the field of link prediction. I want to know how to use the tool ’linkpred‘ to evaluate related indicators. I found that there are a lot of evaluation functions built in, I call these functions but I don't get a value, but a series values,.Such as precision, will return a list of precisions. I don't know what this means? I found out in the question that you can use the sklearn package for evaluation. I don't know how to do it. Finally, is this tool related to the documentation or manual?

Enable testing with tox

Something like waf may do the trick. I'm thinking of e.g.:

# Create a tarball
git archive --format=tar.gz --prefix=linkpred/ HEAD > linkpred.tar.gz

# Create an installer
pyinstaller linkpred.spec

# Do all tests and count coverage
nosetests --with-doctest --with-coverage --cover-package=linkpred

Test Community predictor

We should add tests for the Community predictor. This is easier now that python-louvain is pip-installable (URL should be updated in the docstring as well).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.