rafguns / linkpred Goto Github PK
View Code? Open in Web Editor NEWEasy link prediction tool
License: Other
Easy link prediction tool
License: Other
Hi,I am learning the project's code. But I don't know the labels in the input file means?
Some labels like this:
1 "Pereira, JCR"
2 "Peters, HPF"
3 "Widhalm, C"
4 "Verbeek, A"
5 "Salvador, P"
I am getting the undefined error when plotting ROC Curve of an evaluation using the following code:
n = len(test)
num_universe = n * (n - 1) // 2
test_set = set(linkpred.evaluation.Pair(u, v) for u, v in test.edges())
evaluation = linkpred.evaluation.EvaluationSheet(cn_results, test_set, num_universe)
plt.plot(evaluation.fallout(), evaluation.recall())
plt.show()
Nose has been unmaintained for several years. It makes sense to move the test suite to pytest.
Dependabot couldn't authenticate with https://pypi.python.org/simple/.
You can provide authentication details in your Dependabot dashboard by clicking into the account menu (in the top right) and selecting 'Config variables'.
Hi Rafguns,
I get an error when trying to run linkpred from the mac terminal. The errors says: object of type generator has no len().
the complete error is this:
Stefans-MacBook-Air:lib stefan$ linkpred /Users/stefan/linkprediction/network1.graphml -p SimRank --output recall-precision
14:50:06 - INFO - Reading file '/Users/stefan/linkprediction/network1.graphml'...
14:50:06 - INFO - Successfully read file.
14:50:06 - INFO - Starting preprocessing...
Traceback (most recent call last):
File "/Users/stefan/anaconda3/bin/linkpred", line 64, in
main()
File "/Users/stefan/anaconda3/bin/linkpred", line 57, in main
linkpred.preprocess()
File "/Users/stefan/anaconda3/lib/python3.7/site-packages/linkpred/linkpred.py", line 169, in preprocess
self.training = preprocessed(self.training)
File "/Users/stefan/anaconda3/lib/python3.7/site-packages/linkpred/linkpred.py", line 163, in
without_selfloops(G), minimum=self.config['min_degree'])
File "/Users/stefan/anaconda3/lib/python3.7/site-packages/linkpred/preprocess.py", line 85, in without_selfloops
"Removing...".format(len(loops)))
TypeError: object of type 'generator' has no len()
Stefans-MacBook-Air:lib stefan$
Do you perhaps know what i am doing wrong here?
Kind regards,
Stefan Bloemheuvel
sklearn.metrics
is in a way much simpler, using plain fuctions. Can we do something analogous or even depend on scikit-learn for stuff like ROC, recall-precision etc.?
The code given in the README is issuing a AssertionError: Predicted link (981, 981) is a self-loop!
, but the terminal command with the exact same training file is doing the job.
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
in
7
8 simrank = linkpred.predictors.SimRank(training, excluded=training.edges())
----> 9 simrank_results = simrank.predict(c=0.5)
10
11 evaluation = linkpred.evaluation.EvaluationSheet(simrank_results, test.edges())
~/anaconda3/envs/social-media/lib/python3.7/site-packages/linkpred/predictors/base.py in predict_and_postprocess(*args, **kwargs)
65 for u, v in self.excluded:
66 try:
---> 67 del scoresheet[(u, v)]
68 except KeyError:
69 pass
~/anaconda3/envs/social-media/lib/python3.7/site-packages/linkpred/evaluation/scoresheet.py in __delitem__(self, key)
193
194 def __delitem__(self, key):
--> 195 return dict.__delitem__(self, Pair(key))
196
197 def process_data(self, data, weight='weight'):
~/anaconda3/envs/social-media/lib/python3.7/site-packages/linkpred/evaluation/scoresheet.py in __init__(self, *args)
125 "__init__() takes 1 or 2 arguments in addition to self")
126 # For link prediction, a and b are two different nodes
--> 127 assert a != b, "Predicted link (%s, %s) is a self-loop!" % (a, b)
128 self.elements = self._sorted_tuple((a, b))
129
AssertionError: Predicted link (381, 381) is a self-loop!
>>>
the networkx version is too old
At the moment, the listeners in linkpred.evaluation.listeners
use fixed file names (well, they change, depending on dataset, predictor and time stamp, but it's not really possible to specify your own name). That sucks.
My original thought was that base class Listener
would just accept an extra argument in its ctor, which could then be used by all descendant classes. The problematic cases, however, are CachePredictionListener
and CacheEvaluationListener
, since they can actually generate multiple files (e.g., one per predictor). Possibilities:
The file misc.py located in linkpred/predictors/ contains a typo while importing generate_dendrogram
from the community
package.
Line 24 reads from community import generate_dendogram, partition_at_level
but there's a missing "r" in dendogram.
Obs: I have installed linkpred using pip install linkpred
I want to know what's the meaning of the third column?
It should be possible to save Scoresheet
s to a CSV-like format and easily create them as well. This would also allow us to replace the code in CachePredictionListener.on_prediction_finished
with
def on_prediction_finished(self, scoresheet, dataset, predictor):
self.fname = _timestamped_filename("%s-%s-predictions" % (dataset, predictor))
scoresheet.to_file(self.fname)
Added bonus: subclasses of Scoresheet
could change the serialization; it would no longer be presupposed that scoresheet keys are tuples.
I don't think I understand the "universe" term that is used as params, or how do I choose it in linkpred/evaluation/static/StaticEvaluation() also in EvaluationSheet() , you stated that this param is important to return the accuracy
Also, how do i get the confusion matrix, recall, precision and accuracy?
Concerning the accuracy do I pick the max value, like this : evaluation.accuracy().max() or is this wrong
or should i do this : acc = (sum(evaluation.tp + evaluation.tn))/(sum(evaluation.tp + evaluation.tn + evaluation.fp + evaluation.fn)) (also i imported 'division from future')
I want to use sklearn but what's confusiing me is how do I retrieve the y_true and y_pred from a graph sklearn.metrics.confusion_matrix(y_true, y_pred, *, labels=None, sample_weight=None, normalize=None)
how do I get these data from the graph to use them in other Machine learning algorithms such as SVM
this is my full code :
`import linkpred
import random
from matplotlib import pyplot as plt
random.seed(100)
# Read network
G = linkpred.read_network('BUP_full.net')
# Create test network
test = G.subgraph(random.sample(G.nodes(), 33))
# Exclude test network from learning phase
training = G.copy()
training.remove_edges_from(test.edges())
simrank = linkpred.predictors.SimRank(training, excluded=training.edges())
simrank_results = simrank.predict(c=0.5)
test_set = set(linkpred.evaluation.Pair(u, v) for u, v in test.edges())
evaluation = linkpred.evaluation.EvaluationSheet(simrank_results, test_set, simrank_results)
plt.plot(evaluation.recall(), evaluation.precision())`
Thank you
#12 showed that the intended use of excluded
can be misunderstood. As I write there:
Note that the
excluded
argument to a predictor (SimRank
in this case) is intended to exclude certain edges from appearing in the results, not to exclude them during training.
So I think two things need to happen:
I think this would simply need a CommonNeighbors = CommonNeighbours
somewhere.
See #12 (comment)
SimRank expects that G.nodes()
returns a list, whereas it actually returns a NodeView now.
Our current implementation is based on equations and an algorithm in Antonellis et al. (2008) (see Appendix A and Algorithm 1). While it yields fairly good results, it seems that this is not equal to SimRank as defined by Jeh and Widom (2002). Unit tests disagree with results obtained therein.
On both 3.8 and 3.11. Here's the output for 3.11:
=================================== FAILURES ===================================
__________________________________ test_katz ___________________________________
def test_katz():
G = nx.Graph()
G.add_weighted_edges_from(
[(1, 2, 1), (0, 2, 5), (2, 3, 1), (0, 4, 2), (1, 4, 1), (3, 5, 1), (4, 5, 3)]
)
beta = 0.01
I = np.identity(6)
for weight in ("weight", None):
katz = Katz(G).predict(beta=beta, weight=weight)
nodes = list(G.nodes())
M = nx.to_numpy_array(G, nodelist=nodes, weight=weight)
K = np.linalg.matrix_power(I - beta * M, -1) - I
x, y = np.asarray(K).nonzero()
for i, j in zip(x, y):
if i == j:
continue
u, v = nodes[i], nodes[j]
> assert K[i, j] == pytest.approx(katz[(u, v)], abs=1e-5)
E assert 0.010038160831933126 == 0.010101010100000004 ± 1.0e-05
E comparison failed
E Obtained: 0.010038160831933126
E Expected: 0.010101010100000004 ± 1.0e-05
tests/test_predictors_path.py:28: AssertionError
----------------------------- Captured stdout call -----------------------------
Computing matrix powers: [............................................................] 0/5
Computing matrix powers: [############................................................] 1/5
Computing matrix powers: [########################....................................] 2/5
Computing matrix powers: [####################################........................] 3/5
Computing matrix powers: [################################################............] 4/5
Computing matrix powers: [############################################################] 5/5
Weirdly enough, this test passes on my computer. Perhaps this is due to a differnece in version of some package, like numpy?
I would like to use your link prediction library, but I am missing the complete example, including evaluation code. For now I have the following code:
import linkpred
import random
# Read network
G = linkpred.read_network('linkpred-master/examples/inf1990-2004.net')
# Create test network
test = G.subgraph(random.sample(G.nodes(), 300))
# Exclude test network from learning phase
simrank = linkpred.predictors.SimRank(G, excluded=test.edges())
simrank_results = simrank.predict(c=0.5)
Could you please provide full example, i.e., how to calculate precision, recall, ROC curve, etc.
I have decided to put linkpred in maintenance mode. Here's some information on what that means as well as the reasons behind this decision.
What can you expect? I will fix critical bugs and may also fix minor bugs if feasible. Furthermore, I will also try to keep the package usable in a modern Python installation, i.e. make changes to keep it working under newer Python versions and recent versions of numpy, networkx etc. The general idea is that if you have been using linkpred before, you shouldn't be required to keep an ancient installation lying around just to be able to use it.
What not to expect? Most likely, I will not implement any new features. Neither will I make big architectural changes to how linkpred works.
Why? Several reasons:
I followed the example provided #12 using an edgelist as input and common neighbours as predictor but the roc plot is empty. Maybe i dont pass the correct arguments to ROCPlotter but i couldn't find any example.My code is this
import linkpred
import random
from matplotlib import pyplot as plt
import math
random.seed(100)
# Read network
G = linkpred.read_network('FollowGraph.edgelist')
testSize = math.ceil(len(G.nodes())*0.2)
# Create test network
test = G.subgraph(random.sample(G.nodes(), testSize))
# Exclude test network from learning phase
training = G.copy()
training.remove_edges_from(test.edges())
cn = linkpred.predictors.AdamicAdar(training, excluded=training.edges())
cn_results = cn.predict()
test_set = set(linkpred.evaluation.Pair(u, v) for u, v in test.edges())
evaluation = linkpred.evaluation.EvaluationSheet(cn_results, test_set)
linkpred.evaluation.listeners.ROCPlotter(evaluation)
plt.show()
Any suggestion?
Thank you in advance
We should give a full standalone code example in the README, including evaluation.
Spinoff from #12. The example I gave there is this (slightly reworked to take advantage of a7121f8):
import linkpred
import random
from matplotlib import pyplot as plt
random.seed(100)
# Read network
G = linkpred.read_network('examples/inf1990-2004.net')
# Create test network
test = G.subgraph(random.sample(G.nodes(), 300))
# Exclude test network from learning phase
training = G.copy()
training.remove_edges_from(test.edges())
simrank = linkpred.predictors.SimRank(training, excluded=training.edges())
simrank_results = simrank.predict(c=0.5)
evaluation = linkpred.evaluation.EvaluationSheet(simrank_results, test.edges())
plt.plot(evaluation.recall(), evaluation.precision())
Dice is proportional to Jaccard, but it might be nice having them both as a convenience.
The prediction part is fairly straightforward, howver the evaluation part is terribly convoluted. Much of the heavy lifting is done by the LinkPred
object. I can think of two steps:
LinkPred
easier to use. For instance:
LinkPred()
without any arguments.Hi,
I used this library and what i wanna really do is to load a large graph and use simrank
for link prediction, but i get the following error:
Traceback (most recent call last):
File "pred.py", line 12, in <module>
results = model.predict(c=0.4)
File "/home/danial/Envs/graph/lib/python3.6/site-packages/linkpred/predictors/base.py", line 64, in predict_and_postprocess
scoresheet = func(*args, **kwargs)
File "/home/danial/Envs/graph/lib/python3.6/site-packages/linkpred/predictors/eigenvector.py", line 88, in predict
sim = simrank(self.G, nodelist, c, num_iterations, weight)
File "/home/danial/Envs/graph/lib/python3.6/site-packages/linkpred/network/algorithms.py", line 73, in simrank
M = raw_google_matrix(G, nodelist=nodelist, weight=weight)
File "/home/danial/Envs/graph/lib/python3.6/site-packages/linkpred/network/algorithms.py", line 87, in raw_google_matrix
weight=weight)
File "/home/danial/Envs/graph/lib/python3.6/site-packages/networkx/convert_matrix.py", line 369, in to_numpy_matrix
M = np.zeros((nlen,nlen), dtype=dtype, order=order) + np.nan
MemoryError
could you help me?
I am trying to use it but when I call, it shows the following error:
/usr/bin/env: ‘python\r’: No such file or directory
HI,I am a newcomer in the field of link prediction. I want to know how to use the tool ’linkpred‘ to evaluate related indicators. I found that there are a lot of evaluation functions built in, I call these functions but I don't get a value, but a series values,.Such as precision, will return a list of precisions. I don't know what this means? I found out in the question that you can use the sklearn package for evaluation. I don't know how to do it. Finally, is this tool related to the documentation or manual?
def _sorted_tuple(t):
a, b = t
return (a, b) if a > b else (b, a)
TypeError: '>' not supported between instances of 'str' and 'int'
It probably doesn't work in Python 3 and is untested. Smokesignal (https://github.com/shaunduncan/smokesignal) looks like it's a modern, maintained and simple signaling package.
Depending on smokesignal might even help with issue #3?
It seems that the current default can lead to erroneous results, like negative weights.
Something like waf may do the trick. I'm thinking of e.g.:
# Create a tarball
git archive --format=tar.gz --prefix=linkpred/ HEAD > linkpred.tar.gz
# Create an installer
pyinstaller linkpred.spec
# Do all tests and count coverage
nosetests --with-doctest --with-coverage --cover-package=linkpred
linkpred/linkpred/predictors/neighbour.py
Line 28 in ae7adc8
in AdamicAdar i got ZeroDivisionError : floar division by zero
Any idea on how to fix this ? Thank you
We should add tests for the Community
predictor. This is easier now that python-louvain is pip-installable (URL should be updated in the docstring as well).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.