neo4j-graph-analytics / ml-link-prediction-notebooks Goto Github PK
View Code? Open in Web Editor NEWNotebooks for the ML Link Prediction Course
Notebooks for the ML Link Prediction Course
In (cell 32](https://github.com/neo4j-graph-analytics/ml-link-prediction-notebooks/blob/main/04_Predictions.ipynb), I get an IndexError
:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_21220/3422983529.py in <module>
----> 1 training_df = apply_triangles_features(training_df, "trianglesTrain", "coefficientTrain")
2 test_df = apply_triangles_features(test_df, "trianglesTest", "coefficientTest")
~\AppData\Local\Temp/ipykernel_21220/2050145394.py in apply_triangles_features(data, triangles_prop, coefficient_prop)
17 "coefficientProp": coefficient_prop
18 }
---> 19 features = graph.run(query, params).to_data_frame()
20 return pd.merge(data, features, on = ["node1", "node2"])
D:\Anaconda3\envs\data_science\lib\site-packages\py2neo\database.py in run(self, cypher, parameters, **kwparameters)
403 :return:
404 """
--> 405 return self.auto().run(cypher, parameters, **kwparameters)
406
407 def evaluate(self, cypher, parameters=None, **kwparameters):
D:\Anaconda3\envs\data_science\lib\site-packages\py2neo\database.py in run(self, cypher, parameters, **kwparameters)
976 result = self._connector.run(self.ref, cypher, parameters)
977 else:
--> 978 result = self._connector.auto_run(cypher, parameters,
979 graph_name=self.graph.name,
980 readonly=self.readonly)
D:\Anaconda3\envs\data_science\lib\site-packages\py2neo\client\__init__.py in auto_run(self, cypher, parameters, pull, graph_name, readonly)
1341 if pull != 0:
1342 try:
-> 1343 cx.pull(result, n=pull)
1344 except TypeError:
1345 # If the RUN fails, so will the PULL, due to
D:\Anaconda3\envs\data_science\lib\site-packages\py2neo\client\bolt.py in pull(self, result, n, capacity)
941 result.append(response, final=(n == -1))
942 try:
--> 943 self._sync(response)
944 except BrokenWireError as error:
945 result.transaction.mark_broken()
D:\Anaconda3\envs\data_science\lib\site-packages\py2neo\client\bolt.py in _sync(self, *responses)
745 self.send()
746 for response in responses:
--> 747 self._wait(response)
748
749 def _audit(self, task):
D:\Anaconda3\envs\data_science\lib\site-packages\py2neo\client\bolt.py in _wait(self, response)
740 """
741 while not response.full() and not response.done():
--> 742 self._fetch()
743
744 def _sync(self, *responses):
D:\Anaconda3\envs\data_science\lib\site-packages\py2neo\client\bolt.py in _fetch(self)
715 failed state into an exception.
716 """
--> 717 tag, fields = self.read_message()
718 if tag == 0x70:
719 self._responses.popleft().set_success(**fields[0])
D:\Anaconda3\envs\data_science\lib\site-packages\py2neo\client\bolt.py in read_message(self)
642
643 def read_message(self):
--> 644 tag, fields = self._reader.read_message()
645 if tag == 0x71:
646 # If a RECORD is received, check for more records
D:\Anaconda3\envs\data_science\lib\site-packages\py2neo\client\bolt.py in read_message(self)
94 chunks.append(self.wire.read(size))
95 message = b"".join(chunks)
---> 96 _, n = divmod(message[0], 0x10)
97 try:
98 unpacker = UnpackStream(message, offset=2)
IndexError: index out of range
Component | Version |
---|---|
Neo4j | 4.3.1 |
GDS | 1.6.1 |
The cell titled as
Next, use full text search and Personalized PageRank to find interesting articles for different authors:
results in the following error:
ClientError: [Procedure.ProcedureCallFailed] Failed to invoke procedure `gds.pageRank.stream`: Caused by: java.lang.IllegalArgumentException: Source nodes do not exist in the in-memory graph: ['105328', '118756', ... ]
I believe this is due to the fact that, in the proposed query reported below, the personalized pagerank algorithm uses source nodes that are not included in the set of nodes of the anonymous projection.
query = """
MATCH (a:Author {name: $author})<-[:AUTHOR]-(article)-[:CITED]->(other)
WITH a, collect(article) + collect(other) AS sourceNodes
CALL gds.pageRank.stream({
nodeQuery: 'CALL db.index.fulltext.queryNodes("articles", $searchTerm)
YIELD node, score
RETURN id(node) as id',
relationshipQuery: 'MATCH (a1:Article)-[:CITED]->(a2:Article)
RETURN id(a1) as source,id(a2) as target',
sourceNodes: sourceNodes,
validateRelationships:false,
parameters: {searchTerm: $searchTerm}})
YIELD nodeId, score
WITH gds.util.asNode(nodeId) AS n, score
WHERE not(exists((a)<-[:AUTHOR]-(n))) AND score > 0
RETURN n.title as article, score, [(n)-[:AUTHOR]->(author) | author.name][..5] AS authors
order by score desc limit 10
"""
I was able to obtain the same results as pictured in the cell's original output by slightly altering the query as follows:
query = """
MATCH (a:Author {name: $author})<-[:AUTHOR]-(article)-[:CITED]->(other)
WITH a, collect(article) + collect(other) AS sourceNodes
CALL db.index.fulltext.queryNodes("articles", $searchTerm)
YIELD node, score
WITH a, sourceNodes, collect(id(node)) AS ids
CALL gds.pageRank.stream({
nodeQuery: 'UNWIND $ids AS id
RETURN id',
relationshipQuery: 'MATCH (a1:Article)-[:CITED]->(a2:Article)
RETURN id(a1) as source,id(a2) as target',
sourceNodes: [article IN sourceNodes WHERE id(article) IN ids | article],
validateRelationships:false,
parameters: {ids: ids, searchTerm: $searchTerm}
})
YIELD nodeId, score
WITH gds.util.asNode(nodeId) AS n, score
WHERE not(exists((a)<-[:AUTHOR]-(n))) AND score > 0
RETURN n.title as article, score, [(n)-[:AUTHOR]->(author) | author.name][..5] AS authors
order by score desc limit 10
"""
The behaviour of the query is the same but only sourceNodes present in the anonymous projection are used as sources in the pagerank algorithm.
I'm using neo4j Desktop at the following versions:
Product | Version |
---|---|
neo4j | 4.3.1 |
APOC | 4.3.0.4 |
GDS | 1.6.1 |
Thanks for the great course and I hope you find this useful!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.