Comments (4)
I'm not sure I understand all of it :)
At least allShortestPaths and also dijkstra should be lazy, i.e. if you only pull X results (with LIMIT x) then the others are never pulled from the database in the first place.
If you look into the implementation it turns the results from the dijkstra-pathfinder into a (lazy) stream of results.
You are right when you don't know the end-node the path-expander is what you need but afaik it also uses an lazy iterator from the database (using the Traversal Framework) and turns that into a lazy stream.
Do you have any indication / evidence that would state the contrary?
from neo4j-apoc-procedures.
@InverseFalcon can you elaborate perhaps with a simple example.
from neo4j-apoc-procedures.
Sure. I'm still trying to see if this is actually an issue, or if I'm overlooking something that already does this, rendering my request unnecessary.
We can tweak the example movie database to set up an example.
After creating the movie database, let's change a fraction of the people in our graph to :Doctors.
MATCH (p:Person)
WHERE rand() < .1
REMOVE p:Person
SET p:Doctor
I'm after a query to show all :People in the graph paired with the closest :Doctor (by traversing any relationships), and the number of relationship hops to reach that :Doctor.
As far as I can tell, if we try using shortestPath(), we can't use LIMIT 1, as we can only use it in a WITH or a RETURN, which affects the entire result set (if we could use this within a subquery with shortestPath(), then our problem would be solved).
Because we don't have a definite end node for shortestPath, it will generate shortest paths to ALL :Doctors in the graph, and it's up to us to perform sorting based on the length of the path so we collect the doctors (sorted in relationship distance order) and grab the first in the list:
match (p:Person)
match path = shortestPath((p)-[*]-(doc:Doctor))
with p, doc, length(path) as hops
order by hops asc
return p, head(collect(doc)), hops
That's a lot of extra work, and the cost grows as the graph grows, as paths are calculated to all doctors instead of stopping after getting the single closest.
Likewise, we can't use Dijkstra, since it requires inputing start and end nodes, which would end up matching :Person start nodes to ALL :Doctor end nodes, which is pretty much a repeat of the above.
APOC's PathExpander has a neat means of indicating the label of an end-node for the expansion ('/Doctor' as the label filter, in this case), but that only prevents the expander from continuing down that particular path...it will still look down all other paths (note that changing max traversals to 10 can result in a hang), still leaving us with the responsibility of sorting the paths by length and getting the head:
match (p:Person)
call apoc.path.expand(p, null, '/Doctor', 0, 5) yield path
with p, last(nodes(path)) as end, length(path) as hops
order by hops asc
return p, head(collect(end)), hops
What I'm looking for is some means of either performing a subquery with its own limit (so we can only grab the single closest :Doctor for each :Person from a shortestPath() function with LIMIT 1...is something like this present in neo4j 3.1+?), or an APOC Procedure that will do the equivalent, stopping once we have the specified limit of results.
While subquery support would pretty much solve the issue, these could work, to a degree, in the meantime:
match (p:Person)
// new shortestPath with the limit of shortestPaths per start/end as a parameter
// ideally this should respect a predicate too
match path = shortestPath( (p)-[*]-(doc:Doctor), 1 )
return p, doc, length(path) as hops
or this
match (p:Person)
// path expander with new optional parameter for the limit of paths to find before stopping
call apoc.path.expand(p, null, '/Doctor', 0, 5, 1) yield path
return p, last(nodes(path)) as end, length(path) as hops
from neo4j-apoc-procedures.
Fulfilled by #326 , which provides a limit to path expansion results.
from neo4j-apoc-procedures.
Related Issues (20)
- Solve flaky cluster routing test
- Solve flaky mongodb test
- Query generates arbitrary nodes; termination doesn't roll them back HOT 2
- Solve MongoTest startup error
- Add support for Mistral AI for ml procedures
- Check how to integrate vector databases via rest APIs
- Add support for Groq AI for ml procedures
- apoc.convert.toTree() returns wrong result if there is a cycle from root of the path HOT 3
- apoc.convert.toTree include/exclude not working with multiple node labels
- Solve Cypher*Test and testSetupAndDropCustomsWithUseSystemClause
- Add self-explanation to the model, include the verbal schema description to the flow
- Top k parameters in just one call, to retrieve k results
- In case of the llm generates a wrong query, improve by sending the query with the error to llm so it can improve instead of just generating new query, still retry with new query if no results
- Add a reverse explanation of the query: the input is a Cypher query the output is a natural language description of it, given the graph model.
- Given (a set of) queries return the schema + explanation of the subgraph
- Add a procedure for RAG you pass the user question plus a graph pattern (paths) and relevant attributes and it creates a prompt to answer the user question using the data on those paths and executes that with the llm provider and returns the answer
- apoc.export.csv.query() config for quotes: 'ifNeeded' not working HOT 3
- The apoc.cypher.runSchemaFile() never finishes execution HOT 2
- LoadLdapContainerTest.testLoadLDAPWithDefaultPort fails on TC
- Implement an apoc.convert.toYaml function
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from neo4j-apoc-procedures.