Problem Queries 7 and 8, which count the edges between persons and

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Queries 7 and 8 return inconsistent results between Neo4j and Kùzu about kuzudb-study HOT 3 CLOSED

prrao87 commented on September 24, 2024

Queries 7 and 8 return inconsistent results between Neo4j and Kùzu

from kuzudb-study.

Comments (3)

acquamarin commented on September 24, 2024

Hi Prashanth,

To confirm the result, I rerun Q7 and Q8 using both Kùzu and duckdb. Kùzu and duckdb do generate the same result for Q7 and Q8.
For Q7:
Kùzu actually returns 170 rather than 169 for Q7 (same result as duckdb). Can you rerun your testing script?
Query 7:

        MATCH (p:Person)-[:LivesIn]->(:City)-[:CityIn]->(s:State)
        WHERE p.age >= $age_lower AND p.age <= $age_upper AND s.country = $country
        WITH p, s
        MATCH (p)-[:HasInterest]->(i:Interest)
        WHERE lower(i.interest) = lower($interest)
        RETURN count(p.id) AS numPersons, s.state AS state, s.country AS country
        ORDER BY numPersons DESC LIMIT 1
    

            State in United States with the most users between ages 23-30 who have an interest in photography:
shape: (1, 3)
┌────────────┬────────────┬───────────────┐
│ numPersons ┆ state      ┆ country       │
│ ---        ┆ ---        ┆ ---           │
│ i64        ┆ str        ┆ str           │
╞════════════╪════════════╪═══════════════╡
│ 170        ┆ California ┆ United States │
└────────────┴────────────┴───────────────┘

For Q8:
Both Kùzu and duckdb give 1214477 as the result.

from kuzudb-study.

acquamarin commented on September 24, 2024

BTW:
Here is the DDL and copy statements that i used to load the data:

create table person(id int64, name varchar, gender varchar, birthday date, age int64, isMarried bool, primary key(id));
create table city (id int64, city varchar, state varchar, country varchar, lat float, lon float, population int, primary key(id));
create table state(id int64, state varchar, country varchar, primary key(id));
create table interest(id int64, interest varchar, primary key(id));
create table livesin(id1 int64, id2 int64);
create table cityin(id1 int64, id2 int64);
create table hasinterest(id1 int64, id2 int64);
create table follows(id1 int64, id2 int64);

copy person from '${outputPath}/nodes/persons.csv' (AUTO_DETECT TRUE);
copy city from '${outputPath}/nodes/cities.csv' (AUTO_DETECT TRUE);
copy state from '${outputPath}/nodes/states.csv' (AUTO_DETECT TRUE);
copy interest from '${outputPath}/nodes/interests.csv' (AUTO_DETECT TRUE);
copy livesin from '${outputPath}/edges/lives_in.csv' (AUTO_DETECT TRUE);
copy cityin from '${outputPath}/edges/city_in.csv' (AUTO_DETECT TRUE);
copy hasinterest from '${outputPath}/edges/interests.csv' (AUTO_DETECT TRUE);
copy follows from '${outputPath}/edges/follows.csv' (AUTO_DETECT TRUE);

Q7 in sql:

SELECT COUNT(p.id) AS numPersons, s.state AS state, s.country AS country
FROM person p
JOIN livesin pl ON p.id = pl.id1
JOIN city c ON pl.id2 = c.id
JOIN cityin ci ON c.id = ci.id1
JOIN state s ON ci.id2 = s.id
JOIN hasinterest hi ON p.id = hi.id1
JOIN interest i ON hi.id2 = i.id
WHERE p.age >= 23 AND p.age <= 30 AND s.country = 'United States'
    AND LOWER(i.interest) = LOWER('photography')
GROUP BY s.state, s.country
ORDER BY numPersons DESC
LIMIT 1;

Q8 in sql:

SELECT COUNT(f.id1) AS numFollowers
FROM person p1
JOIN follows f ON p1.id = f.id1
JOIN person p2 ON f.id2 = p2.id
WHERE p1.id > p2.id;

from kuzudb-study.

prrao87 commented on September 24, 2024

Hi @acquamarin, it looks like both my databases had some inconsistent state, leading the counts to not match up. I just cleared my entire DBs and regenerated the data and reran the queries again, and it all adds up! Very relieved that this isn't a bug. 😅

Thanks a LOT for doing this check in DuckDB as well -- this helped me learn how the graph actually uses relational algebra under the hood, and it also helps to have a reputable SQL DB show the same results as the graph. It will help in a lot of future work, too!

I'm confident that this isn't a bug, so we can close this issue and move on to other things. Cheers!

Query 7:
 
        MATCH (p:Person)-[:LivesIn]->(:City)-[:CityIn]->(s:State)
        WHERE p.age >= $age_lower AND p.age <= $age_upper AND s.country = $country
        WITH p, s
        MATCH (p)-[:HasInterest]->(i:Interest)
        WHERE lower(i.interest) = lower($interest)
        RETURN count(p.id) AS numPersons, s.state AS state, s.country AS country
        ORDER BY numPersons DESC LIMIT 1
    

            State in United States with the most users between ages 23-30 who have an interest in photography:
shape: (1, 3)
┌────────────┬────────────┬───────────────┐
│ numPersons ┆ state      ┆ country       │
│ ---        ┆ ---        ┆ ---           │
│ i64        ┆ str        ┆ str           │
╞════════════╪════════════╪═══════════════╡
│ 170        ┆ California ┆ United States │
└────────────┴────────────┴───────────────┘
            
Query 7 completed in 0.012754s

Query 8:
 
        MATCH (p1:Person)-[f:Follows]->(p2:Person)
        WHERE p1.id > p2.id
        RETURN count(f) as numFollowers
    
Number of second degree connections reachable in the graph:
shape: (1, 1)
┌──────────────┐
│ numFollowers │
│ ---          │
│ i64          │
╞══════════════╡
│ 1214477      │
└──────────────┘
Query 8 completed in 0.103467s

from kuzudb-study.

Queries 7 and 8 return inconsistent results between Neo4j and Kùzu about kuzudb-study HOT 3 CLOSED

Comments (3)

Related Issues (9)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent