Giter Site home page Giter Site logo

Comments (8)

mwatts15 avatar mwatts15 commented on May 21, 2024

@Downchuck We're already bundling params per type of command sent to execute in addN. What is your proposed change?

from rdflib-sqlalchemy.

Downchuck avatar Downchuck commented on May 21, 2024

We may have been using an old version; I'll circle back after debugging.

Turning on echo logging was showing separate INSERT statements in my addN / sql_graph += memory_graph statement.

from rdflib-sqlalchemy.

Downchuck avatar Downchuck commented on May 21, 2024

Something like this here would be significantly faster, using placeholders and executemany, as placeholders are about 3x faster than a dictionary for sql alchemy, and executemany is significantly faster for all sql engines which support it for inserts.

                statement = self._add_ignore_on_conflict(command['statement'], with_placeholders)
                connection.execute(statement, command['rows'])

Instead of

                for command in commands_dict.values():
                    statement = self._add_ignore_on_conflict(command['statement'])
                    connection.execute(statement, command["params"])

Loaded by the bulk loading method -- addN - as direct parse will do add, which is one at at time:

test_graph = ConjunctiveGraph()
test_graph.parse(data = test_data, format='nt')
# bulk insert seems ok here:
%time sql_graph = sql_graph + graph

from rdflib-sqlalchemy.

mwatts15 avatar mwatts15 commented on May 21, 2024

At version 0.4.0, currently on master, in

                for command in commands_dict.values():
                    statement = self._add_ignore_on_conflict(command['statement'])
                    connection.execute(statement, command["params"])

command["params"] is a list of dicts with parameters for statement, a 'sqlalchemy.sql.dml.Insert' like INSERT INTO kb_bec6803d52_type_statements (id, member, klass, context, termcomb) VALUES (:id, :member, :klass, :context, :termComb). This should already call executemany. One way you can confirm this is by calling addN on an rdflib-sqlalchemy store backed by sqlite3 with cProfile enabled. You should see do_executemanycalled in sqlalchemy and executemany called on the sqlite3 cursor.

from rdflib-sqlalchemy.

Downchuck avatar Downchuck commented on May 21, 2024

ah, I either misread my debug output -- "echo" works just fine for Sqlite3 too -- or I did a bad job at calling addN. Closing issue.

I did notice benchmarks that show placeholder styles could be a bit faster for large data, but that's not the purpose of this report. Closing.

from rdflib-sqlalchemy.

Downchuck avatar Downchuck commented on May 21, 2024

While this is open - I ran a view/materialization query, and that one using literal did use placeholders, but it also wound up using insert one record at a time.

Is there something just off with this count query?

    """PREFIX tg: <http://www.context.com>
    INSERT {
        GRAPH <tg:po-cnt> {?p ?o ?ct}
    }
    WHERE {
        SELECT ?p ?o (count(*) as ?ct)
        {?s ?p ?o}
        GROUP BY ?p ?o
    }""")

from rdflib-sqlalchemy.

mwatts15 avatar mwatts15 commented on May 21, 2024

For that, it would depend on the combinations of predicates and objects in your source graph -- INSERT is evaluated for every unique ?p and ?o and that happens to translate into one call to addN for each evaluation.

Please check out RDFLib support resources for further questions.

from rdflib-sqlalchemy.

Downchuck avatar Downchuck commented on May 21, 2024

Thanks @mwatts15 -- that's the issue I was wondering about. It's unfortunate that each call results in one addN, instead of a bulk insertion, given a trivial query.

That said, this does seem like a potential issue upstream -- so I will work further on understanding those abstractions to see if these trivial aggregates can be improved.

from rdflib-sqlalchemy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.