rdflib / rdflib-sqlalchemy Goto Github PK

View Code? Open in Web Editor NEW

145.0 23.0 33.0 948 KB

RDFLib store using SQLAlchemy dbapi as back-end

License: Other

Python 99.09% Shell 0.91%

rdflib-sqlalchemy's Introduction

RDFLib-SQLAlchemy

A SQLAlchemy-backed, formula-aware RDFLib Store. It stores its triples in the following partitions:

Asserted non rdf:type statements.
Asserted rdf:type statements (in a table which models Class membership). The motivation for this partition is primarily query speed and scalability as most graphs will always have more rdf:type statements than others.
All Quoted statements.

In addition, it persists namespace mappings in a separate table. Table names are prefixed kb_{identifier_hash}, where identifier_hash is the first ten characters of the SHA1 hash of the given identifier.

Back-end persistence

Back-end persistence is provided by SQLAlchemy.

Tested dialects are:

SQLite, using the built-in Python driver
MySQL, using the MySQLdb-python driver or, for Python 3, mysql-connector
PostgreSQL, using the psycopg2 driver or the pg8000 driver.

pysqlite: https://pypi.python.org/pypi/pysqlite

MySQLdb-python: https://pypi.python.org/pypi/MySQL-python

mysql-connector: http://dev.mysql.com/doc/connector-python/en/connector-python.html

psycopg2: https://pypi.python.org/pypi/psycopg2

pg8000: https://pypi.python.org/pypi/pg8000

Development

Note: Currently, rdflib-sqlalchemy is in maintenance mode. That means the current maintainer (@mwatts15) will do what he can to keep the package working for existing use-cases, but new features will not be added and newer versions of SQLAlchemy will not be supported. If you have an interest in further development of rdflib-sqlalchemy, please get in touch with @mwatts15 or core RDFLib project developers.

Github repository: https://github.com/RDFLib/rdflib-sqlalchemy

Continuous integration: https://travis-ci.org/RDFLib/rdflib-sqlalchemy/

An illustrative unit test:

import unittest
from rdflib import plugin, Graph, Literal, URIRef
from rdflib.store import Store


class SQLASQLiteGraphTestCase(unittest.TestCase):
    ident = URIRef("rdflib_test")
    uri = Literal("sqlite://")

    def setUp(self):
        self.graph = Graph("SQLAlchemy", identifier=self.ident)
        self.graph.open(self.uri, create=True)

    def tearDown(self):
        self.graph.destroy(self.uri)
        try:
            self.graph.close()
        except:
            pass

    def test01(self):
        self.assert_(self.graph is not None)
        print(self.graph)

if __name__ == '__main__':
    unittest.main()

Running the tests

pytest is supported as a test runner, typically called via tox. Select the SQL back-end by setting a DB environment variable. Select the database connection by setting the DBURI variable. With tox, you can also specify the Python version.

Using pytest::

DB='pgsql' DBURI='postgresql+psycopg2://user:password@host/dbname' pytest

Using tox::

DB='pgsql' DBURI='postgresql+psycopg2://user:password@host/dbname' tox -e py310

DB variants are 'pgsql', 'mysql' and 'sqlite'. Except in the case of SQLite, you'll need to create the database independently, before execution of the test.

Sample DBURI values::

dburi = Literal("mysql://username:password@hostname:port/database-name?other-parameter")
dburi = Literal("mysql+mysqldb://user:password@hostname:port/database?charset=utf8")
dburi = Literal('postgresql+psycopg2://user:password@hostname:port/database')
dburi = Literal('postgresql+pg8000://user:password@hostname:port/database')
dburi = Literal('sqlite:////absolute/path/to/foo.db')
dburi = Literal("sqlite:///%(here)s/development.sqlite" % {"here": os.getcwd()})
dburi = Literal('sqlite://') # In-memory

rdflib-sqlalchemy's People

Contributors

Stargazers

Watchers

rdflib-sqlalchemy's Issues

SQLAlchemy.open should raise exception if created=False and store doesn't exist

According to the docs in rdflib:

https://github.com/RDFLib/rdflib/blob/fbf95a8ea506f95394c36a09241448e6900fbd77/rdflib/store.py#L171

However the open in SQLAlchemy returns None if created=False is passed in.

Working with huge graphs

"huge" as in "too big to fit into memory". I don't think I've seen this mentioned in the readme, so please excuse me if it's a dumb question. But considering that rdflib works in-memory, I was wondering if this plugin also needs to load all the graph in memory in order to work.

Extra "test" subdirectory in tests ?

Starting investigate this issue #71 , in this fork https://github.com/rchateauneu/rdflib-sqlalchemy/tree/SQLAlchemy_1_4_Count, and running tests, I was wondering if some "test" subdirectories specified in some scripts are needed or not ?

test/test_aggregate_graphs.py

self.graph4.parse('test/rdf-schema.n3', format='n3')

test/test_store_performance.py

inputloc = os.getcwd() + '/test/sp2b/%s.n3' % i
self.path = "sqlite:///%(here)s/test/tmpdb.sqlite" % {
            "here": os.getcwd()
        }

I have to remove these "test/", in order to be able to run the tests with pytest, like this:

pytest -v test

Did I do something wrong ? Thanks.

MySQL-sqlalchemy cannot create database

I wrote tfe following code as in https://rdflib.readthedocs.org/en/latest/persistence.html to use MySQL instead there are two cases:

Create rdfstore in MySQL before running with collation: 1) Server Default, 2)UTF-8_general_ci 3)UTF-8_unicode_ci but none passes the asserion
rdfstore not in MySQL server, run the code: then the message sqlalchemy.exc.OperationalError: (OperationalError) (1049, "Unknown database 'rd
fstore'") None None comes up.

If my mistake is something very simple, could you please advise me where I can find documentation for rdflib so I won't bother you too much with simple stuff.

import rdflib
from rdflib.graph import ConjunctiveGraph as Graph
from rdflib import plugin
from rdflib.store import Store, NO_STORE, VALID_STORE
from rdflib.namespace import Namespace
from rdflib.term import Literal
from rdflib.term import URIRef
from tempfile import mkdtemp

uri = Literal("mysql://username:password@localhost:3306/rdfstore")
ident = URIRef("rdflib_test")

store = plugin.get("SQLAlchemy", Store)(ident)
#default_graph_uri = "http://rdflib.net/rdfstore"
#configString = "/var/tmp/rdfstore"
#Get the Sleepycat plugin.
#store = plugin.get('Sleepycat', Store)('rdfstore')

g = Graph(store="SQLAlchemy", identifier=ident)
rt = g.open(uri, create=False)
#Open previously created store, or create it if it doesn't exist yet
#graph = Graph(store="Sleepycat",
#identifier = URIRef(default_graph_uri))
#path = mkdtemp()
#rt = graph.open(path, create=False)
if rt == NO_STORE:
    # There is no underlying Sleepycat infrastructure, create it
    g.open(uri, create=True)
else:
    assert rt == VALID_STORE, "The underlying store is corrupt"

SQLAlchemy.close() (and open()) fails

The close method tries to close the engine before setting it to None.

https://github.com/RDFLib/rdflib-sqlalchemy/blob/0.3.5/rdflib_sqlalchemy/store.py#L623

However, the Engine class in SQLAlchemy doesn't have a close() method so an exception is raised.

Add mwatts to PyPi maintainers

In abeyance, pending confirmation / advice of pypi username (perhaps https://pypi.org/user/MarkWatts/)?

How to query the database?

It looks like the store doesn't implement the query method. So calling query on a graph using RDFLib-SQLAlchemy as a store throws a NotImplementedError.

Drop support for old Python versions

Python 2.7 has been deprecated since last year, and it's increasingly difficult to maintain this project on Python 2.7, so it's past time to stop supporting it. Support for Python versions earlier than 3.6 will also be dropped as they have also reached EOL.

“Watch” flag is unintuitively disabled by default for repos co-owners.

TIL that the “watch/unwatch” flag is automatically enabled for repos owners but for repos co-owners is disabled by default.

[Suggestion] Better Documentation and Examples

I've been trying to use this module, it's very great, but the code is missing some good examples on how to use and some 101 documentation for beginners. It would be nice to add it to rdflib documentation.

Allow re-inserting triples in SQLite

Currently, you get an exception if you try to insert a triple or more than once with this store, at least using a SQLite db backend. Rather than this, the user should not get an error, but should have duplicate triples insertions silently replaced.

size of sqlite database

Hi, thanks for this very useful rdflib plugin! I am running some tests and comparisons and am noticing that using sqlite results in very large db sizes. I have a graph of ~10k triples and it serializes on disk to ~2MB using rdf-xml and a sqlite db of almost 14MB - is this expected? Or is there some setup step I'm missing that would make the db more reasonable? Thanks!

Bulk insertion : avoid commiting each time a triple is added

Hello,

The current behaviour of the plugin make SQLAlchemy work in autocommit mode.

Each time a triple is added, it is committed making bulk insertion very slow : 1 min 30 s for 500 triples with SQLite, 25 seconds with MySQL (using the triples files of rdflib-benchmark).

The old rdflib-mysql plugin was not issuing a commit on each triple insertion but only when the commit method of the store was used.

I made an quick and dirty change of the plugin to test the impact on performance : begin a transaction when the store is opened and commit only when the store commit method is called.
In this context, 500 triples are added in 0.3 second for SQLite and 1.15 seconds for MySQL.

https://github.com/ktbs/rdflib-sqlalchemy/blob/avoid_autocommit/rdflib_sqlalchemy/SQLAlchemy.py

Maybe autocommiting could be a store parameter ?

Regards

entry_points removal and plugin registration issue

Hi,

I did a project depending on rdflib-sqlalchemy in the past. Recently, I updated rdflib-sqlalchemy and started to have troubles, getting the message:

PluginException: No plugin registered for (SQLAlchemy, <class 'rdflib.store.Store'>)

It seems to have to do with entry_points removal in setup.py which from what I got is a mechanism to auto-register plugins. I could trace back in the commit 7ce9756.

I'm not familiar at all with entry_points. Could you confirm it is related? Is there a new procedure to register the plugin?

Thanks.

create_engine kwargs is empty

And in (story.py line 235). "self.engine = sqlalchemy.create_engine(configuration)", for "create_engine(*args, **kwargs)" configuration should be in second arg "kwargs"?

Pg8000 and non-mapped type

Good day!
Is pg8000 Python driver for postgres supported?

I have errors like these, due to non-mapped type of @Prefix xsd: http://www.w3.org/2001/XMLSchema# in my RDF:

File "pg8000/core.py", line 1885, in make_params
"not mapped to pg type")

sqlalchemy.exc.NotSupportedError: (pg8000.core.NotSupportedError) type <class 'rdflib.term.URIRef'>not mapped to pg type [SQL: u'INSERT INTO kb_bec6803d52_literal_statements (subject, predicate, object, context, termcomb, objlanguage, objdatatype) VALUES (%s, %s, %s, %s, %s, %s, %s)'] [parameters: (u'http://example.com/penid1109001', u'http://www.w3.org/1999/02/22-rdf-syntax-ns#ID', u'1109001', u'example', 9, None, rdflib.term.URIRef(u'http://www.w3.org/2001/XMLSchema#integer'))]

sqlalchemy.exc.NotSupportedError: (pg8000.core.NotSupportedError) type <class 'rdflib.term.URIRef'>not mapped to pg type [SQL: u'INSERT INTO kb_bec6803d52_literal_statements (subject, predicate, object, context, termcomb, objlanguage, objdatatype) VALUES (%s, %s, %s, %s, %s, %s, %s)'] [parameters: (u'ub1bL12C21', u'http://www.w3.org/2002/07/owl#minQualifiedCardinality', u'2', u'example2', 69, None, rdflib.term.URIRef(u'http://www.w3.org/2001/XMLSchema#nonNegativeInteger'))]

SQLAlchemy version is 1.0.6, RDFLib 4.2.1.

Bulk insertion: use executemany option

Brought up in this thread around addN: #9 - bulk insertion could group by parameterized query, making only one call to SQLAlchemy per prepared statement.

The "INSERT INTO" statement for many drivers is passed from execute to executemany when a list of tuples is provided in the execute statement. This significantly improves performance.

Prefix retrieval won't work as expected if the prefix is empty

Hi there !
Thanks for the wonderful work on this package :) Let's talk about the issue.

If you set a prefix by default, which you would do by using an empty prefix such as "" when you bind, it is unsupported by rdflib-sqlalchemy while it's clearly supported by RDFLib :

In rdflib/namespace.py

    def qname(self, uri):
        prefix, namespace, name = self.compute_qname(uri)
        if prefix == "":
            return name
        else:
            return ":".join((prefix, name))

prefix can be found to be "". But if we look at the prefix lookup in the SQLAlchemy store :

In rdflib_sqlalchemy/store.py

    def prefix(self, namespace):
        """Prefix."""
        with self.engine.connect() as connection:
            nb_table = self.tables["namespace_binds"]
            namespace = text_type(namespace)
            s = select([nb_table.c.prefix]).where(nb_table.c.uri == namespace)
            res = connection.execute(s)
            rt = [rtTuple[0] for rtTuple in res.fetchall()]
            res.close()
            return rt and rt[0] or None

return rt and rt[0] or None will return None if rt[0] == ""

Is "klass" a typo?

Should klass be class instead?

rdflib-sqlalchemy/rdflib_sqlalchemy/tables.py

Line 76 in abe2088

Column("klass", TermType, nullable=False),

Implications of adding id/pk field to _statements tables?

In my fork I set the SQLAlchemy table definitions to include id fields as primary keys in the four _statements tables. Is there any reason you can think of that this might break, either in this store plugin or in rdflib more generally? So far I am seeing no issue. I included it because I want to build a Django admin interface directly on the SQLAlchemy store in pursuit of some other goals.

Problem when loading data into sqlite

I am currently trying to load from a sample file from yago ontology (n3). This one for example.

But i am getting this error.

sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such table: kb_20f710cd41_namespace_binds [SQL: 'SELECT kb_20f710cd41_namespace_binds.prefix \nFROM kb_20f710cd41_namespace_binds \nWHERE kb_20f710cd41_namespace_binds.uri = ?'] [parameters: ('http://www.w3.org/XML/1998/namespace',)]

The complete stacktrace for this is

Traceback (most recent call last):
  File "/Users/max/.virtualenvs/twitter-trending/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context
    context)
  File "/Users/max/.virtualenvs/twitter-trending/lib/python3.5/site-packages/sqlalchemy/engine/default.py", line 470, in do_execute
    cursor.execute(statement, parameters)
sqlite3.OperationalError: no such table: kb_20f710cd41_namespace_binds

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/max/.virtualenvs/twitter-trending/lib/python3.5/site-packages/rdflib/graph.py", line 1041, in parse
    parser.parse(source, self, **args)
  File "/Users/max/.virtualenvs/twitter-trending/lib/python3.5/site-packages/rdflib/plugins/parsers/notation3.py", line 1897, in parse
    conj_graph.namespace_manager = graph.namespace_manager
  File "/Users/max/.virtualenvs/twitter-trending/lib/python3.5/site-packages/rdflib/graph.py", line 330, in _get_namespace_manager
    self.__namespace_manager = NamespaceManager(self)
  File "/Users/max/.virtualenvs/twitter-trending/lib/python3.5/site-packages/rdflib/namespace.py", line 281, in __init__
    self.bind("xml", "http://www.w3.org/XML/1998/namespace")
  File "/Users/max/.virtualenvs/twitter-trending/lib/python3.5/site-packages/rdflib/namespace.py", line 394, in bind
    bound_prefix = self.store.prefix(namespace)
  File "/Users/max/.virtualenvs/twitter-trending/lib/python3.5/site-packages/rdflib_sqlalchemy/store.py", line 659, in prefix
    res = connection.execute(s)
  File "/Users/max/.virtualenvs/twitter-trending/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 945, in execute
    return meth(self, multiparams, params)
  File "/Users/max/.virtualenvs/twitter-trending/lib/python3.5/site-packages/sqlalchemy/sql/elements.py", line 263, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/Users/max/.virtualenvs/twitter-trending/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1053, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/Users/max/.virtualenvs/twitter-trending/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1189, in _execute_context
    context)
  File "/Users/max/.virtualenvs/twitter-trending/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1393, in _handle_dbapi_exception
    exc_info
  File "/Users/max/.virtualenvs/twitter-trending/lib/python3.5/site-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/Users/max/.virtualenvs/twitter-trending/lib/python3.5/site-packages/sqlalchemy/util/compat.py", line 186, in reraise
    raise value.with_traceback(tb)
  File "/Users/max/.virtualenvs/twitter-trending/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context
    context)
  File "/Users/max/.virtualenvs/twitter-trending/lib/python3.5/site-packages/sqlalchemy/engine/default.py", line 470, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such table: kb_20f710cd41_namespace_binds [SQL: 'SELECT kb_20f710cd41_namespace_binds.prefix \nFROM kb_20f710cd41_namespace_binds \nWHERE kb_20f710cd41_namespace_binds.uri = ?'] [parameters: ('http://www.w3.org/XML/1998/namespace',)]

The code that produces this error is

from rdflib import Graph
from rdflib_sqlalchemy.store import SQLAlchemy
from sqlalchemy import create_engine

engine = create_engine('sqlite:///foo.db')

store = SQLAlchemy(engine=engine)
graph = Graph(store)

graph.load('yago/sample.rdf', format='n3')

testLeninMultipleContexts test fails.

https://github.com/RDFLib/rdflib-sqlalchemy/blob/master/test/context_case.py#L130

def testLenInMultipleContexts(self):
    oldLen = len(self.graph.store)
    print("self.graph.store: oldLen", oldLen, self.graph.store)
    self.addStuffInMultipleContexts()
    newLen = len(self.graph.store)
    print("stuffaddedinMultipleContexts: newLen", newLen, self.graph.store)
    # addStuffInMultipleContexts is adding the same triple to
    # three different contexts. So it's only + 1
    print("self.graph.triples with no context",
          len(list(self.graph.triples((None, None, None)))))
    print("self.graph.triples from context-1", len(
        list(self.graph.triples((None, None, None), context=self.c1))))
    print("self.graph.triples from context-2", len(
        list(self.graph.triples((None, None, None), context=self.c2))))
    print("asserting len(self.graph.store) = oldLen + 1 == %s" %
          len(self.graph.store))
    self.assertEquals(len(self.graph.store), oldLen + 1,
                      [self.graph.store, oldLen + 1])

    graph = Graph(self.graph.store, self.c1)
    self.assertEquals(len(graph.store), oldLen + 1,
                      [graph.store, oldLen + 1])

('self.graph.store: oldLen', 0, <Partitioned MySQL N3 Store>)
('stuffaddedinMultipleContexts: newLen', 3, <Partitioned MySQL N3 Store>)
('self.graph.triples with no context', 1)
('self.graph.triples from context-1', 1)
('self.graph.triples from context-2', 1)
asserting len(self.graph.store) = oldLen + 1 == 3
FAIL

What is the license for this repo? Can you add a LICENSE file?

Rebinding a prefix fails with SQLite

The current SQLite schema for binds looks like

CREATE TABLE kb_f526f39709_namespace_binds (
	prefix VARCHAR(20) NOT NULL, 
	uri TEXT, 
	PRIMARY KEY (prefix), 
	UNIQUE (prefix)
);

and the code in store.bind does (as of version 0.3.8):

ins = self.tables["namespace_binds"].insert().values(
prefix=prefix, uri=namespace)
connection.execute(ins)

which fails when rebinding a prefix, as the old prefix is not replaced...

Release 0.4.0

Features

Adding max_terms_per_where option allowing to limit the number of terms in generated SQL in a call to addN

Fixes

Restoring the setuptools RDFLib store plugin entrypoint so users no longer need to call registerplugins in most cases
Making addN, add, bind not error out when there's a conflicting entry

Maintenance

Removing uses of rdflib.py3compat

python3

From Travis It looks like the unit tests are failing under python3?

ERROR: Failure: SyntaxError (invalid syntax (term.py, line 1559))
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/virtualenv/python3.3.5/lib/python3.3/site-packages/nose/failure.py", line 39, in runTest
    raise self.exc_val.with_traceback(self.tb)
  File "/home/travis/virtualenv/python3.3.5/lib/python3.3/site-packages/nose/loader.py", line 414, in loadTestsFromName
    addr.filename, addr.module)
  File "/home/travis/virtualenv/python3.3.5/lib/python3.3/site-packages/nose/importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/home/travis/virtualenv/python3.3.5/lib/python3.3/site-packages/nose/importer.py", line 94, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/home/travis/virtualenv/python3.3.5/lib/python3.3/imp.py", line 190, in load_module
    return load_package(name, filename)
  File "/home/travis/virtualenv/python3.3.5/lib/python3.3/imp.py", line 160, in load_package
    return _bootstrap.SourceFileLoader(name, path).load_module(name)
  File "<frozen importlib._bootstrap>", line 584, in _check_name_wrapper
  File "<frozen importlib._bootstrap>", line 1022, in load_module
  File "<frozen importlib._bootstrap>", line 1003, in load_module
  File "<frozen importlib._bootstrap>", line 560, in module_for_loader_wrapper
  File "<frozen importlib._bootstrap>", line 868, in _load_module
  File "<frozen importlib._bootstrap>", line 313, in _call_with_frames_removed
  File "/home/travis/build/RDFLib/rdflib-sqlalchemy/build/src/test/__init__.py", line 1, in <module>
    from rdflib import plugin
  File "/home/travis/virtualenv/python3.3.5/src/rdflib/rdflib/__init__.py", line 112, in <module>
    from rdflib.term import (
nose.proxy.SyntaxError: invalid syntax (term.py, line 1559)

How do I load this plugin?

Hi! I'd like to use this plugin to persist an rdflib graph into sqlite. Unfortunately I do not understand how to load this plugin. In the README there is only a test class that I don't fully understand. Could you please add an example to show how to use the plugin with rdflib?

Thank you a lot for making this plugin.

Upload package to PyPI

Would it be possible for you to upload this package to PyPI?

Currently there only is the out-dated rdflib-sqlite package on PyPI. It would be great get this more up to date package up there too so that other projects can easily depend on rdflib-sqlalchemy.

using unlogged table feature in postgres

Hi,

We are trying to use this with postgres as the underlying data engine.
The RDF data we have is quite large and is taking a bit of time to get loaded.

Was wondering if there is anyway we can control the backend table to be created
as a unlogged table in postgres.

thanks.
sandeep

Create a new release, consider switching to use git-flow conventions for development

Specifically:
What would be involved currently in tagging and pushing to PyPI a new versioned release of the project after recent merge of my previous PR #17 ?

More generally, I noticed the repository does not make use of git tags or branches to track released versions or automatically deploy tagged releases to PyPI.

I'd like to suggest adopting the widely-used git-flow convention (and associated CLI tool) for managing this going forward.

TL;DR this means:

all ongoing development happens in 'develop' branch
'master' branch always reflects latest released version to which hot fixes can be applied directly if needed
individual feature work is done on feature/* branches, and merged back to develop via PRs
individual releases go through a release/* branch and get a git tag. This makes it easy to jump between different released versions, apply hotfixes and maintain ongoing development branch.

Further, this model allows for easy release to PyPI - once a release is tagged and merged to master, we can define branch-specific CI logic that will build distributable and deploy it to PyPI (using something like this: https://docs.travis-ci.com/user/deployment/pypi/)

SQLAlchemy-1.4.0-cp37-cp37m-win_amd64.whl

There is apparently a problem with rdflib_sqlalchemy-0.4.0-py3-none-any.whl and the new version of SQLAlchemy-1.4.0-cp37-cp37m-win_amd64.whl compared to SQLAlchemy-1.3.23-cp37-cp37m-win_amd64.whl

My tests running on Travis used to work until yesterday https://travis-ci.com/github/rchateauneu/survol/jobs/491016928 and they are broken with the latest SQLAlchemy-1.4.0 version : Same problems with Linux Python 2 and 3, and Windows Python 3.

All the other packages loaded by TravisCI are the identical except this dependency.

One of the errors is for example:

        for table, whereClause, tableType in select_components:

            if select_type == COUNT_SELECT:
>               select_clause = table.count(whereClause)
E               AttributeError: 'Alias' object has no attribute 'count'
c:\python37\lib\site-packages\rdflib_sqlalchemy\sql.py:56: AttributeError

Thanks

rdflib.py3compat deprecated in 5x

rdflib-sqlalchemy.termutils is importing from rdflib.py3compat. This still works with RDFLib 4.2.2, but breaks with the 5x (master) branch. Much of the functionality of py3compat is moving to compat instead, but the format_doctest_out decorator is nowhere to be found in master. I don't see an existing issue raised for this, but it is likely to cause problems down the road. I also don't know what the fix is, but will investigate as time permits.

Here's what I am seeing:

Traceback (most recent call last):
   File "/usr/local/lib/python3.6/site-packages/gunicorn/arbiter.py", line 578, in spawn_worker
     worker.init_process()
   File "/usr/local/lib/python3.6/site-packages/gunicorn/workers/base.py", line 126, in init_process
     self.load_wsgi()
   File "/usr/local/lib/python3.6/site-packages/gunicorn/workers/base.py", line 135, in load_wsgi
     self.wsgi = self.app.wsgi()
   File "/usr/local/lib/python3.6/site-packages/gunicorn/app/base.py", line 67, in wsgi
     self.callable = self.load()
   File "/usr/local/lib/python3.6/site-packages/gunicorn/app/wsgiapp.py", line 65, in load
     return self.load_wsgiapp()
   File "/usr/local/lib/python3.6/site-packages/gunicorn/app/wsgiapp.py", line 52, in load_wsgiapp
     return util.import_app(self.app_uri)
   File "/usr/local/lib/python3.6/site-packages/gunicorn/util.py", line 352, in import_app
     __import__(module)
   File "/usr/src/app/skjold/wsgi.py", line 16, in <module>
     application = get_wsgi_application()
   File "/usr/local/lib/python3.6/site-packages/django/core/wsgi.py", line 12, in get_wsgi_application
     django.setup(set_prefix=False)
   File "/usr/local/lib/python3.6/site-packages/django/__init__.py", line 19, in setup
     configure_logging(settings.LOGGING_CONFIG, settings.LOGGING)
   File "/usr/local/lib/python3.6/site-packages/django/conf/__init__.py", line 56, in __getattr__
     self._setup(name)
   File "/usr/local/lib/python3.6/site-packages/django/conf/__init__.py", line 43, in _setup
     self._wrapped = Settings(settings_module)
   File "/usr/local/lib/python3.6/site-packages/django/conf/__init__.py", line 106, in __init__
     mod = importlib.import_module(self.SETTINGS_MODULE)
   File "/usr/local/lib/python3.6/importlib/__init__.py", line 126, in import_module
     return _bootstrap._gcd_import(name[level:], package, level)
   File "/usr/src/app/skjold/settings.py", line 132, in <module>
     from rdflib_sqlalchemy.store import SQLAlchemy
   File "/usr/local/lib/python3.6/site-packages/rdflib_sqlalchemy/store.py", line 40, in <module>
     from rdflib_sqlalchemy.base import SQLGeneratorMixin
   File "/usr/local/lib/python3.6/site-packages/rdflib_sqlalchemy/base.py", line 7, in <module>
     from rdflib_sqlalchemy.termutils import (
   File "/usr/local/lib/python3.6/site-packages/rdflib_sqlalchemy/termutils.py", line 4, in <module>
     from rdflib.py3compat import format_doctest_out
 ModuleNotFoundError: No module named 'rdflib.py3compat'

Working with MySQL, getting random "Specified key was too long; max key length is 3072 bytes" error

I am new to rdflib-sqlalchemy and I try to use it with MySQL.
I am currently running
rdflib 5.0.0
rdflib-sqlalchemy 0.4.0
SQLAlchemy 1.3.9
mysql-connector-python 8.0.21

By simply trying to create a graph in the database it results most of the time in the following error:
sqlalchemy.exc.OperationalError: (MySQLdb._exceptions.OperationalError) (1071, 'Specified key was too long; max key length is 3072 bytes')
[SQL: CREATE UNIQUE INDEX kb_efa2ab3084_quoted_spoc_key ON kb_efa2ab3084_quoted_statements (subject(200), predicate(200), object(200), objlanguage(200), context(200))]

What is really strange is that sometimes, the creation works and the error does not show.
Any suggestions ?

Regards,
Etienne

Engine.has_table() method is deprecated

There seems to be a future deprecation in SQLAlchemy because I get this warning:

python3.8/site-packages/rdflib_sqlalchemy/store.py:762: SADeprecationWarning: The Engine.has_table() method is deprecated and will be removed in a future release.  Please refer to Inspector.has_table(). (deprecated since: 1.4)
    if not self.engine.has_table(table_name):

It is apparently replaced by this:

[Reflecting Database Objects Inspector.has_table](https://docs.sqlalchemy.org/en/14/core/reflection.html#sqlalchemy.engine.reflection.Inspector.has_table)

Many thanks.

PS: Also, when will be the next release, please ?? Thanks !

Add appropriate error handling for use before calling open()

addN returns how many triples were actually added?

So I understand that addN ignores-on-conflict for any duplicate triples (at least with psycopg2). Are there any methods that return the number of triples that were actually inserted into the database? Could this functionality be added? It'd be nice to know if this function was simply a no-op or not.

Cleanup unit-testing invocation

Currently we have the following:

a tox.ini file for running tests with the Tox test runner
a run_tests.py and run_tests_py3.sh wrapper scripts for running tests using nose, along with a skiptests.list file which seems to be empty / not used?
a .travis.yml configuration file for Travis CI which sets up dependencies / environment and invokes nosetests directly, along with associated requirements-py{2,3}.txt files

I would suggest we consolidate all of these to use tox so that a single tox.ini file contains all the definitions and dependencies required to run all unit-tests locally as well as in CI, so that travis CI's configuration file can be reduced to simply setting up tox and invoking it. We should be able to get rid of a few of the above files / definitions and make it clearer how to actually run unit-tests.

Along those lines, I would also consider adding running flake8 (Python linter / style checker) as part of running unit-tests on CI, this will help catch common issues early on (e.g unused imports/variables)

Maintain a fixed bound on memory usage in triples() calls

Build error for Python 2.7 for MySQL

Error observed during test testAdd (test.test_sqlalchemy_mysql.SQLAMySQLContextTestCase).

https://travis-ci.org/github/RDFLib/rdflib-sqlalchemy/jobs/764264566

Traceback (most recent call last):

  File "/home/travis/build/RDFLib/rdflib-sqlalchemy/test/test_sqlalchemy_mysql.py", line 64, in setUp

    uri=self.uri, storename=self.storename)

  File "/home/travis/build/RDFLib/rdflib-sqlalchemy/test/context_case.py", line 29, in setUp

    self.graph.open(uri, create=True)

  File "/home/travis/build/RDFLib/rdflib-sqlalchemy/.tox/py27/lib/python2.7/site-packages/rdflib/graph.py", line 373, in open

    return self.__store.open(configuration, create)

  File "/home/travis/build/RDFLib/rdflib-sqlalchemy/rdflib_sqlalchemy/store.py", line 276, in open

    self.engine = sqlalchemy.create_engine(url, **kwargs)

  File "<string>", line 2, in create_engine

  File "/home/travis/build/RDFLib/rdflib-sqlalchemy/.tox/py27/lib/python2.7/site-packages/sqlalchemy/util/deprecations.py", line 298, in warned

    return fn(*args, **kwargs)

  File "/home/travis/build/RDFLib/rdflib-sqlalchemy/.tox/py27/lib/python2.7/site-packages/sqlalchemy/engine/create.py", line 518, in create_engine

    u = _url.make_url(url)

  File "/home/travis/build/RDFLib/rdflib-sqlalchemy/.tox/py27/lib/python2.7/site-packages/sqlalchemy/engine/url.py", line 711, in make_url

    return _parse_rfc1738_args(name_or_url)

  File "/home/travis/build/RDFLib/rdflib-sqlalchemy/.tox/py27/lib/python2.7/site-packages/sqlalchemy/engine/url.py", line 774, in _parse_rfc1738_args

    return URL.create(name, **components)

  File "/home/travis/build/RDFLib/rdflib-sqlalchemy/.tox/py27/lib/python2.7/site-packages/sqlalchemy/engine/url.py", line 152, in create

    cls._str_dict(query),

  File "/home/travis/build/RDFLib/rdflib-sqlalchemy/.tox/py27/lib/python2.7/site-packages/sqlalchemy/engine/url.py", line 211, in _str_dict

    for key, value in dict_items

  File "/home/travis/build/RDFLib/rdflib-sqlalchemy/.tox/py27/lib/python2.7/site-packages/sqlalchemy/engine/url.py", line 211, in <dictcomp>

    for key, value in dict_items

  File "/home/travis/build/RDFLib/rdflib-sqlalchemy/.tox/py27/lib/python2.7/site-packages/sqlalchemy/engine/url.py", line 189, in _assert_value

    return tuple(_assert_value(elem) for elem in val)

  File "/home/travis/build/RDFLib/rdflib-sqlalchemy/.tox/py27/lib/python2.7/site-packages/sqlalchemy/engine/url.py", line 189, in <genexpr>

    return tuple(_assert_value(elem) for elem in val)

The last frame repeats until max recursion depth is reached.

rdflib_sqlalchemy.SQLAlchemy.SQLAlchemy.repr tracebacks

The SQLAlchemy.__repr__ method raises an unhandled exception. A self-contained test:

In [6]: import rdflib

In [7]: x = rdflib.plugin.get('SQLAlchemy', rdflib.store.Store)(rdflib.URIRef("rdflib_test"))

In [8]: print x
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-8-2d264e11d975> in <module>()
----> 1 print x

/home/piro/src/rdfgambit/env/src/rdflib-sqlalchemy/rdflib_sqlalchemy/SQLAlchemy.pyc in __repr__(self)
    928                 None, ASSERTED_LITERAL_PARTITION), ]
    929         q = unionSELECT(selects, distinct=False, selectType=COUNT_SELECT)
--> 930         with self.engine.connect() as connection:
    931             res = connection.execute(q)
    932             rt = res.fetchall()

AttributeError: 'SQLAlchemy' object has no attribute 'engine'

What is the intersection with rdfalchemy and rdflib-sqlalchemy?

What is the intersection with https://github.com/gjhiggins/RDFAlchemy and https://github.com/RDFLib/rdflib-sqlalchemy?

Which responsibilities does each one cover?

Python3 import error with import sqlalchemy being thought of as a relative import on a Mac

I'm using python 3.5 and when running python setup.py build I noticed that the import line import sqlalchemy from rdflib_sqlalchemy/SQLAlchemy.py was turned into from . import sqlalchemy

I tracked down the issue to the python module lib2to3/fixes/fix_import.py in the probably_a_local_import method that's passed in sqlalchemy to look for to see if it is a local import. What happens is that it ends up checking to see if build/src/rdflib_sqlalchemy/sqlalchemy.py is a file.

Well this isn't really a file as build/src/rdflib_sqlalchemy/SQLAlchemy.py with different capitalization is one. However by default macs are case INsensitive (yes, that's really annoying) and so os.path.exists will return True.

This is rather tricky as it means that you can't have a file with the same (lowercased) name of a module that you're importing when building your module. Obviously later on people will be okay with it as they will do import sqlalchemy and import rdflib_sqlalchemy.SQLAlchemy and python recognizes those as two different modules. I'm surprised this doesn't come up more often.

I'm not sure of the best way to go about fixing this. Right now it is preventing me from using rdflib-sqlalchemy on a Mac. I could turn it into a case sensitive file system but that doesn't help out someone else with the default.

Maybe there's a flag to setup.py that can be used to avoid this nonsense?

Error running python setup.py install python 2.7???

Installed c:\python27\lib\site-packages\rdfextras-0.4-py2.7.egg
Searching for pyparsing<=1.5.7
Reading http://pypi.python.org/simple/pyparsing/
Reading http://pyparsing.wikispaces.com/
Reading http://sourceforge.net/project/showfiles.php?group_id=97203
Reading http://pyparsing.sourceforge.net/
Reading http://sourceforge.net/projects/pyparsing
Best match: pyparsing 1.5.7
Downloading http://sourceforge.net/projects/pyparsing/files/pyparsing/pyparsing-
1.5.7/pyparsing-1.5.7.win32-py2.7.exe/download
Processing download
error: Couldn't find a setup script in c:\users\admin\appdata\local\temp\easy_in
stall-jarxxt\download

"database is locked"

I'm having some issues with concurrency. I'm using rdflib-sqlalchemy with a sqlite backend and I see this error from time to time

sqlalchemy.exc.OperationalError
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) database is locked
[SQL: PRAGMA main.table_info("kb_cabe862c6b_asserted_statements")]
(Background on this error at: http://sqlalche.me/e/e3q8)

I think I might be opening/closing too many connections, but I'm not sure how to handle this.
What is the correct usage for dealing with concurrent requests? Should I open a global rdflib.Graph() object and keep using that? I think SQLAlchemy is supposed to maintain a connection pool and handle concurrency automatically, but I still see these errors.
Thank you.

Loading a large ontology using PostgreSQL as the backend results in BTREE index error

When I try to load a large ontology such as CHEBI (https://bioportal.bioontology.org/ontologies/CHEBI) I get a BTREE error with PostgreSQL. As mitigation, I disabled the index creation.

Catch sqlalchemy.exc.IntegrityError

Hi,

I would like to catch IntegrityError in the presence of potential duplicates (triples). My current env:

rdflib-sqlalchemy (0.3.8)
rdflib (4.2.2)
SQLAlchemy (1.1.9)

from __future__ import print_function
from rdflib import plugin, Graph, Literal, URIRef
from rdflib.store import Store
from rdflib_sqlalchemy import registerplugins
from sqlalchemy.exc import IntegrityError

registerplugins()
identifier = URIRef("test")
db_uri = Literal('sqlite:///test.db')
store = plugin.get("SQLAlchemy", Store)(identifier=identifier, configuration=db_uri)
g = Graph(store)
s1=URIRef('http://example.com/Volvo')
s2=URIRef('http://example.com/VW'
p=URIRef('http://example.com/is-a')
o=URIRef('http://example.com/car'))
for s,p,o in ((s1,p,o),(s1,p,o),(s2,p,o)):
   try:
      g.add((s,p,o))
   except IntegrityError:
      pass
print(g.serialize(format='turtle'))

ERROR:rdflib_sqlalchemy.store:Add failed with statement: INSERT INTO kb_a94a8fe5cc_asserted_statements (id, subject, predicate, object, context, termcomb) VALUES (:id, :subject, :predicate, :object, :context, :termComb), params: {'termComb': 1, 'predicate': rdflib.term.URIRef(u'http://example.com/is-a'), 'object': rdflib.term.URIRef(u'http://example.com/car'), 'context': rdflib.term.BNode('N6c8b02c004c54e18b5712f2aef28a567'), 'subject': rdflib.term.URIRef(u'http://example.com/Volvo')}
Traceback (most recent call last):
  File "/home/arni/.testenv/lib/python2.7/site-packages/rdflib_sqlalchemy/store.py", line 287, in add
    connection.execute(statement, params)
  File "/home/arni/.testenv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 945, in execute
    return meth(self, multiparams, params)
  File "/home/arni/.testenv/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 263, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/home/arni/.testenv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1053, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/home/arni/.testenv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1189, in _execute_context
    context)
  File "/home/arni/.testenv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1402, in _handle_dbapi_exception
    exc_info
  File "/home/arni/.testenv/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/home/arni/.testenv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context
    context)
  File "/home/arni/.testenv/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 470, in do_execute
    cursor.execute(statement, parameters)
IntegrityError: (sqlite3.IntegrityError) UNIQUE constraint failed: kb_a94a8fe5cc_asserted_statements.subject, kb_a94a8fe5cc_asserted_statements.predicate, kb_a94a8fe5cc_asserted_statements.object, kb_a94a8fe5cc_asserted_statements.context [SQL: u'INSERT INTO kb_a94a8fe5cc_asserted_statements (subject, predicate, object, context, termcomb) VALUES (?, ?, ?, ?, ?)'] [parameters: (u'http://example.com/Volvo', u'http://example.com/is-a', u'http://example.com/car', u'N6c8b02c004c54e18b5712f2aef28a567', 1)]
>>> print(g.serialize(format='turtle'))
@prefix ns1: <http://example.com/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

ns1:VW ns1:is-a ns1:car .

ns1:Volvo ns1:is-a ns1:car .

Add parameters for limits in statement size

SQLite particularly has limits in the sizes of various parts of its statements. There should be parameters for limiting parts of the statements created by rdflib-sqlalchemy.

Some limits:

limit the number of terms in a "where" clause created by addN

Maintain a fixed bound on memory usage in calls to addN()

How to query the database

Can someone explain me how to do a query on data stored in a database?

I tried the following:

from rdflib import plugin, Graph, Literal, URIRef

from rdflib.store import Store
from rdflib_sqlalchemy import registerplugins

registerplugins()
identifier = URIRef("benchmark")
db_uri = Literal('postgresql+psycopg2://postgres:mysecretpassword@localhost:5432/benchmark')
store = plugin.get("SQLAlchemy", Store)(identifier=identifier, configuration=db_uri)
graph = Graph(store)

query = """
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>

    SELECT ?product ?label
    WHERE {
        ?product rdfs:label ?label .
     ?product rdf:type bsbm:Product .
        FILTER regex(?label, "e")
    }
    """
query_result = graph.query(query)
for subject, predicate in query_result:
    print(subject, predicate)

Every hint would be great.

mysql connections might not be closed

When I use rdflib-sqlalchemy(0.3.8 ) with mysql , the connections will increase to 200+, and will still increase until python gc.

I watch the code, and found "self.engine = None" in class SQLAlchemy(story.py line 256).I believe this might cause this situation.