threatify / arango-orm Goto Github PK
View Code? Open in Web Editor NEWA SQLAlchemy like ORM implementation using python-arango as the backend library
License: GNU General Public License v3.0
A SQLAlchemy like ORM implementation using python-arango as the backend library
License: GNU General Public License v3.0
Hi !
I am just trying to play with arango-orm but i'm facing an issue.
Maybe i am doing thing wrong and would like some recommendations :
I have this:
from arango_orm.fields import String, Date, Nested
from arango_orm import Collection, Relation, Graph, GraphConnection
from arango_orm.references import relationship, graph_relationship
from datetime import datetime
class Department(Collection):
collection = 'department'
_index = [{'type': 'hash', 'unique': False, 'fields': ['name']}]
_key = String(required=True) # registration number
name = String(required=True, allow_none=False)
# employees = relationship(name + ".Employee", '_key', target_field='department_key')
# dob = Date()
class Role(Collection):
collection = 'role'
_index = [{'type': 'hash', 'unique': False, 'fields': ['name']}]
_key = String(required=True) # registration number
name = String(required=True, allow_none=False)
# employees = relationship(__name__ + ".Employee", '_key', target_field='role_key')
# dob = Date()
class Employee(Collection):
collection = 'employee'
_index = [{'type': 'hash', 'unique': False, 'fields': ['name']}]
_key = String(required=True) # registration number
name = String(required=True, allow_none=False)
#department_key = String()
#role_key = String()
hired_on = Date(default=datetime.now)
department = Nested(Department)
role = Nested(Role)
what i am trying to do is to have relation, or nested document in arangodb.In the collection employee i should see a role and a department for each employee.
here is how i fill datas;
what am i doing wrong ?
from models_aDB import Department, Employee, Role
from arango import ArangoClient
from arango_orm import Database
#Arango
client = ArangoClient(protocol='http', host='172.18.0.12', port=8529)
def init_arangodb():
dev_db = client.db('dev', username='xxx', password='yyyyy*', verify=True)
db = Database(dev_db)
if not db.has_collection('department'):
db.create_collection('department')
if not db.has_collection('employee'):
db.create_collection('employee')
if not db.has_collection('role'):
db.create_collection('role')
# create the fixture
engineering = Department(name='engineering',)
db.add(engineering)
hr = Department(name='Human Resources')
db.add(hr)
manager = Role(name='manager')
db.add(manager)
engineer = Role(name='engineer')
db.add(engineer)
peter = Employee(name='Peter')
peter.role = engineer
peter.department = engineering
db.add(peter)
roy = Employee(name='Roy')
roy.department = engineering
roy.role = engineer
db.add(roy)
tracy = Employee(name='Tracy')
tracy.role = manager
tracy.department = hr
db.add(tracy)
Thanks for your help
Docs states that Database and ConnectionPool are interchangeable, but I experienced some problems moving from the first to the former. ConnectionPool doesn't have has_graph
and graph
methods. These are just examples of issues I had. Maybe it's missing more stuff I didn't check. Because of this I have problems instantiating Graph objects and managing my database graphs.
Am I missing something obvious when moving from one to the other? My code is like this:
With Database:
from arango import ArangoClient
from arango_orm import Database
client = ArangoClient(protocol=DB_PROTOCOL, host=DB_HOST, port=DB_PORT)
db = Database(client.db(DB_NAME, username=DB_USERNAME, password=DB_PASSWORD))
graph = MyGraph(connection=db)
if not db.has_graph(graph.__graph__):
db.create_graph(graph)
With ConnectionPool:
from arango import ArangoClient
from arango_orm import ConnectionPool
clients = [ArangoClient(protocol=DB_PROTOCOL, host=DB_HOST, port=DB_PORT) for i in range(CONNECTION_POOL_SIZE)]
db = ConnectionPool(clients, dbname=DB_NAME, username=DB_USERNAME, password=DB_PASSWORD)
graph = MyGraph(connection=db)
if not db.has_graph(graph.__graph__):
db.create_graph(graph)
I'm trying to create a new database if it doesn't exist, but I get an arango.exceptions.DatabaseCreateError: [HTTP 404][ERR 1228] database not found
exception.
I wonder, why does it give me a database not found exception if I'm trying to create a new one.
As of python-arango 4.0.0 the database constructor takes another mandatory parameter, executor
. There's also a lot of changes in the newest version of python-arango.
Also, a note on version compatibility could be added to the readme.
Should Query.by_key
raise an exception instead of return None?
With a single call, I can test if it returned None:
obj = self.db.query(self.Model).by_key(key)
if not obj:
raise NotFound('No object with key {}'.format(key))
However, In the below scenario, an exception maybe more reasonable:
mol_keys = ['111', '222'] # from user input
try:
mols = [db.query(Molecule).by_key(k) for k in mol_keys]
except XXXException:
raise BadRequest
Here is the behavior of arangosh:
127.0.0.1:8529@Amidala> db._document('Task/14382999')._key
14382999
127.0.0.1:8529@Amidala> db._document('Task/xxx')._key
JavaScript exception in file '/usr/share/arangodb3/js/client/modules/@arangodb/arangosh.js' at 97,7: ArangoError 1202: document not found
! throw error;
! ^
stacktrace: ArangoError: document not found
at Object.exports.checkRequestResult (/usr/share/arangodb3/js/client/modules/@arangodb/arangosh.js:95:21)
at ArangoDatabase._document (/usr/share/arangodb3/js/client/modules/@arangodb/arango-database.js:626:12)
at <shell command>:1:4
If the document contains extra fields that are not present in the class schema, they should be stored somewhere without any type conversion etc.
Currently the ORM checks for the presence of the _key field and then checks the DB for record presence when adding new records. For cases where we manually specify _key field's value, this does not do us any good. A better option would be to use a separate field named _new
just like we use _dirty
.
Though we can keep current UnitTest based test cases, using pytest for the long run seems better specially with fixtures (or perhaps I'm more used to pytest now). So putting some effort now to convert existing test cases to pytest would be good.
Allow connecting to multiple arango nodes within a cluster and distribute requests across the nodes using round-robin strategy.
Can be done by creating a connection pooler class that mimics the Database class but returns a different database session object each time from the connection pool.
It seems this method is not working, after importing the Collection from the arango-orm package python is not recognizing this method for a collection class instance.
allow creating all graphs, collections etc using a single call. The method accepts a list of objects or strings (strings so that we can use a module's all property) containing either names or objects of type Collection, Relation and Graph. In turn it should create all missing collections and graphs.
Graphs should be created/update first ref #19 and then all the remaining collections of the above list should get created if they don't already exist.
The graph module should allow specifying Collection Classes as orphan collections
Currently our schema is defined as subclass of marshmallow.Schema
, separated from the model class itself. It's easy for implementation in some way but inconvenient many times.
dump
result. however dump
is better for front-end or cross-platform exchange, not the same as DB.Below is just a code sample from a project using Mezzanine(A Django-based CMS System):
class Article(Displayable, Ownable, RichText, AdminThumbMixin):
category = models.ForeignKey("Category", verbose_name="Category")
featured_image = FileField(
verbose_name=_("Featured Image"),
upload_to=upload_to("news.Article.featured_image", "news"),
format="Image",
max_length=255,
null=True,
blank=True
)
location = models.CharField(max_length=200, null=True, blank=True)
admin_thumb_field = "featured_image"
class Meta:
ordering = ("-publish_date", )
def get_absolute_url(self):
return reverse('article-detail', kwargs={'slug': self.slug})
When writing a new query, it is possible to specify the return fields, but that is the limit.
It would be beneficial to have the flexibility to include relations (with the source/target vertex) and the depth of the traversal, when querying collections.
This can be done with a graph, though the required records must first be retrieved and then expanded, which is inefficient.
As a design suggestion, could a new method be added to the Query class that allows for specifying relations to include, and the depth of the traversal?
We've frozen dependency to marshmallow 2.x while 3.x is the stable one now. Update to support latest version.
I see in #17 you changed to the SQLAlchemy "declarative style" where collections and fields are referenced directly in a class derived from Collection.
Is it still possible to operate with the Schema separated from the user class in a style more like SQLAlchemy's "classic" or "mapper" syntax where user classes are independent and not derived from any arango-orm classes but are instead associated with the persistence system using a mapper concept (I would assume in this case, a Marshmallow Schema)?
Although in many cases, the declarative-in-class style has many benefits, I also believe that there are also benefits to keeping your persistence layer out of your domain classes, and the mapper style provides this.
Once #11 is implemented, see if backref can be implemented similar to SQLAlchemy's relationship's backref parameter.
Allow specifying the indices to create for the collection within the collection class definition. Should support all types of indices (like hash, skiplist, fulltext, persistent, geo and vertex centering indices).
if we set the _key attribute to None in the schema (or don't provide a _key attribute) then arango will create a key automatically. We should have a way to access it via the object.
As per PEP8 specifications, the __collection__
variable in Collection
declarations should be renamed to something else e.g. __collection
(which would invoke namespace mangling, if that is desired) or _collection
.
__double_leading_and_trailing_underscore__: "magic" objects or attributes that live in user-controlled namespaces. E.g. __init__, __import__ or __file__. Never invent such names; only use them as documented.
https://www.python.org/dev/peps/pep-0008/#descriptive-naming-styles
I'm updating a relation (edge) with db.update(relationModel)
, set the _from
and _to
to be the id's of two vertices, and that works nicely. After the update I'd like to access the relationModel._object_from
and relationModel._object_to
, but they're None
after the update.
I'd expect the update operation to fill in these values, as they are after loading them from the database.
In your examples, all members of collections are simple primitive python types like string or integer or are types for which you have a specific field (such as date-time). How do you nest another object instance, (or an array of instances) as a field within a collection?
For example, in my Customer class I have a list of Address objects, which I want to be stored within my arrango document under the key "address":
{
"username": "Bobby Joe",
"address": [
{
"street": "123 Anywhere",
"city": "New York",
"state": "NY",
"zipcode": "12345"
},
{
"street": "PO BOX 12544",
"city": "New York",
"state": "NY",
"zipcode": "12346"
}
]
}
I know that in Marshmallow, you can nest one schema as a field of another schema using fields.Nested(), so that one schema appears as a data item of another document
However, if I do this within a collection by creating a marshmallow schema for the nested object, and using fields.Nested, arrango-orm creates the correct structure when persisting, but when loading the record back in, the nested object becomes a python dictionary (because there is no link to my domain class for it).
How do I set up my collection to support object instances within it?
I have a graph with three vertices, (User) -> (User Group) -> (App)
I'm trying to get a result by expanding (or AQL) so that the start node is the (App), and get all the connected vertices in the path, including the User, but I only get the (User Group) vertex information.
The code:
relations = vertex.expand(direction, depth) <- function returns the vertex._relations['relations'].
result = [
rel._next.to_dict() for rel in relations <- Returns the model attributes as a dictionary.
]
Is there any way I can get ALL the vertices connected to the (App) vertex, even if they're inbound links?
Allowing extra model properties to be loaded, dumped and also saved into DB makes a big confusion before I realized Collection._allow_extra_fields=False
is required. I've been involved into this issue several times.
When a model has extra properties(defined by @property
), you cannot prevent them from saving into DB with
Collection._allow_extra_fields=True
.
Why not disallow extra fields always (or as default)?
When running an AQL query against a graph, I get KeyError: 'vertices'
, query I'm using:
FOR v, e, p IN 1..2 ANY 'collection/_key' GRAPH 'graph' RETURN v
The error occurs in _objectify_results(results)
:
for p_dict in results:
for v_dict in p_dict['vertices']:
The p_dict
is a dictionary for the model I'm searching for. It only contains the model attributes, nothing in regards to vertices or edges.
I think the AQL function should also support returning a list of vertex/edge models instead of always relying on returning the path.
For some reason, it appears that fields aren't being validated when pushed to the database, as per the schema. Am I missing something here?
from arango_orm import Collection
from arango_orm.fields import String, Date
from arango import ArangoClient
from arango_orm import Database
from datetime import date
class Student(Collection):
__collection__ = 'students'
# _index = [{'type': 'hash', fields: ['name'], unique:False}]
_key = String(required=True, allow_none=False) # registration number
name = String(required=True, allow_none=False)
dob = Date()
client = ArangoClient(hosts='http://localhost:8529')
test_db = client.db('test', username='root', password='root')
db = Database(test_db)
db.drop_collection(Student)
db.create_collection(Student)
s = Student(name=777, dob=date(year=2016, month=9, day=12))
db.add(s)
print(s._id)
Running the following code generates the following DB object:
{
"_id": "students/<random_key_generated_by_arango>", // `required` argument is broken.
"_key": "<random_key_generated_by_arango>",
"name": "777", // `int` implicitly converted to String
"dob": "2016-09-12"
}
Is this intended?
Thanks in advance :)
Update:
Even the following fails to raise a ValidationError:
s = Student(_key=None, name=None, dob=date(year=2016, month=9, day=12))
db.add(s)
print(s._id)
And creates the following object:
{
"_id": "students/<random_key_generated_by_arango>",
"_key": "<random_key_generated_by_arango>",
"name": null,
"dob": "2016-09-12"
}
Is it possible to use transactions somehow?
see https://www.reddit.com/r/Python/comments/qa6g7/any_reason_not_to_use_self_class_with_super/
I've made a subclass of arango_orm.Database
in my project, so I encountered this kind of bug after fetch the latest version of arango-orm.
marshmallow is mainly for serialization and deserialization:
ORM/ODM is bridge between DB and App logic:
+-----------------------------+ +------------------+ +----------+
| | | | | |
| Front-end / External System +<------------------>+ Python Datetypes +<---------------->+ Database |
| | Marshmallow | | ORM | |
+-----------------------------+ (Data Exchange) +------------------+ +----------+
They have different concerns. And I feel marshmallow only shares about 60% similarities with ORM.
dump
) for exposing data to client and saving to DB. These are two different scenarios, but affected by same set field-options, which often leads unexpected bugs for me.marshmallow.Field
have no sense as ORM, or have different semantics.
marshmallow.Field.dump_only
: the doc says it like "read-only", but arango-orm just use the dump result for writing DB. marshmallow.Field.load_only
has the similar issue.Field.default
shouldn't take effect when get NULL
from DB explicitly. It should only have effect while saving into DB without the field.This perform a right update, because all value of user.address is changed
user.address = "value"
db.update(value)
But if user.addresses is a Dict value, if i remove a key and perform an update, the key will not deleted in db
user.addresses.pop("home")
db.update(user)
the same happen if i do this
user.addresses = {}
db.update(user)
i can work around like that but is not the right way
user.adresses = None
db.update(user)
user.addresses = {"home: {"street": "my street"}}
db.update(user)
Collection._load
raises an equivocal RuntimeError
while deserialization failed currently. (see)
I use it in my rest service, and want raise 400 BAD REQUEST while parsing error, so my code goes like below:
class TaskResource(BaseResource):
schema = Task._Schema()
def list(self):
ret = self.db.query(Task).all()
return self.schema.dump(ret, many=True).data
def retrieve(self, key):
ret = self.db.query(Task).by_key(key)
return self.schema.dump(ret).data
def create(self):
try:
task = Task._load(request.json)
except RuntimeError:
return ('', 400)
res = db.add(task)
return (res, 201)
However I dont think it's good idea to catching RuntimeError
.
Given an updated graph class and a database connection (instance of ArangoClient); allow graph creation, deletion (with optional collection deletion) and update.
Update may require adding new vertex collections, new edge collections and adding, deleting or replacing edge definitions.
In commit 6236eff @wonderbeyond added support for the parameter only
, but this is not reflected in the Relation._load()
function, which in turn now raises a TypeError
:
Traceback (most recent call last):
File "/.venv/api-3xq5-S2D/lib/python3.6/site-packages/bottle.py", line 862, in _handle
return route.call(**args)
File "/.venv/api-3xq5-S2D/lib/python3.6/site-packages/bottle.py", line 1740, in wrapper
rv = callback(*a, **ka)
File "/src/api/app/controllers.py", line 385, in list
links = self._service.find_all_by(_filter)
File "/src/api/app/services.py", line 222, in find_all_by
return q.all()
File "/.venv/api-3xq5-S2D/lib/python3.6/site-packages/arango_orm/query.py", line 216, in all
ret.append(self._CollectionClass._load(rec, only=only, db=self._db))
TypeError: _load() got an unexpected keyword argument 'only'
When having a data structure like this:
{
"first_name": "Valentin",
"last_name": "Grégoire",
"hobbies": [
{
"name": "Guitar",
"type": "Music"
}, {
"name": "Speedcubing",
"type": "Brain games"
}
]
To map this with arango-orm, it would look something like this:
from arango_orm import Collection
from arango_orm.fields import String, List, Nested
from marshmallow import Schema
class Hobby(Schema):
name: str = None
type: str = None
class Person(Collection):
__collection__ = "persons"
_key = String(required=True)
first_name = String(required=True)
last_name = String(required=True)
hobbies = List(Nested(Hobby, required=True))
We use the Nested
type because my data is known up front (structured data), else I would use Dict
as stated in the docs.
As you can see, we don't have access to Schema
directly from the arango-orm
library like we do have on the fields
. Might be interesting to map that.
So far, I can only map a list of objects by declaring it as a Dict()
, and provide it as a Python dictionary
, but not as an object.
hobbies = Nested(Hobby, required=True)
I'm using an event listener for my models with the pre_add
and pre_update
events. The events update the updated_at
and created_at
attributes of the model.
Now the issue is, that when running db.update(entity, only_dirty=True)
, the update won't catch the modified attribute updated_at
as dirty, because the event is dispatched AFTER the dirty attributes are checked:
data = entity._dump()
if only_dirty:
if not entity._dirty:
return entity
data = {k: v for k, v in data.items() if k == '_key' or k in entity._dirty}
dispatch(entity, 'pre_update', db=self)
setattr(entity, '_db', self)
res = collection.update(data, **kwargs)
The event should be dispatched BEFORE the if only_dirty:
so that the update would also update the modified attributes in the event listener.
Graph.expand method accepts parameter only
for only traversing some nodes instead of all. This is currently implemented using FILTER (was implemented before the availability of PRUNE). The new PRUNE keywork in AQL is more suitable for such traversal scenarios so use PRUNE for this.
Is the version marshmallow~=2.10.0
mandatory?
I'm trying to install webargs
alongside with arango-orm
but it requires marshmallow>=2.15.0
which leads to unresolvable dependencies.
Is it possible to change the marshmallow version dependency?
similar to #11, allow connecting to collections in a graph joined by edge collections. Resulting referenced object should contain fields from both the relation/edge object and the referenced collection (using _next)
Hello.
Tell me please. I have a collection of Teacher and Student. And there is a EDGE
from one teacher to many students. How do I get all students to one teacher.
Thank you.
Need to fix version of marshmallow < 3.0 due to backwards-incompatible validation or load errors management
from marshmallow's changelogs
3.0.0b7 (2018-02-03)
Features:
Backwards-incompatible: Schemas are always strict (#377). The strict parameter is removed.
Backwards-incompatible: Schema().load and Schema().dump return data instead of a (data, errors) duple (#598).
Backwards-incomaptible: Schema().load(None) raises a ValidationError (#511).```
I have code along the lines of
# testdb is an arango-orm Database instance
# MyRelation inherits from Relation
items = testdb.query(MyRelation).all()
This used to work with arango-orm 0.2.2, but does not anymore with 0.2.3.
The reason seems to be that the signature of Collection.load()
changed recently (with 60a5ee8), but the signature of Relation.load()
did not. The former now takes an additional db
parameter. Because Query#all()
calls _load
with its db
argument, it doesn't work for relations anymore.
Is that intended? If yes, how can you call .all()
for Relations?
I check the source and this is not implemented the "exclude" arg in _dump method, equivalent in marshmallow dump()
Is a common case exclude only few sensibil data
https://github.com/threatify/arango-orm/blob/master/arango_orm/collections.py#L31
@classmethod
def schema(cls):
fields_dict = {}
for attr_name in dir(cls):
attr_obj = getattr(cls, attr_name)
if not callable(attr_obj) and isinstance(attr_obj, fields.Field):
# add to schema fields
fields_dict[attr_name] = attr_obj
Note getattr(cls, attr_name)
. If meet a descriptor such as:
class lazy_property(object):
def __init__(self, fget):
self.fget = fget
def __get__(self, instance, cls):
value = self.fget(instance)
setattr(instance, self.fget.__name__, value)
return value
# -------------------------------------------------------------------
class Task(Collection):
_key = String()
# ...
@lazy_property
def joiner(self):
pass
The joiner
descriptor attribute will be invoked directly under Task
class, with self
be None
.
Descriptors are only expected to be accessed as instance attributes: https://docs.python.org/2/howto/descriptor.html
Kind of error I got from my tests
def __init__(self, db):
self._db = db
> super(Database, self).__init__(db._conn)
E TypeError: __init__() missing 1 required positional argument: 'executor'
../lib/python3.6/site-packages/arango_orm/database.py:24: TypeError
Should be good to fix version of python-arango < 4.X before any changes
Below is my imagination:
from arango_orm import Collection, event
from .models import Person, Car
def post_save_handler(sender, instance):
pass
event.listen(Person, 'post_save', post_save_handler)
# Connect one handler to some event from multiple models
event.listen([Person, Car], 'post_save', post_save_handler)
# Connect one handler to some event from all models
event.listen(Collection, 'post_save', post_save_handler)
### Or ###
@event.listens_for(Person, 'post_save')
def post_save_handler(*args, **kwargs):
pass
I have a case like this:
for c in collection
filter
c.a == "a" and
c.b == "b" and
c.c == "c" and
c.d == "d" and
(c.expires >= 5 or c.expires == null)
I am fully aware of the _or
parameter. However, this results in a query without brackets so returning me everything with all my filtering OR just everything that has null
as value for the field expires
:
for c in collection
filter
c.a == "a" and
c.b == "b" and
c.c == "c" and
c.d == "d" and
c.expires >= 5 or
c.expires == null
How can I solve this problem? I can't use the filter by because that function only allows for the ==
operator. Perhaps we should be allowed to deliver a list of operators as well?
Thanks!
At the moment there is no support for batch operations with the Database class.
It is still possible to call the begin_batch_execution
method and create a new batch database connection which is good. The problem arises when calling the add, or update functions, as there exists a check for the _key
property, and as the BatchJob is not iterable a TypeError is raised..
Could the begin_batch_execution
be implemented on the arango_orm database that returns a Database instance?
Also, could I suggest that either a wrapper for Job be implemented with a callback function inserted when calling result? or as as a quick fix just returning the job result.
Happy to implement as a PR.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.