agateblue / lifter Goto Github PK

View Code? Open in Web Editor NEW

449.0 449.0 16.0 409 KB

A generic query engine, inspired by Django ORM

License: ISC License

Makefile 1.09% Python 98.91%

lifter's People

Contributors

Stargazers

Watchers

Forkers

ogreman anderson89marques youtux evan176 pentusha duoduo369 kootenpv ambitionly shejianmin yunti techscientist naveen7437 7i11 fagan2888 they4kman

lifter's Issues

Allow sorting by multiple fields at once

As suggested on stackoverflow:

results = manager.order_by('-field', 'parent__field')

Is it possible to parse the query from a string?

Is it possible to parse the query from a string? I mean, let users write arbitrary queries to be parsed by lifter without allowing them to write arbitrary python?

Implement a caching mechanism

It would be great to implement caching on various to minimize IO operations.

Documentation example broken (AttributeError: 'module' object has no attribute 'models')

Using

import lifter

users = [
    {
        "is_active": True,
        "age": 35,
        "eye_color": "brown",
        "name": "Bernard",
        "gender": "male",
        "email": "[email protected]",
    },
    {
        "is_active": True,
        "age": 34,
        "eye_color": "brown",
        "name": "Manny",
        "gender": "male",
        "email": "[email protected]",
    },
    {
        "is_active": True,
        "age": 35,
        "eye_color": "brown",
        "name": "Fran",
        "gender": "female",
        "email": "[email protected]",
    },
    # And so on ...
]

User = lifter.models.Model('User')
manager = User.load(users)

results = manager.filter(User.age == 26, User.is_active == True)

I get:
Traceback (most recent call last):
File "C:\htpc\top_sites\testlifter.py", line 169, in
User = lifter.models.Model('User')
AttributeError: 'module' object has no attribute 'models'

Some complex lookups fail on nested iterables

Considering the following data:

users = [
    {
      "name": "Hazel Robertson",
      "gender": "female",
      "friends": [
        {
          "name": "Mays Conway",
          "tags": [
            {"name": "dolore"},
            {"name": "laborum"},
          ]
        },
      ],
    },
]

The following raise an exception:

manager = lifter.load(users)
manager.filter(friends__name=lifter.startswith('Mays'))

With the following stacktrace:

/path/lifter/lifter/query.py in filter(self, **kwargs)
     97     def filter(self, **kwargs):
     98         _filter = self._build_filter(**kwargs)
---> 99         return self._clone(filter(_filter, self._values))
    100 
    101     def exclude(self, **kwargs):

/path/lifter/lifter/query.py in _clone(self, new_values)
     39 
     40     def _clone(self, new_values):
---> 41         return self.__class__(new_values)
     42 
     43     def __repr__(self):

/path/lifter/lifter/query.py in __init__(self, values)
     19 class QuerySet(object):
     20     def __init__(self, values):
---> 21         self._values = list(values)
     22 
     23     def __iter__(self):

/path/lifter/lifter/query.py in object_filter(obj)
     57                 if hasattr(value, '__call__'):
     58                     # User passed a callable for a custom comparison
---> 59                     if not value(getter(obj)):
     60                         return False
     61                 else:

/path/lifter/lifter/lookups.py in __call__(self, value)
      2 class BaseLookup(object):
      3     def __call__(self, value):
----> 4         return self.lookup(value)
      5 
      6     def lookup(self, value):

/path/lifter/lifter/lookups.py in lookup(self, value)
     33 class startswith(OneValueLookup):
     34     def lookup(self, value):
---> 35         return value.startswith(self.reference_value)
     36 
     37 class istartswith(OneValueLookup):

AttributeError: 'IterableAttr' object has no attribute 'startswith'

@Ogreman since you worked on this, you may have an idea about this. It seems all lookups involving accessing a part of the iterable value (startswith, endswith, contains...) fail, while the ones using the whole value (such as value_in) works as expected.

repr declared twice in QuerySet

https://github.com/EliotBerriot/lifter/blob/develop/lifter/query.py#L31

Allow querying on field that do not exist on all models

Currently, if you query on a field that do not exist on all models, lifter will raise an error. Making this behaviour optional, would be an improvement. Example:

manager.filter(User.optional_field.exists(), User.optional_field == 'value')

This was first raised on reddit

Make lifter able to query any kind of data (LINQ)

At the moment, lifter is focused on querying against Python iterables.

However, there is an opportunity to make it much more than that. Some people, (e.g. on reddit) mentioned Microsoft .NET LINQ as a possible use case for lifter, and I recently stumbled accross pynq, an attempt to implement this in Python.

As you can see in the project wiki, there are indeed some similitude between our API, but LINQ (and therefore, pyqn) goals are much wider than lifter at the moment.

Basically, LINQ (Language Integrated Query) is a way to query any kind of provider, using the same API everytime. A provider can be a collection/python iterable, a REST API, a relational database or basically anything that can return results.

Of course, where not Microsoft, and we're building a package here, not a Python implementation, so we'll probably never reach this kind of clean syntax:

var results =  from u in users
               where u.age < 10 AND u.is_active = True
               select new (u.first_name, u.email);

But imagine if you could query an IMAP mailbox with lifter:

from lifter.backends.imap import ImapBackend, Email

backend = ImapBackend(host='imap.example.com', user='test', password='test')

# Get unread email from INBOX
backend.select(Email).filter(Email.unseen == True, Email.directory == 'Inbox')

Or a REST API:

from lifter.backends.http import RESTBackend, JSONModel

class MyStoreApi(RESTBackend):
    endpoint = 'https://store.example.com/api/'
    content_type = 'json'

class Product(JSONModel):
    path = 'products'

backend = MyStoreApi()
for product in backend.select(Product).all():
    print(product.title, product.description)

Or a SQL Database:

from lifter.backends.sql import SQLiteBackend, SQLModel

class User(SQLModel):
    table_name = 'users'

SQLiteBackend('/path/to/db').select(User).filter(User.is_active == False).count()

And, of course, a Python iterable:

from lifter.backends.python import PythonModel as APIResult

manager = APIResult.load(api_results)
sorted_results = manager.order_by(~APIResult.relevance)

I can imagine an infinite number of use cases for such a set of features. If I sum it up, lifter would grow from an iterable query engine, to a generic query engine.

What does it imply:

All iterable-specific code should be move to a dedicated backend
Queries should be backend independent, and will be intepreted by backend
Backend results (SQL Query, HTTP JSON response, etc.) would be translated
in plain python objects after the query is executed, that's probably a job for the Model class.

That would be pretty big change, but also an exciting one because it opens so many possibilities.

Making querysets lazy

This would lead to a serious performance improvements if querysets where lazily evaluated and all queries where combined in a single one before iterating on values.

This was first raised on reddit:

the idea is fun but the implementation looks bad. I supposed from a fast review that every time a filter asked, it is applied. It result with lots of copy and lots of iteration.

to avoid that every filters, sorting fields and limit or whatever should be store in the QuerySet and a method all() first() or aggregation method could optimize iterations and copy.

Create a benchmark suite for easy comparison of lifter versus python code performance

Setup a real documentation

The README is big enough, it's time to think about writing a real documentation (I'm for leaving a few exemple in the README, though, juste to give a taste of the package).

I've already some experience using Read The Docs, but if you have other suggestions, your input is welcome :)

Add a shortcut to `load`

As suggested on reddit, we could use a shortcut to lifter's entry point (the load method). Using L would probably do the trick.

Pandas compatibility

I'm not really familiar with this library, so I don't know exactly what it could mean to be pandas compatible, and especially against which data structure it would be really useful. Does anyone use pandas out here and would have some use cases involving lifter ?

Missing backends module in 0.4

Make nested lookups works on iterables just like django does

Considering the following structure:

users = [
    'name': 'Kurt',
    'tags': [
        {'name': 'nice'},
        {'name': 'friendly'},
    ]
]

It would be nice if nested lookups allowed us to run queries on the iterable objects, instead of the iterable itself, such as:

manager.filter(tags__name='nice')
# >>> return all users having at least a tag named 'nice'

EDIT:

Ideally, lookups on iterables would also be nestable:

companies = [
    {
        'name': 'blackbooks'
        'employees': [
            {
                'name': 'Manny',
                'tags': [
                    {'name': 'nice'},
                    {'name': 'friendly'},
                ]
            }
        ]
    },
    {
        'name': 'community'
        'employees': [
            {
                'name': 'Britta',
                'tags': [
                    {'name': 'activist'},
                ]
            }
        ]
    }
]

lifter.load(companies)
assert lifter.filter(employees__tags__name='friendly') == companies[0]

Create a soft-failing mode when querying fields that do not exist

As reported on reddit recently, the current behaviour of crashing when querying a field that does not exists on all model is not always handy.

The soft mode could be enabled by default, catching any MissingAttribute error and continue, and disabled with something such as queryset.permissive(False).

Add a simple django contrib app to allow locally filtering querysets

Assuming the app is configured correctly:

INSTALLED_APPS ) = [
    'lifter.contrib.django',
]

One should be able to use lifter's python backend to manipulate queryset locally, without calling the database:

# django qs
qs = User.objects.filter(date_joined__year=2016)

# local sorting using lifter
qs.locally().order_by('-date_joined')

Optimize query matching when chaining filters

The current pattern when chaining filters do not merge queries together:

manager.filter(query1).exclude(query2)

When evaluating the queryset, it will loop over available values, return the one that match query1, then loop other these remaining values and exclude the one that match query2.

It would be more efficient to loop only once on all values, and return only values that match all queries.

JSON list as Python generator?

I am collecting information about the possibility to use a generator instead of loading the full JSON in memory as manager get called:

Possible algorithm:

manager loads the JSON and create two generators: one to be kept as a blueprint, the other to be consumed at every filtering operation
filtering or any other action consumes the generator,
function return the resulting filtered output
a new generator is copied from the blueprint to serve the next operation
optional: create a index (a dictionary JSON value > JSON position in the array) for subsequent functions' calls, or instead create a copy of the generator to keep in memory (to avoid the generator to be built again at each filtering calls, see caveat below).

I couldn't find any memory/CPU-attentive method in the Standard Library to accomplish the cloning or the deep copy of a generator in memory, the only one is tee() but it seems to have downsides for our usecase:

This answer here underlines the fact that creating the generator twice is CPU-intensive while dumping a copy into a list() can be better if you think of consuming the generator until the end
Consider these three cases
See the snippet here for creating an iterable class

Does it sounds like a good idea?

Any plans to support querystrings / Allow queries to be serialized from other formats

I was thinking about using with a webservice, it would be nice to do ?key=title, product, qty&orderby=title and simple pass this lifter ^^

Use of getattr in QuerySet

From a Python3 perspective, is it not wrong to use __getattr__ here instead of __getattribute__? Is there any design decision involved?

Allow flat results on aggregates

Right now, aggregate always return a dictionary:

manager.filter(gender='female').aggregate(lifter.Avg('age'), lifter.Min('age'))
# >>> {'age__sum': 289, 'age__min': 15}

As suggested on reddit, it
would be nice to return a flat list:

manager.filter(gender='female').aggregate(lifter.Avg('age'), lifter.Min('age'), flat=True)
# >>> [289, 15]

This would enable unpacking on aggregates:

avg_age, min_age = manager.filter(gender='female')\
                          .aggregate(lifter.Avg('age'), lifter.Min('age'), flat=True)

Store lookups (gte, gt, eq, etc.) in a dedicated registry

Since we want operators to be customizable / extandable, a registry seems the way to go

How to query value in list?

I think reading the code that lifter supports value_in lists query. I cannot find any example or tip on how to implement this in my filter.
Also startswith and other lookups would be nice to have them documented.

My data is a simple list of dictionaries, and list is a list.

Doesn't work:
sent_messages = objects.filter(folder__in = sent_folders)

Doesn't work:
sent_messages = objects.filter(Message.folder in sent_folders)

Combining Nodes section is not helpful either.

qn = (Message.date > this_year) & (Message.folder == 'INBOX.Sent') | (Message.folder == 'Sent') | (Message.folder == 'Enviados') | (Message.folder == 'INBOX.Enviados')
sent_query = lifter.query.Query(action='select', filters=qn)
sent_messages = objects.filter(sent_query)

Please help! :)

Less complexity for the Store - > Refined Store -> Manager stack ?

Currently, one should implement at least two classes to get a working backend:

The Store class, that holds general data about a backend (think database in SQL world)
The RefinedStore class that deals with querying about a specific model / dataset (think table in the SQL world)

However, this is far from ideal, and I'm not really fond of this RefinedStore class. It feels like it could be removed from the whole thing, and everything could be done at the store level.

TinyDB query api / Should we change the query API ?

Hi, first of all, please exuce my poor English.

I was looking at TinyDB query api and I think it looks pretty nice. So i trying to reproduce their api in terms of lifter. WIP you can find here. It is Python 3.5 only yet but it is not so hard to port to older versions. It is also lazy(almost). I understand it is very different from lifter queries and not perfect in some cases(brackets in complex queries, q/p/a objects) but it will be great to know your thoughts about this. Or maybe you can find some inspration in this.

In my mind it looks something like this:

usage

import lifter
from lifter.tiny import TinyQuerySet, q, p, a, Order

from tests.fake_data import fake


manager = lifter.load(fake, queryset_class=TinyQuerySet)

Please note that q/p/a are important entities that using in different methods

q = Query()  # filter, exclude, get
p = Path()  # order_by, values, values_list
a = Aggregation()  # aggregate

filter

# Query object should bee used
manager.filter(q.age < 30)
manager.filter((q.name == 'Manny') & (q.has_beard == True))
manager.filter((q.name == 'Manny') | (q.name == 'Fran'))
manager.filter(q.name == 'Manny').filter(q.has_beard == True)

get

# will return None if value does not exists
# or fist object if multiple objects returned
manager.get((q.name == 'Fran') & (q.gender == 'female'))
manager.filter(q.has_beard == False).get(q.gender == 'male')

exclude

manager.exclude((q.gender == 'male') & (q.age > 21))

order_by

manager.order_by(p.age, Order.DECS)

count

manager.filter(q.gender == 'female').count()
manager.filter(q.gender == 'male').count()

exists

# I have some doubts about this method

first

manager.filter(q.company.name != 'blackbooks').first()

last

manager.filter((q.age > 30) & (q.age < 50)).last()

values

# will return a list of dictionaries as follow:
# [
#     {'name': 'Bernard', 'email': '[email protected]', {'company.name': 'house of congress'},
#     {'name': 'Manny', 'email': '[email protected]', {'company.name': 'house of congress'},
# ]
manager.values(p.name, p.email, p.company.name)

values_list

# will return a list of tuples as follow:
# [
#     ('Bernard', '[email protected]')
#     ('Manny', '[email protected]')
# ]
manager.values_list(p.name, p.company.name)

# will return a list as follow:
# ['Bernard', 'Manny']
manager.all().values_list(p.name, flat=True)

distinct

# will return ['blue', 'brown', 'green', 'purple']
manager.order_by(p.eye_color).values_list(p.eye_color, flat=True).distinct()

spanning lookups

# will filter users with a company whose name is "blackbooks"
manager.filter(q.company.name == 'blackbooks')

# return a list of all companies names, without duplicates
manager.values_list(p.company.name, flat=True).distinct()

complex lookups

# return all users older than 37
manager.filter(q.age > 37)

# exclude all users under 43
manager.exclude(q.age < 43)

# return all users between 21 and 27 years old
manager.exclude(q.age.test(lambda age: 21 <= age <= 27))

# return users with brown or green eyes
manager.filter(q.eye_color.test(lambda c: c in ['brown', 'green']))

# leave only users whose age is odd
manager.exclude(q.age.test(lambda v: v % 2 == 0))

aggregations

# return the total number of children of all users combined, like this:
# {'number_of_children': 215}
manager.aggregate(a.number_of_children(sum))

# {'avg_age': 44.26229508196721, 'children': 215}
from statistics import mean
manager.aggregate(avg_age=a.age(mean), children=a.number_of_children(sum))

# [215]
manager.aggregate(children=a.number_of_children(sum), flat=True)

So what do you think?