stevearc / flywheel Goto Github PK

View Code? Open in Web Editor NEW

128.0 128.0 25.0 455 KB

Object mapper for Amazon's DynamoDB

License: MIT License

Python 99.55% Shell 0.45%

aws dynamodb orm python

flywheel's Introduction

Hello

On Github, I am primarily known for authoring and maintaining several Neovim plugins.

These days, between work and family I have very little free time available. I can no longer get to every reported issue, so I have implemented a priority queue:

Pull requests - these are the highest priority for me and will be looked at first.
After PRs, I will triage new issues with P0, P1, and P2 labels.
P0 issues will be worked on as soon as possible.
P1 issues may get worked on when I get spare time and feel like it, which is uncommon these days.
P2 issues will not be worked on. Pull requests are the only way to move these forward.

flywheel's People

Contributors

Stargazers

Watchers

flywheel's Issues

can save return the items new values?

By default, I believe DynamoDB returns the updates item for free.
Is it possible to get the save method on a Model to return the saved items?

Maybe it already does, I haven't dug through the code too much yet.

Query not returning result when using first()

Version: flywheel==0.4.4

I have encountered with an empty result issue when using "first" function on flywheel query object, but got result when using the "all" function (with the same query)

DB schema:


class User(Modal):
__metadata__ = {
        '_name': 'test_table',
        'global_indexes': [
            GlobalIndex('name-age-index', 'name', 'age_min').throughput(5, 3),
        ]
    }
name = Field(hash_key=True)
age_min = Field(data_type=int)      # OPTIONAL
age_max = Field(data_type=int)       # OPTIONAL

Query:

1. engine.query(User).filter(User.name == 'Erik', User.age_min <= age, User.age_max >= 18).first()
result: None

2.  engine.query(User).filter(User.name == 'Erik', User.age_min <= age, User.age_max >= 18).all()
result: [<object>]

Transparent table sharding of time series data

It would be awesome to have some way to transparently shard Dynamo tables when storing timeseries data

New data types available in dynamo3

Hi,

I was wondering if the "new" data types of dynamodb were usable in flywheel.

I see them implemented in dynamo3 : stevearc/dynamo3@0b4b21b

But never mentioned in flywheel documentation nor imported in flywheel.fields module.

Thanks,
Pior

add exists static method to models

I would find it useful to have a static method for models like in Active Record. It would use a primary key value as input and return True or False if the entry is there or not.

For a User model it would be called like this:

User.exists("some-user-id") 
True

No results returned when querying a date field

Given a record with the following shape:

{'allocated_datetime': 1506648875.071361,
  ... snip ...
 'status': 'ALLOCATED'}

the following works with aws cli:

$ aws dynamodb scan --filter-expression='allocated_datetime BETWEEN :ago AND :now AND #status = :status' --expression-attribute-values='{":ago":{"N":"1506646742.820100"},":now":{"N":"1506646742.820110"},":status":{"S":"ALLOCATED"}}' --expression-attribute-names='{"#status":"status"}' --table-name='my-table' --select=COUNT --endpoint=http://localhost:8000
{
    "Count": 1,
    "ScannedCount": 92086,
    "ConsumedCapacity": null
}

but this does not work in a python script:

results = client.scan(
            TableName='my-table',
            Select='COUNT',
            ConsistentRead=True,
            FilterExpression='allocated_datetime BETWEEN :ago AND :now AND #status = :status',
            ExpressionAttributeNames={
                '#status': 'status'
            },
            ExpressionAttributeValues={
                ':ago': {
                    'N': '1506646742.820100'
                },
                ':now': {
                    'N': '1506646742.820110'
                },
                ':status': {
                    'S': 'ALLOCATED'
                }
            }
        )

pprint(results)

The results are:

{'Count': 0,
 'LastEvaluatedKey': {'hash_key': {'S': '8926a53fe484dd413ede6b45fb82159f76db60c49a1280cd96ccfe52'},
                      'range_key': {'S': '000080000111'}},
 'ResponseMetadata': {'HTTPHeaders': {'content-length': '161',
                                      'content-type': 'application/x-amz-json-1.0',
                                      'server': 'Jetty(8.1.12.v20130726)',
                                      'x-amz-crc32': '2370556166',
                                      'x-amzn-requestid': '09cb3dea-bd2f-4df1-a006-0828674a5f1e'},
                      'HTTPStatusCode': 200,
                      'RequestId': '09cb3dea-bd2f-4df1-a006-0828674a5f1e',
                      'RetryAttempts': 0},
 'ScannedCount': 1394}

Not sure why the cli scan would return the expected results, but the scan through flywheel does not.

Flywheel raises SerializationException when using int Fields

When trying to use an integer field in a model a serialization exception is raised when saving (or retrieving) an item. If I do not use data_type=int, everything else works. I am using python 2.7, flywheel-0.2.0rc2, dynamo3-0.1.1, and botocore-0.35.0

Python 2.7.3 (default, Feb 27 2014, 19:58:35)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.

from datetime import datetime
from flywheel import Model, Field, Engine

class Tweet(Model):
... userid = Field(hash_key=True)
... id = Field(range_key=True)
... ts = Field(data_type=datetime, index='ts-index')
... text = Field()
... val = Field(data_type=int)
... def init(self, userid, id, ts, text, val):
... self.userid = userid
... self.id = id
... self.ts = ts
... self.text = text
... self.val = val
...

engine = Engine()
engine.connect_to_region('us-east-1', access_key='mykey', secret_key='hunter2' )
engine.register(Tweet)
engine.create_schema()
['Tweet']
tweet = Tweet('myuser', '1234', datetime.utcnow(), '@AWScloud hey', 2 )
engine.save(tweet)
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python2.7/dist-packages/flywheel/engine.py", line 455, in save
item.post_save_()
File "/usr/local/lib/python2.7/dist-packages/dynamo3/batch.py", line 142, in exit
self.flush()
File "/usr/local/lib/python2.7/dist-packages/dynamo3/batch.py", line 192, in flush
resp = self._batch_write_item(items)
File "/usr/local/lib/python2.7/dist-packages/dynamo3/batch.py", line 226, in _batch_write_item
return self.connection.call('BatchWriteItem', request_items=data)
File "/usr/local/lib/python2.7/dist-packages/dynamo3/connection.py", line 200, in call
raise_if_error(kwargs, response, data)
File "/usr/local/lib/python2.7/dist-packages/dynamo3/exception.py", line 46, in raise_if_error
**error)
dynamo3.exception.DynamoDBError: SerializationException:
Args: {'request_items': {'Tweet': [{'PutRequest': {'Item': {'id': {'S': u'1234'},
'text': {'S': u'@AWScloud hey'},
'ts': {'N': u'1395120835.9691159725189208984375'},
'userid': {'S': u'myuser'},
'val': {'N': 2}}}}]}}

filter_expression with 2 possible values

How is it possible that a query use beside the primary key filter
an additional filter on an attribute with 2 possible values.

With the boto3 client it is possible duo this syntax
filter_expression=Attr('status').ne('terminated') & Attr('status').ne('deleting')

Kind regards :-)

Field.ddb_dump assumes None -> None translation for all data_types

There probably aren't many types that would care, but I came across this code while looking into attribute_names.

Any reason why None is always returning None, instead of delegating it to the data_type.ddb_dump call? I'd expect most custom types to check None anyway, and it doesn't leave much room for Types that care.

Unless there's a strong case for not supporting custom load/dump for None, this condition should probably be removed.

import uuid
import dynamo3
from flywheel import Field, Model, Engine
from flywheel.fields.types import TypeDefinition, STRING

class NonNullType(TypeDefinition):
    data_type = int
    ddb_data_type = STRING

    def coerce(self, value, force):
        return value

    def ddb_dump(self, value):
        if value is None:
            return ''
        return str(value)

    def ddb_load(self, value):
        if value == '':
            return None
        return int(value)

class MyModel(Model):
    __metadata__ = {
        '_name': str(uuid.uuid4())
    }
    pkey = Field(data_type=NonNullType(), hash_key=True)

dynamo = dynamo3.DynamoDBConnection.connect(...)
engine = Engine(dynamo=dynamo)
engine.register(MyModel)
engine.create_schema()

pk = None

from_typedef = NonNullType().ddb_dump(pk)
print(type(from_typedef))

m = MyModel(pkey=pk)
from_model = m.ddb_dump_field_('pkey')
print(type(from_model))

engine.sync(m)

And the output:

Traceback (most recent call last):
  File "crossj.py", line 77, in <module>
    engine.sync(m)
  File "/home/crossj/ws/flywheel/flywheel/engine.py", line 615, in sync
    self.save(item, overwrite=False)
  File "/home/crossj/ws/flywheel/flywheel/engine.py", line 465, in save
    **expected)
  File "build/bdist.linux-x86_64/egg/dynamo3/connection.py", line 382, in put_item
  File "build/bdist.linux-x86_64/egg/dynamo3/connection.py", line 207, in call
  File "build/bdist.linux-x86_64/egg/dynamo3/exception.py", line 46, in raise_if_error
dynamo3.exception.DynamoDBError: ValidationException: One or more parameter values were \
  invalid: Missing the key pkey in the item
Args: {'expected': {'pkey': {'ComparisonOperator': 'NULL'}},
 'item': {},
 'return_consumed_capacity': 'NONE',
 'return_values': 'NONE',
 'table_name': '91f18196-2d5c-41c3-ad1e-15d8f007e6a8'}

Documentation references support for undeclared / overflow fields

Unless I'm mistaken, undeclared / overflow field support was removed in version 5.0.0,
5cf757e

but support for undeclared fields is still mentioned in the documentation here:
https://github.com/stevearc/flywheel/blame/master/doc/topics/models/basics.rst#L18

I'm new to your library, so I hope I'm not missing something. If so, please disregard.

Minor: developer usability, `Query.one()` and `ValueError`

The docstring for Query.one() says the following:

        Return the result of the query. If there is not exactly one result,
        raise a ValueError

This is technically true - it actually raises one of two subclasses of ValueError, DuplicateEntityException or EntityNotFoundException. The trouble is, the docstring hints that the caller should catch ValueError directly - and the implementation of .one() calls .all(), which can wind up calling Condition.query_kwargs(). This method can end up raising a ValueError directly, if the query has bad arguments. This then accidentally gets caught at the top level by the caller of Query.one() - which was catching ValueError to detect a the query not returning any results.

Do you think it's worth clarifying the docstring (and any other docs) to explicitly call out the DuplicateEntityException and EntityNotFoundException classes, and encourage those to be caught directly, rather than catching ValueError?

engine.create_schema() fails to create tables

When there are more than 100 table in user's account and a table for a model already exists engine.create_schema() will fail with:
ResourceInUseException: Table already exists: table-name

This happens because in engine.py line 204 the call to:

        tablenames = set(self.dynamo.list_tables())

will return 100 first tables and in model_meta.py line 440:

        if tablename in tablenames:
            return None

will miss the fact that the table already exists.

Proposed solution:
For each table to create deliberately validate that it doesn't exist. Like (proto-code):

        for model in self._ddb_engine.models.itervalues():
            if not self._ddb_engine.dynamo.describe_table(model.meta_.ddb_tablename()):
                model.meta_.create_dynamo_schema(self._ddb_engine.dynamo, tablenames=[], wait=True)

dynamo3 dependency released breaking change

I realize this is an unsupported library, but for anyone still using it, it appears the dynamo3 package just released a new version with some breaking changes. If you are missing a typing_extensions module suddenly, you may want to specify dynamo3==0.4.10 in your requirements.txt as a workaround.

What is the best way to update just one model attribute

Sync seems to update all changed attributes of a model.

Is there a way to update just one attribute (for instance for status updates)?

Similar to ActiveRecord's update_column.

engine.create_schema() doesn't add newly added global indexes for existing tables.

It's possible this might be better accomplished with a new engine.update_schema() function.

dynamo3 already has the describe() and update_table() functions required. It's just a matter to doing a diff between what global indexes should be there and which ones shouldn't.

Add Python 3 support

I'll have to rip out boto and replace it with botocore, but I didn't have anything better to do with my time anyway.

Easy way to make fields non-nullable

Since it's a common use case, it should be easy.

Bugx .one() raises "Expected one result!" but all() returns just one entry

Hello,
i have observed that our code raises a "Except one result!" exception.

engine.query(instance).filter(fqdn='some', deleted_at=None).one()

The problem is that when i call this code

engine.query(instance).filter(fqdn='some', deleted_at=None).all()

It returns an array with just one result (as excepted).

Some observations:
The limit(2) condition prevent to get results.
This request is on a global secondary index. There are more than one fqdn with 'some' but only one that where delete_at is set to None

Need for case insensitive comparisons..

In my filter() I'd like to be able to compare a field to be case insensitive. Right now

foo.key=='My Test'

will look for an exact match. There are times I'd like to do something like:

foo.key=='case Insensitive TEST' and have it match whatever is in the key. Possible?

Add attribute_name to Field

This is an absolutely fantastic library, by the way.

Since attribute names count against the 64kb limit for items and against provisioned iops (docs), it would be great to specify a short attribute name, and still refer to the field by a longer name.

For example:

class MyModel(Model):
    my_long_id = Field(hash_key=True, data_type=BINARY, attribute_name='id')
    unreasonably_long_field_name_for_data = Field(attribute_name='data')

I imagine that all external references would still use the model's meta field names - my_long_id - which would require an extra field_name -> attribute_name translation when constructing queries. Fallback when attribute_name isn't specified would use the existing behavior.

If you're interested, I can get to work on a pr.

This came about because I tried to change the name of a field before mapping it to an object as a workaround, but it didn't persist. That seems odd - maybe it's worth checking if field name is None here before overwriting it?

A simple test case:

f = Field(hash_key=True)
f.name = 'custom name'
print(f.name)  # prints 'custom name'
class MyModel(Model):
    id = f
print(MyModel.id.name)  # prints 'id'
print(f.name)  # prints 'id'

Feature Suggestions: Option to set to default value when None and option to treat "" as None.

When a field is set to None due to an explicit None value, there could be the option to set it to its default value before validation/

I've written a field validator loop (#29) and I've just recently added this check to it. Ideally you would want to check if nullable is false first but it looks like nullable isn't an attribute (it's just part of the init to append the not null check.)

It might be possible to do what I propose in a check method, but python isn't my strongest language.

  def validate(self):
    self._fields_with_errors = []
    for field in self.meta_.fields.values():
      try:
        #check if the field's value is None and if it has a default, if it's none, set it to it's default.
        if field.resolve(self) is None and field.default is not None:
          setattr(self, field.name, field.default)

        field.validate(self)
      except ValueError as e:
        res = re.search('Validation check on field (.*) failed for value (.*)', str(e))
        self._fields_with_errors.append(res.group(1))
    return len(self._fields_with_errors) == 0

My other suggestion is to treat "" as None for the nullable check for str field types. Or to at least have that option. This one's not a big deal and can easily be overcome with a check.

lambda x: x != ""

Unable to use to boolean

I'm trying to store a True/False in DynamoDB. I'm getting this error:

TypeError: Field 'auto_retry_job' must be <type 'unicode'>! 'True'

I have only configured the auto_retry_job as a Field() without any special type. Should this work?

Problem with Composite ...

Here is my schema...

__metadata__ = {
   'global_indexes': [
        GlobalIndex.all( 'gcc-index','_country_code' ),
        GlobalIndex.all( 'gat-index','_app_type' )
    ]
}

_country_code = Field( data_type=STRING )
_app_type = Field( data_type=STRING )
_app_id = Field( data_type=STRING )
_app_version = Field( data_type=STRING )
_full_appid = Composite( '_country_code', '_app_type', '_app_id', '_app_version', data_type=STRING,  hash_key=True )
_analyzer_name = Field( data_type=STRING, range_key=True )
_data_type = Field( data_type=STRING, index='ldt-index' )
_creation_datetime = Field( data_type=datetime, index='lc-index' )

def __init__( self, country_code=None, app_type=None, app_id=None, app_version=None, analyzer_name=None, data_type=None, creation_datetime=None ):
    self._country_code = country_code
    self._app_type = app_type.lower() if app_type else None
    self._app_id = app_id
    self._app_version = app_version
    self._analyzer_name = analyzer_name
    self._data_type = data_type
    self._creation_datetime = creation_datetime or datetime.utcnow()

When I set initialize everything and do a 'eng.save()' I get the following error returned:

boto.dynamodb.exceptions.DynamoDBNumberError: BotoClientError: TypeError numeric for <flywheel.fields.Composite object at 0x9eece2c>
Cannot convert <flywheel.fields.Composite object at 0x9eece2c> to Decimal

This must be wrong.

Query and return multiple records based on the hash key

I seem to be having an issue in selection multiple conversation ids based on the model below

class Transaction(Model):
    conversation_id = Field(hash_key=True)
    msisdn = Field(range_key=True)
    timestamp = Field(type=datetime, index='timestamp-index')
    amount = Field(type=NUMBER)
    posted_timestamp = Field(type=datetime)
    transaction_time = Field(type=datetime)

So I constructed the following query

tx_ids = ['b501d92c-27e0-11e9-a206-06299faea7c2', '80a2107e-2147-11e9-a206-06299faea7c2']
txs = engine.query(
    Transaction
).filter(
    Transaction.conversation_id.in_(tx_ids)
).all()

Which errors out with

/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/flywheel/fields/conditions.py in query_kwargs(self, model)
     66 
     67         if ordering is None:
---> 68             raise ValueError("Bad query arguments. You must provide a hash "
     69                              "key and may optionally constrain on exactly one "
     70                              "range key")

ValueError: Bad query arguments. You must provide a hash key and may optionally constrain on exactly one range key

Which is odd. I thought I could provide multiple hash keys . A single one works as expected

tx_ids = ['b501d92c-27e0-11e9-a206-06299faea7c2', '80a2107e-2147-11e9-a206-06299faea7c2']
txs = engine.query(
    Transaction
).filter(
    Transaction == tx_ids[0]
).all()

An or_filter with multiple filters also doesn't work

tx_ids = ['b501d92c-27e0-11e9-a206-06299faea7c2', '80a2107e-2147-11e9-a206-06299faea7c2']
txs = engine.query(Transaction).filter(
    Transaction.conversation_id == tx_ids[0]
).filter(
    Transaction.conversation_id == tx_ids[1]
).all(filter_or=True)

print(len(txs))  # Returns 1 instead of 2

Model.save() missing?

In the documentation is explained that every model has a .save method.

But in the source code this method is missing.
https://github.com/mathcamp/flywheel/blob/0.4.8/flywheel/models.py#L138-L158

is this a bug?

Query all giving an error

Here is the code I execute:

def c_composite( analyzer_name, data_type, data_idx, creation_time ):
"""
Custom composite handler for range key of AppDataModel
"""
return( '%s:%s:%s:%s' % (analyzer_name,data_type,str(data_idx),str(creation_time)) )

class AppDataModel(Model):
metadata = {
'global_indexes': [
GlobalIndex.all( 'gcc-index','country_code' ),
GlobalIndex.all( 'gat-index','app_type' )
]
}

country_code = Field( data_type=STRING )
app_type = Field( data_type=STRING )
app_id = Field( data_type=STRING )
app_version = Field( data_type=STRING )
composite_appid = Composite( 'country_code', 'app_type', 'app_id', 'app_version', hash_key=True )
analyzer_name = Field( data_type=STRING, index='lan-index' )
data_type = Field( data_type=STRING, index='ldt-index' )
data_index = Field( data_type=NUMBER )
creation_datetime = Field( data_type=datetime, index='lc-index' )
r_key = Composite( 'analyzer_name', 'data_type', 'data_index', 'creation_datetime', merge=c_composite, range_key=True)

def init(....)

Later on I do this:

dyn = DynamoDBConnection( **args )
eng = Engine( dynamo=dyn, namespace=['appdata'] )

test = AppDataModel( country_code='united states',app_type='android',app_id='123', app_version='1.0')
test.eng = eng

i = test.eng.query(AppDataModel).all()
for r in i:
    print r

I get this traceback:

Traceback (most recent call last):
...
File .../flywheel/query.py line 79 in all

File ../flywheel/query.py line 48 in gen

File ../flywheel/fields/conditions.py line 54

ValueError: Bad query arguments. You must provide a hash key and may optionally constrain on exactly one range key

In fact when I check composite_appid it is set to u'USA:android:123:1.0' which is the composited hash key. Am I missing something?

JSON decode issue when querying database

First of all, thanks for creating flywheel. Very helpful.

I do have the following issue:

Version is 0.44 running on python 3.43 under Windows 7
The db structure is:

class scrobble(Model):
    artist = Field(hash_key=True)
    ts = Field(data_type=datetime, range_key=True)

Works fine when database is queried through boto or when exploring through the DynamoDB Web interface. I believe the issue is that for some reason the timestamps have a dozen or more digits to the right of the decimal point when I view them in the DynamoDB.

Whether that's it or not, all queries are throwing an Exception as follows:

>>> z = engine(scrobble).filter(artist="Lucinda Williams")
>>> z.first()
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
  File "C:\Python34\lib\site-packages\flywheel\query.py", line 155, in first
attributes=attributes, filter_or=filter_or):
  File "C:\Python34\lib\site-packages\flywheel\query.py", line 80, in gen
yield self.model.ddb_load_(self.engine, result)
  File "C:\Python34\lib\site-packages\flywheel\models.py", line 458, in ddb_load_
obj.set_ddb_val_(key, val)
  File "C:\Python34\lib\site-packages\flywheel\models.py", line 450, in set_ddb_val_
setattr(self, key, Field.ddb_load_overflow(val))
  File "C:\Python34\lib\site-packages\flywheel\fields\__init__.py", line 265, in ddb_load_overflow
return json.loads(val)
  File "C:\Python34\lib\json\__init__.py", line 318, in loads
return _default_decoder.decode(s)
  File "C:\Python34\lib\json\decoder.py", line 343, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Python34\lib\json\decoder.py", line 361, in raw_decode
    raise ValueError(errmsg("Expecting value", s, err.value)) from None
ValueError: Expecting value: line 1 column 1 (char 0)

Does flywheel automatically deals with the 1MB limit per API call in the queries?

According to this page, the queries on dynamoDB are limited to 1MB per result set and any aditional results can be retrieved by doing extra API calls using the LastEvaluatedKey value that is returned in the previous call.

Does flywheel deals with this "pagination" automatically?

Thanks.

Avoid superfluous calls to describe_table

boto Table calls describe_table secretly in several methods. We should make sure we never hit those calls. They cause unnecessary slowdowns.

Exception using scan with undefined `name` field

There seems to be a bug where flywheel is trying to decode any string value for not defined name fields.

Conditions seem to be:

name field present in dynamodb table while no corresponding field defined in model
custom table name defined using __metadata__ and _name
objects are obtained from dynamodb using scan and not get (did not check query yet)

Sample Code

from flywheel import Model, Field, Engine
from flywheel.fields.types import StringType
import uuid

class Item(Model):
    __metadata__ = {
        '_name': 'items'
    }
    uuid = Field(data_type=StringType(), hash_key=True, nullable=False)
    # name = Field(data_type=StringType())  # only not having this field defined will let the script crash

# setup
engine = Engine()
engine.connect_to_region('eu-west-1')
engine.register(Item)
engine.create_schema()

# create an item
unique_id = str(uuid.uuid1())
i = Item()
i.uuid = unique_id
i.name = 'a name' # using name (which is not a field) while _name is defined in __metadata__
engine.save(i)

# scan for
engine.get(Item, [unique_id]) # no crash
engine.scan(Item).filter().all() # crash happens here

Stack Trace

Traceback (most recent call last):
  File "addition_field.py", line 27, in <module>
    engine.scan(Item).filter().all() # crash happens here
  File "/usr/local/lib/python3.5/site-packages/flywheel/query.py", line 115, in all
    exclusive_start_key=exclusive_start_key))
  File "/usr/local/lib/python3.5/site-packages/flywheel/query.py", line 301, in gen
    yield self.model.ddb_load_(self.engine, result)
  File "/usr/local/lib/python3.5/site-packages/flywheel/models.py", line 493, in ddb_load_
    obj.set_ddb_val_(key, val)
  File "/usr/local/lib/python3.5/site-packages/flywheel/models.py", line 485, in set_ddb_val_
    setattr(self, key, Field.ddb_load_overflow(val))
  File "/usr/local/lib/python3.5/site-packages/flywheel/fields/__init__.py", line 265, in ddb_load_overflow
    return json.loads(val)
  File "/usr/local/Cellar/python3/3.5.2/Frameworks/Python.framework/Versions/3.5/lib/python3.5/json/__init__.py", line 319, in loads
    return _default_decoder.decode(s)
  File "/usr/local/Cellar/python3/3.5.2/Frameworks/Python.framework/Versions/3.5/lib/python3.5/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/local/Cellar/python3/3.5.2/Frameworks/Python.framework/Versions/3.5/lib/python3.5/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

edit: I noticed something more. Actually running the script above works fine. But when you have a closer look at the content of the table / raw data returned by dynamo3for the scan you will notice the additional ": {'name': '"a name"', 'uuid': '5ebc745a-4def-11e6-864d-20c9d07f4883'}

Manually removing them (e.g. {'name': 'a name', 'uuid': '6b87e066-4def-11e6-a7b2-20c9d07f4883'}) will trigger the exception/bug described above. I didn't dig deeper why and where these " get inserted. But there is definitely a problem relying on them when the data in the table also gets touched by other software.

Engine-wide table name prefix

Observation/Situation

When working with a multi-tenant AWS account, it's a reasonable practice to prefix table names with some sort of token to indicate what tenant/application a table belongs to. This same design pattern is also relevant for having tables for different environments, such as dev/qa/prod.

Hacky Failed Attempt

I attempted to handle this with a sort of configuration parameter for a table prefix, which I could them update the class models' __metadata__ attributes with, but it did not work:

APP_NAME = 'abc'
APP_ENV = 'dev'
TABLE_NAME_PREFIX = '-'.join([
    APP_NAME,
    APP_ENV,
])

# Register our models with the engine
for this_model in [
    OAuthUser,
    User,
]:
    # Insert the table name prefix
    if '_name' in this_model.__metadata__:
        print("Updating {}.__metadata__".format(this_model.__name__))
        print("  (old)_name='{}'".format(this_model.__metadata__['_name']))

        this_model.__metadata__.update({
            '_name': TABLE_NAME_PREFIX + this_model.__metadata__['_name'],
        })

        print("  (new)_name='{}'".format(this_model.__metadata__['_name']))

    engine.register(this_model)

engine.create_schema()

>>> Updating OAuthUser.__metadata__
>>>   (old)_name='oauth_user'
>>>   (new)_name='abc-dev_oauth_user'
>>> Updating User.__metadata__
>>>   (old)_name='user'
>>>   (new)_name='abc-dev_user'

This resulted in the creation of tables that were still only named oauth_user and user.

Potential Resolutions

Add ability to the engine to set an engine-wide table prefix string

Add ability to the engine to set an engine-wide table name function

def table_namer(original_name):
    return "my-prefix_" + original_name

engine.set_name_function(table_namer)

This implementation may also resolve @stevearc's enhancement request in
#4 as the table_namer function could independently handle the naming of

the table with a timestamp:

import datetime

def table_namer(original_name):
    if original_name == 'timeseries_table':
        now = datetime.datetime.now()
        return ''.join([
            original_name,
            str(now.year),
            str(now.month),
        ])
    else:
        return original_name

Causes conflict with properties

When using new style class properties, inheriting Model causes a conflict.

I assume this is because all non "_" prefixed class attributes are expected to be Fields.

For example:

from flywheel import Model, Field
class Breakfast(Model):
  meal_id = Field(hash_key=True, data_type=int)
  @property
  def eggs(self):
    return self._eggs

  @eggs.setter
  def eggs(self, value):
    self._eggs = value

  @eggs.deleter
  def eggs(self):
    del self._eggs

  def __init__(self):
    self._eggs = None
    if not self.eggs:
      self.eggs = 'scrambled'


if __name__ == '__main__':
  t = Breakfast()
  print(t.eggs)

This prints None

class Breakfast():
  @property
  def eggs(self):
    return self._eggs

  @eggs.setter
  def eggs(self, value):
    self._eggs = value

  @eggs.deleter
  def eggs(self):
    del self._eggs

  def __init__(self):
    self._eggs = None
    if not self.eggs:
      self.eggs = 'scrambled'


if __name__ == '__main__':
  t = Breakfast()
  print(t.eggs)

this properly prints "scrambled".

I think I can overcome what I intended to do with properties by using custom data types. I just wanted to bring this up.

Model.setattr unconditionally marks Fields as dirty

The flywheel/models.py code for Model calls self.mark_dirty_() for every field that the user assigns to. However, it's possible the values before/after are identical, wasting a round trip to DynamoDB and write capacity.

Would you consider checking the value and if the assignment would have no effect, skipping it? I'm ok if the check is only shallow and doesn't cover nested structures (dict within dict etc.), although that might be nice too. Thanks!

version 0.5 breaks ddb_dump_()

I've observed a breaking bug in version 0.5 of flywheel. Took me awhile to figure out the issue.

........
    else:
                print('Setting {}'.format(arg))
                setattr(user, arg, args[arg])

        print(user.secret_question)
        print(user.ddb_dump_())
        user.update()

The result, I would see the secret question I just set on my user object, but in the result of user.ddb_dump_() all of my newly attrs were gone (and it appeared some values for attrs defined in the model statically were missing). Thus, none of my newly added attributes to my user object were saving and my project and tests were breaking when trying to update a model.

There is something wrong with the user.ddb_dump_() method.. I believe that method is called for every model object to save to the database?

For updating I use the engine.save(self, overwrite=True)

I had to downgrade to 0.4.11 and everything started saving and working again.

Stream support

It would be great if flywheel supported the creation of DynamoDB streams.

Has any work been done (or planned) on streams support? It's something I need, and may have time to take a look at integrating directly into flywheel, if you think it's useful. (Until then, I can dive in and do it imperatively through boto.)

Is it possible to get all validation errors?

I'm working on a some basic crud stuff and noticed that flywheel appears to raise an exception on the first validation failure and that's it.

Is it possible to get a list of all the fields with failed validations?

Adding validators to fields

More of an enhancement: it would be great if there was a way to define more than just the type of the field, but also few additional rules.

Example (found on the SQLAlchemy documentation):

username = Column(String(18))

gen() iterator should support resumable paged queries/scans

I’d like to be able to provide an API to our users that is backed by a query that returns a large number of results. We’d like to be able to retrieve and store a cursor from a result set (query or scan), then create a new result set that picks up where the last one left off, possibly minutes later. I know that dynamo3 supports this internally, but it's not exported via flywheel yet.

Steve replied via email:

I think you'd have to have a new terminator on the query (like gen(), all()). I think you'd need page() to fetch a single page from DynamoDB so that you have some knowledge of where the database query left off. The default gen() implementation just provides a straight-up iterator that handles it all transparently. We could make page() return an object that extends list and has a 'cursor' attribute. Then you make the query terminators also take 'cursor' as a keyword argument, pass that through to dynamo3 and it should just work. I'll have to make a couple of changes in dynamo3 to support fetching pages instead of just iterators, but that should be pretty easy.

Possible new issue...

I do the following:

class AppDataModel(Model):
"""
The application data storage model. The indexed data includes:

    _country_code - ISO 3166-1 alpha-3, 3 character country code
    _app_type - application type, i.e. android, ios, etc. Always lower case
    _app_id - unique application identifier, usually a hash of some kind
    _app_version - version of application
    _analyzer_name - name of the analyzer that is responsible for some specific generated data
    _data_type - type of recorded data, i.e. log, call, output
    _creation_datetime - the ISO 8601 formatted date and time of when the data was created

The following indices are available:

    Global index gcc-index allowing searching by _country_code
    Global index gat-index allowing searching by _app_type
    Global index lan-index allowing searching by _analyzer_name

    Primary key:
        Hash key composed of _country_code, _app_type, _app_id, _app_version
        Range key composed of _analyzer_name

    Local index ldt-index allow searching within an application by _data_type
    Local index lc-index allow searching within an application by _creation_datetime
"""
__metadata__ = {
   'global_indexes': [
        GlobalIndex.all( 'gcc-index','country_code' ),
        GlobalIndex.all( 'gat-index','app_type' ),
        GlobalIndex.all( 'gboth-index','app_type','app_id')
    ]
}

country_code = Field( data_type=STRING )
app_type = Field( data_type=STRING )
app_id = Field( data_type=STRING )
app_version = Field( data_type=STRING )
composite_appid = Composite( 'country_code', 'app_type', 'app_id', 'app_version', hash_key=True )
analyzer_name = Field( data_type=STRING, index='lan-index' )
data_type = Field( data_type=STRING, index='ldt-index' )
data_index = Field( data_type=NUMBER )
creation_datetime = Field( data_type=datetime, index='lc-index' )
r_key = Composite( 'analyzer_name', 'data_type', 'data_index', 'creation_datetime', merge=c_composite, range_key=True)

def AppDataEngine(*_kwargs):
# Connect to DynamoDB
dyn = DynamoDBConnection( *_kwargs )

# Connect to the server and create the table schema for the 'appdata' table.
eng = Engine( dynamo=dyn, namespace=['appdata'] )
eng.register( AppDataModel )
eng.create_schema()

# Return the engine.
return( eng )

Later on I do this:

eng = AppDataEngine(**kwargs)

followed by this:

    # Create the instance for us to use...
    i = AppDataModel( country_code=self.cc, app_type=self.at,
                      app_id=self.aid, app_version=self.av,
                      analyzer_name=self.an, data_type='CALL',
                      creation_datetime=self.dt,
                    )

    # Write out the control data as a JSON struct
    i.data = { "WhoCalledMe":who_called_me, "args":args } )
    self.eng.save( i )

When I execute this code I get the following error returned:

oto.exception.JSONResponseError: JSONResponseError: 400 Bad Request
{u'Message': u'Cannot do operations on a non-existent table', u'__type': u'com.amazonaws.dynamodb.v20120810#ResourceNotFoundException'}

Since I'm running the "local dynamodb" system I can check the database and see all the stuff in Sqlite. So I know the table is there since I see 'appdata-AppDataModel' in the list of tables.

Delete subject

Is there a way to delete a subject container using the Flywheel SDK?

Engine.refresh() assumes batch_get() returns items in order; fails when changing pkey

Today I ran some old code that uses Engine.refresh() on a list of models, all from the same table, and the call fails with "AttributeError: Cannot change an item's primary key!". This code worked a few months ago, so I dug into how Flywheel implements refresh().

The cause of the error is that Flywheel expects batch_get() to return my results in the order the keys were specified, but DynamoDB seems to be returning them in a random order. I'm watching the JSON response from DynamoDB and they really are in a different order each time. I can reproduce this with a small number of small items, which easily fit in one response and require no paging. Also, I have no UnprocessedKeys in my results.

Did DynamoDB change its behavior recently? I see a comment in the 0.4.5 branch that asserts we'll get the results in order, and I'm pretty sure refresh() used to work in my program, so I'm currently stumped.

DynamoDB's docs don't have a lot to say about result ordering. It mentions attributes aren't ordered, but doesn't explicitly talk about result items.

Support `consumed_capacity` keyword argument

Functions in engine.py should accept and pass along a consumed_capacity argument to dynamo3

Use of 'datetime' in composite field

I've got the following in a schema:

analyzer_name = Field( data_type=STRING, index='lan-index' )
data_type = Field( data_type=STRING, index='ldt-index' )
creation_datetime = Field( data_type=datetime, index='lc-index' )
r_key = Composite( 'analyzer_name', 'data_type', 'creation_datetime', range_key=True)

When I try to use it I get the following returned to me:
File "marsdata.py", line 157, in call_control
self.eng.save( self.instance )
File "/home/dev/Projects/MARS2/PYTHON/lib/python2.7/site-packages/flywheel/engine.py", line 401, in save
batch.put_item(data=item.ddb_dump_())
File "/home/dev/Projects/MARS2/PYTHON/lib/python2.7/site-packages/flywheel/models.py", line 424, in ddb_dump_
data[name] = self.ddb_dump_field_(name)
File "/home/dev/Projects/MARS2/PYTHON/lib/python2.7/site-packages/flywheel/models.py", line 414, in ddb_dump_field_
val = getattr(self, name)
File "/home/dev/Projects/MARS2/PYTHON/lib/python2.7/site-packages/flywheel/models.py", line 234, in getattribute
return field.resolve(self)
File "/home/dev/Projects/MARS2/PYTHON/lib/python2.7/site-packages/flywheel/fields/init.py", line 445, in resolve
return self.coerce(self.merge(*args))
File "/home/dev/Projects/MARS2/PYTHON/lib/python2.7/site-packages/flywheel/fields/init.py", line 424, in
self.merge = lambda *args: ':'.join(args)
TypeError: sequence item 2: expected string or Unicode, datetime.datetime found

It is clear that Composite is trying to "push" strings together to create the range key and doesn't know what to do with a NUMBER type. Does this require me to create a custom composite handler or is this a bug?

Cross-table linking

Some way to have models reference models. One-to-one, one-to-many, and many-to-many should be supported.

Since we don't have the benefit of a SQL database, we'll have to consider the following:

Save and sync actions should probably not propagate
Backrefs are possible, but...hard. Maybe add those later?
Query/model option to eager-load referenced models
What happens if a value is set but the object doesn't exist in the DB? None? AttributeError?

Distinguishable Exception

Because Query.one() raises same type of exception for different errors, it's difficult to handle exception on application side. It'll be nice to have custom exceptions like "EntityNotFound".

Add support for engine.connect()

It would be good to have support for dynamo3's connect method instead of having two separate methods connecting: connect_to_host and connect_to_region.

Also, dynamo3 seems to have deprecated the connect_to_host and connect_to_region methods in the version which is used by the latest flywheel release. So there are DeprecationWarnings everywhere.