Comments (11)
After a fantastic test case by Nate and some digging, it appears that the ExclusiveStartKey
for an index query (global and local) needs to contain both the primary key of the index and the primary key of the table. This isn't documented anywhere I could find in the Dynamo docs because they just say "use the thing we give you as LastEvaluatedKey
" and assume that you won't be constructing it yourself.
I'll account for that in the method I'm making for grabbing the primary key from an index and it should Just Work™
from flywheel.
@stevearc This sounds good. I like that gen() would be implemented in terms of page(). Hopefully page size would be configurable as well in the first call.
from flywheel.
I ended up going about it in a slightly different way than initially planned, but it's pretty similar.
You can see an example of what the flow would look like in this test case
You should be able to configure the query behavior to do what you want with the Limit class. It's in dynamo3, but it's also imported in flywheel so you can import it directly from there.
Released flywheel 0.4.6. Let me know if this works for you!
from flywheel.
I ran some tests on this code, and it seems to be basically working. I found some issues with the current approach I’m not sure how to handle.
- Getting the primary key for an index
Your test code shows how to use pk_dict_ to get the primary key from the last item in a result set in order to restart the scan at that point. However, for a query on an index, it looks like I need to specify all the keys (and includes) from that index. How do I get the equivalent of pk_dict_ for an index?
- Query/scan terminates late
I’m running a query of an index with 4 matching items, with a Limit(scan_limit=2, strict=True). Debug log shows that each query is submitted properly and the results are what I expect, but the LastEvaluatedKey in the response for the 2nd query is non-null. (It is the last item in that result, item #4). According to the docs, it should be null instead when you hit the end of the results.
Sequence:
- Query(app_id=‘x’, ExclusiveStartKey=None).limit(2) -> two Items, last_evaluated_key=items[1]
- Query(app_id=‘x’, ExclusiveStartKey=items[1]).limit(2) -> two Items, last_evaluated_key=items[3] **
- Query(app_id=‘x’, last_evaluated_key=items[3]).limit(2) -> no Items, no last_evaluated_key
** I expect LastEvaluatedKey to be null here
If I continue with the query, the 3rd query gets a response of 0 items, so my code terminates then. But it would be nice not to have an extra round trip each time just to find there are no more results. I’m not sure if this is a DDB issue or Flywheel.
Here’s my rough workaround:
last_evaluated_key = None
while True:
# This calls .filter().limit().all() on the index
versions = app.get_items(engine, last_evaluated_key)
for x in versions:
print x.id
# Catch partial or no results = done
if len(versions) < PAGE_SIZE:
break
# Get primary key for index query to resume
dump = x.ddb_dump_()
last_evaluated_key = {'id': dump['id'], 'oid': dump['oid'], 'date': dump['date']}
- I don’t quite understand item_limit vs. scan_limit
I assume item_limit is when you want a specific number of items and you don’t care how many per Query/Scan operation, nor do you want to handle pagination. In the scan_limit case, you want to control the number of items fetched. Is that right?
Thanks!
from flywheel.
- You should be able to just pass in the dict that has the hash key and optional range key of that particular index. I'll find a place to add a convenient method to do that for you.
- This seems to be a DynamoDB behavior. When you pass in a
Limit
and it happens to hit the limit right at the final item, it will return aLastEvaluatedKey
.
¯\_(ツ)_/¯
- Yeah, it's not as clear as it could be.
scan_limit
is passed in directly to DynamoDB as theLimit
parameter. A careful reading of the docs reveals thatLimit
doesn't do what you would expect, nor probably what you want most of the time. It sets a hard limit on the items scanned, which does not mean it will be the number of items returned. If you don't have any scan filters then it behaves as expected. If you have any scan filters it's possible for the query/scan to return no results even if some exist in the table, because Dynamo will scan up to theLimit
and then return whatever it found.
This is why I added item_limit
. If you pass in an item_limit
dynamo3 will continue to query DynamoDB until either the item_limit
is reached OR there are no more results in the DB. min_scan_limit is there for the case when you're fetching a small item_limit
(say, 1). If you set the Limit to just be 1, you may have to do many many queries to finally retrieve a result. min_scan_limit
is there to make sure you're always scanning a minimum amount of items. Since your results only come back in pages, the total results may exceed the item_limit
(ex. you only need 1 more, you make another query and get back 5 results). By default it will return all of the results, but you can pass in strict=True
to chop off the extras.
I hope this clears up some of the confusion. I still think item_limit
isn't a great name, so if you have suggestions for a better one please let me know. Aaaaand I'll look for somewhere to put a method that will construct primary keys for indexes.
from flywheel.
I actually had to pass in an ExclusiveStartKey
comprised of 3 fields in the case of an includes index that had a hash key, range key, and one other included field. That surprised me since I thought it would just need the hash/range key for the index. Including anything less than those 3 fields gives this exception:
File "bluesteel/models.py", line 347, in <module>
main()
File "bluesteel/models.py", line 345, in main
last_key = {'oid': x.oid, 'date': x.ddb_dump_()['date']}
File "/Users/nate/venvs/orangecrush/lib/python2.7/site-packages/flywheel/query.py", line 79, in gen
for result in results:
File "/Users/nate/venvs/orangecrush/lib/python2.7/site-packages/six.py", line 558, in next
return type(self).__next__(self)
File "/Users/nate/venvs/orangecrush/lib/python2.7/site-packages/dynamo3/result.py", line 254, in __next__
return six.next(self.iterator)
File "/Users/nate/venvs/orangecrush/lib/python2.7/site-packages/dynamo3/result.py", line 283, in fetch
data = self.connection.call(*self.args, **self.kwargs)
File "/Users/nate/venvs/orangecrush/lib/python2.7/site-packages/dynamo3/connection.py", line 230, in call
exc.re_raise()
File "/Users/nate/venvs/orangecrush/lib/python2.7/site-packages/dynamo3/exception.py", line 22, in re_raise
six.reraise(type(self), self, self.exc_info[2])
File "/Users/nate/venvs/orangecrush/lib/python2.7/site-packages/dynamo3/connection.py", line 220, in call
data = op(**kwargs)
File "/Users/nate/venvs/orangecrush/lib/python2.7/site-packages/botocore/client.py", line 310, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/Users/nate/venvs/orangecrush/lib/python2.7/site-packages/botocore/client.py", line 396, in _make_api_call
raise ClientError(parsed_response, operation_name)
dynamo3.exception.DynamoDBError: ValidationException: Exclusive Start Key must have same size as table's key schema
Args: {
'ExclusiveStartKey': {'date': {'N': u'1450915200.000000'},
'oid': {'S': u'av-J4DFNhwnbDf'}},
...
Ignore the lazy use of ddb_dump_()
as this was just prototype code.
This index was defined as:
__metadata__ = {
'global_indexes': [
GlobalIndex.include('appversions-for-app', 'app_id', 'date',
includes=['oid']),
],
}
Good explanation on the various limits. That should definitely go in the docs. Maybe call it desired_page_size
or something?
from flywheel.
I may not be reading this correctly, but from the example you gave it looks like you were passing in date
and old
, when you needed to pass in app_id
and date
. Have you tried those two and was the result the same? I tried it on a global index locally and it seemed to work fine.
from flywheel.
Very nice, thanks for tracking this down.
from flywheel.
Okay, I pushed out version 0.4.7
which has the index_pk_dict_()
method on the model, and index_pk_dict()
on the metadata object. See if that works for you.
from flywheel.
Sorry, I’m getting a crash when using an index that has a datetime as part of the key. The key looks like this coming back from index_pk_dict() ok:
{'date': datetime.datetime(2015, 12, 20, 0, 0, tzinfo=<flywheel.fields.types.UTCTimezone object at 0x110c6c250>), 'oid': u'av-VmogFmEt6ey', 'app_id': u'app-IzGMSIS0p0m’}
But then the datetime is not being encoded, resulting in this exception:
File "/Users/nate/venvs/orangecrush/lib/python2.7/site-packages/flywheel/query.py", line 115, in all
exclusive_start_key=exclusive_start_key))
File "/Users/nate/venvs/orangecrush/lib/python2.7/site-packages/flywheel/query.py", line 78, in gen
**kwargs)
File "/Users/nate/venvs/orangecrush/lib/python2.7/site-packages/dynamo3/connection.py", line 1132, in query
self.dynamizer.maybe_encode_keys(exclusive_start_key)
File "/Users/nate/venvs/orangecrush/lib/python2.7/site-packages/dynamo3/types.py", line 170, in maybe_encode_keys
ret[k] = self.encode(v)
File "/Users/nate/venvs/orangecrush/lib/python2.7/site-packages/dynamo3/types.py", line 175, in encode
return dict([self.raw_encode(value)])
File "/Users/nate/venvs/orangecrush/lib/python2.7/site-packages/dynamo3/types.py", line 156, in raw_encode
(value, type(value)))
ValueError: No encoder for value '2015-12-20 00:00:00+00:00' of type '<type 'datetime.datetime’>'
from flywheel.
If you're calling the index_pk_dict()
method on the metadata object
directly, ddb_dump
defaults to False
. Try passing in db_dump=True
or
using the index_pk_dict_()
method on the model itself
On Thu, Dec 17, 2015 at 10:52 AM natecode [email protected] wrote:
Sorry, I’m getting a crash when using an index that has a datetime as part
of the key. The key looks like this coming back from index_pk_dict() ok:{'date': datetime.datetime(2015, 12, 20, 0, 0,
tzinfo=<flywheel.fields.types.UTCTimezone object at 0x110c6c250>), 'oid':
u'av-VmogFmEt6ey', 'app_id': u'app-IzGMSIS0p0m’}But then the datetime is not being encoded, resulting in this exception:
File
"/Users/nate/venvs/orangecrush/lib/python2.7/site-packages/flywheel/query.py",
line 115, in all
exclusive_start_key=exclusive_start_key))
File
"/Users/nate/venvs/orangecrush/lib/python2.7/site-packages/flywheel/query.py",
line 78, in gen
**kwargs)
File
"/Users/nate/venvs/orangecrush/lib/python2.7/site-packages/dynamo3/connection.py",
line 1132, in query
self.dynamizer.maybe_encode_keys(exclusive_start_key)
File
"/Users/nate/venvs/orangecrush/lib/python2.7/site-packages/dynamo3/types.py",
line 170, in maybe_encode_keys
ret[k] = self.encode(v)
File
"/Users/nate/venvs/orangecrush/lib/python2.7/site-packages/dynamo3/types.py",
line 175, in encode
return dict([self.raw_encode(value)])
File
"/Users/nate/venvs/orangecrush/lib/python2.7/site-packages/dynamo3/types.py",
line 156, in raw_encode
(value, type(value)))
ValueError: No encoder for value '2015-12-20 00:00:00+00:00' of type
'<type 'datetime.datetime’>'—
Reply to this email directly or view it on GitHub
#34 (comment).
from flywheel.
Related Issues (20)
- New data types available in dynamo3 HOT 1
- Model.save() missing? HOT 2
- filter_expression with 2 possible values HOT 1
- Engine-wide table name prefix HOT 6
- What is the best way to update just one model attribute HOT 3
- Exception using scan with undefined `name` field HOT 8
- add exists static method to models HOT 1
- version 0.5 breaks ddb_dump_() HOT 3
- engine.create_schema() doesn't add newly added global indexes for existing tables. HOT 1
- Documentation references support for undeclared / overflow fields HOT 4
- Minor: developer usability, `Query.one()` and `ValueError` HOT 1
- Stream support HOT 2
- Bugx .one() raises "Expected one result!" but all() returns just one entry HOT 2
- Unable to use to boolean HOT 1
- Query not returning result when using first() HOT 1
- No results returned when querying a date field HOT 2
- can save return the items new values? HOT 1
- Query and return multiple records based on the hash key HOT 1
- Delete subject
- dynamo3 dependency released breaking change HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from flywheel.