mapbox / dyno Goto Github PK

View Code? Open in Web Editor NEW

78.0 110.0 28.0 547 KB

simple dynamodb client

License: MIT License

JavaScript 100.00%

dyno's Introduction

Dyno

Dyno
- Overview
- Documentation

Dyno provides a DynamoDB client that adds additional functionality beyond what is provided by the aws-sdk-js.

Overview

Native JavaScript objects

Dyno operates as an extension to the aws-sdk's DocumentClient. This means that instead of interacting with typed JavaScript objects representing your DynamoDB records, you can use more "native" objects. For example, the following represents a typed object that could be stored in DynamoDB:

{
  id: { S: 'my-record' },
  numbers: { L: [{ N: '1' }, { N: '2' }, { N: '3' }] },
  data: { B: new Buffer.from('Hello World!') },
  version: { N: '5' },
}

Using Dyno, you can represent the same data in a "native" object:

{
  id: 'my-record',
  numbers: [1, 2, 3],
  data: new Buffer.from('Hello World!'),
  version: 5
}

Streaming query and scan results

A large query or scan operation may require multiple HTTP requests in order to retrieve all the desired data. Dyno provides functions that allow you to read those data from a native Node.js Readable Stream. Behind-the-scenes, Dyno manages making paginated requests to DynamoDB for you, and emits objects representing each of the records in the aggregated response.

Chunked batch getItem and writeItem requests

BatchGetItem and BatchWriteItem requests come with limits as to how much data you can ask for in a single HTTP request. Dyno functions allow you to present the entire set of batch requests that you wish to make. Dyno breaks your set up into an array of request objects each of which is within the limits of a single acceptable request. You can then send each of these requests and handle each of their responses individually.

Multi-table client

For situations where you may wish to write to one database and read from another. Dyno allows you to configure a client with parameters for two different tables, then routes your individual requests to the appropriate one.

De/serialization

Dyno exposes functions capable of serializing and deserializing native JavaScript objects representing DynamoDB records to and from wire-formatted strings acceptable as the body of any DynamoDB HTTP request.

Command-line interface

Dyno provides a CLI tool capable of reading and writing individual records, scanning, exporting and importing data into a DynamoDB table.

Documentation

See API.md for documentation of the JavaScript client, and use dyno --help for details on CLI usage.

dyno's People

Contributors

Stargazers

Watchers

dyno's Issues

Reflecting updated state on putItem, batchWriteItem

Since v1 dyno inherits straight from aws-sdk, that means it also inherits this behavior, making it such that putItem requests (and those in batch) don't reflect the final state of the item.

Do any of you see this as a problem @mick @willwhite @jakepruitt?

Conditional update shortcut?

dyno does a nice job giving us a shortcut for writing queries / filters

dyno.query({ id: { EQ: 'some-value' } }, function(err)...

Could it do the same for the expected array of objects during putItem/updateItem? Currently you pass exactly what is fed to the dynamo api.

something along the lines of:

{
    expected: {
        conditionalOperator: 'OR', // default to 'AND'
        attributename: { EQ: 'somevalue' },
        anotherattribute: { BETWEEN: [ 'a', 'b' ] }
    }
}

require region

no defaulting

Refactor to make better use of native aws-sdk

I've been thinking critically over the weekend about dyno. I'm feeling like several of the helpful aspects that dyno has provided have become convoluted or are now supported by the AWS SDK for JS:

dyno has lagged and has no clear plan for the support of condition & update expressions, leaving it unable to manipulate parts of List and Map attributes atomically
dyno recently developed mixed support for "dyno-style" objects and for "wire formatted" objects. This continues to be awkward to manage and maintain
dyno's lingo for specifying conditions and pagination aren't bad, but they are certainly not nearly as well documented as the aws-sdk
aws-sdk now has an embedded support for "dyno-style" objects: https://aws.amazon.com/about-aws/whats-new/2015/09/amazon-dynamodb-develop-locally-scale-globally/

This has left me trying to list out what dyno's biggest contributions are at this point. I wonder if dyno needs to undergo some significant refactoring to focus on these aspects:

Able to turn the caller's batch needs into multiple HTTP requests that fall within the constraints of DynamoDB's batch requests limits
Readable streams of query and scan results that emit per-item and cross pagination boundaries
Present, but poor, support for aggregating consumed capacity metrics across multiple requests
Some pretty useful CLI tools
Dyno.multi which basically just routes read and write requests to two different tables

I believe we might be able to make dyno more maintainable by building the dyno client as an extension of an aws-sdk client, without overriding that client's methods like we do now. We could inherit getItem, putItem, etc. from the aws-sdk, and provide our own methods focused on the things dyno can still contribute.

Definitely looking for thoughts and input on this @mick @willwhite. Did I miss anything that you consider important dyno functionality that's not offered by the aws-sdk? Talk me back from the ledge please.

Investigate dynamodb-doc

https://github.com/awslabs/dynamodb-document-js-sdk

This looks like it might be a very intriguing backend for type coercions, conditionals and such. Have you seen this @mick?

Allow ReturnConsumedCapacity

We should let requests ReturnConsumedCapacity. @mick what do you think about:

dyno.putItems(items, { capacity: true }, function(err, results, capacity) { });

The SDK provides a choice between INDEXES and TOTAL reports. I would just use the INDEXES option as it seems to provide more a comprehensive report.

Handling batch write failures

Batch writes are performed in parallel chunks of 25, but if one chunk fails, the returned error will only contain unprocessed items from that chunk. In the meantime, if other chunks of the same batch write request fail, their unprocessed items will not be returned.

@willwhite am I understanding that correctly?

tests for secondary indexes

Use serialization functions in CLI tools

#86 is exposing a Dyno.serialize() function that should better support the "new" dynamodb data types. We should use these functions in the CLI tools in this library

Lets branch off from #86 to try this out.

cc @jakepruitt

2014-10-08 dynamodb updates

Amazon's changelog

Map, List, and Bool datatypes #41
new expressions #38

better docs

More examples for:

query
scan
paging
streaming
query options

Add --help for dyno cli

Support map, list, and bool types

Splitting off #40.

Drop kinesis writing

Now that DynamoDB streams are real, I'm inclined to remove the dyno code that uses kinesis as a shim. Any objections @mick or @willwhite?

batch put and delete are skipping items

Counts dont match with multiple batch are needed. like when > 25 items are in a request

Mock dynamodb-stream objects in kinesis are kind of lies

I'm realizing that we've confused a distinction between records in a stream and events that something like Lambda or an SNS would receive. What this boils down to is that it looks like we're putting a dynamodb record, wrapped in a dynamodb event into a kinesis record.

Then if I'm using lambda to consume the stream, I'm getting a dynamo record wrapped in a dynamo event wrapped in a kinesis record wrapped in a kinesis event. More JSON to help explain here: https://gist.github.com/rclark/a97a73a61fc124e7f78d. This highlights some of the subtle differences between dynamodb streams and kinesis streams that we can't really be aware of before being able to actually use dynamodb streams.

I think that a smarter approach for now would be to drop the dynamodb event wrapper. This means a test like this one would change its assertion to:

t.deepEqual(clean(found), {
    Data: JSON.stringify({
        Keys: { id: { S: 'the-id' } },
        SequenceNumber: '0',
        SizeBytes: 0,
        StreamViewType: 'KEYS_ONLY'
    }),
    PartitionKey: JSON.stringify({ id: { S: 'the-id' } })
}, 'expected updateItem params');

cc @mick @willwhite @freenerd

getAll abstraction

Would be great to have a fn that allows the client to pull all items from a table - abstracting away the necessity of paging and concating the data.

item.deleteItem()

delete a single item

figure out whats up with dyno cli.js / cli prob

support parallel scans

This should be relatively simple -- just need to allow users to pass segment and segments params to dyno.scan as options.

`dyno import` does not seem to work

Hello,

Using dyno export produce a file whose first line is the table definition and other lines are the contents of the table. Unfortunately dyno fails to parse the first line because it expect dynamodb types as in the following lines.

Has anyone already used dyno import successfully?

All the best,
Cam

Allow caller to determine type on consumed capacity

Make these options allowable:

If set to TOTAL, the response includes ConsumedCapacity data for tables and indexes. If set to INDEXES, the response includes ConsumedCapacity for indexes. If set to NONE (the default), ConsumedCapacity is not included in the response.

Possible values include:
"INDEXES"
"TOTAL"
"NONE"

document batch methods

getItems and putItems aren't listed in API.md

dyno import does not work

usr/local/lib/node_modules/dyno/node_modules/aws-sdk/lib/request.js:30
throw err;
^
TypeError: Cannot read property 'length' of undefined
at validateList (/usr/local/lib/node_modules/dyno/node_modules/aws-sdk/lib/param_validator.js:75:31)
at validateMember (/usr/local/lib/node_modules/dyno/node_modules/aws-sdk/lib/param_validator.js:63:21)
at validateStructure (/usr/local/lib/node_modules/dyno/node_modules/aws-sdk/lib/param_validator.js:48:14)
at validateMember (/usr/local/lib/node_modules/dyno/node_modules/aws-sdk/lib/param_validator.js:61:21)
at validate (/usr/local/lib/node_modules/dyno/node_modules/aws-sdk/lib/param_validator.js:9:10)
at Request.VALIDATE_PARAMETERS (/usr/local/lib/node_modules/dyno/node_modules/aws-sdk/lib/event_listeners.js:88:32)
at Request.callListeners (/usr/local/lib/node_modules/dyno/node_modules/aws-sdk/lib/sequential_executor.js:105:20)
at callNextListener (/usr/local/lib/node_modules/dyno/node_modules/aws-sdk/lib/sequential_executor.js:95:12)
at /usr/local/lib/node_modules/dyno/node_modules/aws-sdk/lib/event_listeners.js:75:9
at finish (/usr/local/lib/node_modules/dyno/node_modules/aws-sdk/lib/config.js:244:7)

first line of json file is as follows
{"AttributeDefinitions":[{"AttributeName":"id","AttributeType":"S"}],"TableName":"AtomicCounter","KeySchema":[{"AttributeName":"id","KeyType":"HASH"}],"ProvisionedThroughput":{"ReadCapacityUnits":10,"WriteCapacityUnits":10},"TableArn":"arn:aws:dynamodb:us-west-1:1111111111:table/AtomicCounter"}

Catch undefined value

When a value is undefined for an item property, calling toDynamoTypes throws an error Cannot read property 'datatype' of undefined that originates in the call to formatDataType in the dynamo-doc code (https://github.com/mapbox/dyno/blob/master/lib/types.js#L17).

As per discussion, an undefined value should be ignored and not sent to dynamo, unless the attribute is a database key, in which case it should callback with an error rather than throwing.

query support

simplify query.

will look something like:

dyno.query({ id: { 'EQ' : 'yo' }, range: { 'BETWEEN': [4,6] } }, itemResp);

NULL/NOT_NULL support for conditionals

NULL/NOT_NULL doesn't make sense in the context of queries because dynamo doesn't support null values, but it is needed for expectations in updates so you can ensure that a property is set before allowing the request to succeed.

Exponential backoff on throughput exceeeeeeded

We should include a way for the caller can ask dyno to retry requests with an exponential backoff if it hits Dynamo-inflicted throughput limitations.

see mapbox/tilelive-cardboard#11

batchWriteItem smart batching

Given an array of many things to put in an index, split them into chunks and write them with 1 or more batchWriteItem calls

0.7 Release

@mick how about an 0.7 release for the paging support? I just added a test of paging with an offset start, the key feature for cardboard paging.

`BatchGetItems` implementation

Hi all,

Love dyno.

Would love to see BatchGetItems implemented.

Here's a super simple implementation that could obviously be made much more robust (chunking, etc) but works for our simple use case.

items.getItems = function(items, cb) {
    var getItemsRequest = {
        RequestItems: _(items).keys().reduce(function(requestItemsList, itemKey) {
            requestItemsList[itemKey] = {Keys: types.toDynamoTypes(items[itemKey])};
            return requestItemsList;
        }, {})
    };

    return dynamoRequest('batchGetItem', getItemsRequest, opts, cb);
};

Add batchWriteItem

DynamoDB has batchWriteItems(), which allows you to put and delete records within a single request. Dyno breaks away from this pattern and instead implements two separate batch functions for putting and deleting. What's the thinking behind this change?

We should expose a function that allows users to take advantage of the combined request.

/cc @rclark

Attach request ids to error objects

Dyno could insure that an error object always comes with request ids attached if this doesn't get implemented upstream.

Basically, instead of passing through native aws-sdk functions, we would write small wrappers. For example, from index.js:

// this is what we do now
var dyno = {
  putItem: docClient.put.bind(docClient)
};

// we could, instead
var dyno = {
  putItem: function(params, callback) {
    return docClient.put(params, callback)
      .on('retry', function(response) {
        if (!response.error) return;
        response.error.amzId = response.httpResponse.headers['x-amz-request-id'];
        response.error.amzId2 = response.httpResponse.headers['x-amz-id-2'];
      });
  }
}

cc @willwhite

Configurable items-per-event on query, scan streams

It would be pretty useful in several contexts if you could specify the number of items that you get returned in each data event by a query or scan stream.

use Select: 'ALL_PROJECTED_ATTRIBUTES' when no attributes on query

Otherwise you get:

[ValidationException: One or more parameter values were invalid: Select type ALL_ATTRIBUTES is not supported for global secondary index cell because its projection type is not ALL]

If the index doesnt have all the attributes, and from the dynamodb docs for ALL_PROJECTED_ATTRIBUTES:

If the index is configured to project all attributes, this is equivalent to specifying ALL_ATTRIBUTES.

Unable to put more than 25 items

Hi!

For some reason, when using the dyno scan ... | dyno put ... idiom, I am usable to import more than 25 items in my dynamodn-local table. It is unclear to me why it happens, here is the stacktrace I got:

undefined:0


SyntaxError: Unexpected end of input
    at Object.parse (native)
    at Function.Dyno.deserialize (/usr/local/lib/node_modules/dyno/index.js:76:16)
    at Transform.Parser.parser._transform (/usr/local/lib/node_modules/dyno/bin/cli.js:90:23)
    at Transform._read (_stream_transform.js:179:10)
    at Transform._write (_stream_transform.js:167:12)
    at doWrite (_stream_writable.js:301:12)
    at writeOrBuffer (_stream_writable.js:288:5)
    at Transform.Writable.write (_stream_writable.js:217:11)
    at Stream.ondata (stream.js:51:26)
    at Stream.emit (events.js:107:17)

Any idea?

Best,
Camille

Auto-paginate .query and .scan requests

Propose adding an additional numeric parameter Pages to query and scan requests. These would come in alongside all the other options that the aws-sdk already accepts. The idea would be:

a 0 would callback with an error, this is an invalid parameter value
an undefined, null, or 1 value would return 1 page of results (default behavior)
any other number will return up to that number of pages
a value of Infinity would return all available pages

So an example request like this one would return up to 10 pages of results:

dyno.query({
  ExpressionAttributeNames: { '#c': 'collection' },
  ExpressionAttributeValues: { ':c': 'my-collection' },
  KeyConditionExpression: '#c = :c',
  Pages: 10
}, function() {});

The response should look identical to the response for any single page, except that data.Items would include all the records that were read. Additionally, if ReturnConsumedCapacity is requested, those metrics should be the sum of capacity consumed by all the requests performed.

cc @mick

Eliminate shared config behavior

dyno currently shares config across multiple instances inside a single process. Instead, each instance should be independently configurable.

requestSet.sendAll should automatically retry unprocessed items

Almost everywhere I've used it, I've ended up writing like:

var params = {
  RequestItems: { 'my-table': [ ... ] }
};

(function write(requestSet) {
  requestSet.sendAll(10, function(err, responses, unprocessed) {
    if (err) return callback(err);
    if (unprocessed) return write(unprocessed);
    callback(null, responses);
  });
})(dyno.batchWriteItemRequests(params));

This is essentially just "retry until there are no unprocessed items left". This should be the default behavior of .batchWriteItemRequests and .batchGetItemRequests. There should also be a way to opt-out of this behavior.

Import: schema optional?

The import cli command currently requires that the first line of the stream be a table schema definition. If this were optional it would be useful for bulk importing new docs into an existing empty table.

end event on stream is delayed sometimes.

sometimes it takes several seconds of no new data before the end event is fired.

raw option

Right now update, put, scan, all heavy modify the object that get sent to dynamo as well as the response.

add a raw option to bypass these enhancements.

automatic paging for scan and queries

If they are using a callback this should be optional defaulting to false. If they are streaming this should be optional defaulting to true.

This will make #2 possible

dyno.create, dyno.queryBatchGet

Add two functions:

dyno.create: performs a conditional putItem request that asserts the key of the item that you're about to create does not already exist
dyno.queryBatchGet: performs a series of requests -- Query on a GSI that uses a keys-only projection, then BatchGetItem to fetch the individual documents and flesh out the query response.

Normalize responses across requests

Currently:

getItem: { Item: { dyno-style item } }
putItem: {}
updateItem: {}
query: { count: X, items: [ dyno-style items ] }
scan: { count: X, items: [ dyno-style items ] }
batch requests: You get back what the aws-sdk gives you

Questions

Can getItem just return the item without the wrap?
Is it important to return count on query/scan?
batch requests should have a simplified description of the results #27
how to return consumed capacity information #29

table definitions: knowing the key schema can allow you to calculate the size of items as they are stored in DynamoDB. It could also help with higher-level functions like #101
you could keep track of how much storage you've added within the session
a running count on capacity consumed across requests within your session

cc @mick @emilymcafee @willwhite

Support for ConditionExpression

It seems like Expected is deprecated.

Expected — (map) There is a newer parameter available. Use ConditionExpression instead. Note that if you use Expected and ConditionExpression at the same time, DynamoDB will return a ValidationException exception.

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/LegacyConditionalParameters.html

And has been replaced by ConditionExpression:

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Expressions.SpecifyingConditions.html

travis test use real dynamo?

There are some edge changes that mocked dynamo can miss. It would be great to let travis double check test using real dynamo