Giter Site home page Giter Site logo

dyno's Introduction

Dyno

Build Status

Dyno provides a DynamoDB client that adds additional functionality beyond what is provided by the aws-sdk-js.

Overview

Native JavaScript objects

Dyno operates as an extension to the aws-sdk's DocumentClient. This means that instead of interacting with typed JavaScript objects representing your DynamoDB records, you can use more "native" objects. For example, the following represents a typed object that could be stored in DynamoDB:

{
  id: { S: 'my-record' },
  numbers: { L: [{ N: '1' }, { N: '2' }, { N: '3' }] },
  data: { B: new Buffer.from('Hello World!') },
  version: { N: '5' },
}

Using Dyno, you can represent the same data in a "native" object:

{
  id: 'my-record',
  numbers: [1, 2, 3],
  data: new Buffer.from('Hello World!'),
  version: 5
}

Streaming query and scan results

A large query or scan operation may require multiple HTTP requests in order to retrieve all the desired data. Dyno provides functions that allow you to read those data from a native Node.js Readable Stream. Behind-the-scenes, Dyno manages making paginated requests to DynamoDB for you, and emits objects representing each of the records in the aggregated response.

Chunked batch getItem and writeItem requests

BatchGetItem and BatchWriteItem requests come with limits as to how much data you can ask for in a single HTTP request. Dyno functions allow you to present the entire set of batch requests that you wish to make. Dyno breaks your set up into an array of request objects each of which is within the limits of a single acceptable request. You can then send each of these requests and handle each of their responses individually.

Multi-table client

For situations where you may wish to write to one database and read from another. Dyno allows you to configure a client with parameters for two different tables, then routes your individual requests to the appropriate one.

De/serialization

Dyno exposes functions capable of serializing and deserializing native JavaScript objects representing DynamoDB records to and from wire-formatted strings acceptable as the body of any DynamoDB HTTP request.

Command-line interface

Dyno provides a CLI tool capable of reading and writing individual records, scanning, exporting and importing data into a DynamoDB table.

Documentation

See API.md for documentation of the JavaScript client, and use dyno --help for details on CLI usage.

dyno's People

Contributors

ae-mo avatar alexanderbelokon avatar ingalls avatar mapsam avatar mcwhittemore avatar miafan23 avatar mick avatar morganherlocker avatar rclark avatar sgillies avatar springmeyer avatar tcql avatar tmcw avatar vsmart avatar willwhite avatar yhahn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dyno's Issues

Conditional update shortcut?

dyno does a nice job giving us a shortcut for writing queries / filters

dyno.query({ id: { EQ: 'some-value' } }, function(err)...

Could it do the same for the expected array of objects during putItem/updateItem? Currently you pass exactly what is fed to the dynamo api.

something along the lines of:

{
    expected: {
        conditionalOperator: 'OR', // default to 'AND'
        attributename: { EQ: 'somevalue' },
        anotherattribute: { BETWEEN: [ 'a', 'b' ] }
    }
}

Refactor to make better use of native aws-sdk

I've been thinking critically over the weekend about dyno. I'm feeling like several of the helpful aspects that dyno has provided have become convoluted or are now supported by the AWS SDK for JS:

  • dyno has lagged and has no clear plan for the support of condition & update expressions, leaving it unable to manipulate parts of List and Map attributes atomically
  • dyno recently developed mixed support for "dyno-style" objects and for "wire formatted" objects. This continues to be awkward to manage and maintain
  • dyno's lingo for specifying conditions and pagination aren't bad, but they are certainly not nearly as well documented as the aws-sdk
  • aws-sdk now has an embedded support for "dyno-style" objects: https://aws.amazon.com/about-aws/whats-new/2015/09/amazon-dynamodb-develop-locally-scale-globally/

This has left me trying to list out what dyno's biggest contributions are at this point. I wonder if dyno needs to undergo some significant refactoring to focus on these aspects:

  • Able to turn the caller's batch needs into multiple HTTP requests that fall within the constraints of DynamoDB's batch requests limits
  • Readable streams of query and scan results that emit per-item and cross pagination boundaries
  • Present, but poor, support for aggregating consumed capacity metrics across multiple requests
  • Some pretty useful CLI tools
  • Dyno.multi which basically just routes read and write requests to two different tables

I believe we might be able to make dyno more maintainable by building the dyno client as an extension of an aws-sdk client, without overriding that client's methods like we do now. We could inherit getItem, putItem, etc. from the aws-sdk, and provide our own methods focused on the things dyno can still contribute.

Definitely looking for thoughts and input on this @mick @willwhite. Did I miss anything that you consider important dyno functionality that's not offered by the aws-sdk? Talk me back from the ledge please.

Allow ReturnConsumedCapacity

We should let requests ReturnConsumedCapacity. @mick what do you think about:

dyno.putItems(items, { capacity: true }, function(err, results, capacity) { });

The SDK provides a choice between INDEXES and TOTAL reports. I would just use the INDEXES option as it seems to provide more a comprehensive report.

Handling batch write failures

Batch writes are performed in parallel chunks of 25, but if one chunk fails, the returned error will only contain unprocessed items from that chunk. In the meantime, if other chunks of the same batch write request fail, their unprocessed items will not be returned.

@willwhite am I understanding that correctly?

better docs

More examples for:

  • query
  • scan
  • paging
  • streaming
  • query options

Mock dynamodb-stream objects in kinesis are kind of lies

I'm realizing that we've confused a distinction between records in a stream and events that something like Lambda or an SNS would receive. What this boils down to is that it looks like we're putting a dynamodb record, wrapped in a dynamodb event into a kinesis record.

Then if I'm using lambda to consume the stream, I'm getting a dynamo record wrapped in a dynamo event wrapped in a kinesis record wrapped in a kinesis event. More JSON to help explain here: https://gist.github.com/rclark/a97a73a61fc124e7f78d. This highlights some of the subtle differences between dynamodb streams and kinesis streams that we can't really be aware of before being able to actually use dynamodb streams.

I think that a smarter approach for now would be to drop the dynamodb event wrapper. This means a test like this one would change its assertion to:

t.deepEqual(clean(found), {
    Data: JSON.stringify({
        Keys: { id: { S: 'the-id' } },
        SequenceNumber: '0',
        SizeBytes: 0,
        StreamViewType: 'KEYS_ONLY'
    }),
    PartitionKey: JSON.stringify({ id: { S: 'the-id' } })
}, 'expected updateItem params');

cc @mick @willwhite @freenerd

getAll abstraction

Would be great to have a fn that allows the client to pull all items from a table - abstracting away the necessity of paging and concating the data.

support parallel scans

This should be relatively simple -- just need to allow users to pass segment and segments params to dyno.scan as options.

`dyno import` does not seem to work

Hello,

Using dyno export produce a file whose first line is the table definition and other lines are the contents of the table. Unfortunately dyno fails to parse the first line because it expect dynamodb types as in the following lines.

Has anyone already used dyno import successfully?

All the best,
Cam

Allow caller to determine type on consumed capacity

Make these options allowable:

If set to TOTAL, the response includes ConsumedCapacity data for tables and indexes. If set to INDEXES, the response includes ConsumedCapacity for indexes. If set to NONE (the default), ConsumedCapacity is not included in the response.

Possible values include:
"INDEXES"
"TOTAL"
"NONE"

dyno import does not work

usr/local/lib/node_modules/dyno/node_modules/aws-sdk/lib/request.js:30
throw err;
^
TypeError: Cannot read property 'length' of undefined
at validateList (/usr/local/lib/node_modules/dyno/node_modules/aws-sdk/lib/param_validator.js:75:31)
at validateMember (/usr/local/lib/node_modules/dyno/node_modules/aws-sdk/lib/param_validator.js:63:21)
at validateStructure (/usr/local/lib/node_modules/dyno/node_modules/aws-sdk/lib/param_validator.js:48:14)
at validateMember (/usr/local/lib/node_modules/dyno/node_modules/aws-sdk/lib/param_validator.js:61:21)
at validate (/usr/local/lib/node_modules/dyno/node_modules/aws-sdk/lib/param_validator.js:9:10)
at Request.VALIDATE_PARAMETERS (/usr/local/lib/node_modules/dyno/node_modules/aws-sdk/lib/event_listeners.js:88:32)
at Request.callListeners (/usr/local/lib/node_modules/dyno/node_modules/aws-sdk/lib/sequential_executor.js:105:20)
at callNextListener (/usr/local/lib/node_modules/dyno/node_modules/aws-sdk/lib/sequential_executor.js:95:12)
at /usr/local/lib/node_modules/dyno/node_modules/aws-sdk/lib/event_listeners.js:75:9
at finish (/usr/local/lib/node_modules/dyno/node_modules/aws-sdk/lib/config.js:244:7)

first line of json file is as follows
{"AttributeDefinitions":[{"AttributeName":"id","AttributeType":"S"}],"TableName":"AtomicCounter","KeySchema":[{"AttributeName":"id","KeyType":"HASH"}],"ProvisionedThroughput":{"ReadCapacityUnits":10,"WriteCapacityUnits":10},"TableArn":"arn:aws:dynamodb:us-west-1:1111111111:table/AtomicCounter"}

Catch undefined value

When a value is undefined for an item property, calling toDynamoTypes throws an error Cannot read property 'datatype' of undefined that originates in the call to formatDataType in the dynamo-doc code (https://github.com/mapbox/dyno/blob/master/lib/types.js#L17).

As per discussion, an undefined value should be ignored and not sent to dynamo, unless the attribute is a database key, in which case it should callback with an error rather than throwing.

query support

simplify query.

will look something like:

dyno.query({ id: { 'EQ' : 'yo' }, range: { 'BETWEEN': [4,6] } }, itemResp);

NULL/NOT_NULL support for conditionals

NULL/NOT_NULL doesn't make sense in the context of queries because dynamo doesn't support null values, but it is needed for expectations in updates so you can ensure that a property is set before allowing the request to succeed.

batchWriteItem smart batching

Given an array of many things to put in an index, split them into chunks and write them with 1 or more batchWriteItem calls

0.7 Release

@mick how about an 0.7 release for the paging support? I just added a test of paging with an offset start, the key feature for cardboard paging.

`BatchGetItems` implementation

Hi all,

Love dyno.

Would love to see BatchGetItems implemented.

Here's a super simple implementation that could obviously be made much more robust (chunking, etc) but works for our simple use case.

items.getItems = function(items, cb) {
    var getItemsRequest = {
        RequestItems: _(items).keys().reduce(function(requestItemsList, itemKey) {
            requestItemsList[itemKey] = {Keys: types.toDynamoTypes(items[itemKey])};
            return requestItemsList;
        }, {})
    };

    return dynamoRequest('batchGetItem', getItemsRequest, opts, cb);
};

Add batchWriteItem

DynamoDB has batchWriteItems(), which allows you to put and delete records within a single request. Dyno breaks away from this pattern and instead implements two separate batch functions for putting and deleting. What's the thinking behind this change?

We should expose a function that allows users to take advantage of the combined request.

/cc @rclark

Attach request ids to error objects

Dyno could insure that an error object always comes with request ids attached if this doesn't get implemented upstream.

Basically, instead of passing through native aws-sdk functions, we would write small wrappers. For example, from index.js:

// this is what we do now
var dyno = {
  putItem: docClient.put.bind(docClient)
};

// we could, instead
var dyno = {
  putItem: function(params, callback) {
    return docClient.put(params, callback)
      .on('retry', function(response) {
        if (!response.error) return;
        response.error.amzId = response.httpResponse.headers['x-amz-request-id'];
        response.error.amzId2 = response.httpResponse.headers['x-amz-id-2'];
      });
  }
}

cc @willwhite

use Select: 'ALL_PROJECTED_ATTRIBUTES' when no attributes on query

Otherwise you get:

[ValidationException: One or more parameter values were invalid: Select type ALL_ATTRIBUTES is not supported for global secondary index cell because its projection type is not ALL]

If the index doesnt have all the attributes, and from the dynamodb docs for ALL_PROJECTED_ATTRIBUTES:

If the index is configured to project all attributes, this is equivalent to specifying ALL_ATTRIBUTES.

Unable to put more than 25 items

Hi!

For some reason, when using the dyno scan ... | dyno put ... idiom, I am usable to import more than 25 items in my dynamodn-local table. It is unclear to me why it happens, here is the stacktrace I got:

undefined:0


SyntaxError: Unexpected end of input
    at Object.parse (native)
    at Function.Dyno.deserialize (/usr/local/lib/node_modules/dyno/index.js:76:16)
    at Transform.Parser.parser._transform (/usr/local/lib/node_modules/dyno/bin/cli.js:90:23)
    at Transform._read (_stream_transform.js:179:10)
    at Transform._write (_stream_transform.js:167:12)
    at doWrite (_stream_writable.js:301:12)
    at writeOrBuffer (_stream_writable.js:288:5)
    at Transform.Writable.write (_stream_writable.js:217:11)
    at Stream.ondata (stream.js:51:26)
    at Stream.emit (events.js:107:17)

Any idea?

Best,
Camille

Auto-paginate .query and .scan requests

Propose adding an additional numeric parameter Pages to query and scan requests. These would come in alongside all the other options that the aws-sdk already accepts. The idea would be:

  • a 0 would callback with an error, this is an invalid parameter value
  • an undefined, null, or 1 value would return 1 page of results (default behavior)
  • any other number will return up to that number of pages
  • a value of Infinity would return all available pages

So an example request like this one would return up to 10 pages of results:

dyno.query({
  ExpressionAttributeNames: { '#c': 'collection' },
  ExpressionAttributeValues: { ':c': 'my-collection' },
  KeyConditionExpression: '#c = :c',
  Pages: 10
}, function() {});

The response should look identical to the response for any single page, except that data.Items would include all the records that were read. Additionally, if ReturnConsumedCapacity is requested, those metrics should be the sum of capacity consumed by all the requests performed.

cc @mick

Eliminate shared config behavior

dyno currently shares config across multiple instances inside a single process. Instead, each instance should be independently configurable.

requestSet.sendAll should automatically retry unprocessed items

Almost everywhere I've used it, I've ended up writing like:

var params = {
  RequestItems: { 'my-table': [ ... ] }
};

(function write(requestSet) {
  requestSet.sendAll(10, function(err, responses, unprocessed) {
    if (err) return callback(err);
    if (unprocessed) return write(unprocessed);
    callback(null, responses);
  });
})(dyno.batchWriteItemRequests(params));

This is essentially just "retry until there are no unprocessed items left". This should be the default behavior of .batchWriteItemRequests and .batchGetItemRequests. There should also be a way to opt-out of this behavior.

Import: schema optional?

The import cli command currently requires that the first line of the stream be a table schema definition. If this were optional it would be useful for bulk importing new docs into an existing empty table.

raw option

Right now update, put, scan, all heavy modify the object that get sent to dynamo as well as the response.

add a raw option to bypass these enhancements.

automatic paging for scan and queries

If they are using a callback this should be optional defaulting to false. If they are streaming this should be optional defaulting to true.

This will make #2 possible

dyno.create, dyno.queryBatchGet

Add two functions:

  • dyno.create: performs a conditional putItem request that asserts the key of the item that you're about to create does not already exist
  • dyno.queryBatchGet: performs a series of requests -- Query on a GSI that uses a keys-only projection, then BatchGetItem to fetch the individual documents and flesh out the query response.

Normalize responses across requests

Currently:

getItem: { Item: { dyno-style item } }
putItem: {}
updateItem: {}
query: { count: X, items: [ dyno-style items ] }
scan: { count: X, items: [ dyno-style items ] }
batch requests: You get back what the aws-sdk gives you

Questions

  • Can getItem just return the item without the wrap?
  • Is it important to return count on query/scan?
  • batch requests should have a simplified description of the results #27
  • how to return consumed capacity information #29

Dyno.session()

Spitballing an idea here, looking for feedback. What if Dyno.session() provided you a dyno client that could keep track of things across all the queries that you made within that session?

Some examples of what might be nice to have stashed in memory alongside your client:

  • table definitions: knowing the key schema can allow you to calculate the size of items as they are stored in DynamoDB. It could also help with higher-level functions like #101
  • you could keep track of how much storage you've added within the session
  • a running count on capacity consumed across requests within your session

cc @mick @emilymcafee @willwhite

Support for ConditionExpression

It seems like Expected is deprecated.

Expected โ€” (map) There is a newer parameter available. Use ConditionExpression instead. Note that if you use Expected and ConditionExpression at the same time, DynamoDB will return a ValidationException exception.

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/LegacyConditionalParameters.html

And has been replaced by ConditionExpression:

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Expressions.SpecifyingConditions.html

travis test use real dynamo?

There are some edge changes that mocked dynamo can miss. It would be great to let travis double check test using real dynamo

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.