mapbox / dynamodb-migrator Goto Github PK
View Code? Open in Web Editor NEWFor migration and cleanup operations on your DynamoDB table
License: ISC License
For migration and cleanup operations on your DynamoDB table
License: ISC License
Many migration scripts are meant to target a very specific table and could lead to Bad Things ๐ if run on others.
I've currently hacked some sanity checks for the table being targeted (and bailing out) by inspecting process.argv
from my migrate module but it would be cleaner/more reliable for a setup or preflight function to be called with these options (method/region/table might be sufficient?)
I can take a quick pass at implementing this if this sounds reasonable.
The change I made in #3 (use Dyno.deserialize, expecting wire-format from a backup instead of line-deliminated JSON) doesn't play nice with feeding in a list of keys that look like { id: "foo", collection: "foo:bar"}
.
I think the migrator should continue support for line-deliminated JSON, especially in stream mode (i.e. for "give me a list of keys to target, and I'll do stuff to them"). Maybe this should be an additional option passed only in stream mode, like a --json
flag or something. Since I messed this up in the first place I can take this on, let me know what you think @rclark.
Right now we store all objects in a buffer while we wait to run them.
For many migrations, you will want to filter out objects that don't need to be migrated.
We should consider supporting a filter
function that takes the object and MUST return true or false.
This filter should run before we add the object to the queue.
Provide a simple shell script that reads a backup file, runs a user-provided function over each line, and pipes the output into another file.
Issues #6 and #10 bring up some problems with logging, but we also have serious memory buffering issues with it during large migrations. I'm going to take a pass through to see if it can be improved, but there's a pretty good chance we can't without impacting the way folks expect to be able to write logs in their migration scripts.
We may have to fall back to just console.log
+ piping to a file as part of the bash command.
The table name being operated on is not exposed at runtime through the dyno object. If developers are using TypeScript, the table name is a required part of the update expression. So, to avoid the compilation error, developers must provide their migrations another way to get the table name being worked on - duplicating it with the command line arguments used to invoke dynamodb-migrator.
If the dyno object made the table name string available, we'd be able to avoid this extra step.
http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/DynamoDB.html#scan-property
@rclark - possible fix for high cpu migrations running at low concurrencies?
Right now we support scan
and stream
. While we have filter functionality to let you transform a stream
so the migration will only run on some items in the export, we can only use this where a database export is usable.
To support filtered results where an export is not viable, we should consider supporting the query operation.
This would require the migration file to export a query attribute. I'm torn on if it should be a function that accepts a dyno instance or a JSON object that describes the dynamo query.
I think this is supposed to be logged at the beginning of a migration, but right now it's logged at the end. It would be useful to have this up front.
It would be awesome to get a timestamp on log files, and maybe other descriptive information, too.
The rate flag only works right now when you're running in scan mode. stream + --rate = crash
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.