bchew / dynamodump Goto Github PK
View Code? Open in Web Editor NEWSimple backup and restore for Amazon DynamoDB using AWS SDK for Python (boto3)
License: MIT License
Simple backup and restore for Amazon DynamoDB using AWS SDK for Python (boto3)
License: MIT License
Hello,
Thank you for your work on this project.
I am using 1.9.0 from pip for backup and restore of dynamo database. There appears to be a bug when restoring binary object data that are inside maps. I'm assuming this would affect lists as well.
I've setup an example dataset and use the command to dump the data:
~/.local/bin/dynamodump -m backup -r local -s Test --accessKey a --secretKey a
{
"Items": [
{
"SubObject": {
"M": {
"BinaryInMap": {
"B": "MA=="
}
}
},
"PK": {
"S": "1"
},
"BinaryData": {
"B": "MA=="
}
}
],
"Count": 1,
"ScannedCount": 1
}
Now if we restore the data back to the database as-is:
~/.local/bin/dynamodump -m restore -r local -s Test -d Test --accessKey a --secretKey a
{
"Items": [
{
"SubObject": {
"M": {
"BinaryInMap": {
"B": "TUE9PQ=="
}
}
},
"PK": {
"S": "1"
},
"BinaryData": {
"B": "MA=="
}
}
],
"Count": 1,
"ScannedCount": 1
}
The BinaryInMap
key in the restored database has been restored with an incorrect value TUE9PQ==
. If we base64 decode TUE9PQ==
, it is MA==
, the correct value. So, basically, the base64 value is getting double encoded.
This bug appears to only affect values inside maps (and probably lists), the top-level binary values appear to be restored correctly.
not sure if this is dynamodump issue or a localstack issue. I can use aws cli on localstack so I assume this is a dynamodump issue.
bash-4.2# dynamodump --host xxxxx-container-localstack --port 4566 -m backup -r local -s xxxxxxxxx --accessKey test --secretKey test
INFO:root:Found 1 table(s) in DynamoDB host to backup: comm-dev-pricing-price-retails
INFO:root:Starting backup for xxxxxxxxx.
INFO:root:Dumping table schema for xxxxxxxxx
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib64/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/usr/lib64/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.8/site-packages/dynamodump/dynamodump.py", line 673, in do_backup
table_desc = dynamo.describe_table(TableName=table_name)
File "/usr/local/lib/python3.8/site-packages/botocore/client.py", line 415, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/local/lib/python3.8/site-packages/botocore/client.py", line 745, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.errorfactory.ResourceNotFoundException: An error occurred (ResourceNotFoundException) when calling the DescribeTable operation: Cannot do operations on a non-existent table
^CTraceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/dynamodump/dynamodump.py", line 1337, in main
args.read_capacity,
AttributeError: 'Namespace' object has no attribute 'read_capacity'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/dynamodump", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.8/site-packages/dynamodump/dynamodump.py", line 1367, in main
q.join()
File "/usr/lib64/python3.8/queue.py", line 89, in join
self.all_tasks_done.wait()
File "/usr/lib64/python3.8/threading.py", line 302, in wait
waiter.acquire()
KeyboardInterrupt
Amazon now allows for multiple gsis.. I'm thinking that may be the cause of this failure:
python dynamobackup.py -m restore -r us-west-2 -s dev_UserSeeds -d dev_UserSeeds4
INFO:root:dev_UserSeeds4 table deleted!
INFO:root:Starting restore for dev_UserSeeds to dev_UserSeeds4..
INFO:root:Creating dev_UserSeeds4 table with temp write capacity of 100
INFO:root:Waiting for dev_UserSeeds4 table to be created.. [CREATING]
INFO:root:Waiting for dev_UserSeeds4 table to be created.. [CREATING]
INFO:root:dev_UserSeeds4 created.
INFO:root:Restoring data for dev_UserSeeds4 table..
INFO:root:Processing 0001.json of dev_UserSeeds4
Traceback (most recent call last):
File "dynamobackup.py", line 419, in
do_restore(conn, sleep_interval, args.srcTable, dest_table, args.writeCapacity)
File "dynamobackup.py", line 291, in do_restore
batch_write(conn, sleep_interval, destination_table, put_requests)
File "dynamobackup.py", line 118, in batch_write
response = conn.batch_write_item(request_items)
File "/Library/Python/2.7/site-packages/boto/dynamodb2/layer1.py", line 420, in batch_write_item
body=json.dumps(params))
File "/Library/Python/2.7/site-packages/boto/dynamodb2/layer1.py", line 2842, in make_request
retry_handler=self._retry_handler)
File "/Library/Python/2.7/site-packages/boto/connection.py", line 953, in _mexe
status = retry_handler(response, i, next_sleep)
File "/Library/Python/2.7/site-packages/boto/dynamodb2/layer1.py", line 2882, in _retry_handler
response.status, response.reason, data)
boto.dynamodb2.exceptions.ValidationException: ValidationException: 400 Bad Request
{u'message': u'One or more parameter values were invalid: Type mismatch for Index Key thingId Expected: S Actual: M IndexName: gsi_UserSeeds_thingId', u'__type': u'com.amazon.coral.validate#ValidationException'}
It would be very handy to be able to use this to simply create a table, without loading data into it. Keys, indexes, etc. This would make it easily repeatable to create a new setup. A simple true/false flag as a param.
Along with Restore, can we have a Sync function?
Where any old keys in the target table is deleted.
There is an error with this repository's Renovate configuration that needs to be fixed. As a precaution, Renovate will stop PRs until it is resolved.
Location: renovate.json
Error type: Invalid JSON (parsing failed)
Message: Syntax error: expecting end of expression or separator near } "pre-
Hi, I'm having trouble using dynamodump with a wildcard source table specification. I want to dump a collection of tables to disk, with names like test.atlas.checks
and test.atlas.deploys
. I want all the tables matching the glob test.atlas.*
but it fails even when I use --srcTable test*
.
Here's the command I'm using:
python dynamodump.py --mode backup --region us-east-1 --srcTable test* --log debug
Here's the relevant part of the debug output. Boto returned a list of all my tables, but dynamodump didn't match the wildcard as I expected.
DEBUG:boto:Saw HTTP status: 200
DEBUG:boto:Validating crc32 checksum for body: {"TableNames":["prod.atlas.api_keys","prod.atlas.checks","prod.atlas.deploys","prod.atlas.graphs","prod.atlas.hosts","prod.atlas.regions","prod.atlas.services","prod.scorekeeper.requests_seen","test.atlas.checks","test.atlas.deploys","test.atlas.graphs","test.atlas.hosts","test.atlas.regions","test.atlas.services","test.scorekeeper.requests_seen"]}
DEBUG:boto:{"TableNames":["prod.atlas.api_keys","prod.atlas.checks","prod.atlas.deploys","prod.atlas.graphs","prod.atlas.hosts","prod.atlas.regions","prod.atlas.services","prod.scorekeeper.requests_seen","test.atlas.checks","test.atlas.deploys","test.atlas.graphs","test.atlas.hosts","test.atlas.regions","test.atlas.services","test.scorekeeper.requests_seen"]}
INFO:root:Found 0 table(s) in DynamoDB host to backup:
INFO:root:Backup of table(s) test* completed!
I took a look at the code but I don't really understand how the wildcard matching is supposed to work. Is there any way I can dump all the test.atlas.*
tables?
It's nice and simple the way it works now. Be great to be able to specify something on the output so that multiple versions could be kept separately.
Hello.
The option --srcTable
don't accept dashes chars.
Example: my-tables*
will result in a "found 0 tables" to backup or restore.
Thanks for this great tool.
Hi, really nice project, keep it up.
I just found one issue when I tried to backup and restore a table with LocalSecondaryIndex. It turns out that IndexSizeBytes,ItemCount and IndexArn are not required for creating LocalSecondaryIndex.
Here is the error messages:
[ec2-user@ip-10-0-1-110 dynamodump]$ dynamodump -m restore -r ap-southeast-2 -s demo2 --schemaOnly
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
About to delete table demo2. Type 'yes' to continue: yes
INFO:root:demo2 table deleted!
INFO:root:Starting restore for demo2 to demo2..
INFO:root:Creating demo2 table with temp write capacity of 25
Traceback (most recent call last):
File "/home/ec2-user/.local/bin/dynamodump", line 8, in
sys.exit(main())
File "/home/ec2-user/.local/lib/python3.7/site-packages/dynamodump/dynamodump.py", line 1497, in main
billing_mode=args.billingMode,
File "/home/ec2-user/.local/lib/python3.7/site-packages/dynamodump/dynamodump.py", line 924, in do_restore
**optional_args
File "/home/ec2-user/.local/lib/python3.7/site-packages/botocore/client.py", line 514, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/home/ec2-user/.local/lib/python3.7/site-packages/botocore/client.py", line 902, in _make_api_call
api_params, operation_model, context=request_context
File "/home/ec2-user/.local/lib/python3.7/site-packages/botocore/client.py", line 963, in _convert_to_request_dict
api_params, operation_model
File "/home/ec2-user/.local/lib/python3.7/site-packages/botocore/validate.py", line 381, in serialize_to_request
raise ParamValidationError(report=report.generate_report())
botocore.exceptions.ParamValidationError: Parameter validation failed:
Unknown parameter in LocalSecondaryIndexes[0]: "IndexSizeBytes", must be one of: IndexName, KeySchema, Projection
Unknown parameter in LocalSecondaryIndexes[0]: "ItemCount", must be one of: IndexName, KeySchema, Projection
Unknown parameter in LocalSecondaryIndexes[0]: "IndexArn", must be one of: IndexName, KeySchema, Projection
After I removed IndexSizeBytes, ItemCount, IndexArn, it works perfectly fine.
I try to restore a table using this command: dynamodump -m restore -r ap-northeast-1 --billingMode PAY_PER_REQUEST -s demo2 --schemaOnly --log DEBUG
The source table has a global secondary index(attribute2-3-index) configured, and both read and write capacity are on-demand.
I encountered the following error message:
DEBUG:botocore.endpoint:Sending http request: <AWSPreparedRequest stream_output=False, method=POST, url=https://dynamodb.ap-northeast-1.amazonaws.com/, headers={'X-Amz-Target': b'DynamoDB_20120810.CreateTable', 'Content-Type': b'application/x-amz-json-1.0', 'User-Agent': b'Boto3/1.24.90 Python/3.7.16 Linux/5.10.165-143.735.amzn2.x86_64 Botocore/1.27.96', 'X-Amz-Date': b'20230307T094011Z', 'X-Amz-Security-Token': b'IQoJb3JpZ2luX2VjEHEaCXVzLWVhc3QtMSJHMEUCIH65HH7LPsAOQGdibadIbyTqvf202wOAI0LCi+fzChXYAiEAvEcni1AWU4IfIq+7rCG5QhkBz2ofcN5BQCeEl+W3tJIqngIIKhAAGgw5Mzc2MjYyNTY1ODkiDGmaiHuauEuMbrmgXir7Add4/L/lxG2ppONGJLsiVYbpD4koRfojVSMZZLZxc5YIAKI5HJoAxXCQXP1dG68x9D52t5au6Ct147HqY2XLmdUYffSdqxReSlT4l4XAT/vQcXaaRgL2KoeTCfEJmamOt/d/tBTabaqJ4vGFlzRVTC1RrHZBUONrfpCe4bZlExpS1sTAMwRQddAzmRStf2nY5sZdD8A2WJ+pah1bp3NduFJXb+/i2aW4HRQp1E9awwLKbkqdu76Po84NH46wCVIPdTv3pHgli5Wf2FJvPK0h3CV0zvWZoTdUVXTStqTIf/+82eE/qW9P8byFEKeHtKQtZrIkTWoWGKVOfp+wMKH3m6AGOp0Belqc019kneOi/ITzhsbcwGfJjbXRBfd188ztwMMtNcU9TIeNozCUbE5ZOgAycoDGUwkMbIjS3ZXLJxxE6n9BsUg5ciqd01wYZFEK0wLIkQT4o81gIGuyNrnXQ2oZDXT8+41yUwRJWI84B0/7DVbkNiLs3o9HQSazrceG8MgbI2eQQsRRnuAmbUQhfl5oIocWha09sT0hIxkBvUpx9Q==', 'Authorization': b'AWS4-HMAC-SHA256 Credential=ASIA5UTXBITGW7I4SFUF/20230307/ap-northeast-1/dynamodb/aws4_request, SignedHeaders=content-type;host;x-amz-date;x-amz-security-token;x-amz-target, Signature=dcf0d126229ff693964a9dd8933d62c10d6ea10baf436a7e89bcfddfd73db46e', 'amz-sdk-invocation-id': b'fa21e345-390f-4076-982e-65c35bd0b388', 'amz-sdk-request': b'attempt=1', 'Content-Length': '630'}>
DEBUG:botocore.httpsession:Certificate path: /home/ec2-user/.local/lib/python3.7/site-packages/botocore/cacert.pem
DEBUG:urllib3.connectionpool:https://dynamodb.ap-northeast-1.amazonaws.com:443 "POST / HTTP/1.1" 400 225
DEBUG:botocore.parsers:Response headers: {'Server': 'Server', 'Date': 'Tue, 07 Mar 2023 09:40:11 GMT', 'Content-Type': 'application/x-amz-json-1.0', 'Content-Length': '225', 'Connection': 'keep-alive', 'x-amzn-RequestId': 'KS01FNVA0DCAVUT114ED3TKH7BVV4KQNSO5AEMVJF66Q9ASUAAJG', 'x-amz-crc32': '791733731'}
DEBUG:botocore.parsers:Response body:
b'{"__type":"com.amazon.coral.validate#ValidationException","message":"One or more parameter values were invalid: ProvisionedThroughput should not be specified for index: attribute2-3-index when BillingMode is PAY_PER_REQUEST"}'
DEBUG:botocore.hooks:Event needs-retry.dynamodb.CreateTable: calling handler <botocore.retryhandler.RetryHandler object at 0x7f375217afd0>
DEBUG:botocore.retryhandler:No retry needed.
ERROR:root:An error occurred (ValidationException) when calling the CreateTable operation: One or more parameter values were invalid: ProvisionedThroughput should not be specified for index: attribute2-3-index when BillingMode is PAY_PER_REQUEST
Traceback (most recent call last):
File "/home/ec2-user/.local/lib/python3.7/site-packages/dynamodump/dynamodump.py", line 924, in do_restore
**optional_args
File "/home/ec2-user/.local/lib/python3.7/site-packages/botocore/client.py", line 514, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/home/ec2-user/.local/lib/python3.7/site-packages/botocore/client.py", line 938, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (ValidationException) when calling the CreateTable operation: One or more parameter values were invalid: ProvisionedThroughput should not be specified for index: attribute2-3-index when BillingMode is PAY_PER_REQUEST
Hi Benny,
I used your repo to help a customer to migrate their DynamoDB database from one account to another. I find it really useful. I would like to write an AWS blog about this tool. I would like to check with you weather it is ok for me to write about it, and I would like you to review and contribute to the blog if that's possible. And I can add you to the co-author. Let me know your thought. You can reach me at [email protected] if you have any queries.
Regards
Tan Jicong
dynamodump
has an option to use local DynamoDB by setting the region to local
. The problem is that the AWS regions are defined explicitly in the C# client in an enum, and local
is not contained within the enum.
DynamoDB local uses the region property to differentiate sets of tables, so if you create the tables under the region local
, and you aren't able to specify the region local
when using the DynamoDB client, you aren't able to access any tables.
> aws dynamodb list-tables --endpoint-url http://localhost:2732 --region local
{
"TableNames": [
"Table1",
"Table2",
"Table3",
]
}
> aws dynamodb list-tables --endpoint-url http://localhost:2732 --region us-east-1
{
"TableNames": []
}
If we were able to add an additional flag such as --useDynamoDBLocal
in tandem with a region specified by --region
, then we could specify the region as an existing AWS region so DynamoDB Local partitions the tables into a namespace that's accessible from typed language clients where the local
region will not exist.
This line in the file is the issue. if you pass "local" it sets the region to "local" also which makes the table partitioning listed under "local" region.
dynamodump/dynamodump/dynamodump.py
Line 1252 in 77e888f
I have 220 tables in dynamodb.
Because I have dev-environment, staging-environment, product-environment.
I wanted to backup with 'product-*'.
But its could not use. (can not find tables)
I fixed this line.
https://github.com/bchew/dynamodump/blob/master/dynamodump.py#L20
conn.list_tables("product-area", None)
This is my prodcut-environment first table name.
and it works.
Hello there.
May I know if it's possible to restore a DynamoDB from AWS Account A to AWS Account B?
The following steps are what I've done:-
Backup DynamoDB from AWS Account A to AWS Account B S3 Bucket.
Restoring from AWS Account B S3 Bucket to AWS Account B DynamoDB.
This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.
These updates are awaiting their schedule. Click on a checkbox to get an update now.
docker-compose.yml
.devcontainer/Dockerfile
mcr.microsoft.com/vscode/devcontainers/python 3.12
Dockerfile
python 3.12.2-alpine
.github/workflows/build.yml
actions/checkout v4
actions/setup-python v5
.github/workflows/codeql-analysis.yml
actions/checkout v4
github/codeql-action v3
github/codeql-action v3
github/codeql-action v3
.github/workflows/dependency-review.yml
actions/checkout v4
actions/dependency-review-action v4
.github/workflows/docker.yml
actions/checkout v4
docker/setup-qemu-action v3
docker/setup-buildx-action v3
docker/login-action v3
docker/login-action v3
docker/login-action v3
docker/metadata-action v5
docker/build-push-action v5
.github/workflows/linting.yml
actions/checkout v4
actions/setup-python v5
.github/workflows/stale.yml
actions/stale v9
.github/workflows/test.yml
actions/checkout v4
actions/setup-python v5
actions/checkout v4
actions/setup-python v5
actions/checkout v4
actions/setup-python v5
pyproject.toml
setuptools >=42
requirements-dev.txt
black ==24.3.0
flake8 ==7.0.0
pre-commit ==3.7.0
requirements.txt
boto3 ==1.34.75
six ==1.16.0
setup.py
boto3 ==1.34.75
six ==1.16.0
.pre-commit-config.yaml
psf/black-pre-commit-mirror 24.3.0
PyCQA/flake8 7.0.0
Hello, I am using dynamodump in an environment where we use session tokens that expire after less than an hour, and can expire before the backup or restoration of a large table is complete. To get around this, I have a script that will refresh my session token and write a new key/secret to ~/.aws/credentials
. However, the operation will still sometimes fail due to the boto client still having the old credentials in memory. To mitigate this, I would like to have a backoff/retry loop that will wait a few seconds and try again after encountering an ExpiredTokenException.
Feature:
I will try to implement a solution soon and open a pull request, but I wanted to open this issue now in case anyone had a similar issue or knew of a workaround. I've tried refreshing the token before the advertised expiration time and issuing SIGSTOP/SIGCONT signals to the dynamodump process to suspend until the new credentials are written and resume, but no luck.
Example output from a request that failed due to an expired token:
[ec2-user@xxx ~]$ dynamodump --mode backup --srcTable mybigtable
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:root:Found 1 table(s) in DynamoDB host to backup: mybigtable
INFO:root:Starting backup for mybigtable..
INFO:root:Dumping table schema for mybigtable
INFO:root:Dumping table items for mybigtable
### some time later ###
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib64/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/usr/lib64/python3.9/threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "/home/ec2-user/.local/lib/python3.9/site-packages/dynamodump/dynamodump.py", line 729, in do_backup
scanned_table = dynamo.scan(
File "/home/ec2-user/.local/lib/python3.9/site-packages/botocore/client.py", line 553, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/home/ec2-user/.local/lib/python3.9/site-packages/botocore/client.py", line 1009, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (ExpiredTokenException) when calling the Scan operation: The security token included in the request is expired
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.