aws-samples / amazon-s3-resumable-upload Goto Github PK

S3 Resumable Migration Version 2 ( S3 断点续传迁移 Version 2)

License: MIT No Attribution

Go 100.00%

amazon-s3-resumable-upload's Introduction

S3 Resumable Migration Version 2 ( S3 断点续传迁移 Version 2)

多线程断点续传，适合批量的大文件S3上传/下载本地/对象存储迁移，支持Amazon S3, Ali OSS, Tencent COS, Google GCS 等兼容S3 API的对象存储，也将支持 Azure Blog Storage...
本 Version 2 在同一个应用通过配置即可用做各种场景：单机的上传，单机的下载，部署为集群版的扫描源文件，或作为集群版的分布式传输工作节点；用Golang做了重构，提高性能；支持了一系列扩展功能：排除列表、源no-sign-request、源request-payer、目的storage-class、目的ACL、传输 Metadata 等。

功能

多线程并发传输到多种对象存储，断点续传，自动重传。多文件任务并发，充分利用带宽。优化的流控机制。在典型测试中，迁移1.2TB数据从 us-east-1 S3 到 cn-northwest-1 S3 只用1小时。
支持的源和目的地：本地目录或单个文件, Amazon S3, Ali OSS, Tencent COS, Google GCS 等对象存储。无需区分工作模式，指定好源和目的URL或本地路径即可自动识别并开始传输。可以是单个文件或对象，或整个目录，或S3桶/前缀等URL。
传输数据只以单个分片的形式过中转节点的内存，不落盘到节点，节省时间且更安全。可支撑 0 Size 至 TB 级别。
支持设置目的地的各种对象存储级别，如：STANDARD|REDUCED_REDUNDANCY|STANDARD_IA|ONEZONE_IA|INTELLIGENT_TIERING|GLACIER|DEEP_ARCHIVE|OUTPOSTS|GLACIER_IR|SNOW
支持指定目的S3的ACL: private|public-read|public-read-write|authenticated-read|aws-exec-read|bucket-owner-read|bucket-owner-full-control
支持设置源对象存储是no-sign-request和request-payer的情况
支持获取源对象存储的 Metadata 也复制到目的对象存储。但要注意这个需要每个对象都Head去获取一次，会影响性能和增加对源S3的请求次数费用。
自动对比源/目的桶的文件名和大小，不一致的才传输。默认是一边List，一边传输，即逐个获取目的对象信息对比一个就传输一个，这样使用体验是输入命令之后就立马启动传输（类似AWS CLI）；可选设置 -l 参数，为List目的对象列表之后再进行传输，因为List比逐个Head对比效率更高，也节省请求次数的费用。
本次 Version 2 支持了多线程并行 List ，对于对象数量很多的情况，可以更快完成List。例如3千万对象的桶，如果按正常 List（例如 aws s3 ls）要90分钟以上，而现在在使用64并发的情况(16vCPU)下缩减到只有 1 到 2 分钟。
支持把对比扫描出来的任务列表存入文件；支持把已发送到SQS的日志存入文件；支持设置排除列表，如果数据源Key或源本地路径符合排除列表的则不传输；支持DRYRUN模式，只比较源和目的桶，统计数量和Size，不传输数据；支持不做对比不检查目的对象，直接覆盖的模式。
支持设置断点续传阈值；设置并行线程数；设置请求超时时间；设置最大重试次数；支持设置是否忽略确认命令，直接执行；

使用说明

安装Go运行环境

首次使用需要安装Golang运行环境，以Linux为例：

sudo yum install -y go git -y

如果在**区，可通过go代理来下载go依赖包，则多运行一句代理设置：

go env -w GOPROXY=https://goproxy.cn,direct

下载和编译本项目的Go代码

git clone https://github.com/aws-samples/amazon-s3-resumable-upload
cd amazon-s3-resumable-upload
go build .  # 下载依赖包并编译程序

可使用 ./s3trans -h 获取帮助信息

使用

下载S3文件到本地：

./s3trans s3://bucket-name/prefix /local/path 
# 以上是使用默认AWS profile in ~/.aws/credentials，如果是EC2且没有配置 profile 而是使用IAM Role，需指定一下 Region
./s3trans s3://bucket-name/prefix /local/path --from-region=my_region
# 如果要指定S3的profile则如下：
./s3trans s3://bucket-name/prefix /local/path --from-profile=source_profile

上传本地文件到S3：

./s3trans /local/path s3://bucket-name/prefix
# 以上是使用默认AWS profile in ~/.aws/credentials，如果是EC2且没有配置 profile 而是使用IAM Role，需指定一下 Region
./s3trans /local/path s3://bucket-name/prefix  --to-region=my_region
# 如果要指定S3的profile则如下：
./s3trans /local/path s3://bucket-name/prefix --to-profile=dest_profile

从S3到S3，如不指定region，则程序会先自动查询Bucket的Region：

./s3trans s3://bucket-name/prefix s3://bucket-name/prefix --from-profile=source_profile --to-profile=dest_profile
# 如果from-profile不填则获取默认的profile或使用EC2 IAM Role，需指定一下region
./s3trans s3://bucket-name/prefix s3://bucket-name/prefix --from-region=my_region --to-profile=dest_profile

对于非AWS的S3兼容存储，则需要指定endpoint

./s3trans s3://bucket-gcs-test s3://bucket-virginia --from-profile=gcs_profile --to-profile=aws_profile --from-endpoint=https://storage.googleapis.com
# 以上endpoint也可以用简称替换，即：--from-endpoint=google_gcs，还可以是其他简称：ali_oss, tencent_cos, azure_blob(TODO: azure)

-l 指定先List再同步数据（节省请求次数费用，但会增加一次List的时间）
-n （n 即NumWorkers）指定并行List和并行传输线程数。最大并发对象数为n，每个对象最大并发为2n，List Bucket时最大并发为4n；推荐n <= vCPU numbe
-y 忽略确认命令，直接执行

./s3trans C:\Users\Administrator\Downloads\test\ s3://huangzb-virginia/win2/ --to-profile sin  -l -n 8 -y

支持设置排除列表 (--ignore-list-path) 如果数据源的S3 Key或源本地路径符合排除列表的则不传输例如，排除列表路径设置为 --ignore-list-path="./ignore-list.txt" 文件内容为：

test2/
test1

则源数据中遇到这些路径都会被跳过，不传输：test2/abc.zip, test1/abc.zip, test1, test1.zip, test2/cde/efg等... 而这些路径则会正常传输，因为开头Prefix不一致：test3/test1, test3/test2/ 等...

集群模式

集群模式的 List 模块使用

对比源Bucket/Prefix和目的Bucket/Prefix，把不一致的对象信息写入SQS队列，以便后续的传输节点使用。
需要指定源S3和目的S3的URL，另还需要指定一个SQS用于发送任务列表，包括SQS的url和能访问这个SQS所用的AWS profile。不指定 sqs profile 则程序会自动从 EC2 IAM Role 获取权限，Region名称会从sqs-url自动提取。
可选：
设置把对比扫描出来的任务列表存入文件 --joblist-write-to-filepath；
设置把SQS发送的日志存入文件 --sqs-log-to-filename

./s3trans s3://from_bucket/ s3://to_bucket/prefix --from-profile us --to-profile bjs \
    --work-mode SQS_SEND
    --sqs-profile us \
    --sqs-url "https://sqs.region.amazonaws.com/my_account_number/sq_queue_sname" \
    --joblist-write-to-filepath "./my_joblist.log" \
    --sqs-log-to-filename  "./sqssent.log" \
    -y -l -n 8

集群模式的传输节点使用

从SQS队列中获取任务列表，然后传输数据。需要指定源S3和目的S3的URL，另还需要指定一个SQS用于发送任务列表，包括SQS的url和能访问这个SQS所用的AWS profile。不指定 sqs profile 则程序会自动从 EC2 IAM Role 获取权限，Region名称会从sqs-url自动提取。

./s3trans s3://from_bucket/prefix s3://to_bucket/ --from-profile us --to-profile bjs \
    --work-mode SQS_CONSUME
    --sqs-profile us \
    --sqs-url "https://sqs.region.amazonaws.com/my_account_number/sq_queue_sname" \
    -y -l -n 8

其他使用帮助

./s3trans -h

s3trans 从源传输数据到目标
./s3trans FROM_URL TO_URL [OPTIONS]
FROM_URL: 数据源的URL，例如 /home/user/data or s3://bucket/prefix
TO_URL: 传输目标的URL，例如 /home/user/data or s3://bucket/prefix

Usage:
s3trans FROM_URL TO_URL [flags]

Flags:  
      --acl string                目标S3桶的ACL，private表示只有对象所有者可以读写，例如 private|public-read|public-read-write|authenticated-read|aws-exec-read|bucket-owner-read|bucket-owner-full-control ，不设置则默认根据S3的默认设置，通常是 private 模式 
      --from-endpoint string      数据源的 API Endpoint 例如 https://storage.googleapis.com; https://oss-shenzhen.aliyuncs.com; https://cos.<region>.myqcloud.com 如果是AWS S3或本地路径，无需指定这个 Endpoint  
      --from-profile string       数据源在~/.aws/credentials中的AWS profile，如果不指定profile则用default profile，如果没有default profile，则需指定region  
      --from-region string        数据源的区域，例如 cn-north-1. 如果未指定，但有设置 profile 则会自动找S3的所在 Region  
  -h, --help                      帮助文档  
      --http-timeout int          API请求超时（秒）（默认30）  
  -l, --list-target               推荐使用。列出目标S3桶，传输之前先比较现有对象。因为列表方式比逐个对象请求检查是否存在更有效率，只是因为需要等待列出所有对象进行比较，然后再开始传输所以感觉启动较慢。为了缓解这个问题，此应用程序利用多线程并行List，进行快速列表；如果没有设置--list-target参数，就不List目标S3桶了，而是在传输每个对象之前，检查每个目标对象，这会消耗更多API调用，但开始更快；如果完全不希望做对比，直接覆盖，则用下面提到的--skip-compare参数，而不用--list-target了； 
      --max-retries int           API请求最大重试次数（默认5）  
      --no-sign-request           源桶不需要请求签名（即允许匿名）的情况  
  -n, --num-workers int           NumWorkers x 1 个并发线程传输文件；NumWorkers x 2 每个文件的并发分片同时传输的线程数；NumWorkers x 4 List目标桶的并发线程数；推荐NumWorkers <= vCPU数量（默认4）  
      --request-payer             源桶要求请求者支付的情况  
      --resumable-threshold int   当文件大小（MB）大于此值时，使用断点续传。（默认50）  
  -s, --skip-compare              跳过比较源和目标S3对象的名称和大小。直接覆盖所有对象。不列出目标也不检查目标对象是否已存在。  
      --sqs-profile string        work-mode为SQS_SEND或SQS_CONSUME的场景下，为访问SQS队列使用~/.aws/credentials中的哪个AWS profile，不指定sqs profile则程序会自动从EC2 IAM Role获取权限，Region名称会从sqs-url自动提取。  
      --sqs-url string            work-mode为SQS_SEND或SQS_CONSUME的场景下，指定发送或消费消息的SQS队列URL，例如 https://sqs.us-east-1.amazonaws.com/my_account/my_queue_name  
      --storage-class string      目标S3桶的存储类，例如 STANDARD|REDUCED_REDUNDANCY|STANDARD_IA|ONEZONE_IA|INTELLIGENT_TIERING|GLACIER|DEEP_ARCHIVE|OUTPOSTS|GLACIER_IR|SNOW 或其他S3兼容的  
      --to-endpoint string        数据传输目标的端点，例如 https://storage.googleapis.com . 如果是AWS S3或本地路径，无需指定这个  
      --to-profile string         数据传输目标在~/.aws/credentials中的AWS profile，如果不指定profile则用default profile，如果没有default profile，则需指定region    
      --to-region string          数据传输目标的区域，例如 cn-north-1. 如果未指定，但有设置 profile 则会自动找S3的所在 Region  
      --transfer-metadata         从源S3桶获取元数据并上传到目标对象。这需要每传输一个对象都通过API调用获取源文件元数据。  
      --work-mode string          SQS_SEND | SQS_CONSUME | DRYRUN; SQS_SEND：扫描节点，表示列出源S3和目标S3进行比较，并发送传输任务消息到SQS队列；SQS_CONSUME： 工作节点，表示从SQS队列获取任务消息并从来源S3传输对象到S3  
  -y, --y                         忽略等待确认，直接执行；DRYRUN是只比较源和目的桶，统计数量和Size，不传输数据

其他说明

S3 触发 SQS 的 Policy示例

写入SQS权限："Service": "s3.amazonaws.com" 读取SQS权限：EC2 Role 或直接填 AWS Account Number

{
  "Version": "2008-10-17",
  "Id": "__default_policy_ID",
  "Statement": [
    {
      "Sid": "__owner_statement",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::my_account_number:root"
      },
      "Action": "SQS:*",
      "Resource": "arn:aws:sqs:us-west-2:my_account_number:s3_migration_queque"
    },
    {
      "Sid": "__sender_statement",
      "Effect": "Allow",
      "Principal": {
        "Service": "s3.amazonaws.com"
      },
      "Action": "SQS:SendMessage",
      "Resource": "arn:aws:sqs:us-west-2:my_account_number:s3_migration_queque"
    },
    {
      "Sid": "__receiver_statement",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::my_account_number:root"
      },
      "Action": [
        "SQS:ChangeMessageVisibility",
        "SQS:DeleteMessage",
        "SQS:ReceiveMessage"
      ],
      "Resource": "arn:aws:sqs:us-west-2:my_account_number:s3_migration_queque"
    }
  ]
}

配置文件

如果不使用上面的命令行参数，而使用配置文件，可以在程序运行目录下写一个config.yaml文件，内容如下。然后只需要运行 ./s3trans FROM_URL TO_URL 即可。

from-profile: "your_from_profile"
to-profile: "your_to_profile"
from-endpoint: "your_from_endpoint"
to-endpoint: "your_to_endpoint"
from-region: "your_from_region"
to-region: "your_to_region"
storage-class: "your_storage_class"
acl: "your_acl"
no-sign-request: false
request-payer: false
db-location: "./your_download_status.db"
list-target: false
skip-compare: false
transfer-metadata: false
http-timeout: 30
max-retries: 5
retry-delay: 5
chunk-size: 5
resumable-threshold: 50
num-workers: 4
y: false
work-mode: "your_work_mode"
sqs-url: "your_sqs_url"
sqs-profile: "your_sqs_profile"
joblist-write-to-filepath: "your_joblist_write_to_filepath"
sqs-log-to-filename: "your_sqs_log_to_filename"
ignore-list-path: "your_ignore_list_path"

还可以把以上配置写入环境变量。

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Author: Huang, Zhuobin (James)

amazon-s3-resumable-upload's People

Contributors

Stargazers

Watchers

Forkers

xujunbj allen2git t3ng214 micxyj verran huangzbaws hawkey999 dean205 jianyew danielxiawd qixiangw chris-wa-he abulkaware amarkfox comdaze anliridvan zzkkjj pcliangoz yizhizoe superyhee maozhitao muncleben jeffreydin gdw712 jerrywongggg barrycug dwtcourses rickhuang1989 frank-chenyonggui aaloksharma emxlats1qn sent2020 davischen99 ababook mejasy 4ccsds williamdutrendmicro xpl1986 lifeisgift chengbc20 nancynann frankyue yorkshi crossz landsatfile20 ablegao akshaysawantext publicreading mtnk2005 joeshi cycbluesky kepa520 sunl bjmingyang mihail-sofeski allan2005 captrcl seymourde louis-ng ihavebooks raginwombat sengarashu xinlaoda qliu-hcp sujaypuvvadi chengzehsiao vm2018 zhangsj lawhsing zhiming429438709 bisio ambition1994 dodhacker lukeluo1 stevenayu fubinzh definetx wujunlin555 guangminglion javison666 dcynsd lixianfa xwlops ywyang sixgod191001 corsair-cxs leon153 zhucan freedomsunny

amazon-s3-resumable-upload's Issues

Windows下传输完成之后，出现WARING - There are xx files not in destination or not the sam size

传输完成之后，会把本地目录和S3上目录下的文件再做一次比较，如果源目录包含子目录，并且用的是Windows系统，会出现WARING - There are xx files not in destination or not the sam size, List:

原因是Windows系统下本地文件路径有2个斜杠\，然后S3上的文件路径是一个斜杠/，不一致
目前简单通过如下方式解决：

Can't do "cdk synth" on cdk-cluster.

I'm following the steps from Readme and on cdk synth I get:

cdk synth
Traceback (most recent call last):
File "app.py", line 57, in
resource_stack.s3_deploy)
File "/Users/.../AWS_Sync/amazon-s3-resumable-upload/cluster/cdk-cluster/.env/lib/python3.7/site-packages/jsii/_runtime.py", line 69, in call
inst = super().call(*args, **kwargs)
TypeError: init() takes 3 positional arguments but 12 were given
Subprocess exited with error 1

Cluster model can not sync the data with error Fail to list multipart upload - hawkey999/s3-migration-test/...

Create System Manager Parameter Store

Name: s3_migration_credentials
Type: SecureString
Content:

{
  "aws_access_key_id": "your_aws_access_key_id",
  "aws_secret_access_key": "your_aws_secret_access_key",
  "region": "cn-northwest-1"
}

Configure AWS CDK app.py setting about source / desination S3 bucket and prefix

[{
    "src_bucket": "ray-cross-region-sync-oregon",
    "src_prefix": "broad-references",
    "des_bucket": "ray-cross-region-sync-zhy",
    "des_prefix": "broad-references"
    }]

deploy CDK

cd amazon-s3-resumable-upload/cluster/cdk-cluster
source ~/python3/env/bin/activate
pip3 install -r requirements.txt
export AWS_DEFAULT_REGION=us-west-2
npm install -g aws-cdk
cdk synth
cdk deploy s3-migration-cluster* --profile ${AWS_GLOBAL_PROFILE} --outputs-file "stack-outputs.json"

Testing

cdk deploy the resource for you, locate the S3 bucket NewS3BucketMigrateObjects in global region for upload new objects
upload the files

aws s3 cp amazon-corretto-8-x64-linux-jdk.rpm s3://s3-migration-cluster-reso-s3migratebucket676429fa-lt3gfz9nbfn7/crr-ningxia/

check the des_bucket

aws s3 ls s3://ray-cross-region-sync-zhy --region cn-northwest-1 --profile china

check the cloudwatch dashboard s3migrate* created by CDK
check the s3-migration-cluster-ec2-applog* CloudWatch log group

2020-09-17 03:24:21,724 INFO - Start multipart: s3-migration-cluster-reso-s3migratebucket676429fa-lt3gfz9nbfn7/crr-ningxia/amazon-corretto-8-x64-linux-jdk.rpm, Size: 117537571, versionId: null

2020-09-17 03:24:21,724 INFO - Getting unfinished upload id list - hawkey999/s3-migration-test/crr-ningxia/amazon-corretto-8-x64-linux-jdk.rpm..


2020-09-17 03:24:22,547 ERROR - Fail to list multipart upload - hawkey999/s3-migration-test/crr-ningxia/amazon-corretto-8-x64-linux-jdk.rpm - An error occurred (NoSuchBucket) when calling the ListMultipartUploads operation: The specified bucket does not exist | 2020-09-17 03:24:22,547 ERROR - Fail to list multipart upload - hawkey999/s3-migration-test/crr-ningxia/amazon-corretto-8-x64-linux-jdk.rpm - An error occurred (NoSuchBucket) when calling the ListMultipartUploads operation: The specified bucket does not exist
-- | --



2020-09-17 03:24:22,547 INFO - Create multipart upload - hawkey999/s3-migration-test/crr-ningxia/amazon-corretto-8-x64-linux-jdk.rpm | 2020-09-17 03:24:22,547 INFO - Create multipart upload - hawkey999/s3-migration-test/crr-ningxia/amazon-corretto-8-x64-linux-jdk.rpm
-- | --



2020-09-17 03:24:22,748 ERROR - Fail to create new multipart upload - hawkey999/s3-migration-test/crr-ningxia/amazon-corretto-8-x64-linux-jdk.rpm - An error occurred (NoSuchBucket) when calling the CreateMultipartUpload operation: The specified bucket does not exist | 2020-09-17 03:24:22,748 ERROR - Fail to create new multipart upload - hawkey999/s3-migration-test/crr-ningxia/amazon-corretto-8-x64-linux-jdk.rpm - An error occurred (NoSuchBucket) when calling the CreateMultipartUpload operation: The specified bucket does not exist
-- | --

3T数据从OSS同步S3，有没有办法可以提速？5个小时内完成同步

您好，

只有Single模式支持OSS为数据源.

AWS测试EC2配置： m5a.xlarge
测试时间： 10分钟
同步到S3的结果： 11736 个对象, 1.2GB
修改s3_upload_config.ini配置项：MaxThread = 10 ，MaxParallelFile = 10

推算一个小时能同步7GB左右，3T数据需要同步42天。

准备同步的OSS文件结构可以分为6种前缀，我可开6台机器分前缀传输。理论7天左右能完成同步。

如题，如果我想5个小时左右完成同步，是否有办法？大神请指点，谢谢！

ModuleNotFoundError: No module named 'aws_cdk.aws_events_targets'

OS: Mac OS 10.14.6
CDK Version: 1.44.0 (build 1cd832b)
Error:

Traceback (most recent call last):
  File "app.py", line 13, in <module>
    import aws_cdk.aws_events_targets as target
ModuleNotFoundError: No module named 'aws_cdk.aws_events_targets'

Steps to reproduce the error:

cd serverless/cdk-serverless
# change bucket_para, Des_bucket_default, alarm_email
python3 -m venv .env
source .env/bin/activate
pip install -r requirements.txt
cdk synth

Does this tool support --no-sign-request?

Hi, I am using this tool for downloading open access s3 data, which does not require credentials. How can I add the parameter "--no-sign-request"?

Crontab s3_migration_cluster_jobsender.py

I can't find any s3_migration_cluster_jobsender.py files on the jobsender EC2 instance.

Is there any other information on how to setup the cron job except the following:

If need to cron run Jobsender, you can logon to the EC2 to setup (crontab -e ) to run s3_migration_cluster_jobsender.py

Thank you

Serverless 版本部署完毕后会立即启动 jobsender么？还是必须等待一个小时之后

目前的观察是不会，希望cdk 部署完之后能立即启动jobsender 传输现有文件

S3 Object Metadata has been changed

Version: Serverless

I have images in the src bucket which has S3 Object Metadata value Content-Type = image/png.

After it has been replicated to dest bucket, the Metadata has been change to binary/octet-stream.

This will cause the browser to download the image instead of rendering.

get error when run too many tasks

If I run many s3trans tasks at one time, then some tasks will output error:

2024/01/05 10:10:08 Failed to read from response body: bucketname xxx/yyy.gz 33 unexpected EOF
2024/01/05 10:15:47 Failed to read from response body: bucketname xxx/yyy.gz 44 net/http: request canceled (Client.Timeout or context cancellation while reading body)

and the final output gz file is not completed.

But if I only run one s3trans task at one time, it will not ouput error and get the completed gz file.

So how to resove the error? my command is s3trans -y -l -n 4 --max-retries 30 --http-timeout 500

thanks~
Si

cannot create subfolder under windows environment

【Issue】
当此脚本运行在windows环境下时，无法成功在S3上创建子目录。

【举例】
windows下，test.txt的原目录为 ”C:/videos/2020/02/24/test.txt“，设置SrcDir参数为 ”C:/videos/“。
期待效果为：S3自动生成 videos,2020, 02,24这四个子目录，在这四个子目录下，有一个key值为test.txt的S3 文件。
然而该程序下最终效果为，S3上只生成一个folder为videos，产生一个S3 object，他的key值为 ”2020\02\24\test.txt“。 windows对于字符的识别导致了此问题。

【影响】
所有按天分级的日志文件或影像都会变成一个平级目录

Serverless version CN to Standard Partition failed

Version: Serverless
Source: AWS CN partiton
Dest: AWS Standard partiton

The cdk is deployed in AWS Standard partition.

Error msg:

(AccessDenied) when calling the CreateMultipartUpload operation: Access Denied
(AccessDenied) when calling the PutObject operation: Access Denied

cluster edition report error

amazon-s3-resumable-upload/cluster/cdk-cluster/app.py

cdk-serverless can not deploy

您好，我希望部署serverless版本的amazon-s3-resumable-upload，但是在deploy cdk的时候出现了一些问题，想请教一下。

以下是我app.py的一些配置信息

# Define bucket and prefix for Jobsender to compare
bucket_para = [{
    "src_bucket": "jp-bucket",
    "src_prefix": "",
    "des_bucket": "cn-bucket",
    "des_prefix": ""
}]
StorageClass = 'STANDARD'

# Des_bucket_default is only used for non-jobsender use case, S3 trigger SQS then to Lambda.
# There is no destination buckets information in SQS message in this case, so you need to setup Des_bucket_default
Des_bucket_default = 'cn-bucket'
Des_prefix_default = 'test-dir'

JobType = 'PUT'
#我的ec2，lambda都将使用日本的账号
# 'PUT': Destination Bucket is not the same account as Lambda.
# 'GET': Source bucket is not the same account as Lambda.

JobsenderCompareVersionId = 'True'  # Jobsender should compare versioinId of source B3 bucket and versionId in DDB
UpdateVersionId = 'False'  # get lastest version id from s3 before before get object
GetObjectWithVersionId = 'True'  # get object together with the specified version id

然后我也配置了parameter store的信息

在配置好这些之后，我也使用aws configure配置了我的ec2，并安装好了cdk，版本是：

1.57.0 (build 2ccfc50)

之后安装了requirements.txt的依赖环境，并使用

cdk synth

和

cdk deploy

但是后者在执行时会一直卡顿在那里，不能成功，想咨询一下是什么原因呢?

谢谢！

pre signed S3 url download usage help

Hi, i wanted to use this tool to download files where I am given a pre-signed S3 URL of the file and I want to download it to an S3 bucket I own. will i be able to do that, if so example command I would run would be appreciated?

itried ./s3trans $url /Users/ec2User/dev/open-sources/amazon-s3-resumable-upload/ouput/
but it exits with error
Failed to create directories: {the url variable}

aws cdk AutoScalingGroup now support enable GroupMetric

Can enable GroupMetric of ASG in CDK template

same size and same file name with different content will be ignored

There is abc.txt in us-east-1 S3 bucket, run the S3-To-S3 job to sync the file to cn-north-1 S3 bucket
Modify the abc.txt content but keep the file size same
Upload the abc.txt to us-east-1 S3 bucket, run the S3-To-S3 job again. The abc.txt will be treat as duplicated file to ignore
2020-02-13 13:02:05,053 INFO - Start file: beijing-crr/abc.txt
2020-02-13 13:02:05,053 INFO - Duplicated. beijing-crr/abc.txt same size, goto next file.

s3-migrate-serverless-s3migratejobsender 执行错误

我的理解是这个lambda是每小时执行一次，获取object list 并且更新DynamoDB Table.

以下是错误信息

Job sqs queue is not empty or fail to get_queue_attributes. Stop process.

导致的现象是，新上传到 source S3中的文件没有传输到 dest S3.

cdk-cluster aws-cdk.aws_s3_deployment not found

Should it be added on install_requires in setup.py?

support s3 bucket with RequestPayer

Current build does not support s3 bucket with RequestPayer, it will reply Access Denied
Need to add RequestPayer to all ListObject and GetObject function

RequestPayer='requester'

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.