Giter Site home page Giter Site logo

aws-samples / amazon-s3-multipart-upload-transfer-acceleration Goto Github PK

View Code? Open in Web Editor NEW
53.0 4.0 14.0 1.7 MB

Uploading large objects to S3 using multipart upload, and transfer acceleration

License: MIT No Attribution

JavaScript 67.73% TypeScript 21.94% HTML 6.71% CSS 3.62%
apigateway lambda react s3

amazon-s3-multipart-upload-transfer-acceleration's Introduction

Uploading large objects to Amazon S3 using multipart upload feature and transfer acceleration

Amazon Photos is a secure online storage for photos and videos. Users can upload their photos and videos to Amazon photos using a web browser. Uploading files to a web server is a common feature found in many web applications accessible through a web browser. A web application communicates with a web server using HyperText Transfer Protocol ("HTTP"). A single HTTP connection used to upload a file cannot use the full bandwidth available to the web application due to the underlying TCP throughput limits. To overcome this limit, a large file is split into multiple parts and uploaded concurrently using multiple HTTP connections. A web application uploading a large file to Amazon S3 service can take advantage of the S3 multipart upload feature to improve throughput and upload speed. See this link for all the benefits with using S3 multipart upload feature. Amazon S3 Transfer Acceleration is a bucket-level feature that enables fast, easy, and secure transfers of files over long distances between your client and an S3 bucket. See this link for details on S3 Transfer Acceleration.

This prototype project is intended to show a way to implement multipart upload and transfer acceleration directly from the browser using presigned URLs.

Deploy the application

Prerequisite

Backend

  • Clone this repository to your local computer.
  • From the backendv2 folder, run "npm install" to install all dependencies. Use "npm audit" to check for known vulnerabilites on the dependent packages.
  • Use CDK to deploy the backend to AWS. For example,
cdk deploy --context env="randnumber" --context whitelistip="xx.xx.xxx.xxx"

An additional context variable called "urlExpiry" can be used to set specific expiration time on the S3 presigned URL. The default value is set at 300 seconds (5 min). A new S3 bucket with the name "document-upload-bucket-randnumber" is created for storing the uploaded files, and the whitelistip value is used to allow API Gateway access from this IP address only.

An additional context variable called "functionTimeout" can be used to set specific timeout for the AWS Lambda function responsible for generating presigned URLs. With a higher number of parts, timeouts may occur, but it can be extended as needed.

  • Make note of the API Gateway endpoint URL.

Frontend

  • From the frontend folder, run "npm install" to install the packages.
  • Optionally, you can run "npm audit --production" to check on vulnerabilities.
  • Run "npm run start" to launch the frontend application from the browser.
  • Use the user interface shown in the browser.
  • For Step 1, enter the API Gateway endpoint URL.
  • For Step 2 and Step 3, pick a baseline number. Use your available bandwidth, TCP window size, and retry time requirements to determine the optimal part size. This needs to be a minimum of 5 MB though. Web browsers have a limit on the number of concurrent connections to the same server. In Firefox, the default is 6 connections. Specifying a larger number of concurrent connections will result in blocking on the web browser side.
  • For Step 4, pick whether to use transfer acceleration feature or not.
  • For Step 5, pick a large file to upload.
  • The final part of the user interface will show upload progress and the time to upload the file to S3.

Improved throughput – You can upload parts in parallel to improve throughput

In this section, a sample file "Docker Desktop Installer.exe" (485MB) will be used for testing improved throughput. The web application and the S3 bucket are in the US East region. The internet speed test on the web browser client showed the client can upload at 79 Megabits per second. The results of uploading the sample file as a single part, single part with transfer acceleration, multiple parts, and multiple parts with transfer acceleration is shown below for reference.

Test 1: Single part upload (72 seconds)

SNTA

Test 2: Single part upload with transfer acceleration (43 seconds)

STA

Test 3: Multiple parts upload (45 seconds)

MNTA

Test 4: Multiple parts upload with transfer acceleration (28 seconds)

MTA

Quick recovery from any network issues – Smaller part size minimizes the impact of restarting a failed upload due to a network error

Test 5: Recover from a simulated network issue

A network issue is simulated by activating Firefox "Work Offline" mode while a multiple parts upload with transfer acceleration is in progress. As show below, the client side web application will wait for a certain period of time before retrying the upload process. When the browser goes online by de-activating "Working Offline", the parts that failed to upload will be uploaded automatically. This feature minimized the impact of restarting a failed upload due a transient network issue. MTA

Recommendation

As seen from the results, uploading a large file using the S3 multipart upload feature and transfer acceleration can speed up the upload process time by 61% ((72-28/72)*100). This is made possible by improving throughput with multipart upload and reducing latency with transfer acceleration. By uploading smaller parts on the client side and utilizing exponential backoff retry strategy for failed uploads, a quicker automatic recovery from network issues is made possible.

Cleanup

  • Use "cdk destroy" to delete the stack of cloud resources created in this solution deployment.

Security

  • SonarLint is used in VSCode to confirm there are no problems in the codebase.
  • npm audit is used to confirm there are no security vulnerabilities with the project dependencies. For the frontend package audit, use the command "npm audit --production".
  • S3 bucket created in this project is setup to enforce ssl requests only and encrypt data at rest.
  • S3 bucket is setup to block public access.
  • API Gateway is setup with a resource policy to allow requests from the specified IP address used during deployment. Presigned S3 URLs are setup to expire in 5 minutes by default.

Architecture

sequenceDiagram
  autonumber
  participant User
  participant Browser (SPA)  
  participant HTML Server  
  participant API Gateway/Lambda
  participant S3
  User-->>Browser (SPA):Open Web App
  Browser (SPA)-->>HTML Server:Get App
  User-->>Browser (SPA):Upload File
  Browser (SPA)-->>API Gateway/Lambda:Initialize Multipart Upload
  API Gateway/Lambda-->>S3:Get Upload ID
  Browser (SPA)-->>API Gateway/Lambda:Get Multipart PreSigned URLs (optional: transfer acceleration)
  API Gateway/Lambda-->>S3:Get Presigned URLs  
  par Parallel Upload
    Browser (SPA)-->>Browser (SPA):Make part 1
    Browser (SPA)-->>S3:Upload Part 1
  end  
  Browser (SPA)-->>API Gateway/Lambda:Finalize Multipart Upload
  API Gateway/Lambda-->>S3:Mark upload complete

Credit

This project is inspired by a blog post from LogRocket but utilizes AWS serverless services to implement the backend. The frontend is rebuilt from scratch with enhancements for using transfer acceleration and usability.

amazon-s3-multipart-upload-transfer-acceleration's People

Contributors

akomandooru avatar amazon-auto avatar dependabot[bot] avatar kakakakakku avatar raamak-r avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

amazon-s3-multipart-upload-transfer-acceleration's Issues

CREATE_FAILED | AWS::S3::Bucket | documentuploadbucket1B3E1E18

Hi !

I just cloned the repo and attempted to deploy, but encountered an error.

I followed the documentation to deploy the backendV2. During the cdk deploy command, I encountered a peculiar error with a null message and an "AlreadyExists" error. This is perplexing to me because I don't have any buckets in my project except the one from cdk bootstrap.

It's unclear to me how it's supposed to work, and I'm struggling to understand why there seems to have capital letters in the bucket name "documentuploadbucket1B3E1E18". Is that the error ?

image

Thank you for your coming help !

Unable to access API gateway - 403 forbidden

I succesfully deployed the the cloudformation using this statement:
cdk deploy --context env="9189086" --context whitelistip="0.0.0.0"

every time I call the initialize, I get forbidden 403 for the CORS options call.

I checked the resource policy and saw that deny requests from any IP 0.0.0.0 which is bad, so I removed the deny section completely.

I'm trying to test it locally, but can't get past this 403 for initialize OPTIONs call
image

How to upload file again

It seems like after you select a file, the upload will be triggered. How can I upload a second, third, and more files? I tried to select a different file and it seems no reaction. It would be more intuitive to add a button "upload" to trigger the uploading action after a file is selected. Currently, I have to refresh the page, and refill the settings, and then choose a different file to upload.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.