Giter Site home page Giter Site logo

taskcluster / remotely-signed-s3 Goto Github PK

View Code? Open in Web Editor NEW
0.0 3.0 3.0 236 KB

A library and tool designed to support doing Amazon S3 uploads and downloads with request signing occuring on a different machine than the machine performing the actual upload or download.

License: Mozilla Public License 2.0

JavaScript 100.00%
s3

remotely-signed-s3's Introduction

Remotely Signed S3 Requests

This is a library and tool designed to support doing Amazon S3 uploads and downloads with request signing occuring on a different machine than the machine performing the actual upload or download.

With this library, you can allow untrusted hosts to upload to an object which you've protected with whichever authentication and authorization scheme you choose.

There is support for both multipart and singlepart uploads and downloads. The high level interfaces provided will automatically select which version to use.

Architecture

This project is divided into a server-side component called the Controller. Instances of the Controller know how to generate and sign all the methods involved in uploading files to S3. Controllers also know how to run methods which must be run from the server. The methods which must be run from the server are the initate, complete and abort methods of the multipart uploads.

The Runner class is used by both the Client and Controller to run the underlying HTTP requests. Requests are passed between the Contoller and Client or between the Controller and Runner in something called interchange format. This format is a generalized HTTP request description which omits the body. The body must be provided to the HTTP request seperately

Here's an example of a simple request in this format:

{
  "url": "https://www.hostname.com:443/path/to/resource?query=string",
  "method": "GET",
  "headers": {
    "Content-Length": "1234",
  }
}

TODO: Write some stuff:

  • Controller.prototype.generateGet: return a v4 signed request which allows access to an object
  • Client support for downloading files
  • command line tool to run all the requests
  • command line tool to do a complete upload locally -- mainly as an integration test

Method Signatures:

In all cases, permissions and tags are optional parameters. The parameters of all functions are validated through the use of Joi schemas. These schemas are partially specified in the method body and comprised of schemas stored in src/schemas.js

Controller

  • new Controller({region, runner, runnerOpts})
  • Controller.prototype.initiateMultipartUpload({bucket, key, sha256, size, permissions}) -> uploadId
  • Controller.prototype.generateMultipartRequest({bucket, key, uploadId, parts}) -> [{url, method, headers}]
  • Controller.prototype.completeMultipartUplaod({bucket, key, etags, tags, uploadId}) -> 'ETAG_OF_OBJECT'
  • Controller.prototype.abortMultipartUpload({bucket, key, uploadId}) -> void
  • Controller.prototype.generateSinglepartRequest({bucket, key, sha256, size, tags, permissions}) -> {url, method, headers}

Runner

The public api of this method is the .run() method. All other methods which aren't prefixed with double underscores are OK to use externally but are not supported

  • new Runner(agent, agentOpts)
  • Runner.prototype.run({req, body, streamingOutput}) -> {body | bodyStream, headers, statusCode, statusMessage}

Client

The partsize parameter is the size of the multiple part of the upload in bytes. This value specifies how large each individual upload requets will be. The multisize parameter is the size of file which will cause the method to switch from single part upload to multipart upload.

  • new Client({runner, runnerOpts, partsize, multisize}
  • Client.prototype.prepareUpload({filename, forceSP, forceMP, partsize}) -> {filename, sha256, size, parts: [] | undefined
  • Client.prototype.runUpload(request, upload) -> ['ETAG_OF_EACH_REQUEST']

Command line tools

TODO: write the command line tool that does upload and download

Examples

TODO: write some examples

Hacking

This library has a suite of unit tests which can be run without credentials.

npm install .
npm test

There are integration tests which can be run, but require S3 credentials and an existing bucket to test against.

export BUCKET=mybucket
export AWS_ACCESS_KEY_ID=myaccesskeyid
export AWS_SECRET_ACCESS_KEY=mytoken

npm test

remotely-signed-s3's People

Contributors

ccooper avatar imbstack avatar jhford avatar

Watchers

 avatar  avatar  avatar

Forkers

walac

remotely-signed-s3's Issues

Concurrent multipart uploads

Right now for multi-part uploads, we upload each part sequentially. This means that we aren't getting the full benefits of multipart uploads through concurrency. The naive approach would be to start all the part uploads using Promise.all, but that would mean that we could have possibly have hundreds of uploads all happening at the same time. This would likely trigger API limits and would probably actually slow down the overall process.

The solution should be some sort of batching of the parts. There are some nice libraries which might be able to help us do this:

There are still reasons for doing multipart uploads outside of concurrency, so this doesn't need to be a day-1 feature.

Uploading a large file (512164 bigfile.gz) is causing RequestTimeout errors

It seems that what's happening is that the request is start and running, but when the data is completely written, the socket sits idle long enough for the S3 server to abort the request.

The error message looks like:

{ Error: Failed to run a request PUT https://test-bucket-for-any-garbage.s3-us-west-2.amazonaws.com/fcae4bfa-276d-4b26-aca1-bccbe4461dfe
    at Client._loop2$ (/home/jhford/taskcluster/remotely-signed-s3/src/client.js:324:19)
    <snip>
  url: 'https://test-bucket-for-any-garbage.s3-us-west-2.amazonaws.com/fcae4bfa-276d-4b26-aca1-bccbe4461dfe',
  method: 'PUT',
  headers: 
   { 'x-amz-meta-metadata1': 'metadata-1-value',
     'x-amz-meta-content-sha256': 'eab08c87a06d2af28892184b6a128e3080575afb38a99bfe4ee08e24e96f10a6',
     'x-amz-meta-content-length': '524288000',
     'x-amz-meta-transfer-sha256': '0cdaa5160cc994a38207a1b81b88e5a8e0ec498c411a092b06e02de70398b64f',
     'x-amz-meta-transfer-length': '524447948',
     'x-amz-storage-class': 'STANDARD',
     'x-amz-content-sha256': '0cdaa5160cc994a38207a1b81b88e5a8e0ec498c411a092b06e02de70398b64f',
     'content-length': '524447948',
     'content-type': 'application/json',
     'content-encoding': 'gzip',
     'x-amz-tagging': 'tag1=tag-1-value',
     Host: 'test-bucket-for-any-garbage.s3-us-west-2.amazonaws.com',
     'X-Amz-Date': '20170721T125222Z',
     Authorization: 'AWS4-HMAC-SHA256 Credential=SNIP/20170721/us-west-2/s3/aws4_request, SignedHeaders=content-encoding;content-length;content-type;host;x-amz-content-sha256;x-amz-date;x-amz-meta-content-length;x-amz-meta-content-sha256;x-amz-meta-metadata1;x-amz-meta-transfer-length;x-amz-meta-transfer-sha256;x-amz-storage-class;x-amz-tagging, Signature=SNIP' },
  body: '<?xml version="1.0" encoding="UTF-8"?>\n<Error><Code>RequestTimeout</Code><Message>Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.</Message><RequestId>43ED0FF6149574F1</RequestId><HostId>NYe+B0kbDJeA8Ey0awHTAxdQ3CxQjev+f6M/M5A/QBzNdMsH/NOIso6clOxa0D1IWX15cr60IbU=</HostId></Error>' }

It seems to work reliably when gzip encoding is not being used, though all of the gzip compression is done outside of the upload function. Looking at networking activity and correlating the logs:

COMPLETE PUT https://test-bucket-for-any-garbage.s3-us-west-2.amazonaws.com/e449b332-bfea-4902-8a75-fb2c0be93f3c 188fa003860e10fcf858e97d45e856cd44325ab20160855d1c4b80f14d3958d8 524288000 bytes +2m
  remote-s3:Runner:res RESPONSE 400 Bad Request PUT https://test-bucket-for-any-garbage.s3-us-west-2.amazonaws.com/e449b332-bfea-4902-8a75-fb2c0be93f3c HEADERS: {"x-amz-request-id":"104B9CA922FEBAA7","x-amz-id-2":"VDHi5PHovTOuzDB2JLZ1L3OoAwq4dCS/KlB5d9WEYB+iW0vkqXi5aP0IkrU3pyqti/iKy1JULTA=","content-type":"application/xml","transfer-encoding":"chunked","date":"Fri, 21 Jul 2017 13:01:32 GMT","connection":"close","server":"AmazonS3"}

shows that the upload is finished, even though the connection isn't being closed. I'm not sure what's going on here. This does not block the artifact api deployment, though, as this does not relate to the type of requests which happen in the controller.

CODE_OF_CONDUCT.md isn't correct

Your required text does not appear to be correct

As of January 1 2019, Mozilla requires that all GitHub projects include this CODE_OF_CONDUCT.md file in the project root. The file has two parts:

  1. Required Text - All text under the headings Community Participation Guidelines and How to Report, are required, and should not be altered.
  2. Optional Text - The Project Specific Etiquette heading provides a space to speak more specifically about ways people can work effectively and inclusively together. Some examples of those can be found on the Firefox Debugger project, and Common Voice. (The optional part is commented out in the raw template file, and will not be visible until you modify and uncomment that part.)

If you have any questions about this file, or Code of Conduct policies and procedures, please reach out to [email protected].

(Message COC003)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.