tus / tus-resumable-upload-protocol Goto Github PK

View Code? Open in Web Editor NEW

1.4K 53.0 100.0 303 KB

Open Protocol for Resumable File Uploads

Home Page: https://tus.io

License: MIT License

protocol tus resumable file-upload mit cross-platform http open-source

tus-resumable-upload-protocol's Introduction

tus resumable upload protocol

The protocol is in the protocol.md file.

It is rendered as HTML via Jekyll/Kramdown by the tus.io repository (type make preview there).

License

Licensed under the MIT license, see LICENSE.txt.

OpenAPI specification

The OpenAPI Specification (OAS) defines a standard, language-agnostic interface to RESTful APIs which allows both humans and computers to discover and understand the capabilities of the service without access to source code, documentation, or through network traffic inspection.

There exists tools to create http servers and clients to access APIs using the OpenAPI description as input, e.g.:

go-swagger
Various generators in https://swagger.io.

The directory OpenAPI contains the OpenAPI (version 3.0.1) definitions of the tus protocol. Use a converter, e.g. API Spec Converter, source if you need a different version.

Since implementators are free to use different endpoints, the endpoints documented in the OpenAPI directory are to be considered examples.

tus-resumable-upload-protocol's People

Stargazers

Watchers

Forkers

esk525 powtac damirtrputec henvic dev-life hugozhu jaryhjz seacoastboy hfeeki lzq0904 biubiu dharmastyle web5design seco jirapong harmjanluth acconut sudoconf vayam vencax openium eltonoliveira th3conc3pt3ur 314687289 gocloudxyz jryans fikalefaza suxinde2009 bhstahl findyr nerro janko ifedapoolarewaju anilpuvvadi interweblt luckyarthur richiethom relax94 dima1034 athiwatp aetherus trevenen myang32 cactusnoon khadishi mrhubgit sgml ronag volkansenturk2012 polytronicgr hiproz ssttevee the-cc-dev brunofrani misselvexu rakou alimy lancelixiang prarot ashim-kr-saha sushrita acidburn0zzz 355731090990871 cladis alfxp butonic masterashu chainarong24nutdogg felix-schwarz forkkit nigoroll smatsson lexis-project rifkisia leelu611 jorgeluisrdz198 collabsoft c3333 suneelkumar94120 namezis mustafafi stantoxt realotz isabella232 kjm0001 alex1528 nh2 launchpado2 yialo yassinhajaj devdevdany 00mjk dee11ad renowncoder kkpan11 rbgadotti seanpm2001 parniaarb kakkoidev

tus-resumable-upload-protocol's Issues

202 Accepted Response?

Could a 202 Accepted Response be good for use under heavy usage/throttling uploads?

Reference
http://tools.ietf.org/html/rfc2616#section-10.2.3
http://mathieu.fenniak.net/the-api-checklist/ number 4 in the list, there also may be some other related things in this post

Describe large file chunking as a part of protocol

Many frameworks and webservers have predefined request maximum size. Hanlding a huge file upload in one request is not trivial task. The current version of protocol says:

Clients SHOULD send all remaining bytes of a resource in a single PATCH request, but MAY also use multiple small requests for scenarios where this is desirable (e.g. NGINX buffering requests before they reach their backend).

But how can client know what is the maximum size of 1 chunk? In most cases if client tries to send too big request the server would return a 413 error or something like this and client would not know what to do now.

I think that an additional header (lets say Max-Content-Length) that server returns on initial POST and HEAD requests can help us with that.

I haven't found any existing headers for such task so I suggest to use a custom one. Here is a small example (we want to send 50 mb file):

Request:

POST /files HTTP/1.1
Host: tus.example.org
Content-Length: 0
Final-Length: 52428800

Response:

HTTP/1.1 201 Created
Max-Content-Length: 10485760
Location: http://tus.example.org/files/24e533e02ec3bc40c387f1a0e460e216

Ok. Now client knows that only 10 mb per request is allowed. Than it sends chunks. If something went wrong - make HEAD request, detect the offset and continue.

HEAD request with non-existent previous upload should return 404

According to the protocol, HEAD don't have to handle unknown resources. I suggest adding to the protocol that with HEAD requests, if there is no previous upload, return a 404 status.

PATCH vs PUT

As pointed out by @Baughn in IRC, PATCH (partial entity update) seems more appropriate than PUT (replacing an entire entity) for resuming uploads. But it might cause issues with mobile http proxies as PATCH is somewhat exotic. Needs investigation.

More 1.0 implementations

Now that the protocol is stabilizing, we're looking to have more 1.0 implementations. A few ideas to kick things off:

A Client test suite that can be used to run automated tests against servers, to see if they are 100% tus 1.0 compatible (in Go?)
A Python client was requested (update 2016-09-14: @ifedapoolarewaju is thinking about building one now which is ready for use: https://github.com/ifedapoolarewaju/tus-py-client)
A Ruby client was requested and is in the works by @alexanderbez
An Nginx server module (update 2016-09-14: @adrusi is thinking about building one now)
A concurrent server implementation in Erlang
A Java server was requested (update 2016-09-14: @terrischwartz has one in the works)
Some work for a tus rust server was done in https://github.com/Rio/tus-rust
A Node.js (CoffeeScript?) server. We could probably base it on @vayam's brewtus?

Block Uploads vs. Offset Ranges

In this issue I will propose two extensions which both aim implement the features needed for parallel uploads and non-contiguous chunks (see #3). In the end we have to choose one of them or go with something else.

Before starting I want to clarify that streaming uploads (with unknown length at the beginning) have not been included in these thoughts since you should not use them in conjunction with parallel uploads.

The first solution uses offset ranges. Instead of defining one offset which starts at 0 and is incremented for each PATCH request the server would store one or multiple ranges of allowed and free offsets. These offsets will be returned in the Offset-Range header in the HEAD request replacing the Offset response (!) header. The client then uses this information to choose the offset and uploads the same way as it's currently implemented.

Here is an example of a 300-byte file of which the second 100 bytes (100-199) have been uploaded:

HEAD /files/foo HTTP/1.1

HTTP/1.1 204 No Content
Enitity-Length: 300
Offset-Range: 0-99, 200-299

PATCH /files/foo HTTP/1.1
Content-Length: 100
Offset: 200

[100 bytes]

HTTP/1.1 204 No Content

HEAD /files/foo HTTP/1.1

HTTP/1.1 204 No Content
Enitity-Length: 300
Offset-Range: 0-99

The range of the last 100 bytes (200-299) has been removed since this buffer has been filled successfully by the upload.

While this solution allows the maximum of flexibility (compared to my second proposal) since you can upload at any offset (as long as it's available) it may be a though extension to implement for the servers. It has to ensure that the start of the offset range against which the chunk is uploaded is available and the end of the offset. Using the example from above you're not allowed to patch a 150-byte chunk at the offset of 0 because the bytes starting from 100 have already been written.

The second solution I came up with involves a bit more: When creating a new upload (using the file creation extension or somehow else) a blocksize is defined using which the file is separated into different blocks. For example, considering a file of 5KB and a blocksize of 2KB you would end up with two blocks of 2KB and a single one of 1KB. The important point is that each of the blocks has its own offset which starts at position 0 relative to the starting position of the block.

Considering the last example, the relative offset 100 of the second block would be the absolute offset of 2148: 2048 (2KB starting position of the second block) + 100 relative offset.

Only one upload is allowed at the same time per block. In this example a maximum of three parallel uploads are allowed. Each new PATCH request must resume where the last upload of the block has stopped, jumps are not allowed.

In following example we consider having a file of 5KB with the blocksize of 2KB. The first block is already fully uploaded (2048 bytes), the second with is filled with 100 bytes and the last one has not a single write yet. We are going to upload 100 bytes to the relative offset of 100 into the second block:

HEAD /files/bar HTTP/1.1

HTTP/1.1 204 No Content
Enity-Length: 5120
Blocksize: 2048
Block-Offset: 2048, 100, 0

PATCH /files/bar HTTP/1.1
Content-Length: 100
Offset: 2148

[100 bytes]

HTTP/1.1 204 No Content

HEAD /files/bar HTTP/1.1

HTTP/1.1 204 No Content
Enity-Length: 5120
Blocksize: 2048
Block-Offset: 2048, 200, 0

Please post your opinion about these solutions (I prefer the my last proposal) or any additional way we could achieve parallel and non-contiguous uploads. Also take the time to consider the work of implementations for servers and clients.

make example urls in protocol work

we host a tusd version anyway, why not make life easier for early protocol implementors by making the example requests executable

Standardize support for streaming uploads

As @felixge pointed out in #26, it would be good to have a standardized way of providing URL endpoints where a client can retrieve a file that is currently being uploaded that will stay open until the entire file has been sent.

Following the decision in #26 to replace Offset with Content-Length, clients will by default be getting only the bytes that have been uploaded at the time of the request. A conforming client might be able to detect the Entity-Length header and keep the connection open to stream more bytes, but it would be good to define the protocol in such a way that "normal" HTTP clients would be able to request a file being uploaded and receive the entire file too.

One way of achieving this might be to change the default behavior of HEAD and GET requests to by default serve Content-Length = Entity-Length and stream the file to the client, but add a request flag a client to send if they wish to only get the uploaded bytes and not wait for the rest. Something like Accept: incomplete, except with a more appropriate header field (Accept is only for content types).

Upload-Length Entity-Length Upload-Offset Offset

Seeing that all the projects are as recently updated, I'm wondering what are the correct headers?

tus/tusd@46fabad3 says one thing https://github.com/tus/tus-js-client, https://github.com/tus/TUSKit, https://github.com/tus/tus-java-client seem to follow but i didn't check if all the headers matches

http://tus.io/protocols/resumable-upload.html seem to talk about Entity-Length and Offset and then there's also Tus- prefixed headers (vs the Upload- prefixed headers)

Thanks

How to properly use Content-Type

This is a continuation of a discussion from issue #14. I created a new issue because it's a completely different subject. But you should read the posts there too.

The topic here is: should the protocol specify Content-Type of valid requests, and if so, what for and what types to use?

Upload expiration

I feel there's a need to add an optional upload expiration to the protocol. If someone starts an upload and never finishes it, it shouldn't just sit around eternally abandoned. Eventually you'd want the server to clean those up and remove them. A client should be informed when this might occur.

Clarify that url does not have to be fixed

see https://news.ycombinator.com/reply?id=5563688&whence=item%3fid%3d5563569

worth extending protocol to define GET (download)?

I had a brief look, but couldn't find a discussion of downloading a large file in parallel, resumable chunks.

I have a use case that may involve passing large confidential blobs back and forth between the client and server. Before I found tus, I was considering using https://github.com/feross/webtorrent .

Upload bandwidth does seem more precious than download bandwidth, but I wonder if there wouldn't be some benefit to addressing the same concerns for both directions.

I imagine getting downloads to work would require any client-side solution to define a storage adapter, so that an official storage-agnostic download algorithm could keep identify missing / corrupted chunks, etc.

Is it valid to POST the `location` in the response of POST before?

After POST a upload url, I will get the location for upload,
after a few PATCHs, is it valid to POST the location url?

In the second POST, is it valid to update metadata and upload-length(if defer upload length)?

Rename Final-Length to Entity-Length

Using Entity- as the consistent prefix makes more sense for the other headers we'll need (Entity-Content-Type, etc.).

Add S3 Multipart Uploads to Prior art

Although it has a misleading name (often confused with form/multipart), Amazon's solution could be listed under the Prior art section.

Here are some of its highlights:

You're not required to know the size of the file in advance;
The minimum part size is 5MB (you upload parts of 5MB, concurrent uploads are possible and encouraged);
You must complete the request by providing information on all the parts (each part upload returns an ETag you must provide at the end);
It supports uploads up to 5GB;
To resume an upload, you request the list of parts and then proceed to upload the ones that are missing.

Reduce implications of Location header field.

I'm glad to see such projects in the wild. Uploading files was a complete mess until now. :)

I considered implementing this protocol to the Symfony2 bundle OneupUploaderBundle and stumbled over the following part of the specification.

Servers MUST acknowledge a successful file creation request with a 201 Created response code and include an absolute url for the created resource in the Location header.

See section 6.1.3.1 for details.

This has multiple severe implications on application backends that support tus.

The enforcement of a Location header that is an absolute url for the created resource assumes that there is always a public url for an uploaded file. Even though possible use-cases might be edge cases, I don't think a protocol should enforce such behavior.
It implies that the backend stores currently uploading files to the same directory than previously (and complete) uploaded files. The protocol itself does not mention a possibility to send a final destination to the frontend after having a file upload complete. (See section 5.1). The usage of a temporary directory would make the cleanup pretty easy.
Obviously there is no way of accessing MIME data while uploading an image, as there is no way of proving that the file headers are uploaded completely. This makes it impossible to name the file according to its mime type, which is as far as I can tell pretty common.

Given the fact that there must be a valid identifier which can be sent along in subsequent requests, I can think of the following possibility:

Do not enforce an accessible url for the created resource in the following requests:

HEAD /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1
Host: tus.example.org

PATCH /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1
Host: tus.example.org
Content-Type: application/offset+octet-stream
Content-Length: 30
Offset: 70

[remaining 30 bytes]

Think of 24e533e02ec3bc40c387f1a0e460e216 as an identifier rather than a file name.

And extend the last response from the backend by a Location field to send an accessible url to the client. (This could be optional given the fact that there might be none. RECOMMENDED or OPTIONAL)

This way it is possible to name the file after the last uploaded bytes and move it from a temporary directory to its final place.

What do you think? Would this be a reasonable way to go, or am I missing something (maybe I misunderstand the specification at this point)?

cc 1up-lab/OneupUploaderBundle#52

http2 support?

any planning for addding http2 support?

Content-Type for PATCH

we need to define / set a content type for our patch requests.

Project status?

We have adopted the semantics of the protocol in one of our applications. What is the current status of the project? Are there any plans on updating / enhancing the protocol (e.g. checksums)? It seems, that the development has somewhat stalled. If not, what is the best way to contribute to the project? Reporting/discussing in issues or PR or both?

The protocol SHOULD describe retry for clients :)

I think it would be useful if the protocol describes how clients should automatically retry a few times during XX seconds. If this behavior is implemented consistently across all clients, it's becomes easy for people who run tus servers to do failovers or spread load, without end-users even noticing anything except for maybe a few seconds delay added to their upload.

I guess you could consider this the responsibility of the implementer, but if you can assume all clients handle this gracefully, it makes architecting server-side infrastructure much simpler.

Define behavior for GET on incomplete uploads

Edge case. Don't really have a strong opinion on whether it should simply return 404 Not Found or give some sort of indication that an upload is under way, like 416 Requested Range Not Satisfiable. For the latter, the RFC says servers "SHOULD" respond with 416 if a Range header was sent in the request but it doesn't say anything about it not being allowed or recommended otherwise.

Probably best to just leave a small note on the protocol draft stating that default behavior is to report 404 until the upload has been completed but server implementations are free to roll in their own behavior as they deem fit.

Ressource allocation/timeout

Once the client requests an upload via POST (or should we call that an upload session) the server should reserve enough resources to deal with the announced size of the upload. However, to prevent super simple DOSing the server should implement a timeout mechanism to clean up stale upload sessions.

I don't know if we need to specify the timeout mechanism, as there are probably a number of different algorithms that make sense here; however, the server should have some way to inform the client that his upload session is not valid anymore due to a timeout.

Describe how a client may abandon a transfer if it cannot resume it

Today at work we've found a corner case when the client knows it was uploading something, but is unable to resume that.

Should TUS specify a way to gracefully abort an upload session? Of course it cannot be enforced, but in case of a closed environment, you may want to implement it, to release resources on the server sooner.

I'm fine if we decide it should be out of scope. But I guess it can be as simple as sending a request using DELETE instead of PATCH, so perhaps we can specify that? We only need to think how to respond to further PATCH and/or GET requests, but I guess 404 is the answer here.

Opinions?

100-Continue Research

Figure out if we need to support / require https' 100-Continue feature.

Describe limits for upload size

Some HTTP frameworks I came across won't accept Content-Length values over a signed 32bit integer (2147483647 bytes, 2GB).

On the other hand, the RFC mandates that the Content-Length header must be ignored if the transfer encoding is anything but identity (i.e. chunked transfer encoding):

Messages MUST NOT include both a Content-Length header field and a non-identity transfer-coding. If the message does include a non-identity transfer-coding, the Content-Length MUST be ignored.

You should probably include a note explaining the limit for this kind of uploads will depend on the servers and/or client and that chunked transfer encoding is not an option.

Is the `reason` of response from tus server must match the protocol?

of course the status code must be the same to the protocol,
but what about the reason part?

Is it also must same to the protocol?

Respond with 412 Precondition Failed when required request headers are missing/invalid

Currently, a 412 Precondition Failed is used when the client is not supported by the server.

But, this could also be used if the request headers don't satisfy the protocol, such as if the Upload-Offset header is missing inPATCH requests

PATCH /files/17f44dbe1c4bace0e18ab850cf2b3a83 HTTP/1.1
Content-Length: 11
Tus-Resumable: 1.0.0

hello world

HTTP/1.1 412 Precondition Failed

or the Upload-Length, Upload-Defer-Length headers with POST requests.

POST /files HTTP/1.1
Tus-Resumable: 1.0.0

HTTP/1.1 12 Precondition Failed

Thoughts?

"Range" vs. "Content-Range"

"Range" is a request header for GET requests in section 14.35 of the HTTP/1.1 spec:

HTTP retrieval requests using conditional or unconditional GET methods
MAY request one or more sub-ranges of the entity

On the other hand, "Content-Range" in section 14.16 of the spec is a response header field:

The Content-Range entity-header is sent with a partial entity-body to specify
where in the full entity-body the partial body should be applied

Why do you use "Range" for the response and "Content-Range" for the request?

Consider using 204 No Content

See: https://news.ycombinator.com/reply?id=5564024&whence=item%3fid%3d5563569

how to response HEAD method when supporting concatenation-unfinished extention?

when server supports tus-defer-length and concatenation-unfinished ,
the upload-length is unknown, Is the response like this?

HEAD /files/ab HTTP/1.1

HTTP/1.1 200 OK
Upload-Defer-Length: 1
Upload-Concat: final;/files/a /files/b

when all the parts length are known, but upload is not finished,
should it to response with a header to indicate that it is not finished?
or upload-offset header?

Developer Guide

For some time, we have had the idea of a separate document containing further explanation about the protocol and best practices for implementations which don't fit into the specification. Here are some points which may be contained in this guide:

Retries (exponential backoff, automated reconnecting etc)
Single vs multiple requests
Method overriding
100-continue
Custom extensions
Checksums
Header naming scheme (prefix and singular)
Empty files (length = 0)
Obtaining upload URL without Creation extension
Handling 200, 204 and 2xx the same
Explain tus vocabulary (upload, upload URL, resuming an upload)
HTTP2 support

If you have further ideas, feel free to comment!

Response to HEAD should use Content-Length, not Offset

The protocol dictates that the response to a HEAD request on an incomplete upload should reply with an Offset header indicating how much data the server already has. Wouldn't it make more sense to use the Content-Length header here, as we are really telling the client how much data is already stored on the server?

Do we need a separate request to create uploads?

The file creation extension suggests an initial POST to create the file on the server:

POST /files HTTP/1.1
Host: tus.example.org
Content-Length: 0
Final-Length: 100

Any thoughts on allowing clients attempt to upload the complete file to that request? If the upload finishes, you can respond the same (201 with a Location header). If it fails, return the appropriate HTTP status and the Location header. That way clients only have to worry about the resumable protocol when necessary.

Where to specify filename

The protocol implementations I've seen store the chunked data in a .bin file. No where in the protocol does it mention how a client should provide a target file name to move the upload to once complete.

Video and audio files, among others no longer work/are recognized when they've been renamed from something like my_video.mp4 to 036f1378006746bcb40a0e3981257552.bin. Has the protocol left this out for a reason?

Checksum Support (MD5,CRC32,SHA1,etc.)

Just a vague idea I'm throwing out here:

Should there be some mechanism for verifying the upload contents somewhere in the process? e.g supplying a md5 hash of the file when responding to a HEAD request?
Or is this outside the scope of the protocol and something that should left as option for implementations to handle if they want to?

Mention authentication in protocol

Basically clarifying that the usual suspects can be used (basic auth, cookies, etc).

tus Concatenation Extension Proposal

After reviewing the spec for the concatenation extension I think that the
current proposal can be simplified a bit.

Right now each client is required to make individual file creation requests with
individual content lengths for each segment, keep track of all of them and the
order they should be in, and then send another request with the list of the
individual segment urls.

This puts a bit of a burden on the client implementation that can be avoided if
the server does more of the work.

Proposal

Have the client make a file creation request with the final Upload-Length and
a new header Upload-Segments which would be an integer greater than 0, but
less than the total Upload-Length.

If the number of requested segments seems too high the server can return an error
response.

The server would reply with a response containing a header with the URIs of all
of the segments the client will need to send data to in the order they should
appear possibly containing the total length of each segment as well.

It could also return the normal Location response with the location for the
finished upload which you could use HEAD against to get the overall status based
on the status of the individual segments.

Why?

Having to manage parallel requests to create individual segments is cumbersome
and might lead to strange bugs. In javascript particularly doing parallel
asynchronous stuff can be annoying.
It cuts down on the number of http requests. In the current proposal if you
have 10 segments, it will require 11 HTTP requests just for file creation (not
including PATCH requests and HEAD requests). In this new proposal, it only
requires 1 HTTP request
Currently the server has to wait on an Upload-Concat: final and deal with
the possibility of it coming in before the uploads are finished using the
concatenation-unfinished extension. That will no longer be necessary

Example

This is an example of what the creation request might look like. Not sure if
this is the best way to represent the expected segments and their sizes, but it
will require the least amount of parsing on the client cause you can just split
at comma.

The expected ranges/sizes technically do not need to be returned by this request
if there is an agreed upon way to calculate the segment sizes.

Possible calculation example

sizePerSegment = Math.floor(uploadLength / numSegments)

segmentSize = sizePerSegment
if isLastSegment
    segmentSize = sizePerSegment + uploadLength % sizePerSegment

Initialize the upload

Request

POST /files HTTP/1.1
Upload-Length: 1005
Upload-Segments: 10

Response

HTTP/1.1 201 Created
Location: https://tus.example.org/files/5c8da61d68591b1300a46e3e2766daef
Upload-Segments: /files/5c8,/files/a3e,/files/f87,/files/e4e,/files/cc5,/files/7d2,/files/88a,/files/13b,/files/b3b,/files/6a4
Upload-Expected-Ranges: 0-99,100-199,200-299,300-399,400-499,500-599,600-699,700-799,800-899,900-1004

Patch the expected segment to each of the segment URIs
Use HEAD https://tus.example.org/files/5c8da61d68591b1300a46e3e2766daef to check overall state or HEAD /files/cc5 to check individual segment state.

Note

This assumes that every segment will be an equal length, but I can't
think of a case where you would want to use different lengths for different
segments.
Originally I had this for the response header, but I realized the ranges would probably reduce the amount of work required on the client
```
Upload-Expected-Sizes: 100,100,100,100,100,100,100,100,100,105
```

Allow protocol discovery between client and server.

If writing an application that is able to upload to many different servers, the application will need to check if the remote server supports this protocol. Therefore, I suggest the following protocol extension to start a conversion on the topic.

(Note: that servers that do not support the tus protocol will reply to HEAD requests with either 200 OK or 404 Not Found)

Protocol Discovery

All implementations SHOULD include the protocol discovery API to allow clients to negotiate, with a server, a common supported version of this protocol.

When making a HEAD request for a new file the client MAY include a TUS-Resumable header with the version of this protocol.

When receiving a HEAD request with the TUS-Resumable header the server MAY reply with the 202 Accepted status and the current Offset.

Example

Request:

HEAD /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1
Host: tus.example.org
TUS-Resumable: 1.0

Response:

HTTP/1.1 202 Accepted
Offset: 0

[Core] There's no way to find out if transfer is completed on server side

Let's look at this:

"Clients SHOULD send all remaining bytes of a resource in a single PATCH request, but MAY also use multiple small requests for scenarios where this is desirable"

So how the server is going to know if transfer is completed since it doesn't know how many subsequent patch requests are going to be sent?
It looks like the only way to find out if file is completely transferred is to use POST file creation extension and keep expected file size somewhere.

I think file size to be uploaded should be in the core protocol, since file creation is not specified in the core and does't have to be implemented the way of POST extension. Each PATCH request should be "self-supporting" and should include what is the final file size (especially giving the fact that client knows what the file size is).

Define behaviour when uploading non-contiguous chunks

As per IRC convo:

Should the server accept uploading of non-contiguous chunks?
For example client uploads ranges [0-100], then [240-360] then [400-420]. Do we accept this happily or throw toys out of the pram?

If acceptable behaviour, how should results of such be displayed when making a HEAD request?

with contiguous blocks Range: bytes=0-100
with non-contigous blocks Range: bytes=0-100,240-360,400-420 ?? no idea if this is sane or not

Fear of creating extra requests?

Hi,

I really like the fact that you are creating a protocol for uploads. Building an upload server and html5 client I struggled a lot with deciding how to handle big, resumable uploads. I finally went with a protocol alike to the Amazon S3 multipart upload. It works really well and I am able to fully saturate the available upload bandwidth with parallel chunks (which is important because usually files are multiple GBs big), pause uploads (also in-explicitly, when the network connection drops), and resume them. I was going through the tickets that are currently open, and I made the following observation.

It seams that you want to put every piece of information in as few requests as possible. I see mentions of max-content-length (#24) , checksum support (#7), protocol discovery (#29), etc to be implemented in the first POST request. Isn't this wat an OPTIONS request is for?

Also, finishing the upload happens now because the server knows the length of the to be uploaded data up-front. Streaming uploads are to be implemented in the future however. So why not leave room or implement a "finalising" request. I think that it is also the right place for a final checksum (or hashtree) to be exchanged. And a place to handle things like Entity-Location (#30) (In my case, all chunks are assembled by a background task. The last request returns the tasks status location url to be consumed by the client).

We are talking about a resumable upload protocol, so I assume most use-cases concern big files because otherwise a single POST or PUT would suffice. So I can't imagine the extra requests would be of any concern, but correct me if I make the wrong assumptions.

I can see the beauty in the way the protocol is currently designed (single start request, the rest managed by headers), but for me that beauty fades away quickly when the features Parallel Chunks, Checksums and Streams will be implemented within the same constraints.

Tino

Support streams with unknown length?

see discussion here: http://www.tus.io/protocols/resumable-upload.html#comment-865108402

Disallow offsets smaller than uploaded size

Citing the current spec for the PATCH request:

The Offset value SHOULD be equal, but MAY also be smaller than the current offset of the resource, and servers MUST handle PATCH operations containing the same data at the same absolute offsets idempotently.

In v0.2.2 of tus offsets smaller then the offset of the resource are accepted allowing clients to change already uploaded chunks of the resource. While this makes sense in some rare cases it hardens the server implementations. The problem is that the server must wait until everything is uploaded to start processing the resource because the client may change parts of the file. For example, a server may start converting a big video file right from the start allowing faster conversion and better resource usage.

Another point to consider is that streaming downloads of uploads are impossible if offsets smaller than the amount of already uploaded bytes.

If the client knows that parts of the file will change in the future they shouldn't be uploaded in the first place and only in their final state. The server should not to take care of that.

I suggest to change to specification to disallow this behaviour. In cases where the clients sends such values the request should be rejected returning a 409 Conflict.

Web site uses protocol version 0.2.1 instead of 0.2.2

http://tus.io/protocols/resumable-upload.html has the protocol version 0.2.1 instead of 0.2.2.

1.0 Launch todos

We're thinking what is required before we launch 1.0, and are putting all todos in this page.

Are we missing anything? Please feel free to add.

Before 1.0 Launch

1.0 Launch!

After 1.0 Launch

protocol: Finish Developer Guide
protocol: RAML spec
client: Implementation tester

Describe how to provide metada

This issue is to define section "6.4. Metadata" of the protocol, i.e. how to provide additional meta information when uploading files.

I'm going to start with my use case. I think I need metadata to let a client specify what kind of file it is uploading, so that server can trigger appropriate processing. Let's say clients can upload two types of files: measurement data and logfiles.

What do you think about including this in the file creation request, as POST data? E.g. filetype=measurements or filetype=logs?

POST /v1/files HTTP/1.1
Entity-Length: 3000
Content-Length: 21
Content-Type: application/x-www-form-urlencoded

filetype=measurements

When using POST request's content to send metadata, there could be some conflicts in case protocol wanted to include some metadata as well, e.g. a checksum. In such case some naming convention should be used to avoid conflicts. E.g. keys of metadata defined in the protocol must start with a prefix "tus-". This means that users are free to pick any names they want, but namespace "tus-*" is reserved for future use by the protocol.

Your thoughts?

[duplicate] Optional checksumming

The client should be allowed to send a header indicating which checksum algorithms it accepts, and the server should optionally checksum each data chunk, allowing the client to resend on checksum error.

Something like an 'Accept-Checksum: md5,crc32', either in the initial POST or in each PUT, and a following Checksum header in the response to each data chunk. The checksum algo should be negotiated, but something like md5 should probably be recommended.

The client is in charge of deciding whether a resend is required.

Rework language to use MUST, MAY, etc.

This is helpful for implementors to check if their implementation comply with the protocol.

Clarify that POST is an optional step

If you already have the final file url, you can directly PUT to it.

tus / tus-resumable-upload-protocol Goto Github PK

tus-resumable-upload-protocol's Introduction

tus resumable upload protocol

License

OpenAPI specification

tus-resumable-upload-protocol's People

Stargazers

Watchers

Forkers

tus-resumable-upload-protocol's Issues

Proposal

Why?

Example

Possible calculation example

Note

Protocol Discovery

Example

Before 1.0 Launch

1.0 Launch!

After 1.0 Launch

Recommend Projects

Recommend Topics

Recommend Org