w3ctag / packaging-on-the-web Goto Github PK

View Code? Open in Web Editor NEW

60.0 60.0 15.0 74 KB

OBSOLETE: Guidance about how to provide packages of information on the web.

HTML 100.00%

packaging-on-the-web's People

Contributors

Stargazers

Watchers

Forkers

dret chicoxyzzy dimich-g mrdewitt kleopatra999 plehegar daukantas qpc-github

packaging-on-the-web's Issues

Nested packages

What if my package contains a html-page, which links to a sub-package, which is also a part of the primary package? What would the syntax look like?

This could very well be a scenario for distributing a full web site, but the content itself has been optimized and packaged ready for streaming.

We are considering using this format as a serialization format for Assetgraph, which would almost guarantee we will hit this case

There are good use cases for packaging in locations where Web is still behind a slow or intermittent connection. Packaging can be used for alternative distribution of web content - via local sharing (ShareIt etc), on memory cards, from local servers, and so on. Simply capturing a set of articles or pages and then using them offline as reference material can be very useful when data plans are limited and expensive.
Here is a doc with some desired properties and use cases for such conditions.

One of the interesting capabilities is for the package to contain multiple pages, potentially from varied origins. There are 2 elements likely needed for that:

some sort of index at the beginning of a package, that allows 'random' access to resources w/o unpacking the potentially large package into parts or some local file system. Mulitiple pages and resources (and perhaps a couple of movies) can result in a package that is prohibitively expensive to 'unpack', especially on a mobile device.
a notion of a 'main page' that is opened when user 'opens' a package - in case it is loaded in some way to the device and is 'opened' by a browser.

Add example showing Streamable Package Format for packaged web app

Claim of more efficientcy lacks evidence

The document sez:

Delivering a package of these files could be more efficient than delivering individual files. Downloading each file has a connection overhead which is particularly impactful on low-bandwidth mobile devices and on secure connections.

This claim needs clarification, data, and citations. Particularly, what is the impact for a common website over HTTP? What would it be over HTTP2? what could it theoretically be over using packaging?

I know people have done the research for HTTP and HTTP2 - but probably not for the kind of packaging being suggested by this document.

Link headers as an alternative?

Late to the party here, but I just read the spec and can't figure out what the advantages of this are over simply making links between things.

HTTP/2 makes the number of individual requests irrelevant to perf or reliability, so I don't get the 'populating caches' use case (2.1).

In the web page use case (2.2), I can't see any benefit to linking to a package when you have a preload scanner that's going to harvest all the subresource links from the page anyway, you have preload hint headers to deal with anything the preload scanner doesn't spot, and serviceworker to cache atomic packages of resources if you want offline access. Adding an additional permissions model for 'installing' a package seems redundant given other work (see Quota Management and Storage Durability)

Distributing libraries (2.3) is currently done using package managers that install multi-file packages from git repos, which seems to work OK, and the subsequent concatenation dramas are eliminated by HTTP/2.

In downloading data (2.4) with associated metadata, I guess it kind of makes sense that if you download a CSV you'd want the documentation too, but the current proposal would require anything that currently recognises the CSV format to now also recognise the new package format in order to make the new format as useful as the unpackaged original. Your starting point is breaking all CSV readers, and then gradually fixing them by supporting the new format. Wouldn't a better mechanism be to include links in the HTTP response headers, eg:

Link: rel=describedby; href=https://example.com/docs/about-my-csv.html
Link: rel=next; href=https://example.com/data/month-2.csv

etc.

I suspect I'm missing something quite blindingly obvious...

Is it safe to deploy over plain-text HTTP?

Quoting from the document:

Developers who cannot yet use HTTP/2 may find that using packages can provide performance benefits through reducing numbers of requests

I think there's an assumption here that this file format is safe to deploy over plain-text HTTP (Since TLS is probably the main hurdle to adoption of HTTP/2).

Since this format enables cache population under a certain path, is it not something we know to be safe to use with HTTP? Can't it be abused, beyond the current possible abuse of HTTP cache in MITM scenarios?

I think it'd be good if the document addressed that point.

Why not use `Content-Encoding`?

Maybe it's just me but I found the use of Transfer-Encoding to indicate a message part is compressed rather confusing. Why can't we use Content-Encoding the same way that it is used for entire messages?

As an aside: I don't think that compressing the entire package would make it less streamable from a decoding perspective. It may not make sense if it contains binary files, but issue 3 seems based on wrong assumptions.

Reflect the discussion in the April 2015 F2F

remove performance aspects
explore sending only references
explore idea of an inflatable package where the body may or may not be directly inside (with no defined way on how to populate)

Caching and delta updates to packages

It would be great if the document could speak more clearly (maybe it's own section) on how caching works and how updates to particular resources could happen.

Like, if I send a package that contained my sites logo + 50 other files, but then I only update my logo, does the user need to re-download the whole package? What happens if I decide to remove a file for a package or add some new files, etc.

Please "un-gut" this specification

The last commit "gutted" this draft specification: 65af284

While I understand that WebPackage covers a similar topic area and list of use cases, it removes several features (human legibility, streaming, internal references via fragment identifiers) which are still of interest to the Publishing WG and possibly others.

"Gutting" any specification in this way also hampers discussion by breaking links and hiding the actual spec behind a "wall".

Please reconsider this approach and revert this commit: 65af284

Thank you!
🎩

IETF review

Since this draft defines a new media type (that's sort of a multipart type, but not), it really needs early review from the appropriate communities (e.g., the media types list, apps-discuss).

When is the SPF stream finished? Is Content-Length enough?

As there is no end marker of Streamable Package Format (SPF) - how do we know the stream is finished and there are no more files, and that the connection was not broken for other reasons?

This could presumably be solved at the HTTP level - but could also be an issue for the file format. How do I now a .pack file is not half-way written to disk?

In the SPF examples there are no Content-Length header. Presumably this can be difficult to know in advanced for a truly streaming server, in which case Transfer-Encoding SHOULD be used with a chunked transfer coding.

See https://tools.ietf.org/html/rfc7230#section-3.3.3

"scope" attribute

This sounds very much like the concept of link context in RFC5988; see
http://tools.ietf.org/html/rfc5988#section-5.2

If this spec is going to go to the trouble of defining a "scope" attribute on the link element, it'd be really really nice if it was defined to line up with that concept, and available for all link relations, not just "package".

Streaming Packaging Format

The term "streaming" often refers to the timed delivery of data, with timestamps and clocks, in particular with audio and video data. This package format is more about progressive download. Consider renaming "Streaming Packaging Format" to "Progressively Downloadable Packaging Format" or at least adding text disambiguating the term.

tarballs

Hi, just wondering if there was a discussion about the tarball format.

From the README:

The main problem with using zips is that the central directory record, which lists the valid files within the zip archive, appears at the end of the zip.

Tarballs are stream friendly where ZIPs aren't, so this issue wouldn't apply

What's the origin of a signed package?

The introduction says:

Initiatives such as Firefox OS and Chrome OS demonstrate the potential of trusted, installable applications built with web technologies. To be used in this way, applications must be self-contained packages of resources that can be tested and signed.

Firefox OS and Chrome OS use the presence of a signature from Mozilla or Google to allow an application to request permissions that normal websites can't request. The code with access to these permissions may be tricked into mis-using them if a less-trusted application may write to its storage. However, any code running on the same origin can write to a trusted application's storage. I think that implies that a signed package built by the owners of https://example.com/ can't have the same origin as non-packaged code fetched from https://example.com/.

Maybe suborigins (@metromoxie) can help with this. [Edit: Nope: "there should be no way for Suborigins to obtain such permissions"]

Package signing and key continuity for trusted apps

I work on OpenPGP.js and would love to see a standard for signed packaged apps on the web. Much like how Chrome Apps work but without a central CA or app store to go through.

@diracdeltas suggested that the browser check for key continuity when verifying app updates. Much like how Android works. But in the web's case there should be no central CA and the browser just does trust-on-first-use for the developer's public key on the initial installation. On subsequent updates the browser then verifies that the signature is indeed from the same key.

Especially with regards to client side crypto apps like WhisperSystems' Signal or encrypted webmail apps like Protonmail. I'm also working on an OpenPGP mail client (https://hoodiecrow.com) and could give feedback in terms of the security aspects for this use case.

Specifically, I'm talking about the follwing problem: http://tonyarcieri.com/whats-wrong-with-webcrypto

Or Moxie's remarks here: https://news.ycombinator.com/item?id=11307992

In that case a packaged app would have to exist in its own sandbox much like Chrome Apps with its own origin and storage so that e.g. private PGP key material could be stored by the trusted app, and JavaScript from an untrusted hosted page could not access it.

Populating Caches

The "populating caches" use case is deeply flawed. Having one resource populate a different URI's cache entry is a huge security hole; http://host.com/~evil can insert things into cache for http://host.com/~alice.

If the host opts into this sort of cross-population (e.g., by putting something indicating that at a well-known URI), it's a different story, but the default has to be safe.

(yes, ServiceWorker has a similar problem; working with Alex on that one).

There's a higher-level issue here about granularity of authority on the Web. Right now we have some / many security mechanisms that operate on the granularity of an origin, but that doesn't mean that new mechanisms (like this) can be introduced with origin scoping safely. It'd be good to have a general discussion about this, because e.g., CORS took great pains to allow finer-grained granularity of authority even though arguably it wasn't necessary in that case.

Package Contents Header Field?

interesting stuff! did you consider having a specific header field for listing the package's contents in the HTTP response? i haven't thought through this completely, and maybe it is subject to the same objections that you're listing for adding other HTTP-level constructs (such as a new status code), but for now i am just wondering if that might be another possibility for how to expose the contents of a package. In essence, it would lift packaging to being an HTTP-level mechanism, instead of just being a new media type.

Icon container as use case

Half-baked idea...

As screen densities increase, and web apps become "installable", it's likely to become more common that developers include high resolution icons pertaining to their web applications into either a container or directly into HTML using multiple <link rel=icon>s.

One can use a .ico file to contain icons, but the max. size of an icon can only be 256x256. Apple's .icns file format supports larger icons, but it's a proprietary format and not widely supported by browsers.

A generic packaging format could potentially help address the shortcomings with .ico. Particularly if the central directory included some metadata about each file (e.g., width, height, and maybe target density).

Vary header use

In the examples Vary is used alongside HTTP response headers. Vary is defined to work with request headers as a value, so I'm not sure how that can work.

maybe use EPUB as a scenario?

maybe it would be interesting to look at ebooks and EPUB in particular as a popular use case? after all, ebooks benefit a lot from packaging. but the web would benefit from this packaging being more open/standardized so that web architecture is able to reach into ebooks, and not just to their packaged level.

Consider manifest~like approach instead of SPF

We didn't come to any resolutions on the public-web-perf list, so just want to raise this here such that the feedback and discussion doesn't get lost. The highlights:

The entire thread is worth reading for full context, but in a nutshell the proposal is to revisit the use cases in light of omitting the whole "streaming package format" in favor of a ~manifest-like approach. I believe we can meet the required use cases without SPF, while also avoiding any perf complications introduced by SPF.

Zip CAN be streaming

It is wrongly argued in https://github.com/w3ctag/packaging-on-the-web#zip-as-a-packaging-format that:

The TAG discussed the use of zipped files as a packaging format. The main problem with using zips is that the central directory record, which lists the valid files within the zip archive, appears at the end of the zip. Implementations therefore need to wait until the whole zip is downloaded before the files within it can be read. This makes it unsuitable for a packaging format for efficient delivery of content on the web (the first of the requirements described above).

ZIP-files also start each file with a ZIP Entry. So there is no need for the central directory for anything except for knowing that there are no more files.

https://gist.github.com/stain/46dd33fb49e843d333bd is an example in Java where a file is written each second.

Why not the non-zip formats?

The Introduction explains why not to use a Zip-based format, but several of the options, including MHTML, aren't zip-based and are more widely supported than this new proposal. The Introduction should explain why we need something new.

Indicate the spec isn't being worked on

In README.md:

The TAG stopped working on that specification. Please consult the WICG WebPackage repository for continuing work on this subject.

It's also useful to mention that in the spec draft itself. Thanks.

Turn on HTTPS

http://w3ctag.github.io/packaging-on-the-web/ doesn't redirect to https://w3ctag.github.io/packaging-on-the-web/. You should turn that on in the github settings: https://help.github.com/articles/securing-your-github-pages-site-with-https/

Can a package be authenticated to come from a particular secure origin?

One use of this packaging format could be to let folks exchange websites offline. However, if I copy a package from you and open it, and it wants to do something like install a service worker or record video, it needs to live in a secure context, meaning the browser needs a cryptographically secure signature vouching that a particular origin created the package. Has anyone thought about how to embed that signature and its certificate chain into a package? Can the HTTPS client extract a trustworthy signature automatically, or does it need the server's cooperation?

@talo @slightlyoff

Need to clarify how this relates to HTTP2

This document should justify why this would be needed over just using HTTP2 - or why HTTP does not address the use cases.

Use new archive/ top-level media type?

A new IETF working group arcmedia is forming, proposing the top-level media-type archive/.

Should Packaging-on-the-web register as the media type archive/package instead of application/package? I would argue yes - if you intend for a package to be browsable as an archive.

It might be beneficial for Packaging-on-the-web folks to join the arcmedia mailing list.

Related to #11.

Security use cases for packaging

The intro says:

Initiatives such as Firefox OS and Chrome OS demonstrate the potential of trusted, installable applications built with web technologies. To be used in this way, applications must be self-contained packages of resources that can be tested and signed.

IMO, the ability to verify trusted signatures on packaged apps provides a huge security advantage over regular web apps that don't use packaging. This is part of why many developers who make encrypted messaging apps implement them as browser extensions or Chrome apps instead of as web pages (popular examples include Google's End to End and Cryptocat)

The rest of the document doesn't say anything about signatures, though #8 suggests that the SPF format may include a signature header. FWIW, I don't think it is a good idea to put a signature in a package header or part header, because the content of the other headers should be signed as well. PGP/MIME is an example of how to securely include a signature over a MIME structure (see http://www.ietf.org/rfc/rfc3156.txt, section 5).

Given the importance of signatures, especially for installable apps, it would be nice for this draft to be specific about the signature format and how the client is supposed to verify it. For instance, is the public signing key supposed to be included in the package? Should the client always verify a SPF signature if one is included before loading the resources (this means it has to wait until the entire package is downloaded)? How do signatures interact with caching and partial package updates? etc.