w3c / dpub-pwp-loc Goto Github PK

Locator Task force of the W3C DPUB IG

License: Other

HTML 99.81% CSS 0.19%

dpub-pwp-loc's Introduction

The Digital Publishing Interest Group, that managed this repository, is now closed, and so is this repository. Activities in the group has been taken up by:

All these groups are part of the Publishing@W3C, born out of the merger of IDPF and W3C.

– Ivan Herman ([email protected])

pwp-loc

Locator Task Force discussions related to PWP

See the github view page for a readable version of the HTML files in the repo.

dpub-pwp-loc's People

Contributors

Watchers

Forkers

isabella232

dpub-pwp-loc's Issues

Clarify the manifest retrieval from the package?

Currently we're pretty evasive about how the manifest is retrieved from a packaged PWP. The algorithm says:

If the response is a packaged PWP instance (...) unpack the package, and retrieve the manifest embedded in the package

Under the hood, several scenarios can be envisioned:

there is a standalone manifest at a known location (as is the case, to some extent, in EPUB)
the manifest is combined from several manifests ; for instance, a "main" HTML document in the package may contain an embedded manifest and/or link to manifest(s).

The "packaged" step in the algorithm may well have to go back to the "resource as HTML file" step, at some point after the unpackaging.

It seems odd that we clearly define the manifest retrieval and priorities for the "single HTML file" but we do not for the "packaged" stage.
On the other hand, I can hardly see how we could do that given that we leave the packaging solution entirely open...

Editorial: use the example.org domain for example URLs

URLs in examples currently use the domain books.org. I would suggest systematically using https://example.org/ instead.

The example.org domain is reserved for this purpose. Using any other domain also introduces security risks if ever the domain goes into wrong hands and readers of the spec do try to navigate to these URLs.

Editorial: clarify the use of L, P, M, etc

Sometimes the letters are limited for use in examples:

“The example PWP in this document is denoted P.”

Other times they seem to be used in replacement for the concepts:

“Throughout the document, we will use L as the canonical locator of a published PWP. ”

I would suggest that:

all general “spec-level” statements are written with the explicit terms
letters are only used for examples

Generally, I find that letters are convenient shorthand for our internal discussions, but IMHO they make the document harder to read for newcomers (as they add another level of abstraction to what is already very abstract).

Should packaged states impose the same structure as the unpacked state?

At the moment the document relies on a package file that reflects, internally, the same file structure as the unpacked state.

If this is not the case, some of the URL translations in the manifest may become more complicated. To be specified.

Explicitly describe manifest order?

At the moment, the algorithm implies following order:

HTTP Link Header
Manifest from package OR manifest as payload OR Manifest embedded in the HTML
Manifest via a <link> in <head>

I am not disagreeing, but maybe we should explicitly state this somewhere, to avoid people having to fully parse the algorithm or figure to realize this.

Should the resources in a PWP follow a hierarchical "tree like" structure?

At the moment, the discussion relies on the fact that the resources follow a "tree like" structures in its local organization, which means that a single base plus the local, relative, URI makes it possible to find the locator for a resource. This may not always be true; if so, the manifest may require further definitions to ensure a mapping.

This is also related to issue #3.

How to fetch a resource from within a PWP?

Given a situation where a PWP is only published packed, e.g., once zipped and once tarred, and a resource is requested, the question is how does this resource get to the end-user.

Proposal: This depends on the configuration of the server. If the server has the implemented functionality to unpack the PWP server-side, it could return the resource immediately. However, if the server can only return the entire package, the PWP processor can see (based on the returned content type) what kind of package is returned when an individual resource is requested, unpack P client-side, and return the resource of the local unpacked P to the reading system.

Hierarchy of identifiers/locators (was: Consider finding a different name for "canonical locator")

It seems that this term is misleading in understanding what is going on (part of the feedback at the WWW2016 presentation).

Change M1 and M2 to Ma and Mb in the definition?

In http://w3c.github.io/dpub-pwp-loc/#dfn-combination-of-manifests, the combination of different manifests is described using M1 and M2.

To avoid confusion with the M2 in http://w3c.github.io/dpub-pwp-loc/#algorithm-to-find-the-right-values-for-lp-and-lu, maybe we can change the numbered M1 and M2 in http://w3c.github.io/dpub-pwp-loc/#dfn-combination-of-manifests to Ma and Mb?

Are details of manifest retrieval in scope for this Note?

To some extent, I still believe that the algorithm and priorities eventually depend on the detailed technical solutions, which are out of scope in the note. For instance:

we do not (and cannot) describe retrieval and priorities mechanisms in the packaged state
If I understand correctly, the eventual Spec for PWP can still make the decision that the canonical locator always return a single media type (manifest, or HTML document, or packaged state) –IOW it can discard a conneg approach–, in which case several branches of the algorithm can simply be pruned.

I appreciate that the algorithm definitely clarifies what we're talking about, but IMHO giving it too much "definitive" credit is wishful thinking at this stage?

Part of my concerns are handled in the notes after the algorithm, but I feel that the whole section looks too much "spec like". Shouldn't we just make general statements in this note, and leave the exact mechanisms to the final spec ?

I'm thinking of statements along the line of:

there can be several manifest resources, to be defined in the final spec.
manifest have priorities (to be defined), and must eventually be combined
HTML allows embedded and linked manifest, in which case embedded should always have higher prio
manifest can be linked via HTTP headers, in which case it has the higher prio over the payload's manifest

What to do with a locally downloaded PWP?

Proposal: L must be part of any published or downloaded PWP. Then, the PWP processor can link this local PWP with the online published PWP via L, and thus connect to, e.g., online annotations, or sync offline made annotations to the online account.

Preference of media type (a.k.a. content) of the PWP processor

Whether this preference is based on some local consideration, like network speed, PWP size, etc., or is set by the user, is a different problem that will require separate considerations.

What if the PWP processor has not given a preference on which media type should be returned, or what if the preference of the user is not available on the server? Should we as a task force specify an order, or let the server decide?

Server-side operations

As raised by @lrosenthol (https://lists.w3.org/Archives/Public/public-digipub-ig/2016Mar/0043.html)

potential definitions of the server-side operations are as valid (or potentially more so) than the client and should be present in the document as well

Proposal: change last sentence of the definition of server to "This server will always need some kind of configuration or maybe more complex functionalities. The minimal requirement of a server is that when L is requested, it should return M (which in turn could be combined by using different parts of the HTTP response, see http://w3c.github.io/dpub-pwp-loc/#pwp-processor). Depending on the implemented functionalities of the server, it could take the preferences of the client-side into account (see #8)".

Following example might clarify the issue (and could also be inserted in the document somewhere):

When L is requested, the server returns either the manifest, the main HTML page (to be defined what the main HTML page is), or a package, with optional HTTP Link Header (see http://w3c.github.io/dpub-pwp-loc/#pwp-processor).
This returning result depends on three factors: the preference of the client-side (see #8 ), the capabilities of the server, and the published states of the PWP.

If we assume the PWP is only published zipped, and the preference of the client-side is zipped, then the server does not need any more capabilities than just returning the package.
- If the preference of the client-side is the manifest, and the server is just a file server, then the server can only return the zip-package, and the client is burdened with unpacking the zip client-side and extracting the manifest.
- If the server has the functionality, it could unpack the zip server-side and return the manifest directly to the client.
Vice versa and other alternative scenarios are also possible, e.g., a client requests the zipped PWP and the server only has the unpacked PWP published. depending on the capabilities of the server, either
- the client is burdened with packing the PWP client-side (dumb server), or
- the server packs the PWP server-side (smart server).

isn't the real objective of the algorithm to retrieve the manifest?

Current it reads "Algorithm to find the right values for Lp and Lu". I think the point of the algorithm is to retrieve the whole manifest, why not say so directly ?

There are (or will be) UCs where I need info on the content of the PWP, but don't care about its absolute locator.

Consequently, I'd remove the paragraph starting with:

In fact, the algorithm does more than...

Do we have to be exhaustive in the various options on getting to manifests?

The document contains a description on how the various manifests can be retrieved (and subsequently combined, although see issue #6). Is this description in the spec exhaustive, or are implementations allowed to do more? Does this description defines the minimal alternatives?

Describe an example of M+L retrieval as an algorithm?

The diagram in Fig. 1 is a good visual aid, but the same retrieval workflow could also be described as an algorithm with pseudo-code. It may actually help clarify the diagram, in addition to making it accessible to visually impaired people.

Obviously, we'd need to figure out how to deal with #6 and #7 first.

Do we have to specify what happens when the PWP processor already has a cached version?

I.e., if the user is online, and the online version may be updated, (and P allows for automatic updating,) does the cache than have to be updated? Is this an implementation level issue, or should it be reflected in the specification?

What to do with a PWP that is published nowhere, i.e., are created locally?

Do we have to allow for ways to have canonical locators without online reference?

What is the relative priority of various manifests?

When retrieving the manifests a PWP processor may have a number of manifests from different sources that must be combined. If there are overlap in terms, the question of priority comes up: e.g., a manifest coming through a <link> element may have a lower priority than the embedded manifest (or the other way round).

Should a PWP specification specify this, or should it be left to implementations