w3c / dpub-pwp-loc Goto Github PK
View Code? Open in Web Editor NEWLocator Task force of the W3C DPUB IG
License: Other
Locator Task force of the W3C DPUB IG
License: Other
Do we have to allow for ways to have canonical locators without online reference?
In http://w3c.github.io/dpub-pwp-loc/#dfn-combination-of-manifests, the combination of different manifests is described using M1 and M2.
To avoid confusion with the M2 in http://w3c.github.io/dpub-pwp-loc/#algorithm-to-find-the-right-values-for-lp-and-lu, maybe we can change the numbered M1 and M2 in http://w3c.github.io/dpub-pwp-loc/#dfn-combination-of-manifests to Ma and Mb?
URLs in examples currently use the domain books.org
. I would suggest systematically using https://example.org/ instead.
The example.org
domain is reserved for this purpose. Using any other domain also introduces security risks if ever the domain goes into wrong hands and readers of the spec do try to navigate to these URLs.
To some extent, I still believe that the algorithm and priorities eventually depend on the detailed technical solutions, which are out of scope in the note. For instance:
I appreciate that the algorithm definitely clarifies what we're talking about, but IMHO giving it too much "definitive" credit is wishful thinking at this stage?
Part of my concerns are handled in the notes after the algorithm, but I feel that the whole section looks too much "spec like". Shouldn't we just make general statements in this note, and leave the exact mechanisms to the final spec ?
I'm thinking of statements along the line of:
At the moment, the algorithm implies following order:
I am not disagreeing, but maybe we should explicitly state this somewhere, to avoid people having to fully parse the algorithm or figure to realize this.
Sometimes the letters are limited for use in examples:
“The example PWP in this document is denoted P.”
Other times they seem to be used in replacement for the concepts:
“Throughout the document, we will use L as the canonical locator of a published PWP. ”
I would suggest that:
Generally, I find that letters are convenient shorthand for our internal discussions, but IMHO they make the document harder to read for newcomers (as they add another level of abstraction to what is already very abstract).
When retrieving the manifests a PWP processor may have a number of manifests from different sources that must be combined. If there are overlap in terms, the question of priority comes up: e.g., a manifest coming through a <link>
element may have a lower priority than the embedded manifest (or the other way round).
Should a PWP specification specify this, or should it be left to implementations
Proposal: L must be part of any published or downloaded PWP. Then, the PWP processor can link this local PWP with the online published PWP via L, and thus connect to, e.g., online annotations, or sync offline made annotations to the online account.
I.e., if the user is online, and the online version may be updated, (and P allows for automatic updating,) does the cache than have to be updated? Is this an implementation level issue, or should it be reflected in the specification?
Currently we're pretty evasive about how the manifest is retrieved from a packaged PWP. The algorithm says:
If the response is a packaged PWP instance (...) unpack the package, and retrieve the manifest embedded in the package
Under the hood, several scenarios can be envisioned:
The "packaged" step in the algorithm may well have to go back to the "resource as HTML file" step, at some point after the unpackaging.
It seems odd that we clearly define the manifest retrieval and priorities for the "single HTML file" but we do not for the "packaged" stage.
On the other hand, I can hardly see how we could do that given that we leave the packaging solution entirely open...
Whether this preference is based on some local consideration, like network speed, PWP size, etc., or is set by the user, is a different problem that will require separate considerations.
What if the PWP processor has not given a preference on which media type should be returned, or what if the preference of the user is not available on the server? Should we as a task force specify an order, or let the server decide?
At the moment the document relies on a package file that reflects, internally, the same file structure as the unpacked state.
If this is not the case, some of the URL translations in the manifest may become more complicated. To be specified.
The document contains a description on how the various manifests can be retrieved (and subsequently combined, although see issue #6). Is this description in the spec exhaustive, or are implementations allowed to do more? Does this description defines the minimal alternatives?
Current it reads "Algorithm to find the right values for Lp and Lu". I think the point of the algorithm is to retrieve the whole manifest, why not say so directly ?
There are (or will be) UCs where I need info on the content of the PWP, but don't care about its absolute locator.
Consequently, I'd remove the paragraph starting with:
In fact, the algorithm does more than...
Given a situation where a PWP is only published packed, e.g., once zipped and once tarred, and a resource is requested, the question is how does this resource get to the end-user.
Proposal: This depends on the configuration of the server. If the server has the implemented functionality to unpack the PWP server-side, it could return the resource immediately. However, if the server can only return the entire package, the PWP processor can see (based on the returned content type) what kind of package is returned when an individual resource is requested, unpack P client-side, and return the resource of the local unpacked P to the reading system.
It seems that this term is misleading in understanding what is going on (part of the feedback at the WWW2016 presentation).
As raised by @lrosenthol (https://lists.w3.org/Archives/Public/public-digipub-ig/2016Mar/0043.html)
potential definitions of the server-side operations are as valid (or potentially more so) than the client and should be present in the document as well
Proposal: change last sentence of the definition of server to "This server will always need some kind of configuration or maybe more complex functionalities. The minimal requirement of a server is that when L is requested, it should return M (which in turn could be combined by using different parts of the HTTP response, see http://w3c.github.io/dpub-pwp-loc/#pwp-processor). Depending on the implemented functionalities of the server, it could take the preferences of the client-side into account (see #8)".
Following example might clarify the issue (and could also be inserted in the document somewhere):
When L is requested, the server returns either the manifest, the main HTML page (to be defined what the main HTML page is), or a package, with optional HTTP Link Header (see http://w3c.github.io/dpub-pwp-loc/#pwp-processor).
This returning result depends on three factors: the preference of the client-side (see #8 ), the capabilities of the server, and the published states of the PWP.
The diagram in Fig. 1 is a good visual aid, but the same retrieval workflow could also be described as an algorithm with pseudo-code. It may actually help clarify the diagram, in addition to making it accessible to visually impaired people.
Obviously, we'd need to figure out how to deal with #6 and #7 first.
At the moment, the discussion relies on the fact that the resources follow a "tree like" structures in its local organization, which means that a single base plus the local, relative, URI makes it possible to find the locator for a resource. This may not always be true; if so, the manifest may require further definitions to ensure a mapping.
This is also related to issue #3.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.