Giter Site home page Giter Site logo

dpub-pwp-loc's Introduction

The Digital Publishing Interest Group, that managed this repository, is now closed, and so is this repository. Activities in the group has been taken up by:

All these groups are part of the Publishing@W3C, born out of the merger of IDPF and W3C.

– Ivan Herman ([email protected])


pwp-loc

Locator Task Force discussions related to PWP

See the github view page for a readable version of the HTML files in the repo.

dpub-pwp-loc's People

Contributors

bjdmeest avatar iherman avatar rdeltour avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

isabella232

dpub-pwp-loc's Issues

Clarify the manifest retrieval from the package?

Currently we're pretty evasive about how the manifest is retrieved from a packaged PWP. The algorithm says:

If the response is a packaged PWP instance (...) unpack the package, and retrieve the manifest embedded in the package

Under the hood, several scenarios can be envisioned:

  • there is a standalone manifest at a known location (as is the case, to some extent, in EPUB)
  • the manifest is combined from several manifests ; for instance, a "main" HTML document in the package may contain an embedded manifest and/or link to manifest(s).

The "packaged" step in the algorithm may well have to go back to the "resource as HTML file" step, at some point after the unpackaging.

It seems odd that we clearly define the manifest retrieval and priorities for the "single HTML file" but we do not for the "packaged" stage.
On the other hand, I can hardly see how we could do that given that we leave the packaging solution entirely open...

Editorial: use the example.org domain for example URLs

URLs in examples currently use the domain books.org. I would suggest systematically using https://example.org/ instead.

The example.org domain is reserved for this purpose. Using any other domain also introduces security risks if ever the domain goes into wrong hands and readers of the spec do try to navigate to these URLs.

Editorial: clarify the use of L, P, M, etc

Sometimes the letters are limited for use in examples:

“The example PWP in this document is denoted P.”

Other times they seem to be used in replacement for the concepts:

“Throughout the document, we will use L as the canonical locator of a published PWP. ”

I would suggest that:

  • all general “spec-level” statements are written with the explicit terms
  • letters are only used for examples

Generally, I find that letters are convenient shorthand for our internal discussions, but IMHO they make the document harder to read for newcomers (as they add another level of abstraction to what is already very abstract).

Explicitly describe manifest order?

At the moment, the algorithm implies following order:

  1. HTTP Link Header
  2. Manifest from package OR manifest as payload OR Manifest embedded in the HTML
  3. Manifest via a <link> in <head>

I am not disagreeing, but maybe we should explicitly state this somewhere, to avoid people having to fully parse the algorithm or figure to realize this.

Should the resources in a PWP follow a hierarchical "tree like" structure?

At the moment, the discussion relies on the fact that the resources follow a "tree like" structures in its local organization, which means that a single base plus the local, relative, URI makes it possible to find the locator for a resource. This may not always be true; if so, the manifest may require further definitions to ensure a mapping.

This is also related to issue #3.

How to fetch a resource from within a PWP?

Given a situation where a PWP is only published packed, e.g., once zipped and once tarred, and a resource is requested, the question is how does this resource get to the end-user.

Proposal: This depends on the configuration of the server. If the server has the implemented functionality to unpack the PWP server-side, it could return the resource immediately. However, if the server can only return the entire package, the PWP processor can see (based on the returned content type) what kind of package is returned when an individual resource is requested, unpack P client-side, and return the resource of the local unpacked P to the reading system.

Are details of manifest retrieval in scope for this Note?

To some extent, I still believe that the algorithm and priorities eventually depend on the detailed technical solutions, which are out of scope in the note. For instance:

  • we do not (and cannot) describe retrieval and priorities mechanisms in the packaged state
  • If I understand correctly, the eventual Spec for PWP can still make the decision that the canonical locator always return a single media type (manifest, or HTML document, or packaged state) –IOW it can discard a conneg approach–, in which case several branches of the algorithm can simply be pruned.

I appreciate that the algorithm definitely clarifies what we're talking about, but IMHO giving it too much "definitive" credit is wishful thinking at this stage?

Part of my concerns are handled in the notes after the algorithm, but I feel that the whole section looks too much "spec like". Shouldn't we just make general statements in this note, and leave the exact mechanisms to the final spec ?

I'm thinking of statements along the line of:

  • there can be several manifest resources, to be defined in the final spec.
  • manifest have priorities (to be defined), and must eventually be combined
  • HTML allows embedded and linked manifest, in which case embedded should always have higher prio
  • manifest can be linked via HTTP headers, in which case it has the higher prio over the payload's manifest

What to do with a locally downloaded PWP?

Proposal: L must be part of any published or downloaded PWP. Then, the PWP processor can link this local PWP with the online published PWP via L, and thus connect to, e.g., online annotations, or sync offline made annotations to the online account.

Preference of media type (a.k.a. content) of the PWP processor

Whether this preference is based on some local consideration, like network speed, PWP size, etc., or is set by the user, is a different problem that will require separate considerations.

What if the PWP processor has not given a preference on which media type should be returned, or what if the preference of the user is not available on the server? Should we as a task force specify an order, or let the server decide?

Server-side operations

As raised by @lrosenthol (https://lists.w3.org/Archives/Public/public-digipub-ig/2016Mar/0043.html)

potential definitions of the server-side operations are as valid (or potentially more so) than the client and should be present in the document as well

Proposal: change last sentence of the definition of server to "This server will always need some kind of configuration or maybe more complex functionalities. The minimal requirement of a server is that when L is requested, it should return M (which in turn could be combined by using different parts of the HTTP response, see http://w3c.github.io/dpub-pwp-loc/#pwp-processor). Depending on the implemented functionalities of the server, it could take the preferences of the client-side into account (see #8)".

Following example might clarify the issue (and could also be inserted in the document somewhere):

When L is requested, the server returns either the manifest, the main HTML page (to be defined what the main HTML page is), or a package, with optional HTTP Link Header (see http://w3c.github.io/dpub-pwp-loc/#pwp-processor).
This returning result depends on three factors: the preference of the client-side (see #8 ), the capabilities of the server, and the published states of the PWP.

  • If we assume the PWP is only published zipped, and the preference of the client-side is zipped, then the server does not need any more capabilities than just returning the package.
    • If the preference of the client-side is the manifest, and the server is just a file server, then the server can only return the zip-package, and the client is burdened with unpacking the zip client-side and extracting the manifest.
    • If the server has the functionality, it could unpack the zip server-side and return the manifest directly to the client.
  • Vice versa and other alternative scenarios are also possible, e.g., a client requests the zipped PWP and the server only has the unpacked PWP published. depending on the capabilities of the server, either
    • the client is burdened with packing the PWP client-side (dumb server), or
    • the server packs the PWP server-side (smart server).

isn't the real objective of the algorithm to retrieve the manifest?

Current it reads "Algorithm to find the right values for Lp and Lu". I think the point of the algorithm is to retrieve the whole manifest, why not say so directly ?

There are (or will be) UCs where I need info on the content of the PWP, but don't care about its absolute locator.

Consequently, I'd remove the paragraph starting with:

In fact, the algorithm does more than...

Describe an example of M+L retrieval as an algorithm?

The diagram in Fig. 1 is a good visual aid, but the same retrieval workflow could also be described as an algorithm with pseudo-code. It may actually help clarify the diagram, in addition to making it accessible to visually impaired people.

Obviously, we'd need to figure out how to deal with #6 and #7 first.

What is the relative priority of various manifests?

When retrieving the manifests a PWP processor may have a number of manifests from different sources that must be combined. If there are overlap in terms, the question of priority comes up: e.g., a manifest coming through a <link> element may have a lower priority than the embedded manifest (or the other way round).

Should a PWP specification specify this, or should it be left to implementations

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.