Intertwingler — An Engine for Dense Hypermedia
Intertwingler
is an engine, very much like
WordPress is an engine: you use it to
make websites. You can think of Intertwingler
, at least this
implementation of it, as a demonstrator for the kind of
infrastructure necessary to make the Web do genuine dense
hypermedia.
The way to understand dense hypermedia is to contrast it with what
the Web is off the shelf, which is sparse hypermedia: big clunky
pages with not a lot of links, and what links do exist are
sequestered into regions like navigations and other UI. What we want
instead are smaller, more composable units, the mechanism of
composition being — what will end up being a much greater density of —
ordinary links. The effect we are after with Intertwingler
is to
pulverize Web content, dramatically increasing its addressability.
Not only does this afford practical benefits like content reuse, but
new affordances for software tools and creative expression.
Strategy
The main problem Intertwingler
has to solve, then, is the fact that
links on the Web are extremely brittle. The reason why links on
the Web are brittle is because it's very cheap to change the URL of a
Web resource, and very expensive to change all the places where that
URL is referenced. Intertwingler
solves this problem the following
way:
- It stores links (i.e., referent-reference pairs✱) as first-class objects,
- It assigns every resource a canonical identifier that doesn't budge,
- It overlays human-friendly address components (slugs) on top,
- It remembers prior values for these address components if you change them,
- It uses a custom resolver to do everything in its power to match a requested URL to exactly one resource,
- It also has a mechanism for the principled handling parametrized and derived resources, maintaining a registry of parameter names, syntaxes, semantics, and other metadata.
Intertwingler
accomplishes all this by bringing your organization's
entire address space (i.e., every Web URL under every domain you control)
under its management.
✱ Actually,
Intertwingler
stores links as triples where the third element is the kind of link it is. More on this later.
Also packaged with the Intertwingler
demonstrator are the means for
creating websites with dense hypermedia characteristics:
- A file system handler, for transition from legacy configurations
- A content-addressable store handler, for bulk storage and caching of opaque resources
- A pluggable markup generation handler, for rendering transparent resources
- Mutation handlers (e.g.
PUT
andPOST
) for both opaque and transparent resources, - A set of transforms for manipulating resource representations, specifically HTML/XML markup and images.
Architecture
Concepts
This is a brief glossary of terms that are significant to
Intertwingler
. It is not exhaustive, as it is assumed that the
reader is familiar with Web development terminology. Note that I will
typically prefer the more generic term "URI" (identifier) rather than
"URL" (locator), and use "HTTP" to refer to both HTTP and HTTPS unless
the situation demands more precision.
(Information) Resource
An information resource is a relation between one or more identifiers (in this case URIs) and one or more representations. A familiar type of information resource is a file, which has exactly one representation and usually one, but possibly more than one identifier (file name/path). Web resources have an additional dimension, which is the request method or verb with which the resource was requested.
Opaque Resource
An opaque resource is named such because the enclosing information system does not need to "see into" it. An opaque resource may have more than one representation, but one representation will always be designated as canonical.
Transparent Resource
A transparent resource is the complement of an opaque resource: the enclosing information system can, and often must, "see into" its structure and semantics. Since the canonical representation of a transparent resource resides only in live working memory, all serializations (that neither discard information nor make it up) are considered equivalent.
Representation
A representation (of an information resource on the Web) is a literal sequence of bytes (octets) that represents the given information resource. Representations can vary by media type, natural language, character set, compression, and potentially many other dimensions.
HTTP Transaction
An HTTP(S) transaction refers to the process of a client issuing a single request to a server, and that server responding in kind. In other words, a single request-response pair.
Handler
An Intertwingler
handler is a microservice with certain
characteristics. All handlers are designed to be run as stand-alone
units for bench testing and system segmentation. A handler responds to
at least one request method for at least one URI. Handlers have a
manifest that lists the URIs, request methods, parameters, content
types, etc. under their control. This enables the Intertwingler
engine to perform preemptive input sanitation, and efficiently route
requests to the correct handler.
Engine
The Intertwingler
engine is a special-purpose handler that
marshals all other handlers and transforms, resolves URIs, and routes
requests to handlers. This is the part that faces the external network.
Transform
A transform is a special-purpose handler that encapsulates one or
more operations (each identified by URI) over a request body. As such,
transforms only respond to POST
requests. Like all handlers,
transforms have lists of content types for each URI in their manifest
that they will both accept and emit. Transforms are configured to run
in a queue, with the output of one being fed into the input of the
next. Through its interaction with an HTTP message, a transform may
also trigger a subsequent transform to be added to its own, or
another queue.
Request Transform
A request transform operates over HTTP requests. It can modify the request's method, URI, headers, body (if present), or any combination thereof.
Response Transform
A response transform operates over HTTP responses. Analogous to request transforms, response transforms can manipulate the response status, headers, body, or any combination thereof. Unlike a request transform, there are multiple queues for response transforms: an early-run queue and a late-run queue, with an addressable queue sandwiched between them.
Handlers
Everything in Intertwingler
is a handler, including the engine
itself. At root, a handler is a microservice created in compliance
with the host language's lightweight Web server interface (in our
case with Ruby, that would be
Rack).
A handler is intended to be only interfaced with using HTTP (or, again, the Web server interface's approximation of it). That is, a handler instance is a callable object that accepts a request object and returns a response object. A handler is expected to contain at least one URI that will respond to at least one request method.
Intertwingler
Engine
The The Intertwingler
engine imagines itself one day turned into a
high-performance, stand-alone reverse proxy, with hot-pluggable
handlers (and by extension, transforms) that can be written in any
language, and interface internally over HTTP. That is the lens with
which to view the design. The engine is meant to be put at the edge of
an organization's Web infrastructure and manage the Web address space
for all of the organization's (DNS) domains.
When an HTTP transaction occurs completely within the engine's process space (i.e., it does not try to access handlers running in other processes/engines), the engine has strategies to mitigate the amount of extraneous parsing and serialization that would otherwise occur.
Intertwingler
Handler Manifests (In Progress)
Still in progress at the time of this writing is a finalized design for handler manifests, though some details are certain. A manifest is intended to advertise the set of URIs that a given handler will respond to, along with:
- what request methods are recognized,
- what content types are available as a response,
- what URI query parameters are recognized, their data types, cardinality, etc.,
- what content types are accepted in requests (at least the ones that send body content)
- in the case of
POST
ed HTML forms (application/x-www-form-urlencoded
andmultipart/form-data
types), parameter lists analogous to query parameters, - etc…
The exact format of the manifest payload is still yet to be
determined. What is known is that handler manifests will be
retrieved by the special OPTIONS *
request, intended to address the
server (in this case microservice) directly rather than any one
particular resource it manages. Since the HTTP
specification
does not explicitly define semantics for any content in response to
OPTIONS *
, we future-proof by only sending the manifest if a
Prefer: return=representation
header is present in the request, in addition to the ordinary content
negotiation headers, Accept
and so on.
State
Intertwingler
maintains its state — at least the transparent
resources — in an RDF graph database. The
current implementation uses a very simple, locally-attached quad
store. Opaque resources, or rather their literal
representations, are held primarily in a content-addressable
store. Intertwingler
also includes a file system handler to help
transition from legacy configurations.
Both the graph database and the content-addressable store are candidates for stand-alone systems that could be scaled up and out.
Addressing
Intertwingler
maintains URI continuity by ascribing durable
canonical identifiers to every resource, and
then overlaying human-friendly yet potentially perishable
identifiers on top. The goal of the Intertwingler
resolver is to
eliminate the possibility of a user receiving a 404
error, at least
in practice. (In principle it will always be possible to request URIs
that Intertwingler
has never had under its management.)
While it is possible, for aesthetic reasons, to ascribe an explicit
path as an overlay URI, Intertwingler only needs as much path
information as is necessary to match exactly one canonical
identifier. That is, if the database only contains one resource with a
slug of my-summer-vacation
, then the full URI
https://my.website/my-summer-vacation
is enough to positively
identify it. (If a longer path was explicitly specified, then
Intertwingler
will redirect.) If a second resource shows up in the
graph with the same slug, Intertwingler
will return 300 Multiple Choices
with the shortest URIs that will unambiguously identify both
options.
URI path segments prior than the terminating one correspond to
arbitrary entities in the graph that happen to have been appropriately
tagged. Again, the only purpose they serve is to unambiguously
identify the terminating path segment. For the path /A/B/C/d
to
resolve, A
has to exist and be connected (again, arbitrarily) to
B
, B
to C
, and C
to d
. If only part of the path resolves,
then that is one of the few situations you will encounter a 404
—
because the path is over\specified — something you can only do if
you enter the path manually, as Intertwingler
will only ever expose
(explicit overrides notwithstanding) the shortest uniquely-identifying
overlay path for any resource. As such, if d
can be uniquely
identified using a shorter path, the default behaviour of
Intertwingler
is to redirect.
In practice, this behaviour subsumes what we ordinarily think of as "folders" or "containers", and will be possible to configure which resource and relation types get considered for "container-ness", but in general
Intertwingler
does not recognize the concept of a container as a category of entity that is meaningfully distinct from a non-container.
Canonical Identifiers
Intertwingler
uses
UUIDs for the bulk of
its canonical identifiers, with the exception of those that correspond
1:1 to byte segments (that is to say, the opaquest of the opaque),
which use URIs derived from cryptographic
digests. The former
can always be reached by accessing, e.g.:
https://my.website/d3e20207-1ab0-4e65-a03c-2580baab25bc
and the latter, e.g.:
https://my.website/.well-known/ni/sha-256/jq-0Y8RhxWwGp_G_jZqQ0NE5Zlz6MxK3Qcx02fTSfgk
…per RFC6920. If the resolver finds a suitable overlay address for the UUID form, it will redirect, but the hash URI form remains as-is. Direct requests to these hash URIs (at least from the outside) will also bypass any response transforms, in order to preserve the cryptographic relationship between the URI and the representation.
Intertwingler
Transform Protocol
The transform protocol is inspired by the FastCGI
specification,
and its use in server modules like Apache's
mod_authnz_fcgi
.
In this configuration, the main server issues a subrequest to a
FastCGI daemon, and then uses the response, in this case, to determine
if the outermost request is authorized. The reasoning goes that this
behaviour can be generalized to ordinary HTTP (in our era of reverse
proxies, FastCGI is an extra step), as well as handle other concerns
in addition to authorization. (Indeed, FastCGI itself also specifies
a filter
role,
but I have not seen a server module that can take advantage of it.)
A direct request to a transform looks like a POST
to the transform's
URI where the request body is the object to be transformed. Additional
parameters can be fed into the transform using the URI's query
component, it being on a separate band from the request body. POST
s
to transforms must include a Content-Type
header and should
include an Accept:
header to tell the transform what it prefers as a
response. The Content-Length
, Content-Type
, Content-Language
,
and Content-Encoding
headers of the transform's response will be
automatically merged into the original HTTP message.
Entire-Message Transforms
Transforms can modify the entire HTTP message by serializing the
message (or the part desired to be modified) into the body of the
request to the transform, and using the content type message/http
.
Transforms that accept serialized HTTP messages as request bodies
should respond in kind.
That is, if you were writing an entire-request-manipulating request transform, it would expect the
POST
ed content to be a serialized request, and would likewise return a serialized request. An analogous response transform would expect a serialized response in the request body, and likewise respond with a serialized response, allmessage/http
.
For entire-message-manipulating transforms, it is only necessary to pass in the part of the HTTP message that one wishes to have transformed, plus any additional information needed for the transformation to be successful. (It is, however, necessary to include the request line or status line, for request transforms and response transforms, respectively.) Results will be merged into the original HTTP message. Responding with an identical value as the request (request line, status line, or header) will leave it unchanged, or in the case of headers, it is safe to omit them. To signal that a header ought to be deleted, include it in the outgoing header set with the empty string for a value.
URI Rewriting and No-Ops
The response codes 303 See Other
and 304 Not Modified
have special
meaning with respect to transforms. If a request transform returns a
303
, its Location
header should be interpreted as a simple
internal rewrite of the request-URI. A 304
indicates that the
transform (request or response) has made no changes at all. All
other 3XX
responses are forwarded to the client.
Redirect responses from addressable response transforms that return their own URI path with different query parameter values are translated backwards into the outermost request with different path parameter values.
Addressable Transforms
Most transforms are configured statically, but some response transforms are addressable through the use of path parameters, a lesser-known feature of URIs. The advantage of using path parameters to identify response transforms is that they stack lexically, so the example:
https://my.website/some/image;crop=200,100,1200,900;scale=640,480
…would fetch /some/image
from a content handler, and then in a
subrequest, POST
the resulting response body to, say,
/transform/crop?x=200&y=100&width=1200&height=900
, receive that
response body, and then POST
it to
/transform/scale?width=640&height=480
, the response to which would
be reattached to the outgoing response to the client. The mapping that
relates the comma-separated positional arguments in the path
parameters to key-value query parameters is expressed using the
Transformation Functions Ontology.
Handler Inventory
Everything in Intertwingler
is a handler, but the undecorated term
"handler" refers to content handlers. These are the stripped-down
microservices that actually respond to outside requests.
File System Handler
This is a rudimentary handler that provides content-negotiated GET
support to one or more document roots on the local file system.
Markup Generation Handler
This handler generates (X)HTML+RDFa (or other markup) documents from subjects in the graph. Pluggable sub-handlers can be attached to different URIs or RDF classes.
Generic Markup Generation Sub-Handler
This creates a simple (X)HTML document with embedded RDFa intended for subsequente manipulation downstream. This sub-handler and others will eventually be supplanted by a hot-configurable Loupe handler.
Atom Feed Sub-Handler
This will map resources typed with certain RDF classes to Atom feeds
when the request's content preference is for application/atom+xml
.
Google Site Map Sub-Handler
This will generate a Google site map at the designated address.
skos:ConceptScheme
Sub-Handler
This is a special alphabetized list handler for SKOS concept schemes.
sioct:ReadingList
Sub-Handler
This is a special alphabetized list handler for bibliographies.
Person/Organization List Sub-Handler
This is a special alphabetized list handler for people, groups, and organizations.
All Classes Sub-Handler
This handler will generate a list of all RDF/OWL classes known to
Intertwingler
. Useful for composing into interactive interfaces.
Adjacent Property Sub-Handler
This handler will generate a resource containing a list of RDF properties that are in the domain of the subject's RDF type(s). Useful for composing into interactive interfaces.
Adjacent Class Sub-Handler
This handler will generate a resource containing a list of subjects
with ?s rdf:type ?Class .
statements where ?Class
is in the range
of a given property. Useful for composing into interactive interfaces.
Content-Addressable Store Handler
The content-addressable store handler wraps
Store::Digest::HTTP
(which itself wraps
Store::Digest
).
This handler maps everything under /.well-known/ni/
. You can add a
new object to the store by POST
ing it to that address.
Store::Digest::HTTP
also generates rudimentary index pages.
Reverse Proxy Handler (TODO)
While the plan is to include a reverse proxy handler, and while they are relatively easy to write, I am leaving it out until I can determine a sensible policy for not letting the entire internet access the entire rest of the internet through the reverse proxy.
Linked Data Patch Handler
The
LD-Patch
handler processes PATCH
requests with text/ldpatch
content and
applies them to the graph. This can be used in conjunction with the
RDF-KV Transform.
Transform Inventory
Much of the labour of Web development is considerably simplified if you realize that many of the operations that bulk up Web applications can be construed as transformations over HTTP message bodies. Most transforms don't need much, if any information outside of the segment of bytes they get as input. Most transforms, moreover, are tiny pieces of code.
Request Transforms
Again, request transforms are run in advance of the content handlers, generally making small adjustments to headers and sometimes manipulating request bodies.
Markdown Hook Transform
This simple transform adds text/markdown
to the request's Accept
header, so downstream content negotiation selects Markdown variants
when it wouldn't otherwise. It also hooks the Markdown to
HTML response transform.
Note that if the
Accept
header already containstext/markdown
with a higher score thantext/html
orapplication/xhtml+xml
, the markdown passes through to the client untouched.
Sass Hook Transform
In a virtually identical move, this transform adds the
text/x-vnd.sass
and text/x-vnd.sass.scss
content types to the
request's Accept
header, and hooks the Sass Transform.
PUT
Transform
Pseudo-File This transform will take a PUT
request to a particular URI and
generate the graph statements needed to fake up a file path, while
transforming the request into a POST
to
/.well-known/ni/
to store the
content.
This is a basic mechanism for getting content onto the site in lieu of a fully-fledged WebDAV infrastructure, which will come later. I have implemented a WebDAV server before and it was an entire project unto itself.
RDF-KV Transform
This transform will take POST
requests with
application/x-www-form-urlencoded
or multipart/form-data
bodies
that conform to the RDF-KV protocol
and transform the request into a PATCH
with a text/ldpatch
body,
suitable for the LD-Patch handler.
Response Transforms
Response transforms are run after the selected content handler, in three phases: early-run, addressable, and late-run. Theoretically any response transform can be run in any phase, but some transforms will only make sense to run in certain phases, and/or before or after other transforms in the same phase.
Markdown to HTML Transform
This transform will take Markdown and turn it into (X)HTML.
Sass Transform
This transform will take Sass content and turn it into CSS.
Tidy Transform
This transform will run HTML Tidy over (X)HTML content.
RDF Transform
Turn an RDFa document into Turtle, N-Triples, RDF/XML, or JSON-LD.
Also turn any of those types into each other.
Strip Comments Transform
Removes the comments from HTML/XML markup.
(X)HTML Conversion Transform
Transforms HTML to XHTML and vice versa.
<head>
Transform
Rewrite Ensures the correct <title>
and <base href="…">
elements are
present in an (X)HTML document, as well as <link>
, <meta>
,
<script>
and <style>
.
Rehydrate Transform
Transforms certain inline elements in an (X)HTML document (<dfn>
,
<abbr>
…) into links to term definitions, people, places, companies…
Add Social Media Metadata Transform
Adds Google, Facebook,
Twitter,
etc. metadata to the <head>
of an (X)HTML document.
Add Backlinks Transform
Adds a chunk of markup containing backlinks to every block or section element in an (X)HTML document that is an identifiable RDF subject.
Rewrite Links Transform
Rewrites all links embedded in a markup document to the most up-to-date URIs.
mailto:
Transform
Mangle Obfuscates e-mail addresses/links in a manner serviceable to recovery by client-side scripting.
Add Amazon Tag Transform
Adds an affiliate tag to Amazon links.
Normalize RDFa Prefixes Transform
Moves RDFa prefix declarations to the root (or otherwise outermost available) node where possible; overwrites alternate prefix identifiers with those configured in the resolver; prunes out unused prefix declarations.
xml-stylesheet
PI Transform
Add This transform will add an <?xml-stylesheet …?>
processing
instruction to the top of an XML document, for use with XSLT or CSS.
Apply XSLT Transform
Applies an XSLT stylesheet to an XML document.
Reindent Transform
Normalizes the indentation of an HTML/XML document.
Image Conversion Transform
Converts a raster image from one format to another.
Crop Transform
Crops an image.
Scale Transform
Scales an image down.
Desaturate Transform
Makes an image black and white.
Posterize Transform
Posterizes an image.
Knockout Transform
Generates a transparency mask based on a colour value.
Brightness Transform
Manipulates an image's brightness.
Contrast Transform
Manipulates an image's contrast.
Gamma Transform
Manipulates an image's gamma value.
Implementation Note
Parts of Intertwingler
, notably the URI resolver and markup
generation handlers, depend on a
reasoner to make inferences
about assertions in the database. In 2018, when I began working on
Intertwingler
's predecessor, RDF::SAK
, the only workable
implementations of reasoners were in Java and Ruby (which still
appears to more or less be the case). I chose Ruby because it was
easier for prototyping. My vision for Intertwingler
, though, is that
it eventually has implementations in as many languages as it can.
Installation
For now I recommend just running the library out of its source tree:
~$ git clone [email protected]/doriantaylor/rb-intertwingler.git intertwingler
~$ cd intertwingler
~/intertwingler$ bundle install
Configuration
Intertwingler
is effectively a form of middleware, meaning it's effectively useless without mounds of content. Until further notice, my recommendation is to monitor the Getting Started guide.
Sponsorship
The bulk of the overhaul that transformed RDF::SAK
into
Intertwingler
was funded through a generous research fellowship by
the Ethereum Foundation, through their
inaugural Summer of Protocols
program. This was a unique opportunity for which I am sincerely
grateful. I would also like to thank Polyneme
LLC for their financial support and ongoing
interest in the project.
Contributing
Bug reports and pull requests are welcome at the GitHub repository.
Copyright & License
©2018-2023 Dorian Taylor
This software is provided under the Apache License, 2.0.