Giter Site home page Giter Site logo

Comments (42)

surma avatar surma commented on May 27, 2024 4

What are the advantages of this proposal, vs being able to create an iframe that runs in a different thread?

FWIW, I think being able to create document fragments in a Worker that can be manipulated, without having to pay the cost of layouting or rendering, but can be sent to a renderer thread seems valuable to me and sufficiently different from an iframe.

(I suppose a case could be made to introduce something like <iframe no-render> that can skip layout and rendering and effectively becomes a Worker with a DOM ™️ . Not sure if that has second-order implications tho).

from dom.

jakearchibald avatar jakearchibald commented on May 27, 2024 1

It feels like folks think there's a single line in browsers like:

if (isWorkerEnvironment) return;
exposeDOMAPIs();

But that isn't the case. It isn't that DOM APIs are simply not-exposed workers, it's that DOM APIs are not designed to work in non-document environments. Allowing DOM APIs to exist in workers will be a massive undertaking in terms of spec and implementation.

I'm not saying it's impossible, but it's not just flipping a flag.

DOM APIs are massively interlinked with style and rendering. It might be easier to create a new set of interfaces that don't have that issue, and can be cloned/transferred, and upgraded to HTMLElement & co within a document context.

from dom.

WebReflection avatar WebReflection commented on May 27, 2024 1

... so they're not an option!

and imho they shouldn't be in general, now that I think about it, because an iframe with a guaranteed thread (like a worker) would compete with workers at that point, making workers kinda redundant as inferior to iframes ability (no DOM parsing ability), beside the security concerns when foreign scripts might try to access their content.

from dom.

BenjaminAster avatar BenjaminAster commented on May 27, 2024 1

I think there are some fringe cases for DOM-in-Worker that make this particularly tricky.

All of the problems you mentioned have already been solved when browsers implemented DOMParser and DOMImplementation::createHTMLDocument(). If DOM APIs in workers would be spec'd, we could simply use the behavior that currently exists with them, only now in workers.

what happens to inline scripts when parsing?

Nothing. JS doesn't get executed, as is currently the case on the main thread with DOMParser and createHTMLDocument()

what happens to iframe or other nested documents (<svg foreignObject>, <embed> et al) in parsed documents within the Worker (do they get forced into the same thread?)

External content (iframe, embed, img, ...) doesn't get loaded at all. <foreignObject> gets parsed normally & on the same thread.

how would things like the media attribute work given that a document and its nodes have no direct relationship with display?

It doesn't. The media attribute in e.g.

<link rel="styleheet" href="dark.css" media="(prefers-color-scheme: dark)" />

would do absolutely nothing, and it doesn't matter since the CSS file doesn't get loaded anyway. Again, all of this is already the case today with "fake" documents created on the main thread via DOMParser or DOMImplementation::createHTMLDocument().

from dom.

rniwa avatar rniwa commented on May 27, 2024 1

The way browser engines such as Blink, Gecko, & WebKit are written right now, the vast majority of DOM code assumes that it's running in the main thread. Making it possible to run that code in a worker is a massive undertaking. Is it theoretically possible? Yes, but it's by no means simple or easy. It could easy be a multi-year/multi-engineer effort.

from dom.

jakearchibald avatar jakearchibald commented on May 27, 2024 1

Fwiw, in my linear() app, I wanted to be able to analyse SVG paths off the main thread. To do this, I needed to bring another implementation of SVG paths into a worker. I couldn't use the built-in APIs because there's no easy standard way to run them in a different thread. A different-thread iframe (rendered or not) would have solved this.

That might be a different use-case though.

In terms of DOM-in-workers, any thoughts on mine and @developit's suggestion to have a different, minimal API for this? As in, it doesn't create HTMLImageElements, where you have things like naturalWidth and decode(), but a simpler tree model that can be later upgraded to real elements, and that upgrade can only happen in a document.

from dom.

WebReflection avatar WebReflection commented on May 27, 2024 1

The rest of the thread talks of UI.

I never mentioned UI as desired feature and others mentioned no-render too as UI is not interesting or requested (also a non-sense from a Worker?) ... the OP, to which I agree with, is about having the parser exposed ... true that this requires a broader discussion around what we then want from the resulting document to happen when listeners are added or other special things (see Jake mention of naturalWidth) but it looks like we all agree (Surma desire of posting fragments a part) that a parser that produces a lightweight tree but it still validates inputs would be already a huge step forward in regards to this feature request.

from dom.

developit avatar developit commented on May 27, 2024 1

@WebReflection it seems like you're more arguing for DOMParser as a pure standalone implementation using the DOM's structure with the parsing and attribute semantics of HTML, but not including any of the base element prototypes. That seems a lot more feasible, and also seems reasonably in line with where folks have found value in things like LinkeDOM/WorkerDOM/etc.

Devs would be able to build sync mechanisms atop this just as they can with userland DOM implementations, they just wouldn't have to implement the DOM tree, parsing and events from scratch. I do think it might be the case that many of the most compelling use-cases for a dynamic DOM in Worker require property-level MutationObserver (I know my current project does).

from dom.

annevk avatar annevk commented on May 27, 2024 1

A lot more is needed with regards to step 1-7 of https://whatwg.org/faq#adding-new-features. If you think an issue helps with that I won't oppose it, but it strikes me as a rather specific suggestion which seems too early in the general conversation of "what problem are we trying to solve?".

from dom.

WebReflection avatar WebReflection commented on May 27, 2024

While I'd be +1 on this, this part is misleading:

I don't mean having direct access to the current document (that wouldn't make sense, of course)

that's already possible with coincident/window and it does make sense ... we use that to drive WASM targeting programming languages from a worker, without ever blocking via Atomics, giving them the ability to interact 1:1 with the DOM API (or anything else only available on main) so it's a solved problem to us, but surely having it native would be awesome, yet we're good, and we have demanded, working, and usable use cases, even my own DOM libraries work in there out of the box, so please let's not spread FUD around what's desirable or possible, as that's not necessary, thanks.

edit P.S. you'd probably be good with that module too, just use those API as they are from a worker and give it a shot, you might be surprised by everything just working out of the box. If not, please file an issue to the project, thanks again.

from dom.

jakearchibald avatar jakearchibald commented on May 27, 2024

What are the advantages of this proposal, vs being able to create an iframe that runs in a different thread?

from dom.

BenjaminAster avatar BenjaminAster commented on May 27, 2024

@jakearchibald That seems a bit... clunky? Coming from a worker, you'd have to pass a message to the main thread, which sends it to the sandboxed iframe, which sends the result back to the main thread, which sends it back to the worker. Am I missing something here? At the end of the day, using an iframe for that is a hack, and not what iframes were designed to do. DOMParser & friends are not something that are architecturally coupled to the main thread, so they simply should just be available in workers as well.


@WebReflection Hmmm... A thing that makes web workers so awesome is that they are completely isolated from the main thread – on modern systems, they even run in separate CPU cores – and therefore are not constrained by having to finish any synchronous work before the browser renders the next frame. Stuff like DOM operations with the current document are fundamentally synchronous operations and have to operate on the main thread which manages it. Of course, you could give workers access to the current document, but the way this would work internally in the browser is that the worker would somehow notify the main thread to make a DOM operation, the main thread then does this synchronously, and sends a "done" message back to the worker. And this is exactly what libraries like your coincident, via.js or comlink are already doing, just by implementing it themselves with Proxies, Atomics, postMessage, etc. And don't get me wrong: I think it absolutely is an awesome developer experience to be able to modify the current DOM directly from a worker, but building this natively into web browsers simply improves DX because you don't have to use a library for that anymore (or implement all the Proxy/Atomics/postMessage horror yourself), but I don't think you would get any performance benefits from it, as the DOM operations would still have to be executed on the main thread at the end, just that the browser would do it for you and you (or your library) don't have to worry about it anymore.

The proposal I'm talking about wouldn't involve the current document – and therefore the main thread – at all, and would work truly independent from anything outside the worker itself, which is not at all possible today (except if you use an iframe as Jake mentioned, or if you build your own HTML/XML parser, custom "virtual" DOM implementation, and HTML/XML serializer – which will never be as performant as the browser's native methods). This would give actual performance benefits as you have your own separate thread and can do a long, synchronous operation like parsing a giant HTML/XML file that may last dozens of milliseconds, while the document and the main thread simultaneously do their own independent thing.

you'd probably be good with that module too, just use those API as they are from a worker and give it a shot, you might be surprised by everything just working out of the box.

Going back to my use case of parsing a large amount of XML files extracted from a zipped DOCX file, existing libraries like your concident or other ones mentioned above do provide awesome developer experiences, but they do not solve my use case, as even though you can then create a DOMParser in a worker, everything is still just a proxy to the main thread (correct my if I'm wrong here) and the actual XML parsing would be executed on the main thread – which is exactly what I'm trying to avoid.

from dom.

WebReflection avatar WebReflection commented on May 27, 2024

@BenjaminAster you are right, proxied stuff will operate from the main when it comes to main-only utilities, but if iframe already uses a separated thread (or ... does it?) you can use coincident or other projects from that iframe and delegate the iframe to communicate eventually stuff to its parent? if the iframe doesn't create its own thread though I agree having DOMParser in workers is desirable and surely less hacky.

from dom.

jakearchibald avatar jakearchibald commented on May 27, 2024

@BenjaminAster

That seems a bit... clunky? Coming from a worker…

Yeah, that's fair. If your starting point is a worker, the iframe solution isn't great. But, maybe being able to create one of these iframes from a worker is a solution.

At the end of the day, using an iframe for that is a hack, and not what iframes were designed to do.

I don't find this very compelling. You could equally, and truthfully say that DOM APIs weren't designed to be in workers. Whatever solution is employed here will involve changing the intentional design of something.

DOMParser & friends are not something that are architecturally coupled to the main thread

Yes they are. They're absolutely coupled to documents. That's why they aren't available in workers.

Maybe their design could be changed so they don't need to be coupled to documents, but isn't where we're at right now.

from dom.

WebReflection avatar WebReflection commented on May 27, 2024

DOM APIs are massively interlinked with style and rendering

but (new DOMParser).parseFromString(...) works already, right? I am not sure, if stuff is never live, how this API could be problematic once exposed via Worker 🤔


I went ahead and did a test ... the iframe hack is awkward (it needs a sandbox that apparently allows a different thread and at the same time is discouraged and it warns but it's needed for worker to execute).

index.html

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width,initial-scale=1">
    <script src="../../mini-coi.js"></script>
    <script>
      addEventListener('message', ({data}) => {
        document.body.append(data);
      });
    </script>
  </head>
  <body>
    <iframe src="iframe.html"
      sandbox="allow-scripts allow-same-origin"
      frameborder="0" width="0" height="0"
      style="position:absolute;top:-1px;left:-1px"
    ></iframe>
  </body>
</html>

iframe.html

<!DOCTYPE html>
<script type="module">
import coincident from '../../window.js';
coincident(new Worker('./worker.js', {type: 'module'}));
</script>

worker.js

import coincident from '../../window.js';

const {window} = coincident(self);
const parser = new window.DOMParser;

const document = parser.parseFromString(
  '<!doctype html>',
  'text/html'
);

document.body.textContent = 'Hello World';

// send a message to the parent
window.parent.postMessage(document.documentElement.outerHTML);
// <html><head></head><body>Hello World</body></html>

Live test here

I believe this would cover @BenjaminAster non-blocking use case via a whole DOM API that should not execute among the main thread but I couldn't find any encouraging discussion around this assumption, yet it seems to be de-facto standard.

from dom.

WebReflection avatar WebReflection commented on May 27, 2024

btw ... I've just realized that if the iframe is already on a different thread, coincident is kinda useless ... I just used it to be sure I could at least have it running from an iframe but if it uses the iframe thread and that's sync, there's no advantage in doing that at all ... so iframe doesn't look like an answer if we can't guarantee it runs on a separate, non-blocking, thread.

from dom.

jakearchibald avatar jakearchibald commented on May 27, 2024

@WebReflection

if stuff is never live, how this API could be problematic once exposed via Worker 🤔

What do you mean by 'live'? Remember that some elements have actions when they're constructed, not just when they're connected. Eg creating an image.

it needs a sandbox that apparently allows a different thread

I don't believe browsers run iframes in a different thread, even if they have the sandbox attribute.

iframe doesn't look like an answer if we can't guarantee it runs on a separate, non-blocking, thread

Right, that's why I was proposing a feature that did that.

from dom.

WebReflection avatar WebReflection commented on May 27, 2024

Remember that some elements have actions when they're constructed, not just when they're connected. Eg creating an image.

of course I did not think about that, fair enough then.

I don't believe browsers run iframes in a different thread, even if they have the sandbox attribute.

from live tests via SO iframes run in a different thread if:

  • the src points to a different domain
  • the sandbox attribute is used ... at least that's what devs observed and tested live

Right, that's why I was proposing a feature that did that.

it'd be awesome, and if not too problematic and it can speed up things more, @surma hint around no-render would be a strawberry on the cake.

from dom.

WebReflection avatar WebReflection commented on May 27, 2024

Remember that some elements have actions when they're constructed ... Eg creating an image.

wait a minute though ... I don't see any network activity in here ... that's what I meant by live ... if we parse to retrieve a document I don't think the parser constructs out of the box those elements until these are live/adopted ... what am I missing?

(new DOMParser).parseFromString('<img src="shenanigans.png">', 'text/html')

from dom.

jakearchibald avatar jakearchibald commented on May 27, 2024

Maybe images were a bad example then - my point is that someone is going to have to go through all the elements and check that their constructor behaviours are worker compatible.

from dom.

WebReflection avatar WebReflection commented on May 27, 2024

I'd be curious to know which element might have issues though, as I think most of them need to be adopted and pass through the adopt algorithm before having any meaning for the current environment ... I've tested <base>, custom elements, others, I can't find anything working at all unless adopted by the "live document". MDN also doesn't specify anything around this behavior and standards mention that scripts will be flagged as not-executable https://html.spec.whatwg.org/multipage/dynamic-markup-insertion.html#dom-domparser-parsefromstring-dev

that's still something to consider while adopting those nodes ... moreover:

The document's encoding will be left as its default, of UTF-8. In particular, any XML declarations or meta elements found while parsing string will have no effect.

In the parsing model it's also not clear why this would be unsafe if the document is created via the API ... looking forward for some enlightenment around this.

from dom.

BenjaminAster avatar BenjaminAster commented on May 27, 2024

Yes they are. They're absolutely coupled to documents. That's why they aren't available in workers.

Of course it will be some work to implement, but all of the things like computed CSS styles, layout, scripts, resource loading, ... aren't a thing in documents created by DOMParser or DOMImplementation::createHTMLDocument. That's what I meant by "not architecturally coupled to the main thread". I remember that when implementing OffscreenCanvas, that was a lot of work because suddenly stuff like font rendering and CSS parsing (via context2d.{font, fillStyle, etc.}) needed to work in workers. The only thing related to this that comes to my mind now is Document::styleSheets, which gives access to parsed CSS stylesheets and I think currently works also with "fake" documents. For this to work in workers, yes, there would have to be a basic CSS parser available in workers, but if that's too difficult to implement, I guess for the "minimum viable product" of worker DOM APIs, browsers could just leave this empty and not parse the CSS at all? I think the use cases for parsing CSS in a worker are minimal anyways.

Edit: Ok, it turns out Document::styleSheets does not work, but HTMLStyleElement::sheet does work, i.e.

new DOMParser().parseFromString("<!DOCTYPE html><style> body { color: red } </style>", "text/html").querySelector("style").sheet.cssRules

returns the correctly parsed CSS with one rule containing one declaration.

I don't believe browsers run iframes in a different thread, even if they have the sandbox attribute.

I know @WebReflection already mentioned that now, but at least in Chrome where I tested it, it seems that iframes with a sandbox attribute do run in their separate thread. You can try it out with e.g. this setup:

index.html:

<!DOCTYPE html>
<html lang="en">
<head>
	<script type="module">
		const frame = () => {
			millis.textContent = performance.now()
			requestAnimationFrame(frame)
		}
		requestAnimationFrame(frame)
	</script>
</head>
<body>
	<div id="millis"></div>
	<iframe src="iframe.html" sandbox="allow-scripts"></iframe>
</body>
</html>

iframe.html:

<!DOCTYPE html>
<html lang="en">
<head>
	<script type="module">
		const frame = () => {
			millis.textContent = performance.now()
			requestAnimationFrame(frame)
		}
		requestAnimationFrame(frame)
		block.onclick = () => {
			while(true);
		}
	</script>
</head>
<body>
	<div id="millis"></div>
	<button id="block">block</button>
</body>
</html>

If you click the "block" button in the iframe, the iframe is totally blocked but the parent frame continues to run.

Live demo now published at benjaminaster.com/playground/async-iframe

from dom.

jakearchibald avatar jakearchibald commented on May 27, 2024

at least in Chrome where I tested it, it seems that iframes with a sandbox attribute do run in their separate thread

Interesting. That wasn't the case a couple of months ago when I last tested it. It that the case on mobile too?

from dom.

WebReflection avatar WebReflection commented on May 27, 2024

Is that the case on mobile too?

in the SO thread somebody mentioned on Android heuristics can be different (no guarantees, depends on ... things ...) but on Desktop it seems to be consistent.

The thread mentions also that multiple iframes, even with sandbox attribute, will share the same thread so if you add 2 iframes in the above example a click in one will (should) block the other iframe too (still not the main thread).

from dom.

jakearchibald avatar jakearchibald commented on May 27, 2024

If you click the "block" button in the iframe, the iframe is totally blocked but the parent frame continues to run.

Live demo now published at benjaminaster.com/playground/async-iframe

It blocks the whole tab for me. Desktop Chrome 115.0.5790.114 on mac.

from dom.

BenjaminAster avatar BenjaminAster commented on May 27, 2024

It blocks the whole tab for me. Desktop Chrome 115.0.5790.114 on mac.

Ha, I had tested it in Chromium 113 on my Raspberry Pi (separate threads), and now in Chrome 115 on Windows and Android, where it indeed blocks the main thread... Interesting. So it either changed in a very recent Chrome version, or my Raspberry Pi somehow handles that differently. Anyways, yep, you're right, iframes do generally block the whole tab, so they're not an option!

from dom.

developit avatar developit commented on May 27, 2024

Might be worth splitting the discussion here up into two topics:

  1. Ergonomic differences between iframe-as-thread and worker-with-dom
  2. Spec + technical feasibility (of exposing a DOM to Workers, and of allowing <iframe sandbox> to strictly imply OMT)

For #1:

It seems like any ergonomic warts in the process of constructing an iframe are either solvable in userland (essentially add an optimized mechanism for using <iframe sandbox> in a JS-loading-JS scenarios rather than just HTML-loading-HTML).

The ergonomics of the DOM-in-Worker are clearer to me, the issues there seem to be more on the spec and implementation side.

For #2:
I think there are some fringe cases for DOM-in-Worker that make this particularly tricky. Some potential cases off the top of my head:

  • what happens to inline scripts when parsing?
  • what happens to iframe or other nested documents (<svg foreignObject>, <embed> et al) in parsed documents within the Worker (do they get forced into the same thread?)
  • how would things like the media attribute work given that a document and its nodes have no direct relationship with display?

I can think of possible answers to these things, but they would all seem to require substantial revisions to DOM specs. Seems like it would be easier to spec out a "lite" DOM interface that avoids all of these issues by omitting presentation-related APIs.

from dom.

WebReflection avatar WebReflection commented on May 27, 2024

@developit I agree with @BenjaminAster there: nothing you mentioned is an issue with current living standard because DOMParser and parseFromString do nothing until created nodes from that document get adopted.

In Workers, there's no way to adopt these in any meaningful way ("live content") because nothing is ever live ... no src, no source, no CSS, nothing ... the parseFromString rightly does parsing only, the rest is performed only when stuff gets adopted on the main, live, thread (which can't be the case within workers as we can't postMessage DOM nodes, as per structured clone algorithm specs).

from dom.

WebReflection avatar WebReflection commented on May 27, 2024

Is it theoretically possible? Yes, but it's by no means simple or easy.

I don't think anyone in here believes it's a flag switch, like Jake suggested, but it would be interesting to understand why the main is so special in "just parsing" regards (which of course needs many other classes exposed to work properly).

It could easy be a multi-year/multi-engineer effort.

LinkeDOM (or other projects that already run in workers) could be a great polyfill in the meantime but if there's no vendors interest in moving forward with this proposal there won't be interest in making these projects closer to standards than they are now.

from dom.

jakearchibald avatar jakearchibald commented on May 27, 2024

I don't think anyone in here believes it's a flag switch, like Jake suggested

They really do. See the thread that started this one w3c/ServiceWorker#846 - the feeling there is very much that service workers chose to block DOM APIs from that context. Even down to the latest comment w3c/ServiceWorker#846 (comment).

from dom.

WebReflection avatar WebReflection commented on May 27, 2024

They really do.

sad thread ... and I should've specified in here 😅

@rniwa on a second thought about this:

It could easy be a multi-year/multi-engineer effort.

I think that if we had a way to ensure a separate, non-blocking, thread for an iframe we could cut some corner and have what we want, in terms of functionality, even if that's not exactly where we want it (workers) ... as apparently in some circumstance iframes already get that thread, would @surma suggestion around having a no-render (or any other name) be a fast way forward, hopefully relatively easier than bringing the DOM to Workers?

from dom.

keithamus avatar keithamus commented on May 27, 2024

It might be good to contextualise what people want. The ability to de-serialize an HTML string into some kind of object model - and back again - is a hugely different problem than reifying HTML into a DOM; as others have alluded to.

If the ask is "I don't want to bring my own HTML parser when the browser has a perfectly good one outside of Workers" then that closes the scope to a large degree compared to "I want to have the full suite of DOM APIs and shuttle tree fragments between threads".

What gives me pause about this discussion is; while I don't think people are naive enough to believe the DOM is intentionally blocked from workers, I do think that even people in this thread are failing to correctly grasp (or articulate) exactly what they want and the ramifications of that. I think the reason DOMParser() exists, and not HTMLParser() is because it answers a question and gives developers a fully reified DOM sits at the very end of a set of steps of taking HTML and turning it into UI. Everything in between is full of so much nuance that it's hard to find one place to settle on.

An HTML parser would alleviate you from some code within workers, and maybe give you a nice performance boost, but I think if people asked for it, they'd end up disappointed with what you get for it (not the DOM). Having a tree of objects that don't ascribe any semantic meaning to each node gives you very little, and once all that data gets sent to the main thread it still needs to be reified into the DOM, and all the things that your application wants like event listeners. The OP gives some good use cases for having general purpose serialisation but those cases aren't UI, they're data transformation. The rest of the thread talks of UI.

On the other hand having an object model that represents HTML requires full reification, which includes all the aforementioned steps and all the decisions about that must come from somewhere - so you're either introducing a fake environment which means whatever DOM you pass back to the main thread needs to effectively go through the same reification all over again (which means reification gets done twice and possibly diverges in each, making a worker DOM not WYSIWYG) or you need to introduce shenanigans tying a worker to a main thread's DOM so you can marshal data back and forth in order to make decisions, at which point you're back to blocking and may as well have done it in the main thread.

In addition, to talk of some of the use cases of the OP; I don't think the use cases are quite as compelling on the second glance. Let's take for example markdown to HTML. The final artefact is indeed DOM but it's much simpler to write a markdown to HTML converter (that is, converting one string to another string), then hand that to a browser to convert into DOM, than it is to write a markdown to DOM converter. While it would be useful to have an HTML parser to sanitize input, that is the last step in a chain of operations that has to happen before DOM, and pretty much where the contract ends. Up until sanitization the fastest and easiest way to generate HTML from markdown is string to string. DOM APIs would give us nothing in converting markdown.

from dom.

jakearchibald avatar jakearchibald commented on May 27, 2024

and all the things that your application wants like event listeners

Yeah, this is where things get messy. Let's say you did this in a worker:

const div = workerDOM.createElement('div');
div.addEventListener('click', () => console.log('click'));

self.postMessage(div);

…would that event listener 'work'? Would preventDefault in that listener work?

You end up with the same question for every bit of state an element can have that sits outside of the serialisable tree. Pixels on a canvas, styles in a sheet etc etc.

from dom.

WebReflection avatar WebReflection commented on May 27, 2024

that fails at the structured clone level:

  • no callbacks
  • no DOM nodes

IMHO, if DOMParser could validate and produce dummy nodes which are all just Node and ParentNode interfaces + Document with just querySelector or other features such as Xpath to parse & validate + lightweight search crawling ability, it'd be pretty awesome. EventListener or EventTarget interface feels like unnecessary to me or these could be added via utilities when and/or if needed, but maybe I am reducing too much the scope of the proposal, yet I don't see postMessaging DOM nodes making much sense, even if it'd be great for DX, but too much magic involved and tons of surprises.

Posting outerHTML would already go a long way to me, when or if that's needed.

from dom.

jakearchibald avatar jakearchibald commented on May 27, 2024

that fails at the structured clone level:

My assumption from the OP that folks wanted some way to send this stuff from the worker to the document.

from dom.

BenjaminAster avatar BenjaminAster commented on May 27, 2024

Basically, all I really need is three things:

  • convert an HTML/XML string to some magical tree of node objects (aka. Document) or create a new one
  • mess around with that tree by adding, modifying and removing nodes
  • convert the tree back to a valid HTML/XML string

I don't really need anything like event listeners or even being able to postMessage the tree to the main thread. I'm ok with the idea of a "lite" "DOM alternative"; I think that would reduce down to something like the following API shape (names TBD of course):

  • LiteNode:
    • .childNodes
    • .firstChild (?)
    • .lastChild (?)
    • .nextSibling
    • .nodeName
    • .parentElement
    • .parentNode
    • .previousSibling
    • .textContent
    • .cloneNode()
    • .compareDocumentPosition() (?)
    • .contains()
    • .getRootNode() (?)
    • .isEqualNode() (?)
    • .normalize()
  • LiteElement (extends LiteNode):
    • .classList (?) (for convenience)
    • .dataset (?) (for convenience)
    • .innerHTML (?)
    • .innerText (?) (always treats element like white-space: normal)
    • .localName
    • .namespaceURI
    • .nextElementSibling
    • .outerHTML (?)
    • .prefix
    • .previousElementSibling
    • .tagName
    • .after() (?)
    • .append() (?)
    • .before() (?)
    • .getAttribute()
    • .getAttributeNS()
    • .getAttributeNames()
    • .hasAttribute()
    • .hasAttributeNS()
    • .insertAdjacentElement() (?)
    • .insertAdjacentHTML() (?)
    • .insertAdjacentText() (?)
    • .prepend() (?)
    • .remove()
    • .removeAttribute()
    • .removeAttributeNS()
    • .replaceChildren()
    • .replaceWith()
    • .toggleAttribute()
  • LiteElement & LiteDocument (both extend LiteNode):
    • .childElementCount
    • .children
    • .firstElementChild (?)
    • .lastElementChild (?)
    • .getElementsByClassName()
    • .getElementsByTagName()
    • .getElementsByTagNameNS()
  • LiteDocument (extends LiteNode):
    • .body (?)
    • .documentElement
    • .head (?)
    • .getElementById()
  • LiteDocumentFragment (extends LiteNode)
  • equivalent of DOMParser
  • equivalent of XMLSerializer
  • equivalent of window.document.implementation.createDocument()
  • equivalent of window.document.implementation.createDocumentType()
  • equivalent of window.document.implementation.createHTMLDocument()
  • equivalent of window.document.createAttribute()
  • equivalent of window.document.createAttributeNS()
  • equivalent of window.document.createCDATASection()
  • equivalent of window.document.createComment()
  • equivalent of window.document.createElement()
  • equivalent of window.document.createElementNS() (?)
  • equivalent of window.document.createProcessingInstruction()
  • equivalent of window.document.createTextNode()

Some things to consider:

  • Shadow roots? (probably out of scope)
  • Sanitizer API?
  • CSS selector parser?
    • .closest() (?)
    • .matches() (?)
    • .querySelector(All)() (?) (would be very helpful!)
  • .before()/.after()/.prepend()/.append() vs .insertAdjacentElement()/.insertAdjacentText()? (only one of them is needed)
  • Non-element nodes (text nodes, comments, CDATA sections and XML processing instructions): Should they just use the LiteNode interface directly or get their own respective interfaces like in "main" DOM?
  • XPath?
  • XSLT?
  • MutationObserver?

from dom.

WebReflection avatar WebReflection commented on May 27, 2024

@developit yup, we're aligned, and so seems to be @BenjaminAster 👍

from dom.

bahrus avatar bahrus commented on May 27, 2024

I raised a related issue here, but that web site seems to be down (not sure if that's permanent), so I would like to make the suggestion here, if I may:

Cloudflare, which models itself after service workers, but on the server side, introduced something quite innovative: The HTML rewriter.

I think this would be a great first step in achieving more ambitious goals mentioned above. It would provide the ability to inject dynamic data (say from IndexedDB) into the HTML stream, as the content streams into the browser. From my experiments, having this api would allow developers to build a DOM Parser. Perhaps such a DOM parser, built in userland, could then become a candidate for inclusion, once it proves useful and mature.

Also, creating link preview functionality would be doable with this, and avoid an extra hop passing through a cloudflare worker.

There have been implementations with web assembly, which would seem to suggest that we would have a running start getting this implemented in the browser, and the ability to polyfill would be quite feasible.

from dom.

annevk avatar annevk commented on May 27, 2024

This is starting to sound like a duplicate of issue #270.

from dom.

bahrus avatar bahrus commented on May 27, 2024

There are some similarities between the HTML Rewriter and the DOMTreeConstruction class. But I think the HTML Rewriter API provides an extra ability to filter nodes based on a subset of css matching, which seems quite useful. Not sure if I should open a separate issue to propose the HTML Rewriter API? (I don't want to be accused of spamming by opening duplicates).

from dom.

bakkot avatar bakkot commented on May 27, 2024

What are the advantages of this proposal, vs being able to create an iframe that runs in a different thread?

Ability to create an iframe is gated on CSP; frame-src 'none' will prevent that from working. worker-src similarly gates workers, of course, but it's a lot more justifiable (and also objectively less dangerous) to relax your worker-src to do off-thread computation than to relax your frame-src.

from dom.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.