Giter Site home page Giter Site logo

Comments (6)

jtojnar avatar jtojnar commented on May 29, 2024

I think the tidy step is somewhat valuable (maybe we could even extend it to remove img without src) so I would add source and other void elements to the list. It is not like new HTML elements appear that often.


Perhaps the medium page is XHTML5 page? [Edit: Nevermind, they do not close the img.] For XML parser, self-termination should be equivalent to no content followed by a closing tag. I verified it with validator on the following document:

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Test void element with closing tag</title>
</head>
<body>
 <picture>
  <source srcset="[…]" type="image/webp"></source>
  <source data-testid="og" srcset="[…]"></source>
  <img alt="" class="bg mj mk c" width="651" height="478" loading="lazy" role="presentation">
 </picture>
</body>
</html>

Needs to be uploaded as xhtml file, direct input does not detect XML correctly.

from graby.

Kdecherf avatar Kdecherf commented on May 29, 2024

maybe we could even extend it to remove img without src

Removing img without src attribute would break picture tags containing it, even if source tags are present, see wallabag/wallabag#6414 (comment)

from graby.

j0k3r avatar j0k3r commented on May 29, 2024

I think the fastest is to add source to the list.
About adding more elements to the list, is that one enough? https://developer.mozilla.org/en-US/docs/Glossary/Void_element

Funny I didn't see the iframe element in the list..

from graby.

jtojnar avatar jtojnar commented on May 29, 2024

Oh, I see what Kevin means now. The filter cleans up all elements without content, even when they are meaningful. This includes void elements but is not exclusive to them. For example, empty tds are needed for empty table cells because removing them would jumble the table. And while we could enumerate void elements quite easily, the non-void ones are much harder since there is probably no official list of elements that are meaningful without content.

Kevin mentioned continuing to extend the blacklist, and removing the filter completely as the options. Alternately, we could also switch to a whitelist of elements to remove when empty (e.g. p|span|font). That would give us at least some tidiness. We could also log all other empty elements not in the whitelist to allow us to grow coverage over time.

But the choice of action depends on the goals of Graby – do we want to preserve content even when it might be a mess, or do we want a clean content model at the cost of it being potentially incomplete?

from graby.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.