Giter Site home page Giter Site logo

Comments (12)

danbri avatar danbri commented on July 20, 2024 1

would we ignore the native-to-that-format semantics of any other XML attributes or elements, e.g. that messed with base URIs or currently-in-scope objects or whatever? I'd assume so to avoid chaos.

from microdata.

chaals avatar chaals commented on July 20, 2024

A very quick test suggests that at least the Yandex Structured Data Validator, the opensource structured data linter and Google's Structured Data Testing Tool all recognise microdata markup in SVG that is included in HTML. For a pure SVG sample, the Google and Yandex tools extracted the data. It is unclear if the structured linter refused to recognise it or if there was another error in parsing. More tests needed…

Ping @gkellogg

from microdata.

gkellogg avatar gkellogg commented on July 20, 2024

Do you have example SVG I can try?

from microdata.

chaals avatar chaals commented on July 20, 2024
<svg xmlns="http://www.w3.org/2000/svg" itemscope="itemscope"
   itemtype="https://schema.org/CreativeWork">
 <title itemprop="name">A microdata in SVG test</title>
 <desc itemprop="description">This is a test case for an svg file
  to see whether microdata processing tools
  actually process the attributes defined in microdata
  the same way in SVG as they do in HTML</desc>
</svg>

Actually, the SDL did extract the data when I did this - but my quick test wasn't valid XML - I just used the itemscope attribute bare, HTML style, which is what stopped it parsing.

from microdata.

gkellogg avatar gkellogg commented on July 20, 2024

As you note, the reason the SDL shows an error, is because parsing all goes through the RDFa parser, which does a host-language detection. In this case, the host language is detected as svg, which is parsed using the XML parser, which does require that attributes have values, thus the following errors:

Specification mandate value for attribute itemscope
attributes construct error

Wrap it in HTML, and it is parsed using the HTML5 parser (now Gumbo), where such attributes do not require values:

<html>
<svg xmlns="http://www.w3.org/2000/svg" itemscope="itemscope"
   itemtype="https://schema.org/CreativeWork">
 <title itemprop="name">A microdata in SVG test</title>
 <desc itemprop="description">This is a test case for an svg file
  to see whether microdata processing tools
  actually process the attributes defined in microdata
  the same way in SVG as they do in HTML</desc>
</svg>
</html>

from microdata.

chaals avatar chaals commented on July 20, 2024

Yeah - in pure SVG, the microdata is not parsed by SDLinter.

My inclination is to suggest that this is a bug, and that the attributes, which are in the null namespace, should generally work unless a specific language tries to say they don't.

@gkellogg that would mean you're non-conformant - what do you think about making the change?

Has anyone tested other tools?

from microdata.

gkellogg avatar gkellogg commented on July 20, 2024

To say it's a Linter bug would imply that SVG should be parsed as HTML, not XML. According to the SVG Spec, the mime-type for SVG is application/svg+xml. XML does not allow attributes without values. To say that it's a bug for the Linter to parse a valueless attributes when given SVG implies that it is not correct to parse SVG as XML, so I beg to differ.

Of course, parsing SVG embedded within HTML does use an HTML parser, so in that context, a valueless attributes is parsed just fine.

Are there really real-world cases where Microdata is used in pure SVG and this is expected? Is there some basis for thinking that SVG should be parsed as HTML in this case?

from microdata.

iherman avatar iherman commented on July 20, 2024

The microdata parser, incorporated into the RDFLib library (which is probably the most widely used RDF Library in Python land) uses the html5parser, and also includes statements that are very much HTML5 specific (e.g., usage of the <base> element. SVG parsing would be doable, of course, but would require extra work.

from microdata.

chaals avatar chaals commented on July 20, 2024

@gkellogg

To say it's a Linter bug would imply that SVG should be parsed as HTML, not XML. According to the SVG Spec, the mime-type for SVG is application/svg+xml. XML does not allow attributes without values.

Sorry, it seems I have not been clear.

Assuming correct XML such as

<svg xmlns="http://www.w3.org/2000/svg" itemscope="itemscope"
   itemtype="https://schema.org/CreativeWork">
 <title itemprop="name">A microdata in SVG test</title>
 <desc itemprop="description">This is a test case for an svg file
  to see whether microdata processing tools
  actually process the attributes defined in microdata
  the same way in SVG as they do in HTML</desc>
</svg>

As far as I can tell - and I haven't tested rigourously enough to be satisfied myself - the SDLinter applies an RDFa parser, so gets some data but not what one would expect as microdata. Google's and Yandex' tools apply a microdata parser, and extract the microdata as one might naively expect.

According to the current spec, the attributes are meaningless except in HTML, so Google and Yandex are reading data the spec claims isn't there, SDL is conformant, and I haven't looked at other parsers I know of, let alone those I don't.

I don't have data on real-world usage of microdata in SVG, so making this potential change isn't high priority. It was motivated by my initial observation that it worked, so I began looking at the interoperability.

There are potential use cases for it:

  • include information for direct searchability. SVG has few native semantics that search can use.
  • describing the accessibility of SVG
  • enhancing SVG with metadata, as a richer transfer format

from microdata.

chaals avatar chaals commented on July 20, 2024

If only the search engines' tools process this, I suggest that we mark it as wontfix at least for the current version, and wait for some upswell of demand. It's a trivial spec change, if we find broad interoperability later.

from microdata.

gkellogg avatar gkellogg commented on July 20, 2024

Using the RDFa parser on rdf.greggkellogg.net/distiller, I get the following Turtle for your example:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

 [
     a <https://schema.org/CreativeWork>;
     <https://schema.org/description> """This is a test case for an svg file
  to see whether microdata processing tools
  actually process the attributes defined in microdata
  the same way in SVG as they do in HTML""";
     <https://schema.org/name> "A microdata in SVG test"
 ] .

It looks like the linter gets pretty much the same results. (I'd be really worried if it didn't!)

Note that the RDFa parser is used, as it looks for all other embedded formats, including RDF/XML, Microdata and anything inside a script element with an @type related to some RDF format.

from microdata.

chaals avatar chaals commented on July 20, 2024

So the Linter, the distiller, Yandex' and Google's tools all get the "right answer".

Using arbitrary XML, the Linter, Google and Yandex get the same answers...

<testel xmlns="http://example.org/2017/tests" itemscope="itemscope"
   itemtype="https://schema.org/CreativeWork">
 <hohum itemprop="name">A microdata in XML test</hohum>
 <dots itemprop="description">This is a test case for an svg file
  to see whether microdata processing tools
  actually process the attributes defined in microdata
  the same way in arbitrary XML as they do in HTML</dots>
</testel>

I'm going to remove the restriction on the meaning of the attributes. If we collide with someone else's XML attributes that are also in the null namespace, that shows there was a questionable design decision in XML, but doesn't seem to change anything in practice.

I propose we remove the restriction on attributes only having meaning in HTML. I'll make a PR.

from microdata.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.