Comments (12)
would we ignore the native-to-that-format semantics of any other XML attributes or elements, e.g. that messed with base URIs or currently-in-scope objects or whatever? I'd assume so to avoid chaos.
from microdata.
A very quick test suggests that at least the Yandex Structured Data Validator, the opensource structured data linter and Google's Structured Data Testing Tool all recognise microdata markup in SVG that is included in HTML. For a pure SVG sample, the Google and Yandex tools extracted the data. It is unclear if the structured linter refused to recognise it or if there was another error in parsing. More tests needed…
Ping @gkellogg
from microdata.
Do you have example SVG I can try?
from microdata.
<svg xmlns="http://www.w3.org/2000/svg" itemscope="itemscope"
itemtype="https://schema.org/CreativeWork">
<title itemprop="name">A microdata in SVG test</title>
<desc itemprop="description">This is a test case for an svg file
to see whether microdata processing tools
actually process the attributes defined in microdata
the same way in SVG as they do in HTML</desc>
</svg>
Actually, the SDL did extract the data when I did this - but my quick test wasn't valid XML - I just used the itemscope attribute bare, HTML style, which is what stopped it parsing.
from microdata.
As you note, the reason the SDL shows an error, is because parsing all goes through the RDFa parser, which does a host-language detection. In this case, the host language is detected as svg
, which is parsed using the XML parser, which does require that attributes have values, thus the following errors:
Specification mandate value for attribute itemscope
attributes construct error
Wrap it in HTML, and it is parsed using the HTML5 parser (now Gumbo), where such attributes do not require values:
<html>
<svg xmlns="http://www.w3.org/2000/svg" itemscope="itemscope"
itemtype="https://schema.org/CreativeWork">
<title itemprop="name">A microdata in SVG test</title>
<desc itemprop="description">This is a test case for an svg file
to see whether microdata processing tools
actually process the attributes defined in microdata
the same way in SVG as they do in HTML</desc>
</svg>
</html>
from microdata.
Yeah - in pure SVG, the microdata is not parsed by SDLinter.
My inclination is to suggest that this is a bug, and that the attributes, which are in the null namespace, should generally work unless a specific language tries to say they don't.
@gkellogg that would mean you're non-conformant - what do you think about making the change?
Has anyone tested other tools?
from microdata.
To say it's a Linter bug would imply that SVG should be parsed as HTML, not XML. According to the SVG Spec, the mime-type for SVG is application/svg+xml. XML does not allow attributes without values. To say that it's a bug for the Linter to parse a valueless attributes when given SVG implies that it is not correct to parse SVG as XML, so I beg to differ.
Of course, parsing SVG embedded within HTML does use an HTML parser, so in that context, a valueless attributes is parsed just fine.
Are there really real-world cases where Microdata is used in pure SVG and this is expected? Is there some basis for thinking that SVG should be parsed as HTML in this case?
from microdata.
The microdata parser, incorporated into the RDFLib library (which is probably the most widely used RDF Library in Python land) uses the html5parser, and also includes statements that are very much HTML5 specific (e.g., usage of the <base>
element. SVG parsing would be doable, of course, but would require extra work.
from microdata.
To say it's a Linter bug would imply that SVG should be parsed as HTML, not XML. According to the SVG Spec, the mime-type for SVG is application/svg+xml. XML does not allow attributes without values.
Sorry, it seems I have not been clear.
Assuming correct XML such as
<svg xmlns="http://www.w3.org/2000/svg" itemscope="itemscope"
itemtype="https://schema.org/CreativeWork">
<title itemprop="name">A microdata in SVG test</title>
<desc itemprop="description">This is a test case for an svg file
to see whether microdata processing tools
actually process the attributes defined in microdata
the same way in SVG as they do in HTML</desc>
</svg>
As far as I can tell - and I haven't tested rigourously enough to be satisfied myself - the SDLinter applies an RDFa parser, so gets some data but not what one would expect as microdata. Google's and Yandex' tools apply a microdata parser, and extract the microdata as one might naively expect.
According to the current spec, the attributes are meaningless except in HTML, so Google and Yandex are reading data the spec claims isn't there, SDL is conformant, and I haven't looked at other parsers I know of, let alone those I don't.
I don't have data on real-world usage of microdata in SVG, so making this potential change isn't high priority. It was motivated by my initial observation that it worked, so I began looking at the interoperability.
There are potential use cases for it:
- include information for direct searchability. SVG has few native semantics that search can use.
- describing the accessibility of SVG
- enhancing SVG with metadata, as a richer transfer format
from microdata.
If only the search engines' tools process this, I suggest that we mark it as wontfix at least for the current version, and wait for some upswell of demand. It's a trivial spec change, if we find broad interoperability later.
from microdata.
Using the RDFa parser on rdf.greggkellogg.net/distiller, I get the following Turtle for your example:
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
[
a <https://schema.org/CreativeWork>;
<https://schema.org/description> """This is a test case for an svg file
to see whether microdata processing tools
actually process the attributes defined in microdata
the same way in SVG as they do in HTML""";
<https://schema.org/name> "A microdata in SVG test"
] .
It looks like the linter gets pretty much the same results. (I'd be really worried if it didn't!)
Note that the RDFa parser is used, as it looks for all other embedded formats, including RDF/XML, Microdata and anything inside a script
element with an @type
related to some RDF format.
from microdata.
So the Linter, the distiller, Yandex' and Google's tools all get the "right answer".
Using arbitrary XML, the Linter, Google and Yandex get the same answers...
<testel xmlns="http://example.org/2017/tests" itemscope="itemscope"
itemtype="https://schema.org/CreativeWork">
<hohum itemprop="name">A microdata in XML test</hohum>
<dots itemprop="description">This is a test case for an svg file
to see whether microdata processing tools
actually process the attributes defined in microdata
the same way in arbitrary XML as they do in HTML</dots>
</testel>
I'm going to remove the restriction on the meaning of the attributes. If we collide with someone else's XML attributes that are also in the null namespace, that shows there was a questionable design decision in XML, but doesn't seem to change anything in practice.
I propose we remove the restriction on attributes only having meaning in HTML. I'll make a PR.
from microdata.
Related Issues (20)
- Global Identifier
- Values section title odd
- Textual property value does not use language of the element HOT 4
- No description of how numeric property values are obtained. HOT 12
- Incomplete sentence: "User agents are"
- Syntax highlighting not working correctly HOT 12
- Capitalization of "microdata"
- give examples and algorithms a URL HOT 1
- Provide an example of itemid
- incomplete sentence "User agents are" HOT 2
- RDFa and JSON-LD are not equivalent HOT 12
- RDFa should generate to RDFa Lite HOT 4
- Use the same example for JSON-LD and RDFa HOT 6
- Reference to [microdata-rdf] should be changed HOT 13
- Reusing components in different contexts HOT 4
- itemref as a url, not just an ID within the same document
- "Valid" definition doesn't resolve
- "Our company" example is confusing
- Hedral the Cat HOT 1
- Hedral Issue 2 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from microdata.