The microdata from w3c

See also #2, #5, #15 and https://lists.w3.org/Archives/Public/public-whatwg-archive/2012Aug/0101.html (which gives a sense of what is required as changes).

Clarify most accessibility information gets lost

Time at TPAC?

Would you like time at TPAC to update the WG on progress and/or bring issues up for discussion?

Please let me know by Friday 16th June. If yes, please also let me know how much time you think you'll need.

Microdata should include RDF processing model

The primary use of Microdata on the web is to contain schema.org-based metadata. From the recent common crawl 2.5/3 million pages include Microdata. All of this data has an RDF interpretation, which is in fact critical to extracting information from the pages.

The microdata spec should integrate the work from the Microdata to RDF Note.

Requiring both @itemscope and @itemtype when using @itemid seems unnecessary

Note that Microdata to RDF makes no such restriction, and includes a process for crafting URIs for @itemprop values based on the document location. Either this restriction should be removed (requiring only @itemscope), or we’ll need to remove that mechanism from a future Microdata to RDF update. In the absence of either @itemid or @itemtype, a Microdata to RDF processor will generate triples using blank node identifiers. Either such a processor should not ever generate triples without seeing an @itemtype, or @itemid should be allowed without @itemtype.

Capitalization of "microdata"

The document uses "microdata" instead of "Microdata", except in these cases:

Title:

HTML Microdata

ToC:

Converting Microdata to other formats

Body:

Vocabulary specifications must not define property names for Microdata that contain […]

The original specification for Microdata was developed by Ian Hickson.

Whichever variant gets used (I would prefer "Microdata"), I think it should be consistent.

propose a simple IDL interface

There are no properties in the DOM for microdata - a parser in JS needs to use document.querySelectorAll('[itemscope]') to find items, and getAttribute()` and friends to process them.

give examples and algorithms a URL

There should be some readable thing, and a URL, for each example and algorithm.

(Picked up from @unor in #65)

Does anyone convert microdata to JSON?

If not, should we remove that bit?

Add a definition of 'property' sufficient for referencing from elsewhere

We say 'property' a lot, without saying what it is.

Microdata's property ordering semantics are unclear (and perhaps unused) - can we simplify?

From https://w3c.github.io/microdata/#the-microdata-model

5.1 The microdata model

The microdata model consists of groups of name-value pairs known as items.

Each group is known as an item. Each item can have item types, a global identifier (if the vocabulary specified by the item types support global identifiers for items), and a list of name-value pairs. Each name in the name-value pair is known as a property, and each property has one or more values. Each value is either a string or itself a group of name-value pairs (an item). The names are unordered relative to each other, but if a particular name has multiple values, they do have a relative order.

Q: What does "they do have a relative order" vs "are unordered" actually mean? Did anyone implement against this distinction?

A test case to explore could be based on something like:

<div itemscope itemtype="http://schema.org/Book">
   <meta itemprop="bookFormat" content="EBook/DAISY3"/>
   <meta itemprop="accessibilityFeature" content="largePrint/CSSEnabled"/>
   <meta itemprop="accessibilityFeature" content="highContrast/CSSEnabled"/> 
   <span itemprop="author">
    <div itemscope itemtype="http://schema.org/Person">
      <span itemprop="name">Alice Aardvark</span>
    </div>
   </span>
   <span itemprop="author">
   </span>
</div>

The spec seems to say that the relative order of accessibilityFeature vs author on this Book is unimportant, whereas considering the values for accessibilityFeature, they are relative to each other; and considering the two authors listed, that ordering is also considered in some sense significant. For example, perhaps a later accessibilityFeature declaration overrides an earlier one; or perhaps a first-listed author is implicitly said to be a more significant contributor. Microdata delegates such details to vocabularies such as Schema.org. Schema.org says that it does not attach meaning at this level. Does anyone else?

So - I would like to explore clarifications in this area. Neither Schema.org nor the earlier datavocabulary.org vocabulary, assign semantics to this kind of property ordering. At Google we extract schema.org and datavocabulary Microdata into re-order-able triples / graphs; our parser currently assumes other uses of Microdata follow this pattern. I suspect @gkellogg and other parser writers may have implemented structures that represent the property ordering, but I do not know of anyone making use of such facilities.

I suggest that "but if a particular name has multiple values, they do have a relative order" may lack implementations beyond parsers i.e. vocabularies + publisher/consumer ecosystem. Is "parsers can handle this distinction" enough of an argument to preserve this aspect of Microdata, or can the spec be simplified in the light of experience here?

We might consider clarifying that the entire Microdata structure can be viewed as fully ordered as HTML, considered in the context of its life within a larger HTML document. This can be very important for use cases such as editors. However we might choose to say that order is not significant / meaningful when considering Microdata as a carrier of factual claims.

One way to state this idea would be to try to agree that any circumstances that are captured by the above test case ought to also be equally accurately described by the following test case (in which I have reordered everything):

<div itemscope itemtype="http://schema.org/Book">
   <span itemprop="author">
     <div itemscope itemtype="http://schema.org/Person">
      <span itemprop="name">Zac Zebedee</span>
    </div>
   </span>
   <span itemprop="author">
     <div itemscope itemtype="http://schema.org/Person">
      <span itemprop="name">Alice Aardvark</span>
    </div>
    </div>
   </span>  
   <meta itemprop="accessibilityFeature" content="highContrast/CSSEnabled"/> 
   <meta itemprop="accessibilityFeature" content="largePrint/CSSEnabled"/>
  <meta itemprop="bookFormat" content="EBook/DAISY3"/>
</div>

These distinctions are a bit easier to state for languages that explicitly extract into atomic triples, but I think we can find a way.

Does anyone know of a use of Microdata which depends upon "but if a particular name has multiple values, they do have a relative order."?

/cc @tmarshbing @nicolastorzec @chaals, @betehess (and @pmika for old time's sake) for Bing, Yahoo, Yandex, Apple perspective on this.

incomplete sentence "User agents are"

A sentence fragment says "User agents are " (including the trailing space) and nothing more. Either it needs more or it should be deleted. It's in the W3C Working Draft of June 26, 2017, section 5.2, Note (green box), at the end.

What does it mean to "support" itemid

The spec talks about whether or not a vocabulary supports itemid - but provides no explanation of how to make it clear whether this happens, nor what it means if it is not supported.

I suggest that we remove the question of whether itemid is supported by a vocabulary, and just state that if present it represents an identifier for the element it is on.

Add reverse property

microdata makes it hard to have inverses, unlike RDFa and JSON-LD. This means that any vocabulary which wants to work with all three has to add a whole set of inverse properties to make microdata useful.

A reverse property, or similar, as per w3c/microdata-rdf#24 would be useful.

See also https://www.w3.org/wiki/WebSchemas/InverseProperties (notes from schema.org discussions a few years ago).

html-extensions isn't recognised by respec

Hopefully this will be closed by adding it to specref

Reference to [microdata-rdf] should be changed

At the moment, microdata-rdf is listed as a dependency. As we are adding a JSON-LD and an RDF conversion into the document (normatively), I wonder what the fate of that note, and therefore the dependency, should be.

In my view, the cleanest option will be to rescind (or something similar) the microdata-rdf document. Indeed, JSON-LD and RDF are RDF serializations, therefore it seems to be unnecessary to keep that document. A note in the new microdata spec may want to make that clear.

(In view of, albeit limited, but nevertheless existing deployment of microdata-rdf it may worth comparing and making sure that we define the same mappings in terms of RDF…)

Consider acknowledging the @content attribute

An @content attribute is used elsewhere in the HTML universe (at least RDFa).

It appears that at schema.org we have mistakenly assumed it was part of Microdata or HTML proper. If you grep for @content appearing alongside @itemprop in the schema.org examples, there are lots of examples which use it. This idiom is intended to allow a more machine-friendly property value be parsed out, while something more appropriate to human audiences is also accessible for non-machines. It may also help with l18n where schema designs contain e.g. English-language strings but the markup is otherwise in another natural language.

Do parsers apply the value of "base" elements when resolving URLs?

If an HTML document hosted somewhere other than http://example.org, and it has <base href="http:example.org">, do parsers resolve the URL relative to the base element, or not?

And likewise for XML…

@gkellogg, @iherman

No description of how numeric property values are obtained.

The Values section discusses getting a value from elements including data and meter, but this is returned as textContent. However, the JSON Serialization section specifically says to serialize JSON using "no unnecessary zero digits in numbers", implying that values may be numbers which could only come from these elements. Certainly the intention of the data and meter elements is that the content is machine readable and descriptive text in HTML 5.2 does suggest that this is numeric (at least for the meter element.

The Microdata to RDF spec treats this content as numeric if it is either valid xsd:integer or xsd:double, and as text otherwise.

Is microdata processed in e.g. SVG?

The specification claims it is only processed in HTML. Is that true?

add spec to html-extensions document

Since we extend HTML, we should make sure that it is listed for addition. http://w3c.github.io/html-extensions

describe security considerations, if any

From the W3C security and privacy questionnaire

How do property names inherit the URL base of the itemtype?

If you specify an itemtype, then there is an idea that its specification describes "relevant types", and so you don't need to use a full URL to parse them as part of the same specification.

What defines this? Does

<div itemscope itemtype="http://schema.org/Thing">
  <p itemprop="name">My thing</name>
</div>

mean that the Thing has a schema.org name? That is my understanding of what happens in reality, but as far as I can understand the specification, if that code is at http://example.org/some/page" the property should be `http://example.org/some/pagename".

Which means that the interpretation of the property as a schema name is happening by some undocumented magic - parsing according to the "Microdata to RDF" note, or just by deciding that this is how to parse schema.org typed items because that makes sense.

/@danbri @iherman @gkellogg

Global Identifier

Section 4 uses the term "global identifier", but does not reference it. Additionally, this section would seem to be about @itemid, however it is not discussed in this context. It looks like there is some missing text.

What is a 'vocabulary'

The specification mentions vocabularies, and vocabulary specifications, dozens of times. It makes assertion about vocabulary design, and about constraints that are imposed by vocabularies. But it never actually says what a vocabulary is.

I think that a lot of the fixing needed is editorial, but given that there is no formal way of processing a vocabulary, we might end up making some substantive changes like removing constraints, or insteadof saying "only if it is allowed by a vocabulary" provide the more actionable "unless invalid according to a machine-readable specification of the item type: or some such.

CFC: Move Microdata to FPWD

This Call For Consensus (CFC) is to move the ccurrent Microdata Editors Draft (ED) to First Public Working Draft (FPWD).

Changes between the 2013 W3C Note and the current ED:

Remove DOM API. This API has been removed from browsers that did implement it.
Remove unused references.

Please respond to this CFC by the end of day on Monday 17 April 2017. Positive responses are encouraged, in the form of a +1 or -1 on this thread, or by posting a message to [email protected]. Silence will be taken as consent with the proposal to move to a FPWD.

Keeping markup inside values

For various reasons - internationalisation, accessibility, … - it is helpful to have rich / marked up text for content, but microdata currently strips everything back to raw text. Is it possible to change that.

Describe conversion to RDFa

For e.g. serious internationalisation microdata has some pretty fundamental issues - see e.g. #21, #22. It is possible to work around most of these by converting to another format. Since most tools seem to work with both, instead of trying to rewrite microdata which would make it more complex, I think we should just suggest that people use RDFa or JSON-LD if they need their capabilities.

ToC is broken

As noted in #7 the ToC is broken.

Improve privacy concerns section

clarify that using microdata to increase machine-readability of identifying information may mean that more machines will process it - potentially in combination with data found elsewhere, so limiting the data in a given place is ineffective as a privacy mitigation.
note that processors should consider privacy implications of information that they process from content - including their own privacy policies, and protection of information

Thanks to Nick Doty and Christine Runnegar for comments leading to this issue

RDFa should generate to RDFa Lite

At present, the RDFa generation does not generate RDFa Lite. I also believe that by changing, in step 5, the reference to the about attribute to resource would make the trick. The way microdata uses itemid will lead, I believe, identical results.

(I admit my RDFa becomes a bit rusty, so I rely on @gkellogg to watch over my shoulders…)

Extend content models, or define microdata errors

Section 7.1 currently extends content models by making various attributes required in circumstances where the microdata processing won't otherwise work.

I suggest that we state that it is a microdata error if something is missing leading to broken parsing. It seems at first glance reasonable to add the content model constraints, which basically amount to defining authoring errors, but we should think about this.

Incomplete sentence: "User agents are"

Incomplete sentence in a Note in section 5.2:

User agents are

Provide an example of itemid

itemId is a great addition, but it would be great if there was an example of it being used. The current description is minimal.

"This is an absolute URL that provides a global identifier for an item. The itemid attribute must not be specified on elements that do not have both an itemscope attribute and an itemtype attribute specified." - https://www.w3.org/TR/2017/WD-microdata-20170626/#dfn-itemid

Simply including the attribute on one of the page's examples would be very helpful to new microdata users.

Link to Danbri is broken

In the list of editors, the link to "Dan Brickley" returns a 404.

remove drag and drop

It's apparently unimplemented.

A quick messy manual test that might not show much more than the original demo I adapted it from

remove application/microdata+json?

This doesn't seem to be used much, nor very useful - it's claimed purpose is to define the data carried in a drag and drop operation, but that isn't implemented anywhere I could find.

Use the same example for JSON-LD and RDFa

I believe it would be better to use the same example for JSON-LD, RDFa (and JSON). It is better for the readability of the document...

(There may be several examples; the current RDFa example contains the itemref trick, which is great because RDFa can indeed reproduce that...)

Connect

jar, Jason, zip

Textual property value does not use language of the element

The Values section does not make use of the language of the element (as established using @lang or @xml:lang on an ancestor or self).

This could certainly pertain to the textContent of an element and potentially the value of the @content attribute. RDFa uses the current language when creating a literal from @content, but it could be argued either way.

Of course, the JSON expression cannot make use of the language, but it is useful to have in an abstract model for the purposes of generating RDF or JSON-LD.

RDFa and JSON-LD are not equivalent

RDFa and JSON-LD are both serializations of RDF. What it means that, when converted to RDF, both conversion results should produce equivalent graphs.

However... this does not seem to be the case. At least the way I read it

JSON-LD has a top level items property, which yields, in RDF one subject (a blank node, actually) which has a number of <items> _:XYZ pairs, where _:XYZ are blank nodes with the content coming from a specific itemscope
RDFa yields a number _:XYZ triplets without any common subjects binding them together.

This can be easily solved. Either

The JSON-LD structure uses a top level @graph construct which can be used to specify a number of more or less independent group of triples with common subjects
The RDFa version is extended by an artificial HTML code providing the equivalent of the JSON-LD items

I am more in favour of the first approach to solve this, but the second one is also a solution.

(As an aside, the JSON-LD example is incomplete, there is no @context.)

Cc: @gkellogg

Syntax highlighting not working correctly

There seems to be something wrong with the syntax highlighting of a few examples in section 4.2.

Colors are missing in these parts (each list item represents one example):

<div itemscope itemtype
<div itemscope itemtype
<div itemscope>
<figure>
<span itemscope> and <figure>

(btw., it would be helpful to number the examples and/or give them ids)

iframe and embed don't have a data attribute

section 7.1 says the data attribute must be present when there is an itemprop attribute on the iframe and embed attributes. I think it should be src attribute...

describe privacy considerations, if any

From the W3C security and privacy questionnaire

i18n self-check

Language

Language basics

[FAIL] It should be possible to associate a language with any piece of natural language text that will be read by a user. more
[FAIL] Where possible, there should be a way to label natural language changes in inline text. more
Consider whether it is useful to express the intended linguistic audience of a resource, in addition to specifying the language used for text processing. more
[-] A language declaration that indicates the text-processing language for a range of text must associate a single language value with a specific range of text. more
[-] Use the HTML lang and XML xml:lang language attributes where appropriate, rather than creating a new attribute or mechanism. more
A metadata-type language declaration that indicates the intended use of the resource, rather than the language of a specific range of text, may be associated with multiple language values. more

Defining language values

[-] Values for language declarations must use BCP 47. more
[-] Refer to BCP 47, not to RFC 5646. more
[-] Be specific about what level of conformance you expect for language tags. The word "valid" has special meaning in BCP 47. Generally "well-formed" is a better choice.
[-] Reference BCP47 for language tag matching.

Declaring language at the resource level

[FAIL] The specification should indicate how to define the default text-processing language for the resource as a whole. more
[FAIL] Content within the resource should inherit the language of the text-processing declared at the resource level, unless it is specifically overridden.
Consider whether it is necessary to have separate declarations to indicate the text-processing language versus metadata about the expected use of the resource. more
[-] If there is only one language declaration for a resource, and it has more than one language tag as a value, it must be possible to identify the default text-processing language for the resource. more

Establishing the language of a content block

By default, blocks of content should inherit any text-processing language set for the resource as a whole. more
[FAIL] It should be possible to indicate a change in language for blocks of content where the language changes. more

Establishing the language of inline runs

[FAIL] It should be possible to indicate language for spans of inline text where the language changes. more

Text direction

Basic requirements

[FAIL] It must be possible to indicate base direction for each individual paragraph-level item of natural language text that will be read by someone. more
It must be possible to indicate base direction changes for embedded runs of inline bidirectional text for all natural language text that will be read by someone. more
Annotating right-to-left text must require the minimum amount of effort for people who work natively with right-to-left scripts. more

Background information

Do not assume that direction can be determined from language information. more

Handling direction in markup

Characters

Choosing a definition of 'character'

Specifications, software and content MUST NOT require or depend on a one-to-one correspondence between characters and the sounds of a language. more
Specifications, software and content MUST NOT require or depend on a one-to-one mapping between characters and units of displayed text. more
Protocols, data formats and APIs MUST store, interchange or process text data in logical order. more
Independent of whether some implementation uses logical selection or visual selection, characters selected MUST be kept in logical order in storage. more
[-] Specifications of protocols and APIs that involve selection of ranges SHOULD provide for discontiguous logical selections, at least to the extent necessary to support implementation of visual selection on screen on top of those protocols and APIs. more
Specifications and software MUST NOT require nor depend on a single keystroke resulting in a single character, nor that a single character be input with a single keystroke (even with modifiers), nor that keyboards are the same all over the world. more
Specifications, software and content MUST NOT require or depend on a one-to-one relationship between characters and units of physical storage. more
When specifications use the term 'character' the specifications MUST define which meaning they intend. more
Specifications SHOULD use specific terms, when available, instead of the general term 'character'. more

Defining a Reference Processing Model

Textual data objects defined by protocol or format specifications MUST be in a single character encoding. more
All specifications that involve processing of text MUST specify the processing of text according to the Reference Processing Model described by the rest of the recommendations in this list. more
Specifications MUST define text in terms of Unicode characters, not bytes or glyphs. more
For their textual data objects specifications MAY allow use of any character encoding which can be transcoded to a Unicode encoding form. more
Specifications MAY choose to disallow or deprecate some character encodings and to make others mandatory. Independent of the actual character encoding, the specified behavior MUST be the same as if the processing happened as follows: (a) The character encoding of any textual data object received by the application implementing the specification MUST be determined and the data object MUST be interpreted as a sequence of Unicode characters - this MUST be equivalent to transcoding the data object to some Unicode encoding form, adjusting any character encoding label if necessary, and receiving it in that Unicode encoding form, (b) All processing MUST take place on this sequence of Unicode characters, (c) If text is output by the application, the sequence of Unicode characters MUST be encoded using a character encoding chosen among those allowed by the specification. more
If a specification is such that multiple textual data objects are involved (such as an XML document referring to external parsed entities), it MAY choose to allow these data objects to be in different character encodings. In all cases, the Reference Processing Model MUST be applied to all textual data objects. more

Including and excluding character ranges

Specifications SHOULD NOT arbitrarily exclude code points from the full range of Unicode code points from U+0000 to U+10FFFF inclusive. more
Specifications MUST NOT allow code points above U+10FFFF. more
Specifications SHOULD NOT allow the use of codepoints reserved by Unicode for internal use. more
Specifications MUST NOT allow the use of surrogate code points. more
Specifications SHOULD exclude compatibility characters in the syntactic elements (markup, delimiters, identifiers) of the formats they define. more

Using the Private Use Area

Specifications MUST NOT require the use of private use area characters with particular assignments. more
Specifications MUST NOT require the use of mechanisms for defining agreements of private use code points. more
Specifications and implementations SHOULD NOT disallow the use of private use code points by private agreement. more
Specifications MAY define markup to allow the transmission of symbols not in Unicode or to identify specific variants of Unicode characters. more
Specifications SHOULD allow the inclusion of or reference to pictures and graphics where appropriate, to eliminate the need to (mis)use character-oriented mechanisms for pictures or graphics. more

Choosing character encodings

Identifying character encodings

Specifications MUST NOT propose the use of heuristics to determine the encoding of data. more
Specifications MUST define conflict-resolution mechanisms (e.g. priorities) for cases where there is multiple or conflicting information about character encoding. more

Designing character escapes

Specifications should provide a mechanism for escaping characters, particularly those which are invisible or ambiguous. more
Specifications SHOULD NOT invent a new escaping mechanism if an appropriate one already exists. more
The number of different ways to escape a character SHOULD be minimized (ideally to one). more
Escape syntax SHOULD require either explicit end delimiters or a fixed number of characters in each character escape. Escape syntaxes where the end is determined by any character outside the set of characters admissible in the character escape itself SHOULD be avoided. more
Whenever specifications define character escapes that allow the representation of characters using a number, the number MUST represent the Unicode code point of the character and SHOULD be in hexadecimal notation. more
Escaped characters SHOULD be acceptable wherever their unescaped forms are; this does not preclude that syntax-significant characters, when escaped, lose their significance in the syntax. In particular, if a character is acceptable in identifiers and comments, then its escaped form should also be acceptable. more

Storing text

Protocols, data formats and APIs MUST store, interchange or process text data in logical order. more
Specifications of protocols and APIs that involve selection of ranges SHOULD provide for discontiguous logical selections, at least to the extent necessary to support implementation of visual selection on screen on top of those protocols and APIs. more

Specifying sort and search functionality

Software that sorts or searches text for users SHOULD do so on the basis of appropriate collation units and ordering rules for the relevant language and/or application. more
Where searching or sorting is done dynamically, particularly in a multilingual environment, the 'relevant language' SHOULD be determined to be that of the current user, and may thus differ from user to user. more
Software that allows users to sort or search text SHOULD allow the user to select alternative rules for collation units and ordering. more
Specifications and implementations of sorting and searching algorithms SHOULD accommodate text that contains any character in Unicode. more

Converting to a Common Unicode Form

Handling Case Folding

Defining 'string'

Specifications SHOULD NOT define a string as a 'byte string'. more
The 'character string' definition SHOULD be used by most specifications. more

Indexing strings

The character string is RECOMMENDED as a basis for string indexing. more
A code unit string MAY be used as a basis for string indexing if this results in a significant improvement in the efficiency of internal operations when compared to the use of character string. more
Grapheme clusters MAY be used as a basis for string indexing in applications where user interaction is the primary concern. more
Specifications that define indexing in terms of grapheme clusters MUST either: (a) define grapheme clusters in terms of default grapheme clusters as defined in Unicode Standard Annex #29, Text Boundaries [UTR #29], or (b) define specifically how tailoring is applied to the indexing operation. more
The use of byte strings for indexing is NOT RECOMMENDED. more
Specifications that need a way to identify substrings or point within a string SHOULD provide ways other than string indexing to perform this operation. more
Specifications SHOULD understand and process single characters as substrings, and treat indices as boundary positions between counting units, regardless of the choice of counting units. more
Specifications of APIs SHOULD NOT specify single characters or single 'units of encoding' as argument or return types. more
When the positions between the units are counted for string indexing, starting with an index of 0 for the position at the start of the string is the RECOMMENDED solution, with the last index then being equal to the number of counting units in the string. more

Referring to Unicode characters

Use U+XXXX syntax to represent Unicode code points in the specification. more

Referencing the Unicode Standard

[ n/a ] Since specifications in general need both a definition for their characters and the semantics associated with these characters, specifications SHOULD include a reference to the Unicode Standard, whether or not they include a reference to ISO/IEC 10646. more
[n/a ] A generic reference to the Unicode Standard MUST be made if it is desired that characters allocated after a specification is published are usable with that specification. A specific reference to the Unicode Standard MAY be included to ensure that functionality depending on a particular version is available and will not change over time. more
All generic references to the Unicode Standard MUST refer to the latest version of the Unicode Standard available at the date of publication of the containing specification. more
All generic references to ISO/IEC 10646 MUST refer to the latest version of ISO/IEC 10646 available at the date of publication of the containing specification. more

Resource identifiers

Basics

Resource identifiers must permit the use of characters outside those of plain ASCII. discussion
Specifications MUST define when the conversion from IRI references to URI references (or subsets thereof) takes place, in accordance with Internationalized Resource Identifiers (IRIs). more

Markup & syntax

Defining elements and attributes

[ - ] Do not define attribute values that will contain user readable content. Use elements for such content. more
[ FAIL ] If you do define attribute values containing user readable content, provide a means to indicate directional and language information for that text separately from the text contained in the element.
Provide a way for authors to annotate arbitrary inline content using a span-like element or construct. more

Defining identifiers

[ FAIL ] Identifiers should be case-sensitive.

Working with plain text

Avoid natural language text in elements that only allow for plain text and in attribute values.
Provide a span-like element that can be used for any text content to apply information needed for internationalization. more

Typographic support

Text decoration

[ n/a ] Text decoration such as underline and overline should allow lines to skip ink.
[ n/a ] It should be possible to specify the distance of overlines and underlines from the text. more

Vertical text

[ n/a ] It should be possible to render text vertically for languages such as Japanese, Chinese, Korean, Mongolian, etc.
[ n/a ] Vertical text must support line progression from LTR (eg. Mongolian) and RTL (eg. Japanese)
[ n/a ] By default, text decoration, ruby, and the like in vertical text where lines are stacked from left to right (eg. Mongolian) should appear on the same side as for CJK vertical text. Placement should not rely on the before and after line locations.
[n/a ] Vertical writing modes that are equivalent to the vertical- values in CSS (only) should use UTR50 to apply default text orientation of characters. (This does not apply to writing modes that are equivalent to sideways- in CSS.)
[ n/a ] By default, glyphs of scripts that are normally horizontal should run along a line in vertical text such that the top of the character is toward the right side of the vertical line, but there should also be a mechanism to allow them to progress down the line in upright orientation. Such a mechanism should use grapheme clusters as a minimum text unit, but where necessary allow syllabic clusters to be treated as a unit when they involve more than one grapheme cluster.
[ n/a ] Upright Arabic text in vertical lines should use isolated letter forms and the order of text should read top to bottom.
[ n/a ] It should be possible for some sequences of characters (particularly digits) to run horizontally within vertical lines (tate chu yoko).
[ n/a ] Writing modes should provide values like sideways-lr and sideways-rl in CSS to allow for vertical rotation of lines of horizontal script text. UTR50 is not applicable for these cases.

Setting box positioning coordinates when text direction varies

[ n/a ] Box positioning coordinates must take into account whether the text is horizontal or vertical. more

Ruby text annotations

'Ruby' style annotations alongside base text should be supported for Chinese, Japanese, Korean and Mongolian text, in both horizontal and vertical writing modes.
[ n/a ] Ruby implementations should support zhuyin fuhao (bopomofo) ruby for Traditional Chinese.
[ n/a ] Ruby implementations should support a tabular content model (such that ruby contents can be arranged in a sequence approximating to rb rb rt rt).
[ n/a ] Ruby implementations should make it possible to use an explicit rb tag for ruby bases.
[ n/a ] Ruby implementations should allow annotations to appear on either or both sides of the base text.

Miscellaneous

[ n/a ] Line heights must allow for characters that are taller than English.
[ n/a ] Box sizes must allow for text expansion in translation.
[ n/a ] Line wrapping should take into account the special rules needed for non-Latin scripts. more
[ n/a ] Avoid specifying presentational tags, such as b for bold, and i for italic. more

Local dates, times and formats

Working with time

When defining calendar and date systems, be sure to allow for dates prior to the common era, or at least define handling of dates outside the most common range.
[ partial ] When defining time or date data types, ensure that the time zone or relationship to UTC is always defined.
Provide a health warning for conversion of time or date data types that are "floating" to/from incremental types, referring as necessary to the Time Zones WG Note. more
Allow for leap seconds in date and time data types. more
Use consistent terminology when discussing date and time values. Use 'floating' time for time zone independent values.
[ partial ] Keep separate the definition of time zone from time zone offset.
[ N/A ] Use IANA time zone IDs to identify time zones. Do not use offsets or LTO as a proxy for time zone.
[ - ] Use a separate field to identify time zone.
[ - ] When defining rules for a "week", allow for culturally specific rules to be applied. more
[ - ] When defining rules for week number of year, allow for culturally specific rules to be applied.
[ - ] When non-Gregorian calendars are permitted, note that the "month" field can go to 13 (undecimber).

Designing forms

[ N/A ] When defining email field validation, allow for EAI (smtputf8) names.

Working with numbers

When parsing user input of numeric values, allow for digit shaping (non-ASCII digits).
When formatting numeric values for display, allow for culturally sensitive display, including the use of non-ASCII digits (digit shaping).

Navigation

Providing for content negotiation based on language

[ N/A ] In a multilingual environment it must be possible for the user to receive text in the language they prefer. This may depend on implicit user preferences based on the user's system or browser setup, or on user settings explicitly negotiated with the user.

Values section title odd

The title is "Values: the content attribute", however the section generally describes getting a value for a property, which includes, but is not limited to the content attribute.

If meter, data use element text after content/value attributes

<div itemscope itemtype="http://schema.org/CreativeWork">
<data itemprop="name" value="data value attribute" content="data content attribute" >data element text content</data>
<data itemprop="name" value="lone data value attribute" >missed data element' value attribute [BUG]</data>
<data itemprop="name">data used its element's textContent</data>
<meter itemprop="name" content="meter content attribute" value="meter value attribute">meter element text content</meter>
<meter itemprop="name" value="lone meter value attribute" >missed meter element's value attribute [BUG]</meter>
<meter itemprop="name">meter used its element's textContent</meter>
</div>

in the SDL generates:

data content attribute
meter used its element's textContent
lone meter value attribute
data used its element's textContent
lone data value attribute
meter content attribute

and in Google's SDTT gives

@type: CreativeWork
name: data content attribute
name: missed data element' value attribute [BUG]
name: data used its element's textContent
name: meter content attribute
name: missed meter element's value attribute [BUG]
name: meter used its element's textContent

I'll do more testing, but I think modulo the apparent bug in Google of not reading the value attribute at all, I think we should align the value algorithm to match this behaviour. See also #20, #38

handle time elements like RDFa?

Google and SDL both do this already:

<div itemscope itemtype="http://schema.org/CreativeWork">
<time itemprop="name" content="time content attribute"
   datetime="2017-05-19T02:59">time element text content</time>
<time itemprop="name" datetime="2017-05-19T02:59">time element text content</time>
<time itemprop="name">time element only has text content</time>
</div>

gives 3 names:

time content attribute
2017-05-19T02:59
time element only has text content

I'm proposing to match this behaviour in the algorithm for determining values. @iherman ?

relationship to RDFa, JSON-LD, µformats

Describe the similarities and differences. Related to #3

w3c / microdata Goto Github PK

microdata's People

Contributors

Stargazers

Watchers

Forkers

microdata's Issues