yaml / yaml-test-suite Goto Github PK
View Code? Open in Web Editor NEWComprehensive, language independent Test Suite for YAML
License: MIT License
Comprehensive, language independent Test Suite for YAML
License: MIT License
test-case CXX2 considers
--- &anchor a: b
a valid yaml-stream; but in fact, the grammar doesn't allow for block collection nodes to appear on the ---
line
This also excludes the simpler
--- a: b
or also sequences such as
--- - x
The reason becomes obvious when we follow/trace the productions:
[208] l-explicit-document ::= c-directives-end l-bare-document
[207] l-bare-document ::= s-l+block-node(-1,block-in)
n = -1
c = block-in
[196] s-l+block-node(n,c) ::= s-l+block-in-block(n,c) | s-l+flow-in-block(n)
[198] s-l+block-in-block(n,c) ::= s-l+block-scalar(n,c) | s-l+block-collection(n,c)
[200] s-l+block-collection(n,c) ::= ( s-separate(n+1,c) c-ns-properties(n+1,c) )?
s-l-comments
( l+block-sequence(seq-spaces(n,c)) | l+block-mapping(n) )
[79] s-l-comments ::= ( s-b-comment | /* Start of line */ ) l-comment*
[77] s-b-comment ::= ( s-separate-in-line c-nb-comment-text? )? b-comment
[76] b-comment ::= b-non-content | /* End of file */
[30] b-non-content ::= b-break
so the problem is that in order to match s-l+block-collection(n,c)
(rule 200), we must match s-l-comments
, but since we are not on column 1 (because we're 3 characters into the line due to ---
), we must instead match s-b-comment
; and that demands to match a line-break via b-break
(or an EOF).
Hence, we cannot match a s-l+block-collection
on the same line right after the directives-end marker ---
.
QED :-)
PS: One obvious workaround would be to make matching s-l-comments
optional; but that would make other currently disallowed cases such as
k: - x
- y
(for which iirc there's at least one tests which tests that YAML parser refuse it as a a parsing error). I don't know if there's an easy fix to make the grammar accept CXX2 without at the same being too liberal and allowing other syntax that isn't intended to be valid.
I've been messing about in different languages and different YAML implementations. To make this easier I made a JSON file containing all named tests and their data.
This file will be outdated soon and only has the named tests, so how about adding something similar containing all tests?
Also finding the data branch was way too hard, please make the readme section explaining it bigger and or bolder ;)
I'm looking at an issue of my parser with L24T/01, added in #105. My parser gives back a string without trailing newline.
If I look at L24T/01/in.yaml in the data-2022-01-17 tag, the file does not have a trailing newline:
$ hexdump -C yaml-test-suite/L24T/01/in.yaml
00000000 66 6f 6f 3a 20 7c 0a 20 20 78 0a 20 20 20 |foo: |. x. |
0000000e
If my understanding of the spec is correct, b-chomped-last
would then use the alternative b-as-line-feed
rather than <end-of-input>
, so there should be no trailing newline parsed. Is this correct? Should the file be generated with a trailing newline, or should out.yaml be adapted? Or is my understanding of the spec incorrect?
I have the same issue with JEF9/02, added in #90.
Pinging @ingydotnet & @perlpunk because you were involved in the PR that added this test.
Test case 8G76 contains an empty in.json file, but an empty file is not a valid JSON document.
I'd need confirmation, but I think in.yaml parses canonically as null (since it's the empty string, and it matches null in the 1.2 core schema). In this case, I'd expect in.json to just contain the string null
.
Most of the examples from the spec are included as tests, but I notice 5.13 and 5.14 are absent.
In particular, I don't see any test in this suite which uses escaped 32-bit unicode characters like "\U00000041"
I'm happy to submit a PR, but I'm not sure how these test files are generated. Are the tree
/dump
/emit
fields just handwritten?
The test-case S98Z
contains the YAML document
empty block scalar: >
<SPC>
<SPC><SPC>
<SPC><SPC><SPC>
# comment
and claims this to be semantically equivalent to
{ "empty block scalar": "" }
However, according to
8.1.1.1. Block Indentation Indicator
Typically, the indentation level of a block scalar is detected from its first non-empty line. It is an error for any of the leading empty lines to contain more spaces than the first non-empty line.
Detection fails when the first non-empty line contains leading content space characters. Content may safely start with a tab or a “
#
” character....
And in Example 8.2. Block Indentation Indicator
- >
·
··
··# detected
is given as a positive example with an inferred indentation level 2; as well as a negative example in Example 8.3. Invalid Block Scalar Indentation Indicators
:
- |
··
·text
which expects a parser failure (A leading all-space line must not have too many spaces.
)
In order to detect the indentation level n
of a block scalar which wasn't manually specified, we need to find the first non-empty line. And 8.1.1.1 is quite clear that Content may safely start with a tab or a “#
” character., so a line starting with a #
can be considered content (as it does in exmaple 8.2), and thus qualifies as first non-empty line.
The yaml stream in S98Z
however appears to consider the # comment
line an actual comment even though according to section 8.1.1.1 it is in fact the first non-empty line, and thus to be considered as content of the block scalar.
Moreover, since the preceding all-space lines contain more spaces (i.e. 3) than the inferred n
-level (i.e. 2), and consequently an error along the lines of "A leading all-space line must not have too many spaces." is warranted.
Some parsers have demonstrated difficulty with colon-prefixed plain scalars within flow sequences, e.g.:
[[], :@]
---
[[], :%]
---
[[], :^]
---
[[], :$]
---
[[], ::]
---
[[], :\t]
---
[[], :`]
These are valid yaml documents, parsed as expected by e.g. libfyaml, libyaml, and others; but e.g. pyyaml, SnakeYAML, YAML:PP, Ruamel, and JS yaml (the last as a result of a regression, now fixed) fail on some or all of them, some requiring the leading flow sequence as the first element in the outer flow sequence, some not.
Hi, just wanted to let you know that Scala-yaml project uses this test suite and it could be included in the list of libs testing against this. Here's link to integration testing suite: https://github.com/VirtusLab/scala-yaml/tree/main/integration-tests
Referred: VirtusLab/scala-yaml#257
The README says that repo includes
Where are these defined?
Would help me with questions like:
tree:
in src
but not 2 kinds of thingstree:
notation is pretty self-explanatory, but what are all possible options there?=VAL
?
=VAL "
means "a string that was enclosed in double quotes" but what are all possible options?See
https://gist.github.com/anonymous/4e02671301cc07d9730b92e1b34b26fe
if the global tag prefix or tag contains %3E%20
which is >
then the test output is
=VAL <!prefix-foobar> name> :scalar
or
=VAL <tag:example.com,2000:app/tag> > :baz
I think the value should stay uri encoded.
When doing make data-update
and the data branch wasn't pulled before, older commits get overwritten.
Users of the test suite should ideally only use the tml
files from master/releases, or the files from the data-*
releases.
We will squash the data
branch regularly to save some space, but also try to do more regular releases.
Since this hasn't been done before and users might still be using the data branch, I created this issue to notify potential users (taken from the list of libraries in the readme):
Why is there no (valid) json for testcase 4ABK?
Found many errors in Y79Y test.event
files.
Input:
-———»-
Expected event output:
+STR
+DOC
+SEQ
+SEQ
Found
+STR
+DOC
+SEQ
Input:
- ——»-
Expected event output:
+STR
+DOC
+SEQ
Found:
+STR
+DOC
+SEQ
+SEQ []
Input:
?——»-
Expected event output:
+STR
+DOC
+MAP
Found:
+STR
+DOC
+SEQ
+SEQ []
Input:
? -
-——»-
Expected event output:
+STR
+DOC
+MAP
+SEQ
=VAL
-SEQ
Found:
+STR
+DOC
+SEQ
+SEQ []
Input:
?——»key:
Expected event output:
+STR
+DOC
+MAP #Unsure about where will it terminate here
Found:
+STR
+DOC
+SEQ
+SEQ []
Input:
? key:
:——»key:
Expected event output :
+STR
+DOC
+MAP
+MAP
=VAL :key
=VAL :
-MAP #Unsure about where will it terminate here
Found:
+STR
+DOC
+SEQ
+SEQ []
This test corresponds to Example 9.3 of the spec, where a stream has either two or three documents, depending on whether the second one is skipped or not:
Bare
document
...
# No document
...
|
%!PS-Adobe-2.0 # Not the first line
The current in-json and the test-event stream skip the second one, but the out-yaml includes it as an explicit document. I'm not sure what's really right here, but I'd at least prefer to include the second empty/null document in the stream.
If mapping values are clearly not allowed there, then why not just simply
convert it to string a: "a:"
without forcing user to add checks for it?
I wrote a program to calculate the expected JSON from the test.event file and got it to run comparisons against all the in.json files. I observed the following anomalies:
2JQS:
Uses duplicate key values (both null) so is invalid YAML (from section 3.2.1.1 "The content of a mapping node is an unordered set of key: value node pairs, with the restriction that each of the keys is unique")
4ABK, 5WE3, 7W2P, 8KHE, C2DT, DBG4, FRK4:
Omitted values should be represented as JSON null, not empty string.
8UDB:
in.json is syntactically invalid.
DBG4:
Unquoted numbers are represented as strings in in.json.
G4RS:
The value for the "quoted" key in in.json is wrong (it should only have single quotes).
27NA, 2AUY, 2SXE, 2XXW, 35KP, 3GZX, 4GC6, 4UYU, 57H4, 5TYM, 6CK3, 6FWR, 6JWB, 6LVF, 6M2F, 6VJK, 6ZKB, 74H7, 77H8, 7A4E, 7BUB, 7FWL, 7T8X, 8G76, 96L6, 98YD, 9WXW, 9YRD, BEC7, BP6S, CC74, CUP7, DWX9, E76Z, EHF6, F2C7, FH7J, G992, HMQ5, HS5T, JHB9, JS2J, K527:
in.json not included.
Let us add support to check flow style in MappingStartEvent and SequenceStartEvent
( '+MAP {}' and '+SEQ []')
I would like to use it in SnakeYAML.
How can I help ?
At the moment this YAML is considered to be invalid (Wrong indented flow sequence):
---
flow: [a,
b,
c]
JS-YAML agrees with me that it is perfectly valid YAML.
Why is it invalid ?
edit @perlpunk: put code delimiters around example
Can we have a test case for the parsing of "no"?
These tests have a map start on the same line with the directives-end marker. In 9KBC, this is expected to result in an error, whereas in CXX2 the collection should be rendered without a problem. The only difference between these is that that the latter has only one key, while the former has two.
Furthermore, I at least can't find a clear explanation in the spec about what content is or is not allowed on the directives-end line. The spec does include some examples that have at least plain and block scalars on the directives-end line, as well as tags for collections that start on the following line. So where does it say that collections can't start there either?
Currently we have only tests with one of the directives, but no document where both are used at the same time.
See also #49
If we have an open ended block scalar at the end of the stream:
keep: |+
line1
It should be emitted as:
keep: |+
line1
...
But maybe this rule should not only be for the last document in a stream, but for every document, which is especially important in streaming context:
---
keep: |+
line1
--- doc two
Every document should be able to taken out of a stream and represent the same. If you take it out and accidentally add a newline, then it has one more empty line.
So when emitting the above documents, the output should look like this:
---
keep: |+
line1
...
--- doc two
What do you think? @ingydotnet @hvr @eemeli @pantoniou @am11
Note: This rule is for emitters. The marker would not be required when parsing.
Note 2: Open-ended only means block scalars with trailing empty lines (|+
, >+
). I know that in YAML 1.1 open-ended has a slightly different meaning.
I'd like to use the test cases here to test the Go YAML implementation (https://gopkg.in/yaml.v2) but the fact that they're formatted in the relatively obscure TestML data format makes them quite inaccessible (and I don't have the time to write a TestML parser in Go). Would it be possible to provide them in a more universally understood format, such as JSON, please?
The semantics of the test.event file are not entirely clear, and there's no obvious documentation for what the various events mean.
Why has the out.yaml a explicit document closure?
in.yaml
--- scalar
out.yaml
--- scalar
...
This does not seems logic to me :-)
test.event
+STR
+DOC ---
=VAL :scalar
-DOC
-STR
Specifically, 2JQS has repeated empty keys and E76Z has an alias referring to another key in the same map. These are both errors, as the spec requires mapping keys to be unique.
This test-case is taken from the YAML 1.2 spec, i.e. from http://yaml.org/spec/1.2/spec.html#id2801606 and contains the YAML document(s)
%YAML 1.2
--- |
%!PS-Adobe-2.0
...
%YAML1.2
---
# Empty
...
The relevant productions are
l-directive ::= “%” ( ns-yaml-directive | ns-tag-directive | ns-reserved-directive ) s-l-comments
ns-yaml-directive ::= “Y” “A” “M” “L” s-separate-in-line ns-yaml-version
ns-yaml-version ::= ns-dec-digit+ “.” ns-dec-digit+
s-separate-in-line ::= s-white+ | /* Start of line */
Iow, s-separate-in-line
demands a whitespace between YAML
and 1.2
To me this looks like a typo in the example YAML, as it's the only occurence throughout the spec where a white-space was omitted.
Hi,
We have a project to evaluate and compare several YAML parsers, and wanted to include your test suite.
For us to do so, the test suite would need a license. Could you add one, preferably Apache 2.0, MIT, or another notice-type license?
Thank you,
Yuan Kang
The test was introduced a few weeks ago in 30f372c. The relevant directives are defined with:
[82] l-directive ::= "%"
( ns-yaml-directive | ns-tag-directive | ns-reserved-directive )
s-l-comments
[83] ns-reserved-directive ::= ns-directive-name
( s-separate-in-line ns-directive-parameter )*
[84] ns-directive-name ::= ns-char+
[86] ns-yaml-directive ::= "Y" "A" "M" "L" s-separate-in-line ns-yaml-version
[66] s-separate-in-line ::= s-white+ | /* Start of line */
Therefore, a directive line %YAML1.2
should get parsed as ns-reserved-directive
, regarding which the following instruction applies: "A YAML processor should ignore unknown directives with an appropriate warning."
Effectively, as "appropriate warning" is rather implementation-specific, it might be best to just remove this test completely. Or to just leave out the error, as parsers are expected to deal with this situation without an error.
gh-pages:
git clone $$(git config remote.origin.url) -b $@ $@
This doesn't work for me. Same with target data
.
I don't have a remote called origin
. I rename the author remote to author
whenever I do a fork
@sigmavirus24 suggested this: git config --get branch.<branch-name>.remote
to get the remote name.
So something like this:
remote=$(git config --get branch.master.remote)
url=$(git config remote.$remote.url)
git clone $url -b $@ $@
Is there some system to the test names/URLs?
Eg https://github.com/yaml/yaml-test-suite/blob/main/src/7A4E.yaml is "Example 7.6" from the spec.
But I cannot easily find "Example 7.6" in this repo, because github doesn't seem to have phrase search:
(Ok, I know, I should checkout and search with grep. But the original question stays)
We have tests like
1: 2
and those have a in-json
data point.
Most libraries are fine with that, as they load numeric keys as strings anyway.
But for example libfyaml, YamlDotNet and lua lyaml are showing failures for those JSON tests.
According to HsYAML and the reference parser it is valid.
ruamel.yaml, NiMYAML, JS js-yaml, JS yaml, and YAML::PP parse it
libyaml, yaml-cpp, pyyaml, ruby psych don't
Consider the following valid cases:
--- blah
--- hello=world
--- +190:20:30
--- x:y
vs. invalid case of mapping after the DocumentStart:
--- x: y
Currently, spec test 27NA and 6LVF etc. cover the cases when document-start followed by value token is on lines, other than the first one.
In the spirit of tightening the net, it would probably be a good idea to add tests clarifying what is allowed after document start marker, and if document-start on first line matters.
this test should result in an error, since c-forbidden in the spec forbids these markers everywhere in the document, including in flow mode.
Opinions?
This is not correct at the moment, because all subtests get the empty-key
tag.
The data-symlinks script needs to be fixed.
Originally posted by @perlpunk in #100 (comment)
Hello @perlpunk, I just wanted to confirm whether this was a breaking change from YAML 1.1 in 1.2?
1.1 spec states (https://yaml.org/spec/1.1/#l-first-document):
If the document does specify any directives, all directives of previous documents, if any, are ignored.
1.2 spec states (https://yaml.org/spec/1.2/spec.html#id2784064):
The choice of tag handle is a presentation detail and must not be used to convey content information. In particular, the tag handle may be discarded once parsing is completed.
My current understanding is:
%YAML 1.1
%TAG !x! tag:example.com,2014:
--- !x!foo
x: 0
--- !x!bar
x: 1
should parse, but by replacing 1.1
with 1.2
, it should fail.
Thanks!
Possible additional tests:
Numbers with different bases (0x, 0b, 0o)
Numbers which might be interpreted wrongly (e.g. 0755)
Numbers with inappropriate tags (e.g. `!!float 0xff`)
This should be covered:
from yaml import CLoader as Loader, CDumper as Dumper
data = load('"bar"\t', Loader=Loader)
This can be extended:
https://github.com/yaml/yaml-test-suite/blob/main/src/DE56.yaml
I'm somewhat confused by the 4FJ6 test case and can't find the answer in the spec, so I thought I'd ask here. The test case specifies that for this input:
---
[
[ a, [ [[b,c]]: d, e]]: 23
]
The events are:
+STR
+DOC ---
+SEQ []
+MAP {}
+SEQ []
=VAL :a
+SEQ []
+MAP {}
+SEQ []
+SEQ []
=VAL :b
=VAL :c
-SEQ
-SEQ
=VAL :d
-MAP
=VAL :e
-SEQ
-SEQ
=VAL :23
-MAP
-SEQ
-DOC
-STR
however, I would have thought that the two maps here which use sequences as colons are not flow style (i.e. no curly braces). Is it the case that the "flow" style is inherited by all child maps and sequences?
To be able to reliably compare JSON the in.json should all be converted to the default jq
output.
To ensure that also future additions have the correct format, we would need a script that automatically converts in-json in a tml file.
The spec tests covering the error cases are currently exercised as:
A next level of improvement could be if the reason of error is spelled out in terms of explicit error messages. Some benefits are:
Current spec test runners will continue to work with binary logic, by checking the existence of error tag (as happening today). Some implementations may chose to align error messages exactly as the spec test describes it for hardening.
For the existing (73) error cases, error messages could be proofread and included in spec test, after collected from certain implementation like libyaml; including the location of error but without the stacktrace e.g.:
Failed to parse YAML. mapping values are not allowed in this context.
in /path/to/file.yaml, line: 2, column: 8
It points to a non-existent file.
The current plan is:
data
branch for developmentdata-1.2.3
with one commit from current data
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.