Giter Site home page Giter Site logo

yaml-test-suite's Issues

CXX2 violates YAML 1.2 grammar

test-case CXX2 considers

--- &anchor a: b

a valid yaml-stream; but in fact, the grammar doesn't allow for block collection nodes to appear on the --- line

This also excludes the simpler

--- a: b

or also sequences such as

--- - x

The reason becomes obvious when we follow/trace the productions:

[208]  l-explicit-document  ::=  c-directives-end  l-bare-document

[207]  l-bare-document  ::=  s-l+block-node(-1,block-in)



n = -1
c = block-in

[196]  s-l+block-node(n,c)  ::=  s-l+block-in-block(n,c) | s-l+flow-in-block(n)

[198]  s-l+block-in-block(n,c)  ::=  s-l+block-scalar(n,c) | s-l+block-collection(n,c)


[200] s-l+block-collection(n,c) ::= ( s-separate(n+1,c) c-ns-properties(n+1,c) )?
                                    s-l-comments
                                    ( l+block-sequence(seq-spaces(n,c)) | l+block-mapping(n) ) 	 

[79] 	s-l-comments 	::= 	( s-b-comment | /* Start of line */ ) l-comment* 


[77] 	s-b-comment 	::= 	( s-separate-in-line c-nb-comment-text? )?  b-comment

[76]  b-comment  ::=  b-non-content | /* End of file */ 

[30]  b-non-content  ::=  b-break

so the problem is that in order to match s-l+block-collection(n,c) (rule 200), we must match s-l-comments, but since we are not on column 1 (because we're 3 characters into the line due to ---), we must instead match s-b-comment; and that demands to match a line-break via b-break (or an EOF).

Hence, we cannot match a s-l+block-collection on the same line right after the directives-end marker --- .

QED :-)


PS: One obvious workaround would be to make matching s-l-comments optional; but that would make other currently disallowed cases such as

k: - x
   - y

(for which iirc there's at least one tests which tests that YAML parser refuse it as a a parsing error). I don't know if there's an easy fix to make the grammar accept CXX2 without at the same being too liberal and allowing other syntax that isn't intended to be valid.

Make test data available as JSON

I've been messing about in different languages and different YAML implementations. To make this easier I made a JSON file containing all named tests and their data.

This file will be outdated soon and only has the named tests, so how about adding something similar containing all tests?

Also finding the data branch was way too hard, please make the readme section explaining it bigger and or bolder ;)

Should L24T/01/in.yaml and JEF9/02/in.yaml end with \n?

I'm looking at an issue of my parser with L24T/01, added in #105. My parser gives back a string without trailing newline.

If I look at L24T/01/in.yaml in the data-2022-01-17 tag, the file does not have a trailing newline:

$ hexdump -C yaml-test-suite/L24T/01/in.yaml 
00000000  66 6f 6f 3a 20 7c 0a 20  20 78 0a 20 20 20        |foo: |.  x.   |
0000000e

If my understanding of the spec is correct, b-chomped-last would then use the alternative b-as-line-feed rather than <end-of-input>, so there should be no trailing newline parsed. Is this correct? Should the file be generated with a trailing newline, or should out.yaml be adapted? Or is my understanding of the spec incorrect?

I have the same issue with JEF9/02, added in #90.

Pinging @ingydotnet & @perlpunk because you were involved in the PR that added this test.

2LFX and 6LVF appear identical

The only difference between 2LFX and 6LVF is a linebreak before "foo in in.yaml and out.yaml.
Is it redundant or is this to assert that downstream emitter should respect linebreak before "foo (even though the expected event streams are identical)?

8G76: an empty file is not a valid JSON document

Test case 8G76 contains an empty in.json file, but an empty file is not a valid JSON document.

I'd need confirmation, but I think in.yaml parses canonically as null (since it's the empty string, and it matches null in the 1.2 core schema). In this case, I'd expect in.json to just contain the string null.

Missing tests for spec sections 5.13 and 5.14 (Escaped Characters)

Most of the examples from the spec are included as tests, but I notice 5.13 and 5.14 are absent.

In particular, I don't see any test in this suite which uses escaped 32-bit unicode characters like "\U00000041"

I'm happy to submit a PR, but I'm not sure how these test files are generated. Are the tree/dump/emit fields just handwritten?

S98Z ought to be an error under YAML 1.2

The test-case S98Z contains the YAML document

empty block scalar: >
<SPC>
<SPC><SPC>
<SPC><SPC><SPC>
 # comment

and claims this to be semantically equivalent to

{ "empty block scalar": "" }

However, according to

8.1.1.1. Block Indentation Indicator

Typically, the indentation level of a block scalar is detected from its first non-empty line. It is an error for any of the leading empty lines to contain more spaces than the first non-empty line.

Detection fails when the first non-empty line contains leading content space characters. Content may safely start with a tab or a “#” character.

...

And in Example 8.2. Block Indentation Indicator

- >
·
··
··# detected

is given as a positive example with an inferred indentation level 2; as well as a negative example in Example 8.3. Invalid Block Scalar Indentation Indicators:

- |
··
·text

which expects a parser failure (A leading all-space line must not have too many spaces.)


In order to detect the indentation level n of a block scalar which wasn't manually specified, we need to find the first non-empty line. And 8.1.1.1 is quite clear that Content may safely start with a tab or a “#” character., so a line starting with a # can be considered content (as it does in exmaple 8.2), and thus qualifies as first non-empty line.

The yaml stream in S98Z however appears to consider the # comment line an actual comment even though according to section 8.1.1.1 it is in fact the first non-empty line, and thus to be considered as content of the block scalar.

Moreover, since the preceding all-space lines contain more spaces (i.e. 3) than the inferred n-level (i.e. 2), and consequently an error along the lines of "A leading all-space line must not have too many spaces." is warranted.

Add test for e.g. `[[], :@]`

Some parsers have demonstrated difficulty with colon-prefixed plain scalars within flow sequences, e.g.:

[[], :@]
---
[[], :%]
---
[[], :^]
---
[[], :$]
---
[[], ::]
---
[[], :\t]
---
[[], :`]

These are valid yaml documents, parsed as expected by e.g. libfyaml, libyaml, and others; but e.g. pyyaml, SnakeYAML, YAML:PP, Ruamel, and JS yaml (the last as a result of a regression, now fixed) fail on some or all of them, some requiring the leading flow sequence as the first element in the outer flow sequence, some not.

Explain Token stream notation, Event stream notation

The README says that repo includes

  • Token stream notation
  • Event stream notation

Where are these defined?

Would help me with questions like:

  • I see only tree: in src but not 2 kinds of things
  • The tree: notation is pretty self-explanatory, but what are all possible options there?
  • What are the "indicator" chars allowed after =VAL ?
    • Eg =VAL " means "a string that was enclosed in double quotes" but what are all possible options?

Squash `data` branch

Users of the test suite should ideally only use the tml files from master/releases, or the files from the data-* releases.

We will squash the data branch regularly to save some space, but also try to do more regular releases.

Since this hasn't been done before and users might still be using the data branch, I created this issue to notify potential users (taken from the list of libraries in the readme):

Fix generated test in Y79Y

Found many errors in Y79Y test.event files.

Y79Y/004

Input:

-———»-

Expected event output:

+STR
+DOC
+SEQ
+SEQ

Found

+STR
+DOC
+SEQ

Y79Y/005

Input:

- ——»-

Expected event output:

+STR
+DOC
+SEQ

Found:

+STR
+DOC
+SEQ
+SEQ []

Y79Y/006

Input:

?——»-

Expected event output:

+STR
+DOC
+MAP

Found:

+STR
+DOC
+SEQ
+SEQ []

Y79Y/007

Input:

? -
-——»-

Expected event output:

+STR
+DOC
+MAP
+SEQ
=VAL
-SEQ

Found:

+STR
+DOC
+SEQ
+SEQ []

Y79Y/008

Input:

?——»key:

Expected event output:

+STR
+DOC
+MAP      #Unsure about where will it terminate here

Found:

+STR
+DOC
+SEQ
+SEQ []

Y79Y/009

Input:

? key:
:——»key:

Expected event output :

+STR
+DOC
+MAP
+MAP
=VAL :key
=VAL :
-MAP       #Unsure about where will it terminate here

Found:

+STR
+DOC
+SEQ
+SEQ []

M7A3: Either in-json or out-yaml is wrong

This test corresponds to Example 9.3 of the spec, where a stream has either two or three documents, depending on whether the second one is skipped or not:

Bare
document
...
# No document
...
|
%!PS-Adobe-2.0 # Not the first line

The current in-json and the test-event stream skip the second one, but the out-yaml includes it as an explicit document. I'm not sure what's really right here, but I'd at least prefer to include the second empty/null document in the stream.

Issues with in.json files

I wrote a program to calculate the expected JSON from the test.event file and got it to run comparisons against all the in.json files. I observed the following anomalies:

2JQS:
Uses duplicate key values (both null) so is invalid YAML (from section 3.2.1.1 "The content of a mapping node is an unordered set of key: value node pairs, with the restriction that each of the keys is unique")

4ABK, 5WE3, 7W2P, 8KHE, C2DT, DBG4, FRK4:
Omitted values should be represented as JSON null, not empty string.

8UDB:
in.json is syntactically invalid.

DBG4:
Unquoted numbers are represented as strings in in.json.

G4RS:
The value for the "quoted" key in in.json is wrong (it should only have single quotes).

27NA, 2AUY, 2SXE, 2XXW, 35KP, 3GZX, 4GC6, 4UYU, 57H4, 5TYM, 6CK3, 6FWR, 6JWB, 6LVF, 6M2F, 6VJK, 6ZKB, 74H7, 77H8, 7A4E, 7BUB, 7FWL, 7T8X, 8G76, 96L6, 98YD, 9WXW, 9YRD, BEC7, BP6S, CC74, CUP7, DWX9, E76Z, EHF6, F2C7, FH7J, G992, HMQ5, HS5T, JHB9, JS2J, K527:
in.json not included.

Support flow style check in start events

Let us add support to check flow style in MappingStartEvent and SequenceStartEvent
( '+MAP {}' and '+SEQ []')
I would like to use it in SnakeYAML.

How can I help ?

Why 9C9N is invalid ?

At the moment this YAML is considered to be invalid (Wrong indented flow sequence):

---
flow: [a,
b,
c]

JS-YAML agrees with me that it is perfectly valid YAML.

Why is it invalid ?

edit @perlpunk: put code delimiters around example

Tests 9KBC and CXX2 can't both be right

These tests have a map start on the same line with the directives-end marker. In 9KBC, this is expected to result in an error, whereas in CXX2 the collection should be rendered without a problem. The only difference between these is that that the latter has only one key, while the former has two.

Furthermore, I at least can't find a clear explanation in the spec about what content is or is not allowed on the directives-end line. The spec does include some examples that have at least plain and block scalars on the directives-end line, as well as tags for collections that start on the following line. So where does it say that collections can't start there either?

Document end marker after every open ended document?

See also #49

If we have an open ended block scalar at the end of the stream:

keep: |+
  line1


It should be emitted as:

keep: |+
  line1


...

But maybe this rule should not only be for the last document in a stream, but for every document, which is especially important in streaming context:

---
keep: |+
  line1


--- doc two

Every document should be able to taken out of a stream and represent the same. If you take it out and accidentally add a newline, then it has one more empty line.

So when emitting the above documents, the output should look like this:

---
keep: |+
  line1


...
--- doc two

What do you think? @ingydotnet @hvr @eemeli @pantoniou @am11

Note: This rule is for emitters. The marker would not be required when parsing.

Note 2: Open-ended only means block scalars with trailing empty lines (|+, >+). I know that in YAML 1.1 open-ended has a slightly different meaning.

provide test cases in non-TestML format

I'd like to use the test cases here to test the Go YAML implementation (https://gopkg.in/yaml.v2) but the fact that they're formatted in the relatively obscure TestML data format makes them quite inaccessible (and I don't have the time to write a TestML parser in Go). Would it be possible to provide them in a more universally understood format, such as JSON, please?

K54U out.yaml

Why has the out.yaml a explicit document closure?

in.yaml

--- scalar

out.yaml

--- scalar
...

This does not seems logic to me :-)

test.event

+STR
+DOC ---
=VAL :scalar
-DOC
-STR

Address #40 and #54

PRs #40 and #54 were closed when I removed the master branch after replacing it with main.

I didn't realize that would happen.

This issues will keep them relevant for now.

W4TN (Spec Example 9.5. Directives Documents)

This test-case is taken from the YAML 1.2 spec, i.e. from http://yaml.org/spec/1.2/spec.html#id2801606 and contains the YAML document(s)

%YAML 1.2
--- |
%!PS-Adobe-2.0
...
%YAML1.2
---
# Empty
...

The relevant productions are

l-directive ::= “%” ( ns-yaml-directive | ns-tag-directive | ns-reserved-directive ) s-l-comments

ns-yaml-directive ::= “Y” “A” “M” “L” s-separate-in-line ns-yaml-version

ns-yaml-version ::= ns-dec-digit+  “.”  ns-dec-digit+

s-separate-in-line ::= s-white+ | /* Start of line */

Iow, s-separate-in-line demands a whitespace between YAML and 1.2

To me this looks like a typo in the example YAML, as it's the only occurence throughout the spec where a white-space was omitted.

Licensing

Hi,

We have a project to evaluate and compare several YAML parsers, and wanted to include your test suite.
For us to do so, the test suite would need a license. Could you add one, preferably Apache 2.0, MIT, or another notice-type license?

Thank you,
Yuan Kang

Test NTY5 (Missing space in YAML directive) should not be an error

The test was introduced a few weeks ago in 30f372c. The relevant directives are defined with:

[82]           l-directive ::= "%"
                               ( ns-yaml-directive | ns-tag-directive | ns-reserved-directive )
                               s-l-comments
[83] ns-reserved-directive ::= ns-directive-name
                               ( s-separate-in-line ns-directive-parameter )*
[84]     ns-directive-name ::= ns-char+
[86]     ns-yaml-directive ::= "Y" "A" "M" "L" s-separate-in-line ns-yaml-version
[66]    s-separate-in-line ::= s-white+ | /* Start of line */

Therefore, a directive line %YAML1.2 should get parsed as ns-reserved-directive, regarding which the following instruction applies: "A YAML processor should ignore unknown directives with an appropriate warning."

Effectively, as "appropriate warning" is rather implementation-specific, it might be best to just remove this test completely. Or to just leave out the error, as parsers are expected to deal with this situation without an error.

Finding out remote url needs to be fixed

gh-pages:
 	git clone $$(git config remote.origin.url) -b $@ $@

This doesn't work for me. Same with target data.
I don't have a remote called origin. I rename the author remote to author whenever I do a fork

@sigmavirus24 suggested this: git config --get branch.<branch-name>.remote to get the remote name.
So something like this:

remote=$(git config --get branch.master.remote)
url=$(git config remote.$remote.url)
git clone $url -b $@ $@

is there some logic to the naming of tests?

Is there some system to the test names/URLs?

Eg https://github.com/yaml/yaml-test-suite/blob/main/src/7A4E.yaml is "Example 7.6" from the spec.
But I cannot easily find "Example 7.6" in this repo, because github doesn't seem to have phrase search:

(Ok, I know, I should checkout and search with grep. But the original question stays)

Convert all numeric mapping keys to strings

We have tests like

1: 2

and those have a in-json data point.
Most libraries are fine with that, as they load numeric keys as strings anyway.
But for example libfyaml, YamlDotNet and lua lyaml are showing failures for those JSON tests.

Add a test for `{a: b?}`

According to HsYAML and the reference parser it is valid.
ruamel.yaml, NiMYAML, JS js-yaml, JS yaml, and YAML::PP parse it
libyaml, yaml-cpp, pyyaml, ruby psych don't

Test to cover "Value after document-start" (on first line)

Consider the following valid cases:

--- blah
--- hello=world
--- +190:20:30
--- x:y

vs. invalid case of mapping after the DocumentStart:

--- x: y

Currently, spec test 27NA and 6LVF etc. cover the cases when document-start followed by value token is on lines, other than the first one.

In the spirit of tightening the net, it would probably be a good idea to add tests clarifying what is allowed after document start marker, and if document-start on first line matters.

Does the error case QLJ7 indicate a breaking change between 1.1 and 1.2?

Hello @perlpunk, I just wanted to confirm whether this was a breaking change from YAML 1.1 in 1.2?

1.1 spec states (https://yaml.org/spec/1.1/#l-first-document):

If the document does specify any directives, all directives of previous documents, if any, are ignored.

1.2 spec states (https://yaml.org/spec/1.2/spec.html#id2784064):

The choice of tag handle is a presentation detail and must not be used to convey content information. In particular, the tag handle may be discarded once parsing is completed.

My current understanding is:

%YAML 1.1
%TAG !x! tag:example.com,2014:
--- !x!foo
x: 0
--- !x!bar
x: 1

should parse, but by replacing 1.1 with 1.2, it should fail.

Thanks!

more numeric tag tests needed

Possible additional tests:

Numbers with different bases (0x, 0b, 0o)
Numbers which might be interpreted wrongly (e.g. 0755)
Numbers with inappropriate tags (e.g. `!!float 0xff`)

4FJ6 events

I'm somewhat confused by the 4FJ6 test case and can't find the answer in the spec, so I thought I'd ask here. The test case specifies that for this input:

---
[
  [ a, [ [[b,c]]: d, e]]: 23
]

The events are:

+STR
+DOC ---
+SEQ []
+MAP {}
+SEQ []
=VAL :a
+SEQ []
+MAP {}
+SEQ []
+SEQ []
=VAL :b
=VAL :c
-SEQ
-SEQ
=VAL :d
-MAP
=VAL :e
-SEQ
-SEQ
=VAL :23
-MAP
-SEQ
-DOC
-STR

however, I would have thought that the two maps here which use sequences as colons are not flow style (i.e. no curly braces). Is it the case that the "flow" style is inherited by all child maps and sequences?

in.json should be converted to canonical format

To be able to reliably compare JSON the in.json should all be converted to the default jq output.
To ensure that also future additions have the correct format, we would need a script that automatically converts in-json in a tml file.

Error cases with messages

The spec tests covering the error cases are currently exercised as:

  • a sentinel indicator whether there should be an error
  • which tokens were parsed before the error

A next level of improvement could be if the reason of error is spelled out in terms of explicit error messages. Some benefits are:

  • captures the test author's intent.
  • forgoes the ambiguities; depending on the algorithm used by the implementations:
    • there could be multiple candidate validations which might be violating and streamlining it would bring about consistency across implementations.
    • there could be some potentially wrong assumption which may lead to some other ambiguities in implementation; which could be avoided early on.

Current spec test runners will continue to work with binary logic, by checking the existence of error tag (as happening today). Some implementations may chose to align error messages exactly as the spec test describes it for hardening.

For the existing (73) error cases, error messages could be proofread and included in spec test, after collected from certain implementation like libyaml; including the location of error but without the stacktrace e.g.:

Failed to parse YAML. mapping values are not allowed in this context.
  in /path/to/file.yaml, line: 2, column: 8

Versioning of data

The current plan is:

  • We keep adding commits to the data branch for development
  • If we think it is worth to make a release, we create a version number v1.2.3
    • We tag master with v1.2.3
    • We create an orphaned data branch called data-1.2.3 with one commit from current data
    • The data branch will be squashed to one commit, and we start adding commits to it again

TODOs

  • Make sure when adding a commit to data, that the data branch is fetched and rebased to origin
  • Include the short commit sha from master into the commit message
  • Write a utility for "tagging" or actually creating data-release branches

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.