Giter Site home page Giter Site logo

siml's Introduction

Simon's Markup Language

SIML is a spiritual successor to YAML, intended for the same niches, but with a different syntax, and a slightly different feature set.

SIML's syntax is optimized for humans, and not for efficient machine-to-machine communication. It is highly suitable for things like configuration files and structured XML/HTML-like markup. That said, the grammar is very simple and very easy for a computer to parse, meaning that the code size of the parser is quite small.

SIML's syntax is a "superset" of JSON, similar to YAML 1.2. All valid JSON can also be parsed as valid SIML.

Features

  • Optimized for human readability.

  • Struct-like syntax with built-in type tags:

    • MyTypeTag { field1: 123.0, field2: "string" }
  • Built-in support for internal references.

    • Prefix any value with @my_name to give it a name.

    • Refer to the value with &my_name, or *my_name to merge it into the current mapping/sequence.

  • Simple grammar, very fast recursive descent parser.

  • Rust-specific: Built-in support for serde.

Planned features

  • High-level C API for interoperability with other languages.

What does it look like?

SIMLYAML
# Comment
key1: unquoted-value
key2: "spaces need quotes"
list: [
  1.0,
  2.0,
  3.0,
]
mapping: {
  key1: "indentation is not significant"
}
type_tags: MyObject {
  field1: 123
}
# Comment
hello: unquoted-value
key2: spaces do not need quotes
list:
- 1.0
- 2.0
- 3.0

mapping: key1: "indentation is significant"

type_tags: !<MyObject> field1: 123

Here are some translated examples from https://json.org/example.html.

glossary: {
    title: "example glossary

    GlossEntry {
        ID: SGML
        SortAs: SGML
        Acronym: SGML
        Abbrev: "ISO 8879:1986"
        GlossDef: {
            para: "A meta-markup language, used to create markup languages such as DocBook"
        }
        GlossSee: markup
    }
}
menu: {
    header: "SVG Viewer"

    # Items don't need keys.

    { id: Open }
    { id: OpenNew, label: "Open New" }
    Separator                          # Identifiers can be items
    { id: ZoomIn, label: "Zoom In" }
    { id: ZoomOut, label: "Zoom Out" }
    { id: OriginalView, label: "Original View" }
    # ...
}

Stringly typed

One of the major differences from JSON, and to some extent YAML, is that all scalar values are strings from the perspective of the parser. Instead, validation and conversion to/from other data types is delegated to whatever serialization framework is used on top of SIML.

This gives significant freedom to design domain-specific schemas for SIML documents, and avoids pitfalls where the serialization format imposes specific restrictions on the data model.

For example, one famous shortcoming of YAML is that the character sequence no is parsed as boolean false, but this might be unexpected in many contexts, like a list of country codes, where it could also mean "Norway".

Another example is numeric types, where tying a data model to a particular representation is often limiting. For example, you might want big numbers to look like numbers, even if they can't be represented by an i64 or an f64.

Given this, SIML is only structurally "self-describing" (sequences and mappings), prioritizing human-friendliness over validation.

Rust-specific: The implementation of serde::Deserializer supports deserialize_any(), but the deserializer makes an educated guess about the format, where it tries to parse scalars as booleans and numeric types, and deserializes a string if those fail. This may not be what you want, so it is good practice to avoid deserialize_any() when possible.

Comparison with YAML and JSON

  • Identifier-like or number-like strings do not need to be in quotes (similar to "plain" scalars in YAML). However, they cannot contain whitespace.
  • Scalars are strings. Parsing of numbers, booleans, etc. is handled by the higher-level deserializer. This means that there are no "special" strings in the syntax (true means the same as "true"), and all scalar values can always be deserialized as strings.
  • Indentation is not significant. Sequences and mappings are always surrounded by delimiters such as {...}, [...], or (...), like JSON.
  • Newlines are significant within mappings, where it can be the delimiter between entries instead of comma.
  • Duplicate keys are explicitly allowed in mappings, and no attempt is made to prevent that use case.
  • Mappings are a hybrid between key-value pairs and sequences, meaning that the syntax explicitly supports entries in mappings without keys. This is to support the common (XML-like) pattern of having objects that have both attributes and "contents", which would otherwise only be possible in JSON and YAML by having a special items attribute.
    • To explicitly indicate an item without a key in a mapping, use the syntax ?: value.
    • To explicitly indicate the presence of a key in a key-value pair, use the syntax ?key: value.
    • Complex keys (mappings and sequences) must always be preceded by ?, like ?[1,2,3]: value1.
  • The root of a document is always logically a mapping, but that mapping can have a single scalar child element. The input foo is parsed as { ?: foo }.
  • It is possible to distinguish between lists and tuples, by choosing either square brackets or parentheses when typing a sequence. Deserializers may choose to consider the style of the sequence significant or not.
  • Sequences and mappings have explicit support in the syntax for a "type tag", which makes them look like Rust expressions. This is particularly handy for any heterogenously typed container.
    • The implementation of serde::Serializer and serde::Deserializer uses this feature to indicate enum variants.
  • Complex keys (sequences and mappings) are supported in mappings, by prefixing the key with ? (like YAML).
  • Trimmed and folded string literals are supported.
    • -"..." trims all leading and trailing whitespace.
    • |"..." trims the leading indentation of each line.
    • >"..." trims leading and trailing whitespace for each line, and folds all lines into a single line, separated by a space.
    • ```...``` trims nothing, and supports unescaped double-quote and single-quote characters, and does not interpret any escape sequences (verbatim notation, like Markdown).
  • Comments are preserved in the syntax tree, and have logical attachment to nodes. For example, a key-value pair preceded by a comment on a separate line, or followed by a comment on the same line, will be annotated such that the comment is available after parsing the document.

siml-fmt

Since styling properties (like string folding options, sequence styles, etc.) and comments are preserved by the parser, and can be used by the emitter. Thus, the implementation of siml-fmt first parses the document, and then simply emits the document again.

Footnotes

  1. The ? prefix for complex keys significantly simplifies the parser by eliminating the need for left-recursion. โ†ฉ

siml's People

Contributors

simonask avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.