Giter Site home page Giter Site logo

Comments (7)

Gabriella439 avatar Gabriella439 commented on September 26, 2024 1

To answer your questions:

  • I mainly start with an ABNF in the style of RFC 5234 since it's convenient to work with and an IETF standard
  • Once the ABNF is stable we can then translate it to the format that various parser generators expect (such as PEG)
  • I do eventually plan an RFC just for the Dhall grammar with the goal of standardizing a new content type for Dhall (i.e. "application/dhall" or something like that)

I also finished a first draft of the ABNF format so you have something more concrete to work with. It's only missing the specification for how to parse whitespace and Doubles and omits support for trailing commas and bars (which I'm still not sure if I want to keep). I threw this together pretty hastily from the Haskell parser so there may be a few errors, but hopefully this helps you get started:

; ABNF syntax based on RFC 5234, except string literals are case-sensitive and
; assumed to be UTF8-encoded

; TODO: Spaces, tabs, newlines, and comments
whitespace = ???

; Uppercase or lowercase ASCII letter
letter = %x41-5A / %x61-7A

; ASCII digit
digit = %x30-39  ; 0-9

; UTF8 "\""
double-quote = %x22

; UTF8 "\'"
single-quote = %x27

; UTF8 "\\"
back-slash = %x5C

; UTF8 "\t"
tab = %x09

; UTF8 "\r"
carriage-return = %x0D

; UTF8 "\n"
line-feed = "%x0A

simple-label = (letter / "_") *(letter / digit / "-" / "/" / "_")

label = ("`" simple-label "`" / simple-label) white-space

; Printable characters except double quote and backslash, and escape sequences
double-quote-character
    = %x5C.5C ; "\\\\"
    / %x5C.22 ; "\\\""
    / %x5C.6e ; "\\n"
    / %x5C.72 ; "\\r"
    / %x5C.74 ; "\\t"
    / %x20-21
        ; %x22 = '"'
    / %x23-5B
        ; %x5C = "\\"
    / %x5D-7E

double-quote-chunk
    = "${" expression "}"     ; Interpolation
    / "''${"                  ; Escape interpolation
    / double-quote-character

; All printable characters except single quote, and newlines and tabs
not-a-single-quote
    = %x20-26
        ; %x27 = "'"
    / %x27-7E
    / line-feed
    / carriage-return
    / tab

single-quote-chunk
    = "'''"                   ; Escape two single quotes
    / "${" expression "}"     ; Interpolation
    / "''${"                  ; Escape interpolation
    / "'" not-a-single-quote
    / not-a-single-quote

text-literal-raw
    = double-quote *double-quote-chunk double-quote
    / single-quote single-quote *single-quote-chunk single-quote single-quote

text-literal = text-literal-raw whitespace

lambda = (%xCE.BB / back-slash) whitespace

forall = (%xE2.88.80 / "forall") whitespace

arrow = (%xE2.86.92 / "->") whitespace

open-parens = "(" whitespace

close-parens = ")" whitespace

colon = ":" whitespace

if = "if" whitespace

then = "then" whitespace

else = "else" whitespace

let = "let" whitespace

equal = "=" whitespace

in = "in" whitespace

merge = "merge" whitespace

open-bracket = "[" whitespace

close-bracket = "]" whitespace

list = "List" whitespace

optional = "Optional" whitespace

or = "||" whitespace

plus = "+" whitespace

text-append = "++" whitespace

list-append = "#" whitespace

and = "&&" whitespace

combine = (%xE2.88.A7 / forward-slash back-slash) whitespace

prefer = (%xE2.AB.BD / forward-slash forward-slash) whitespace

times = "*" whitespace

double-equal = "==" whitespace

not-equal = "!=" whitespace

dot = "." whitespace

open-brace = "{" whitespace

close-brace = "}" whitespace

open-angle = "<" whitespace

close-angle = ">" whitespace

bar = "|" whitespace

comma = "," whitespace

double = ??? ; TODO

; Perhaps this should require a non-zero starting digit
integer = 1*digit whitespace

natural = "+" integer whitespace

at = "@" whitespace

var = label [ at natural ]

; Printable characters other than " ()[]{}<>/\\"
head-path-character
        ; %x20 = " "
    = %x21-27
        ; %x28 = "("
        ; %x29 = ")"
    / %x2A-2E
        ; %x2F = "/"
    / %x2G-3B
        ; %x3C = "<"
    / %x3D
        ; %x3E = ">"
    / %x3F-5A
        ; %x5B = "["
        ; %x5C = "\\"
        ; %x5D = "]"
    / %x5E-7A
        ; %x7B = "{"
    / %x7C
        ; %x7D = "}"
    / %x7E

path-character = path-character / back-slash / forward-slash

; The first character of an absolute path cannot be a forward slash or
; back-slash since that conflicts with the ASCII versions of the "∧" and "⫽"
; operators
file-raw
    = "/" head-path-character *path-character
    / "./" *path-character
    / "~/" *path-character

file = file-raw whitespace

url = ("https://" | "http://") *path-character whitespace
      ["using" whitespace path-type]

env = "env:" 1*pathChar whitespace

path-type = file / url / env

import = path-type [ "as" whitespace "Text" whitespace ]

expression
    ; "λ(x : a) → b"
    = lambda open-parens label colon expression close-parens arrow expression
    ; "if a then b else c"
    / if expression then expression else expression
    ; "let x : t = e1 in e2"
    / let label [ colon expression ] equal expression in expression
    ; "∀(x : a) → b"
    / forall open-parens label colon expression close-parens arrow expression
    ; "a → b"
    / operator-expression arrow expression
    / annotated-expression

annotated-expression
    = merge selector-expression selector-expression [ colon application-expression ]
    / open-bracket (empty-collection / non-empty-optional)
    / operator-expression (colon expression / "")

empty-collection = close-bracket colon (list / optional) selector-expression

non-empty-optional = expression close-bracket colon optional selector-expression

operator-expression = or-expression

or-expression = plus-expression *(or plus-expression)

plus-expression = text-append-expression *(plus text-append-expression)

text-append-expression =
    list-append-expression *(text-append list-append-expression)

list-append-expression = and-expression *(list-append and-expression)

and-expression = combine-expression *(and combine-expression)

combine-expression = prefer-expression *(combine prefer-expression)

prefer-expression = times-expression *(prefer times-expression)

times-expression = equal-expression *(times equal-expression)

equal-expression = not-equal-expression *(equal not-equal-expression)

not-equal-expression =
    application-expression *(not-equal application-expression)

application-expression = 1*selector-expression

selector-expression = primitive-expression *(dot label)

primitive-expression
    = double
    / natural
    / integer
    / text-literal
    / open-brace record-type-or-literal close-brace
    / open-angle union-type-or-literal close-angle
    / list-literal
    / import
    / "Natural/fold" whitespace
    / "Natural/build" whitespace
    / "Natural/isZero" whitespace
    / "Natural/even" whitespace
    / "Natural/odd" whitespace
    / "Natural/toInteger" whitespace
    / "Natural/show" whitespace
    / "Integer/show" whitespace
    / "Double/show" whitespace
    / "List/build" whitespace
    / "List/fold" whitespace
    / "List/length" whitespace
    / "List/head" whitespace
    / "List/last" whitespace
    / "List/indexed" whitespace
    / "List/reverse" whitespace
    / "Optional/fold" whitespace
    / "Optional/build" whitespace
    / "Bool" whitespace
    / "Optional" whitespace
    / "Natural" whitespace
    / "Integer" whitespace
    / "Double" whitespace
    / "Text" whitespace
    / "List" whitespace
    / "True" whitespace
    / "False" whitespace
    / "Type" whitespace
    / "Kind" whitespace
    / var
    / open-parens expression close-parens

; TODO: This doesn't cover support for trailing commas
record-type-or-literal
    = equal ; Empty record literal
    / ""    ; Empty record type
    / non-empty-record-type-or-literal

non-empty-record-type-or-literal
    = label (non-empty-record-literal / non-empty-record-type)

non-empty-record-type
    = colon expression *(comma label colon expression)

non-empty-record-literal
    = equal expression *(comma label equal expression)

; TODO: This doesn't cover support for trailing bars
union-type-or-literal
    = ""  ; Empty union type
    / non-empty-union-type-or-literal

non-empty-union-type-or-literal
    = label
      ( equal expression union-type
      / colon expression (bar non-empty-union-type-or-literal / "")
      )

union-type = *(bar label colon expression)

; TODO: This doesn't cover support for trailing commas
list-literal = open-bracket expression *(comma expression) close-bracket

; All expressions end with trailing whitespace.  This just adds a final
; whitespace prefix for the top-level of the program
complete-expression = whitespace expression

from dhall-lang.

Gabriella439 avatar Gabriella439 commented on September 26, 2024

Yes, I would love to standardize the grammar

I highly recommend starting from the Haskell parser since it makes the precedence of the various constructs very explicit (and it's also mostly left-factored), which disambiguates some common corner cases

https://github.com/dhall-lang/dhall-haskell/blob/master/src/Dhall/Parser.hs

For example, consider this expression:

λ(x : Natural)  x : Natural

The Dhall grammar ensures that there is one unambiguous parse, which is:

λ(x : Natural)  (x : Natural)

... and not:

(λ(x : Natural)  x) : Natural

Some other unassorted notes:

  • support for trailing commas and bars for record/union types/literals is one of the messy parts of the grammar to parse if you have to left factor the parser (i.e. LL(1) parsing)
    • I've considered just getting rid of support for this, especially given that Dhall now has a code formatter which does not take advantage of this language feature
  • The grammar (not the type-checker) enforces the trailing type annotation for empty lists, so if the user omits them its a parse error, not a type error
  • The Haskell parser includes a special case for a type annotation for non-empty lists, but this is not necessary
    • It's an artifact from back when Dhall used to make type annotations mandatory for all lists
  • Same thing for merge:
    • the type annotation used to be mandatory and now it's only required for empty merges
    • omitting the annotation on the empty merge is a parse error instead of a type error
  • The grammar for {=} now lets it be three separate tokens (i.e. { = }, for example)
  • Identifiers that conflict with reserved keywords can be escaped by surrounding them with backticks
    • i.e. let `let` = 2 in [ `let` ]
    • Note that right now you can only escape reserved identifiers, but in theory this could be loosened to escaping any identifier to permit Unicode identifiers, for example
  • Block-comments can be nested (i.e. Haskell-style rules, not C-style rules)
    • That means that {- {- -} -} is valid, but {- {- -} is not
  • Haskell identifier rules:
  • You can find the full set of all valid string characters and escape sequences here:
  • Don't forget support for string interpolation

I'll work on an ABNF verson of the Dhall grammar today and so we can compare notes

from dhall-lang.

jmitchell avatar jmitchell commented on September 26, 2024

Excellent, and thank you for the advice.

I'll work on an ABNF verson of the Dhall grammar

Would you prefer ABNF? I'd be happy to help collaborate on that instead of continuing with PEG.

ABNF is commonly used in IETF RFCs. Do you intend to submit a spec to a standards body like IETF?

from dhall-lang.

Gabriella439 avatar Gabriella439 commented on September 26, 2024

I have a pull request out for the full ABNF grammar here: #14

from dhall-lang.

chepner avatar chepner commented on September 26, 2024

For standardization purposes, it might be simpler to split the grammar into a lexical grammar to identify tokens (like whitespace, comments, keywords, identifiers, etc) and a syntax grammar, whose input is a token stream (minus whitespace and comments) from the lexical phase.

from dhall-lang.

Gabriella439 avatar Gabriella439 commented on September 26, 2024

@chepner: The problem with that is that Dhall whitespace supports nested block comments and Dhall string literals support string interpolation, neither of which you can lex using regular expressions

from dhall-lang.

Gabriella439 avatar Gabriella439 commented on September 26, 2024

This is now complete. You can find the latest draft of the standard grammar here:

https://github.com/dhall-lang/dhall-lang/blob/bbc74064b2625d9c7635c7bfd1a55cede6a229aa/standard/dhall.abnf

I've also double-checked the grammar by rewriting the entire parser for the Dhall interpreter to follow the grammar as closely as possible:

https://github.com/dhall-lang/dhall-haskell/blob/dca0f4ea0f13342eb8b8657721d154f4e6c9db98/src/Dhall/Parser.hs

... and I also added several files that you can use to test parsing conformance here:

https://github.com/dhall-lang/dhall-haskell/tree/dca0f4ea0f13342eb8b8657721d154f4e6c9db98/tests/parser

There may still be a few small errors left in the grammar but I think I've caught almost all of them at this point

Let me know if you run into any issues implementing a parser using the grammar

from dhall-lang.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.