Comments (7)
To answer your questions:
- I mainly start with an ABNF in the style of RFC 5234 since it's convenient to work with and an IETF standard
- Once the ABNF is stable we can then translate it to the format that various parser generators expect (such as PEG)
- I do eventually plan an RFC just for the Dhall grammar with the goal of standardizing a new content type for Dhall (i.e.
"application/dhall"
or something like that)
I also finished a first draft of the ABNF format so you have something more concrete to work with. It's only missing the specification for how to parse whitespace and Double
s and omits support for trailing commas and bars (which I'm still not sure if I want to keep). I threw this together pretty hastily from the Haskell parser so there may be a few errors, but hopefully this helps you get started:
; ABNF syntax based on RFC 5234, except string literals are case-sensitive and
; assumed to be UTF8-encoded
; TODO: Spaces, tabs, newlines, and comments
whitespace = ???
; Uppercase or lowercase ASCII letter
letter = %x41-5A / %x61-7A
; ASCII digit
digit = %x30-39 ; 0-9
; UTF8 "\""
double-quote = %x22
; UTF8 "\'"
single-quote = %x27
; UTF8 "\\"
back-slash = %x5C
; UTF8 "\t"
tab = %x09
; UTF8 "\r"
carriage-return = %x0D
; UTF8 "\n"
line-feed = "%x0A
simple-label = (letter / "_") *(letter / digit / "-" / "/" / "_")
label = ("`" simple-label "`" / simple-label) white-space
; Printable characters except double quote and backslash, and escape sequences
double-quote-character
= %x5C.5C ; "\\\\"
/ %x5C.22 ; "\\\""
/ %x5C.6e ; "\\n"
/ %x5C.72 ; "\\r"
/ %x5C.74 ; "\\t"
/ %x20-21
; %x22 = '"'
/ %x23-5B
; %x5C = "\\"
/ %x5D-7E
double-quote-chunk
= "${" expression "}" ; Interpolation
/ "''${" ; Escape interpolation
/ double-quote-character
; All printable characters except single quote, and newlines and tabs
not-a-single-quote
= %x20-26
; %x27 = "'"
/ %x27-7E
/ line-feed
/ carriage-return
/ tab
single-quote-chunk
= "'''" ; Escape two single quotes
/ "${" expression "}" ; Interpolation
/ "''${" ; Escape interpolation
/ "'" not-a-single-quote
/ not-a-single-quote
text-literal-raw
= double-quote *double-quote-chunk double-quote
/ single-quote single-quote *single-quote-chunk single-quote single-quote
text-literal = text-literal-raw whitespace
lambda = (%xCE.BB / back-slash) whitespace
forall = (%xE2.88.80 / "forall") whitespace
arrow = (%xE2.86.92 / "->") whitespace
open-parens = "(" whitespace
close-parens = ")" whitespace
colon = ":" whitespace
if = "if" whitespace
then = "then" whitespace
else = "else" whitespace
let = "let" whitespace
equal = "=" whitespace
in = "in" whitespace
merge = "merge" whitespace
open-bracket = "[" whitespace
close-bracket = "]" whitespace
list = "List" whitespace
optional = "Optional" whitespace
or = "||" whitespace
plus = "+" whitespace
text-append = "++" whitespace
list-append = "#" whitespace
and = "&&" whitespace
combine = (%xE2.88.A7 / forward-slash back-slash) whitespace
prefer = (%xE2.AB.BD / forward-slash forward-slash) whitespace
times = "*" whitespace
double-equal = "==" whitespace
not-equal = "!=" whitespace
dot = "." whitespace
open-brace = "{" whitespace
close-brace = "}" whitespace
open-angle = "<" whitespace
close-angle = ">" whitespace
bar = "|" whitespace
comma = "," whitespace
double = ??? ; TODO
; Perhaps this should require a non-zero starting digit
integer = 1*digit whitespace
natural = "+" integer whitespace
at = "@" whitespace
var = label [ at natural ]
; Printable characters other than " ()[]{}<>/\\"
head-path-character
; %x20 = " "
= %x21-27
; %x28 = "("
; %x29 = ")"
/ %x2A-2E
; %x2F = "/"
/ %x2G-3B
; %x3C = "<"
/ %x3D
; %x3E = ">"
/ %x3F-5A
; %x5B = "["
; %x5C = "\\"
; %x5D = "]"
/ %x5E-7A
; %x7B = "{"
/ %x7C
; %x7D = "}"
/ %x7E
path-character = path-character / back-slash / forward-slash
; The first character of an absolute path cannot be a forward slash or
; back-slash since that conflicts with the ASCII versions of the "∧" and "⫽"
; operators
file-raw
= "/" head-path-character *path-character
/ "./" *path-character
/ "~/" *path-character
file = file-raw whitespace
url = ("https://" | "http://") *path-character whitespace
["using" whitespace path-type]
env = "env:" 1*pathChar whitespace
path-type = file / url / env
import = path-type [ "as" whitespace "Text" whitespace ]
expression
; "λ(x : a) → b"
= lambda open-parens label colon expression close-parens arrow expression
; "if a then b else c"
/ if expression then expression else expression
; "let x : t = e1 in e2"
/ let label [ colon expression ] equal expression in expression
; "∀(x : a) → b"
/ forall open-parens label colon expression close-parens arrow expression
; "a → b"
/ operator-expression arrow expression
/ annotated-expression
annotated-expression
= merge selector-expression selector-expression [ colon application-expression ]
/ open-bracket (empty-collection / non-empty-optional)
/ operator-expression (colon expression / "")
empty-collection = close-bracket colon (list / optional) selector-expression
non-empty-optional = expression close-bracket colon optional selector-expression
operator-expression = or-expression
or-expression = plus-expression *(or plus-expression)
plus-expression = text-append-expression *(plus text-append-expression)
text-append-expression =
list-append-expression *(text-append list-append-expression)
list-append-expression = and-expression *(list-append and-expression)
and-expression = combine-expression *(and combine-expression)
combine-expression = prefer-expression *(combine prefer-expression)
prefer-expression = times-expression *(prefer times-expression)
times-expression = equal-expression *(times equal-expression)
equal-expression = not-equal-expression *(equal not-equal-expression)
not-equal-expression =
application-expression *(not-equal application-expression)
application-expression = 1*selector-expression
selector-expression = primitive-expression *(dot label)
primitive-expression
= double
/ natural
/ integer
/ text-literal
/ open-brace record-type-or-literal close-brace
/ open-angle union-type-or-literal close-angle
/ list-literal
/ import
/ "Natural/fold" whitespace
/ "Natural/build" whitespace
/ "Natural/isZero" whitespace
/ "Natural/even" whitespace
/ "Natural/odd" whitespace
/ "Natural/toInteger" whitespace
/ "Natural/show" whitespace
/ "Integer/show" whitespace
/ "Double/show" whitespace
/ "List/build" whitespace
/ "List/fold" whitespace
/ "List/length" whitespace
/ "List/head" whitespace
/ "List/last" whitespace
/ "List/indexed" whitespace
/ "List/reverse" whitespace
/ "Optional/fold" whitespace
/ "Optional/build" whitespace
/ "Bool" whitespace
/ "Optional" whitespace
/ "Natural" whitespace
/ "Integer" whitespace
/ "Double" whitespace
/ "Text" whitespace
/ "List" whitespace
/ "True" whitespace
/ "False" whitespace
/ "Type" whitespace
/ "Kind" whitespace
/ var
/ open-parens expression close-parens
; TODO: This doesn't cover support for trailing commas
record-type-or-literal
= equal ; Empty record literal
/ "" ; Empty record type
/ non-empty-record-type-or-literal
non-empty-record-type-or-literal
= label (non-empty-record-literal / non-empty-record-type)
non-empty-record-type
= colon expression *(comma label colon expression)
non-empty-record-literal
= equal expression *(comma label equal expression)
; TODO: This doesn't cover support for trailing bars
union-type-or-literal
= "" ; Empty union type
/ non-empty-union-type-or-literal
non-empty-union-type-or-literal
= label
( equal expression union-type
/ colon expression (bar non-empty-union-type-or-literal / "")
)
union-type = *(bar label colon expression)
; TODO: This doesn't cover support for trailing commas
list-literal = open-bracket expression *(comma expression) close-bracket
; All expressions end with trailing whitespace. This just adds a final
; whitespace prefix for the top-level of the program
complete-expression = whitespace expression
from dhall-lang.
Yes, I would love to standardize the grammar
I highly recommend starting from the Haskell parser since it makes the precedence of the various constructs very explicit (and it's also mostly left-factored), which disambiguates some common corner cases
https://github.com/dhall-lang/dhall-haskell/blob/master/src/Dhall/Parser.hs
For example, consider this expression:
λ(x : Natural) → x : Natural
The Dhall grammar ensures that there is one unambiguous parse, which is:
λ(x : Natural) → (x : Natural)
... and not:
(λ(x : Natural) → x) : Natural
Some other unassorted notes:
- support for trailing commas and bars for record/union types/literals is one of the messy parts of the grammar to parse if you have to left factor the parser (i.e. LL(1) parsing)
- I've considered just getting rid of support for this, especially given that Dhall now has a code formatter which does not take advantage of this language feature
- The grammar (not the type-checker) enforces the trailing type annotation for empty lists, so if the user omits them its a parse error, not a type error
- The Haskell parser includes a special case for a type annotation for non-empty lists, but this is not necessary
- It's an artifact from back when Dhall used to make type annotations mandatory for all lists
- Same thing for
merge
:- the type annotation used to be mandatory and now it's only required for empty
merge
s - omitting the annotation on the empty merge is a parse error instead of a type error
- the type annotation used to be mandatory and now it's only required for empty
- The grammar for
{=}
now lets it be three separate tokens (i.e.{ = }
, for example) - Identifiers that conflict with reserved keywords can be escaped by surrounding them with backticks
- i.e.
let `let` = 2 in [ `let` ]
- Note that right now you can only escape reserved identifiers, but in theory this could be loosened to escaping any identifier to permit Unicode identifiers, for example
- i.e.
- Block-comments can be nested (i.e. Haskell-style rules, not C-style rules)
- That means that
{- {- -} -}
is valid, but{- {- -}
is not
- That means that
- Haskell identifier rules:
- First character is ASCII letter (upper or lower case) or
_
: - Remaining characters can also have numbers,
-
, or/
- First character is ASCII letter (upper or lower case) or
- You can find the full set of all valid string characters and escape sequences here:
- Don't forget support for string interpolation
I'll work on an ABNF verson of the Dhall grammar today and so we can compare notes
from dhall-lang.
Excellent, and thank you for the advice.
I'll work on an ABNF verson of the Dhall grammar
Would you prefer ABNF? I'd be happy to help collaborate on that instead of continuing with PEG.
ABNF is commonly used in IETF RFCs. Do you intend to submit a spec to a standards body like IETF?
from dhall-lang.
I have a pull request out for the full ABNF grammar here: #14
from dhall-lang.
For standardization purposes, it might be simpler to split the grammar into a lexical grammar to identify tokens (like whitespace, comments, keywords, identifiers, etc) and a syntax grammar, whose input is a token stream (minus whitespace and comments) from the lexical phase.
from dhall-lang.
@chepner: The problem with that is that Dhall whitespace supports nested block comments and Dhall string literals support string interpolation, neither of which you can lex using regular expressions
from dhall-lang.
This is now complete. You can find the latest draft of the standard grammar here:
I've also double-checked the grammar by rewriting the entire parser for the Dhall interpreter to follow the grammar as closely as possible:
... and I also added several files that you can use to test parsing conformance here:
There may still be a few small errors left in the grammar but I think I've caught almost all of them at this point
Let me know if you run into any issues implementing a parser using the grammar
from dhall-lang.
Related Issues (20)
- `with` record update syntax removes other record fields HOT 4
- ABNF grammar should list "as Bytes" import mode
- Should the ABNF grammar disallow shebangs inside expressions? HOT 1
- eta-reducing to merge HOT 4
- Builtins operators reference: Missing record projection HOT 2
- ABNF grammar should explicitly disallow keywords as identifiers? HOT 2
- ABNF grammar should include a mandatory whitespace after `import-hashed`? HOT 1
- Improvements and fixes in the standard documentation HOT 3
- Is this an incorrect test file: `dhall-lang/tests/import/success/unit/ImportRelativeToHomeB.dhall`? HOT 2
- Link to non existing tweet
- A minimalistic proposal for do-notation
- Thoughts on introducing a minimum amount of type inference in Dhall HOT 1
- Introduce Bytes/length and Text/length as built-ins? HOT 3
- Is there a security hole: malicious sha256-protected cached content? HOT 1
- Eta-equivalence in `assert`? HOT 13
- A type level equivalent of the `with` keyword HOT 11
- Convert assertions to Leibniz equality types HOT 3
- A minimalistic proposal for adding row and column polymorphism to Dhall HOT 2
- A proposal for a "lightweight Dhall implementations" standard
- Year, Month, and Date extraction from a Date HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dhall-lang.