Giter Site home page Giter Site logo

harpocrates / language-rust Goto Github PK

View Code? Open in Web Editor NEW
87.0 7.0 18.0 1 MB

Parser and pretty-printer for the Rust language

Home Page: https://hackage.haskell.org/package/language-rust

License: BSD 3-Clause "New" or "Revised" License

Haskell 73.96% Logos 5.47% Yacc 16.97% Rust 2.06% Shell 0.68% Python 0.86%
parser pretty-printer rust syntax ast

language-rust's Introduction

Parser and pretty printer for Rust Build Status Windows build status Hackage Version

language-rust aspires to efficiently and accurately parse and pretty print the Rust language. The underlying AST structures are also intended to be as similar as possible to the libsyntax AST that rustc itself uses.

A typical use looks like:

>>> :set -XTypeApplications +t
>>> import Language.Rust.Syntax

>>> -- Sample use of the parser
>>> import Language.Rust.Parser
>>> let inp = inputStreamFromString "fn main () { println!(\"Hello world!\"); }"
inp :: InputStream
>>> let sourceFile = parse' @(SourceFile Span) inp
sourceFile :: SourceFile Span

>>> -- Sample use of the pretty printer
>>> import Language.Rust.Pretty
>>> pretty' sourceFile
fn main() {
  println!("Hello world!");
}
it :: Doc b

Building

Cabal

With Cabal and GHC, run

cabal install happy --constraint 'happy >= 1.19.8'
cabal install alex
cabal configure
cabal build

Stack

With the Stack tool installed, run

stack init
stack build

The second command is responsible for pulling in all of the dependencies (including executable tools like Alex, Happy, and GHC itself) and then compiling everything. If Stack complains about the version of Happy installed, you can explicitly install a recent one with stack install happy-1.19.8.

Evolution of Rust

As Rust evolves, so will language-rust. A best effort will be made to support unstable features from nightly as they come out, but only compatibility with stable is guaranteed. The last component of the version number indicates the nightly Rust compiler version against which tests were run. For example, 0.1.0.26 is tested against rustc 1.26.0-nightly.

Bugs

Please report any bugs to the github issue tracker.

Parser

Any difference between what is accepted by the rustc parser and the language-rust parser indicates one of

  • a bug in language-rust (this is almost always the case)
  • a bug in rustc
  • that there is a newer version of rustc which made a breaking change to this syntax

If the AST/parser of rustc changes, the rustc-tests test suite should start failing - it compares the JSON AST debug output of rustc to our parsed AST.

Pretty printer

For the pretty printer, bugs are a bit tougher to list exhaustively. Suggestions for better layout algorithms are most welcome! The fmt-rfcs repo is loosely used as the reference for "correct" pretty printing.

language-rust's People

Contributors

acw avatar harpocrates avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

language-rust's Issues

Add `catch` expression

Another day, another new construct which has a "weak" keyword catch. Tracking issue.

  • parsing
  • pretty-printing
  • resolving

Since catch is a "weak" keyword there are problems around using it as the scrutinee of if/while/etc. just like structs.

None of this is actually in any way decided. Relevant internals thread.

Clean up parsing expressions

Right now, due to a happy bug, the expression grammar is difficult to refactor (moving productions around changes the semantics!!). Once this is resolved, several things need to happy:

  • re-organize productions (not just expressions) so that they are grouped more nicely
  • factor out repeated productions in the expression grammar (and group expressions together)
  • test the currently commented out fix for parsing expressions start in union, but that are not unions (e.g. union.tag + 1 - union is a "weak" keyword")

Parse 'asm' macros

There is currently an AST representation of asm! macros (which can be appropriately pretty-printed). It may be worth parsing asm expression macros directly into this representation. Currently, asm! macros are treated just like any other macro.

Reuse `Span` defined in `pretty`

The project already has a dependency on pretty for pretty-printing. In Text.PrettyPrint.Annotated.HughesPJ, there is

data Span a = Span
                  { spanStart      :: !Int
                   , spanLength     :: !Int
                   , spanAnnotation :: a
                   } deriving (Show,Eq)

Which might be a good candidate as the output tag type for parsing. Some thoughts on this:

  • One uniform Span data type for both pretty-printing and parsing is slick, especially a reused one.

  • I'm not sure I see any use for spanAnnotation with parsing - maybe a means of plugging in more helpful information about position. For the latter, one would probably define

    data Pos = Pos { row :: !Int, col :: !Int } deriving (Eq, Ord)
    type ParseSpan = Span Pos
    

    then parsing functions would produce things of type Expr ParseSpan, Ty ParseSpan, etc.

All changes would be in the Parser module, just returning this type.

Finish parsing

Now that parsing is almost finished, it is worth writing down a list of things not done:

Important

  • parsing attributes
  • parsing range expressions properly
  • parsing a whole file (CrateConfig, then mods)
  • statement macros (MacStmt and MacStmtStyle)
  • deal with macro related stuff (MacroDef, KleeneOp, Nonterminals)
  • match arms can have more types of expressions on their RHS (like another match)
  • edge cases of view paths
  • Haddock

Can be punted

  • parsing asm! expressions
  • parsing shebangs (and tokenizing them properly too!)
  • add precedences to get rid of conflicts
  • dealing properly with Span (don't just shove in mempty whenever withSpan doesn't immediately work)
  • properly document the grammar

Bare trait object edge with leading paren

The following test case (taken from rustc's testsuite) doesn't parse:

// build-pass (FIXME(62277): could be check-pass?)

#![allow(bare_trait_objects)]

type A = Box<(Fn(u8) -> u8) + 'static + Send + Sync>; // OK (but see #39318)

fn main() {}

Besides being already controversial (check out the issue referenced) and one hell of an edge case, bare trait objects are on their way out.

Package everything properly

Some things are missing before this can be properly made a package

  • Haddock documentation
  • Loosening package bounds, and supporting older GHC versions (not sure about the second)
  • Remove stack.yaml (and make sure everything is Stackage happy)

Some things that would be nice to support

  • Backpack, when it comes out, for parsing (instead of the ugly CPP hack)

Rewrite pretty printing with `wl-pprint-annotated`

This appears to be a more modern (yay) more experimental (nay) iteration on Wadler's wl-pprint. I like it because it lets me print blocks nicely, by giving me access to the underlying constructors of Doc a (but I must ensure not to violate any invariants).

{-# LANGUAGE OverloadedStrings #-}
import Text.PrettyPrint.Annotated.WL

-- | Asserts a 'Doc a' cannot render on multiple lines. 
oneLine :: Doc a -> Bool
oneLine (FlatAlt d _) = oneLine d
oneLine (Cat a b) = oneLine a && oneLine b
oneLine (Union a b) = oneLine a && oneLine b
oneLine (Annotate _ d) = oneLine d
oneLine Line = False
oneLine _ = True

-- | Make a curly-brace delimited block. When possible, permit fitting everything on one line
block :: Doc a -> Doc a
block b | oneLine b = hsep ["{", b, "}"] `Union` vsep [ "{", indent 2 b, "}" ]
        | otherwise = vsep [ "{", indent 2 b, "}" ]

This should let me print curly-brace delimited macros nicely - if they are short and clearly meant to go on one line, I fit them on one line. Otherwise, I play it safe and use the multiline approach.

Downsides

  • incur dependency on less well-known package
  • #1 is no longer possible

Float lexing is slightly wrong

This line in the lexer:

@lit_float2 / ( [^\._a-zA-Z] | \r | \n )

Implements the "not immediately followed" part in the Rust lexical syntax spec: https://doc.rust-lang.org/reference/tokens.html#floating-point-literals

The code uses _a-zA-Z as the beginning of an identifier, but according to Rust spec any XID_Start is a valid identifier first character, so that part of the lexer should be updated with _ and @xid_start. (not sure what the right syntax for this is in alex)

(I don't have Haskell toolchain installed otherwise I would submit a PR)

Special support for `InlineAsm` and `GlobalAsm`.

The rust AST has special forms for inline and global assembly. It might be worth adding explicit support for these. This is not urgent since both inline and global assembly can already be represented via regular macros.

It would nonetheless be nice to not have to muck around with token trees in order to generate/read these special macros.

The main challenge is in parsing.

Testing

Here are some approaches to testing correctness of parsing and pretty printing

  • Hand written unit test
  • Use the unstable nightly -Z option for outputting a JSON of the AST, parse that JSON into our AST and diff them against the AST we get from regular parsing.
  • Parse, pretty print, re-parse it, and check that the two ASTs match.
  • Re-parse spanned sub-elements from the substring of the input, and check that the two ASTs match.

Macros 2.0

It is expected that the following will be valid macro definitions:

macro foo($a: ident) => {
    return $a + 1;
}
pub macro foo { ... }

Add quasi quotation

Besides the obvious quasi quotation, I want to capture variables with SubstNt tokens in Exp and bind variables with MatchNt in Pat. The following interaction should work:

>>> import Language.Rust.Quote
>>> :set -XQuasiQuotes
>>> let one = [lit| 1i32 |]
>>> [expr| |x: i32| -> $ret:ty $body:expr |] = [expr| |x: i32| -> i32 { x + $one } |]
ret :: Ty Span
body :: Expr Span
>>> import Language.Rust.Pretty
>>> pretty ret
i32
>>> pretty body
{ x + 1 }

Here is my current approach. It is terrible. I can't believe I have been reduced to this, but I've been thinking non-stop about this for 2 weeks now and have come up with nothing better.

  • Find an (unsafe) way to get a Q (String -> Maybe Type) function (instead of String -> Q Type) and use that to make the swapping function. Alternately, lex twice and extract all of the interesting tokens, look them up, and pass them in as a function in the second lexing pass.
  • Swap in NonTerminals containing error <variable-name> - but of the right type!
  • Make view patterns for the parser (instead of pattern matching directly) and make those patterns check for errors using something atrocious like this. Hopefully this function is cheap - if it isn't, we may have a problem (although I can think of more typeclass based hacks with specialization)
  • In the quoters below in dataToPatQ and dataToExpQ, whenever something is of the right type, check if it is defined. If so, do nothing, otherwise, use the error message (ewwwwww!) to lookup variable or make an identifier pattern.

Is there nothing better?

Maybe there is. Here are some unorganized thoughts on this topic:

  • Parametrize the P monad over an arbitrary monad (in particular Q and Identity, allowing me to swap tokens in this monad) - this is already done in another branch, although I have misgivings about performance. I had to disable the monomorphism restriction to get the generated files to compile, and I have a feeling that polymorphic lexer/parser will take a hit in performance.
  • Add "variable" cases to the AST. That has the crappy effect of annoying a user who is not using quasi quotes - the extra cases don't make sense in the context of pure Rust. I can always not export those constructors, but then pattern matches will never be exhaustive. :(

The "good"TM solution is probably an extensible AST with type families, but given that the AST is already ~1kloc, adding a type family for every constructor may get... a bit too big. Plus this also looks ugly to the end user.

Also, the solution described initially isn't so bad. It does not change the reliability of regular parsing - it only risks failing in the case of quasi quotes (which will be marked experimental and non-portable, and will only cause compile-time headaches).

Generics and WhereClause

Every generic has a where clause, but it seems rather than generics and where clauses just come together (not that they should be one within the other). Perhaps we should move the where clause out of the Generic a data type.

Generic associated types

The following sorts of things should parse:

trait Foo {
    type Bar<'a>;
    type Bar<'a, 'b>;
    type Bar<'a, 'b,>;
    type Bar<'a, 'b, T>;
    type Bar<'a, 'b, T, U>;
    type Bar<'a, 'b, T, U,>;
    type Bar<'a, 'b, T: Debug, U,>;
    type Bar<'a, 'b, T: Debug, U,>: Debug;
    type Bar<'a, 'b, T: Debug, U,>: Deref<Target = T> + Into<U>;
    type Bar<'a, 'b, T: Debug, U,> where T: Deref<Target = U>, U: Into<T>;
    type Bar<'a, 'b, T: Debug, U,>: Deref<Target = T> + Into<U>
        where T: Deref<Target = U>, U: Into<T>;
}

Performance testing

Add test suites for performance and allocations, probably using criterion and weigh.

Finish pretty printing

Pretty printing is nearly complete. The following things still need to be addressed:

  • figure out how to pretty print paths (there is a failing test as a starting point)
  • bring up to Haddock level the documentation in the Pretty and Pretty.Internal modules
  • make multiline outputs from pretty printing nice looking (this should already be the case, but given that there are no tests yet, there are likely lots of subtle issues to be handled here)
  • add tests for multiline outputs (more philosophically, it would be nice to have some "fuzzing" here)

Short of any drastic new enhancements, this issue will be the tracking point for everything pretty-printing related.

Attributes

Attributes are currently sometimes not parsed, not printed, and not tested. The following needs to happen:

  • go through the parser to find where attributes ought to be parsed. To do this, see which nodes on the AST support attributes and then figure out where those attributes fit in concrete syntax.
  • add test case files in sample-sources specifically for this as part of rustc-tests (since attributes are constantly evolving).
  • fix Diff to support attributes
  • modify the rustc-tests to also check that pretty-print and re-parse is a no-op; this will check that attributes are properly pretty-printed.
  • update the AST form of attributes (relevant PR)

C-compatible variadics

The following should parse, but it doesn't currently:

pub unsafe extern "C" fn test(_: i32, ap: ...) { }
pub unsafe extern "C" fn test_valist_forward(n: u64, mut ap: ...) -> f64 {
    rust_valist_interesting_average(n, ap.as_va_list())
}

Decide what to do around `mod ...;`

Rust supports two syntaxes for mod - one for declaring inline a module and another for referencing the contents of another file. We don't currently try to go follow those files (since that requires IO and parsing is otherwise pure).

  • Add an IO based parse that chases down mod ...; items to expand them
  • Think about what to do in tests (right now, I have to manually take out mod ...; forms to make the difference tests pass

Use language-c's approach to NodeInfo in Happy

Looking at language-c's Parser.y, I think their approach to keeping track of spans is better than what I am currently doing (using Span as a Monad). It should also be

  • more efficient
  • cleaner to look at

On that note, if they use snoc instead of an extra production rule then reverse, it might be worth doing what they are doing (and cutting down on the number of generic production rules whose types have to be inferred). Maybe just have star :: Parser [a], plus :: Parser (NonEmpty a), and optional :: Parser (Maybe a)...

Parsing and Resolving range expressions

Range expressions are a queer operator: it looks like precedence on the left is not the same as precedence on the right. It also has a prefix and postfix form. Right now, it is broken.

  • add tests in rustc-tests
  • fix parser
  • fix parenthesizing in pretty-printer

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.