Giter Site home page Giter Site logo

syn's Introduction

Parser for Rust source code

github crates.io docs.rs build status

Syn is a parsing library for parsing a stream of Rust tokens into a syntax tree of Rust source code.

Currently this library is geared toward use in Rust procedural macros, but contains some APIs that may be useful more generally.

  • Data structures — Syn provides a complete syntax tree that can represent any valid Rust source code. The syntax tree is rooted at syn::File which represents a full source file, but there are other entry points that may be useful to procedural macros including syn::Item, syn::Expr and syn::Type.

  • Derives — Of particular interest to derive macros is syn::DeriveInput which is any of the three legal input items to a derive macro. An example below shows using this type in a library that can derive implementations of a user-defined trait.

  • Parsing — Parsing in Syn is built around parser functions with the signature fn(ParseStream) -> Result<T>. Every syntax tree node defined by Syn is individually parsable and may be used as a building block for custom syntaxes, or you may dream up your own brand new syntax without involving any of our syntax tree types.

  • Location information — Every token parsed by Syn is associated with a Span that tracks line and column information back to the source of that token. These spans allow a procedural macro to display detailed error messages pointing to all the right places in the user's code. There is an example of this below.

  • Feature flags — Functionality is aggressively feature gated so your procedural macros enable only what they need, and do not pay in compile time for all the rest.

Version requirement: Syn supports rustc 1.60 and up.

Release notes


Resources

The best way to learn about procedural macros is by writing some. Consider working through this procedural macro workshop to get familiar with the different types of procedural macros. The workshop contains relevant links into the Syn documentation as you work through each project.


Example of a derive macro

The canonical derive macro using Syn looks like this. We write an ordinary Rust function tagged with a proc_macro_derive attribute and the name of the trait we are deriving. Any time that derive appears in the user's code, the Rust compiler passes their data structure as tokens into our macro. We get to execute arbitrary Rust code to figure out what to do with those tokens, then hand some tokens back to the compiler to compile into the user's crate.

[dependencies]
syn = "2.0"
quote = "1.0"

[lib]
proc-macro = true
use proc_macro::TokenStream;
use quote::quote;
use syn::{parse_macro_input, DeriveInput};

#[proc_macro_derive(MyMacro)]
pub fn my_macro(input: TokenStream) -> TokenStream {
    // Parse the input tokens into a syntax tree
    let input = parse_macro_input!(input as DeriveInput);

    // Build the output, possibly using quasi-quotation
    let expanded = quote! {
        // ...
    };

    // Hand the output tokens back to the compiler
    TokenStream::from(expanded)
}

The heapsize example directory shows a complete working implementation of a derive macro. The example derives a HeapSize trait which computes an estimate of the amount of heap memory owned by a value.

pub trait HeapSize {
    /// Total number of bytes of heap memory owned by `self`.
    fn heap_size_of_children(&self) -> usize;
}

The derive macro allows users to write #[derive(HeapSize)] on data structures in their program.

#[derive(HeapSize)]
struct Demo<'a, T: ?Sized> {
    a: Box<T>,
    b: u8,
    c: &'a str,
    d: String,
}

Spans and error reporting

The token-based procedural macro API provides great control over where the compiler's error messages are displayed in user code. Consider the error the user sees if one of their field types does not implement HeapSize.

#[derive(HeapSize)]
struct Broken {
    ok: String,
    bad: std::thread::Thread,
}

By tracking span information all the way through the expansion of a procedural macro as shown in the heapsize example, token-based macros in Syn are able to trigger errors that directly pinpoint the source of the problem.

error[E0277]: the trait bound `std::thread::Thread: HeapSize` is not satisfied
 --> src/main.rs:7:5
  |
7 |     bad: std::thread::Thread,
  |     ^^^^^^^^^^^^^^^^^^^^^^^^ the trait `HeapSize` is not implemented for `std::thread::Thread`

Parsing a custom syntax

The lazy-static example directory shows the implementation of a functionlike!(...) procedural macro in which the input tokens are parsed using Syn's parsing API.

The example reimplements the popular lazy_static crate from crates.io as a procedural macro.

lazy_static! {
    static ref USERNAME: Regex = Regex::new("^[a-z0-9_-]{3,16}$").unwrap();
}

The implementation shows how to trigger custom warnings and error messages on the macro input.

warning: come on, pick a more creative name
  --> src/main.rs:10:16
   |
10 |     static ref FOO: String = "lazy_static".to_owned();
   |                ^^^

Testing

When testing macros, we often care not just that the macro can be used successfully but also that when the macro is provided with invalid input it produces maximally helpful error messages. Consider using the trybuild crate to write tests for errors that are emitted by your macro or errors detected by the Rust compiler in the expanded code following misuse of the macro. Such tests help avoid regressions from later refactors that mistakenly make an error no longer trigger or be less helpful than it used to be.


Debugging

When developing a procedural macro it can be helpful to look at what the generated code looks like. Use cargo rustc -- -Zunstable-options --pretty=expanded or the cargo expand subcommand.

To show the expanded code for some crate that uses your procedural macro, run cargo expand from that crate. To show the expanded code for one of your own test cases, run cargo expand --test the_test_case where the last argument is the name of the test file without the .rs extension.

This write-up by Brandon W Maister discusses debugging in more detail: Debugging Rust's new Custom Derive system.


Optional features

Syn puts a lot of functionality behind optional features in order to optimize compile time for the most common use cases. The following features are available.

  • derive (enabled by default) — Data structures for representing the possible input to a derive macro, including structs and enums and types.
  • full — Data structures for representing the syntax tree of all valid Rust source code, including items and expressions.
  • parsing (enabled by default) — Ability to parse input tokens into a syntax tree node of a chosen type.
  • printing (enabled by default) — Ability to print a syntax tree node as tokens of Rust source code.
  • visit — Trait for traversing a syntax tree.
  • visit-mut — Trait for traversing and mutating in place a syntax tree.
  • fold — Trait for transforming an owned syntax tree.
  • clone-impls (enabled by default) — Clone impls for all syntax tree types.
  • extra-traits — Debug, Eq, PartialEq, Hash impls for all syntax tree types.
  • proc-macro (enabled by default) — Runtime dependency on the dynamic library libproc_macro from rustc toolchain.

Proc macro shim

Syn operates on the token representation provided by the proc-macro2 crate from crates.io rather than using the compiler's built in proc-macro crate directly. This enables code using Syn to execute outside of the context of a procedural macro, such as in unit tests or build.rs, and we avoid needing incompatible ecosystems for proc macros vs non-macro use cases.

In general all of your code should be written against proc-macro2 rather than proc-macro. The one exception is in the signatures of procedural macro entry points, which are required by the language to use proc_macro::TokenStream.

The proc-macro2 crate will automatically detect and use the compiler's data structures when a procedural macro is active.


License

Licensed under either of Apache License, Version 2.0 or MIT license at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this crate by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

syn's People

Contributors

alexcrichton avatar bd103 avatar cad97 avatar carllerche avatar colin-kiegel avatar dtolnay avatar eijebong avatar fancyflame avatar folyd avatar goffrie avatar gregkatz avatar hcpl avatar ignatenkobrain avatar matprec avatar mikevoronov avatar mjbshaw avatar modprog avatar mystor avatar peterjoel avatar rantanen avatar sergiobenitez avatar sgrif avatar simonsapin avatar smoelius avatar sof3 avatar taiki-e avatar teddriggs avatar teymour-aldridge avatar tomlinton avatar ubnt-intrepid avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

syn's Issues

Discriminants should be expressions

syn is currently unable to parse complex explicit discriminant expressions in enum definitions, where "complex expression" can be as simple as a negative integer.

Ultimate test of parsing

Write a test to:

  1. Iterate through Rust files in a directory
  2. Parse using syn >> print using syn >> parse using syntex
  3. Parse the original using syntex
  4. Assert that the syntex AST from 2 and 3 are identical

I think this would give us more confidence in the parser than any unit test suite we could write, and it is also far easier to implement than individually testing every parser element. Once we make more progress on #4 we could run this test against a large repo like the full rustc source code.

cc @gregkatz. I plan to work on this today or tomorrow. In the meantime you don't have to worry about unit testing parsers you write (unless it is useful to you in implementing them). Let's focus on flying through #4 without tests and leave testing to this one.

Implement spans

This is a requirement for implementing something like rustfmt against syn.

consider forwards compatibility

I realise you're still pre-1.0, but it is probably worth considering the forwards compatibility story early. A reason that macros are moving to tokens/strings rather than AST is for the stability of procedural macros. This requires that libraries such as Syn have a forwards compatibility policy.

The idea is that when we add a feature to Rust which changes the syntax, you need to add this to Syn, and to handle such code downstream proc macros must upgrade to the new version. But, the important thing is that if downstream macros do not want to handle the new feature, they can upgrade Syn without breaking.

E.g., say we add unions (well, we already have, but imagine we hadn't), then where previously an Item enum had Struct and Enum variants, now it needs Union too. That would be a breaking change, since exhaustive matches are no longer exhaustive.

Exactly how you handle forwards compatibility is an open question. You could combine unenforced policy that clients have to abide be with some degree of coding techniques, or you could use solely code (but that might not be possible with some other design decisions).

One example might be that clients should not pattern match structs, Syn avoids struct variants, and every enum has an Unknown variant which clients should not match.

How can I continue navigating into expressions inside a statement?

Here's a piece of code rudely extracted from my playing-around:

syn::ItemKind::Fn(_, _, _, _, _, block) => {
    for stmt in block.stmts {
        match stmt {
            syn::Stmt::Expr(i) => {
                // i.node; // `node` is private
                println!("e {:?}", i); // but I can see the interesting block under here
            }
            _ => unimplemented!(),
        }
    }
    // Descend
}

I am trying to parse this code:

pub fn unsafe_block_inside() {
    unsafe {}
}

My goal is to be able to answer a "simple" true/false question: does a crate use unsafe code? If you have pointers for a better way I should be doing that, I'd much appreciate it!

Thanks for making such a useful library! ❤️

Return a result instead of panicking

I'm looking at possibly using this in Diesel as we switch to Macros 1.1. I was surprised to see that there's a lot of panic! in the parser. I'd expect this function to return a Result and allow the caller to decide how to handle the error.

Add the rest of Rust syntax behind a feature flag (off by default)

Parsing structs and enums is enough for Macros 1.1 but parsing all of Rust may be useful for other things.

Items

  • ExternCrate
  • Use
  • Static
  • Const
  • Fn
  • Mod
  • ForeignMod
  • Ty
  • Enum (enabled by default)
  • Struct (enabled by default)
  • Union
  • Trait
  • DefaultImpl
  • Impl
  • Mac

Expressions

  • Box
  • Vec
  • Call
  • MethodCall
  • Tup
  • Binary
  • Unary
  • Lit
  • Cast
  • Type
  • If
  • IfLet
  • While
  • WhileLet
  • ForLoop
  • Loop
  • Match
  • Closure
  • Block
  • Assign
  • AssignOp
  • Field
  • TupField
  • Index
  • Range
  • Path
  • AddrOf
  • Break
  • Continue
  • Ret
  • Mac
  • Struct
  • Repeat
  • Paren
  • Try

Statements

  • Local
  • Item
  • Expr
  • Semi
  • Mac

Patterns

  • Wild
  • Ident
  • Struct
  • TupleStruct
  • Path
  • Tuple
  • Box
  • Ref
  • Lit
  • Range
  • Slice
  • Mac

Other

Factor out loop label parsing

Currently we parse labels as lifetimes, then always need to do lt.map(|lt| lt.ident). Let's factor this into a separate label parser.

Implement split_for_impl without cloning

The current implementation does a bunch of cloning and returns (Generics, Generics, WhereClause). Instead it should return (ImplGenerics<'a>, TyGenerics<'a>, &'a WhereClause) defined as:

struct ImplGenerics<'a>(&'a Generics);
struct TyGenerics<'a>(&'a Generics);

These are wrappers around the reference that implement ToTokens in the right way.

Release build fails with SIGSEGV

When building with the cargo build --release, the build fails with (signal: 11, SIGSEGV: Invalid memory reference). Building with cargo build (no --release flag) works. Here is the full output, along with the rust version:

ubuntu@host:~/syn$ git describe
0.8.0
ubuntu@host:~/syn$ cargo build --release --verbose
    Updating registry `https://github.com/rust-lang/crates.io-index`
   Compiling quote v0.2.0
     Running `rustc /home/ubuntu/.cargo/registry/src/github.com-1ecc6299db9ec823/quote-0.2.0/src/lib.rs --crate-name quote --crate-type lib -C opt-level=3 -C metadata=9442466506b24325 -C extra-filename=-9442466506b24325 --out-dir /home/ubuntu/syn/target/release/deps --emit=dep-info,link -L dependency=/home/ubuntu/syn/target/release/deps --cap-lints allow`
   Compiling syn v0.8.0 (file:///home/ubuntu/syn)
     Running `rustc src/lib.rs --crate-name syn --crate-type lib -C opt-level=3 --cfg feature=\"default\" --cfg feature=\"printing\" --cfg feature=\"parsing\" --cfg feature=\"quote\" -C metadata=977935f812d0e598 --out-dir /home/ubuntu/syn/target/release/deps --emit=dep-info,link -L dependency=/home/ubuntu/syn/target/release/deps --extern quote=/home/ubuntu/syn/target/release/deps/libquote-9442466506b24325.rlib`
error: Could not compile `syn`.

Caused by:
  Process didn't exit successfully: `rustc src/lib.rs --crate-name syn --crate-type lib -C opt-level=3 --cfg feature="default" --cfg feature="printing" --cfg feature="parsing" --cfg feature="quote" -C metadata=977935f812d0e598 --out-dir /home/ubuntu/syn/target/release/deps --emit=dep-info,link -L dependency=/home/ubuntu/syn/target/release/deps --extern quote=/home/ubuntu/syn/target/release/deps/libquote-9442466506b24325.rlib` (signal: 11, SIGSEGV: Invalid memory reference)
ubuntu@host:~/syn$ rustc --version --verbose
rustc 1.12.0 (3191fbae9 2016-09-23)
binary: rustc
commit-hash: 3191fbae9da539442351f883bdabcad0d72efcb6
commit-date: 2016-09-23
host: x86_64-unknown-linux-gnu
release: 1.12.0

parse_expr can’t parse `macro_rules!`

Test case:

syn::parse_expr("macro_rules! noop_expr { ($e: expr) => { $e } }").unwrap();

Output:

thread 'test' panicked at 'called `Result::unwrap()` on an `Err` value: "failed to parse tokens after expression: \"! noop_expr { ($e: expr) => { $e } }\""', ../src/libcore/result.rs:799

Rename macro input to derive input?

syn has a parse_macro_input function and a MacroInput type. Despite their name, they are very specific to implementing a custom derive.

Macros 1.1 will presumably support macros like foo!(), and syn will be useful for implementing those too. (Maybe with something like Vec<TokenTree> to represent the input.)

Should parse_macro_input and MacroInput be renamed to parse_derive_input and DeriveInput? Or maybe parse_type_definiton and TypeDefinition?

Spacing after keywords

Reported by email from @gregkatz:

I ran into a little bit of a problem doing the syn parsers. It's a little more complicated than I realized. Specifically, I was experimenting with the expression after the while keyword to see what works and what doesn't in the actual Rust compiler, and my parser doesn't actually match the behavior of the compiler. For example here's how the compiler handles the following:

whilevariable {} //error
whiletrue //error
while{variable} {} //ok
while(variable) {} //ok
while1<2 {} //error
while-1<2 {} //ok
while&1<&2 {} //ok
while*variable {} //ok
while!variable {} //ok

I believe my parser as it currently stands would allow all of this, but I'm not sure how to make it match the behavior of the real compiler.

Do not parse keywords as Ident

The match keyword in #48 gets parsed as an ident after backtracking from trying to parse a match expression. Reserved keywords should never parse as ident.

Better error message when failing to parse array length

Rust allows arbitrary code as an array length:

pub struct Screen(pub [Color; {
    fn holy_smokes() {
        println!("why am I part of this type?");
    }
    12288
}]);

To minimize compile time, syn supports integer literals only (EDIT: now a few other simple expressions too, but still limited). The difference in compile time is more than a factor of 2 between supporting integers only vs supporting expressions, so currently I believe the tradeoff makes sense.

Given that this is more restrictive than Rust, it would be nice to give a better message than the usual "failed to parse macro input" when the failure is related to an array length.

Error reporting

Doesn't matter for Macros 1.1, but for parse_crate it would be helpful to be more specific about where parsing failed. Possibly behind a feature gate if it affects compile time.

Some ideas about error management: nom/docs/error_management.md

Implement operator precedence

Parenthesization is represented in the AST which means precedence is not relevant when just reading source code and writing it back, so this is not urgent, but it becomes relevant if somebody wants to process or transform the AST.

How does BlockCheckMode differ from Unsafety?

pub enum BlockCheckMode {
    Default,
    Unsafe,
}
pub enum Unsafety {
    Unsafe,
    Normal,
}

Other than the ordering and choice of Normal vs Default, do these really differ from each other?

Implement type macros

Need to be careful about compile time, the macro parsing code is not currently compiled in Macros 1.1 mode.

Simplify processing of structured MetaItems

Diesel uses the following to parse attributes of the form #[changeset_options(treat_none_as_null = "true")]:

match options_attr.value {
    syn::MetaItem::List(_, ref values) => {
        if values.len() != 1 {
            usage_err();
        }
        match values[0] {
            syn::MetaItem::NameValue(ref name, ref value)
                if name.as_ref() == "treat_none_as_null" => value == "true",
            _ => usage_err(),
        }
    }
    _ => usage_err(),
}

I expect this use case to be pretty common so let's provide helpers to make it less bad.

Parsing error messages not always helpful

Reduced test case:

syn::parse_expr("match foo { Some(a) => a, , None => 0 }").unwrap()

Output:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: "failed to parse tokens after expression: \" foo { Some(a) => a, , None => 0 }\""', ../src/libcore/result.rs:799

The error message says that the part of the input that failed to parse is: all of it, except for the initial match keyword. The message is similar when parsing a much larger match expression, with the syntax error somewhere in the middle of it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.