Giter Site home page Giter Site logo

lang-c's People

Contributors

fotonick avatar hyunsukimsokcho avatar jeehoonkang avatar kelnos avatar l3nn0x avatar ramihg avatar vickenty avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

lang-c's Issues

Internal functions are not accepted

I've found this kind of code which is accepted by Clang and GCC, but not by lang-c.

int fn() {
        int ifn() {
                return 5;
        }
        return ifn();
}

fn is a function having an internal function, named ifn. ifn is only in scope inside the body of fn. This is not standard C, but a GNU extension. I don't think it's in ANSI C or ISO C99, or K&R, or anything like that. So I am not sure if we want or need to accept this kind of code in lang-c.

I'm getting this error message:

SyntaxError(SyntaxError { source: "# 0 \"c/fninfn.c\"\n# 0 \"<built-in>\"\n# 0 \"<command-line>\"\n# 1 \"/usr/include/stdc-predef.h\" 1 3 4\n# 0 \"<command-line>\" 2\n# 1 \"c/fninfn.c\"\nint fn() {\n int ifn() {\n  return 5;\n }\n return ifn();\n}\n", line: 8, column: 12, offset: 156, expected: {";", "asm", "(", "[", ",", "="} })

What do you think?

Various syntax errors

The following gcc-valid sources cannot be parsed:

void f(void) {
};
struct s {
    struct t {
        int i;
    } __attribute((packed)) v;
};
struct s {
    union { int i; } __attribute__((aligned(8)));
};
struct s {
    int i;;
};
struct s {
    int __attribute__((aligned(8))) *i;
};

Support GNU case ranges

Case ranges are a GNU extension that allow matching a number of consecutive cases, like so:

#include <stdio.h>
int main() {
	int v = 4;

	switch (v) {
		case 0 ... 4: puts("Between 0 and 4"); break;
		case 5: puts("5"); break;
		default: puts("something else"); break;
	}

	return 0;
}

This will be compiled without errors with gcc -std=gnu11 range.c. However, when parsing with lang-c, this results in a SyntaxError:

use lang_c::driver::{Config, parse};

fn main() {
   let config = Config::with_gcc();
    let p = parse(&config, "range.c");
    if let Err(e) = p {
        println!("{}", e);
    }
}

Output:

syntax error: unexpected token at line 733 column 11, expected '[_a-zA-Z]'

Compound literals do not work

Compound literals do not seem to work.

Test preprocessed C file:

# 1 "test.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "test.c"
typedef struct
{
    int value;
} test_t;

void test(test_t* myStruct)
{
    *myStruct = (test_t) {.value = 1};
}

Error:

SyntaxError {
    source: --- see above ---,
    line: 12,
    column: 26,
    offset: 173,
    expected: {
        "<",
        "!=",
        "*=",
        "%=",
        "&",
        "==",
        ">",
        "<<",
        "&&",
        ">>",
        "u8",
        "[",
        "~",
        ">=",
        "*",
        "/",
        "++",
        "->",
        "--",
        "+",
        "<<=",
        ";",
        "?",
        "|=",
        "[_a-zA-Z]",
        ">>=",
        "[uUL]",
        "^=",
        ",",
        ".",
        "!",
        "||",
        "(",
        "-=",
        "\"",
        "/=",
        "<=",
        "^",
        "%",
        "=",
        "+=",
        "&=",
        "|",
        "-",
    },
}

Returning compound literal of typedef struct

This works:

struct thing
{
    int value;
};

struct thing returnthing()
{
    return (struct thing){ 1 };
}

This does not:

typedef struct
{
    int value;
} thing;

thing returnthing()
{
    return (thing){ 1 };
}

Error:

# 1 "test.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "test.c"
typedef struct
{
    int value;
} thing;

thing returnthing()
{
    return (thing){ 1 };
}

SyntaxError {
    source: "# 1 \"test.c\"\r\n# 1 \"<built-in>\"\r\n# 1 \"<command-line>\"\r\n# 1 \"test.c\"\r\ntypedef struct\r\n{\r\n    int value;\r\n} thing;\r\n\r\nthing returnthing()\r\n{\r\n    return (thing){ 1 };\r\n}\r\n",
    line: 12,
    column: 19,
    offset: 157,
    expected: {
        "<=",
        "[",
        ".",
        "<<",
        ">",
        "|=",
        "/=",
        ";",
        "<<=",
        "++",
        "(",
        "*",
        "!=",
        "%",
        "&",
        ">=",
        "-=",
        "&=",
        "^=",
        "||",
        ",",
        "->",
        "%=",
        "[uUL]",
        "<",
        "/",
        "^",
        "=",
        "*=",
        "--",
        "[_a-zA-Z]",
        "+=",
        "?",
        "!",
        "-",
        "u8",
        "==",
        "&&",
        ">>=",
        "\"",
        "|",
        "~",
        ">>",
        "+",
    },
}

Formatting the AST

I implements so-called transpiler which process C source before sending to real compiler.

I found lang-c much useful for parsing the sources but it seems the formatting the AST back to the source code still not implemented yet.

I think the line-precise formatter may be very useful feature here.

Parse error in struct typedef

typedef struct mi_heap_area_s {
  size_t x;
} mi_heap_area_t;

==> unexpected token at line 9 column 3, expected '<typedef_name>', '}'

Notice also the wrong line number!

Curiously, this works:

typedef struct mi_heap_area_s {
  void* x;
} mi_heap_area_t;

More precise parser error

When the parser encounters a type that is unknown, it fails with the following error:

expected: {"<typedef_name>"}

It would be extremely helpful if the parser would at the following point in the parser code:

__state.mark_failure(__pos, "<typedef_name>");

make an effort to print the actual type that it can not parse.

Missing support for Non-standard Clang "block pointers"

It might be out-of-scope for this project, but having support for Clang's non-standard block pointer types would be nice

e.g.

int (^ _Nonnull __compar)(const void *, const void *)

in

typedef unsigned long long size_t;
void *bsearch_b(const void *__key, const void *__base, size_t __nel,
     size_t __width, int (^ _Nonnull __compar)(const void *, const void *) __attribute__((__noescape__)))
     __attribute__((availability(macosx,introduced=10.6)));

Purpose:

Be able to parse stdlib.h and other header files from macOS SDKs

Doesn't work on Windows

It seems a lot of assumptions about the preprocessor are made that don't apply to the preprocessor on Windows.

DerivedDeclarator inconsistency

Enum DerivedDeclarator combines pointer, array, and function declarators. However, pointer declarators in C behave differently from array and function declarators. Pointer declarators are considered in right-to-left order, but array declarators go from left to right. For example:

int * const * volatile a[2][4];

This declares a as (array 2 of (array 4 of (volatile pointer to (const pointer to (int))))), but the DerivedDeclarators will be provided in the following order: Pointer(const), Pointer(volatile), Array(2), Array(4).

When using this list of DerivedDeclarators to build C types, special care should be taken to apply pointers in one direction and when they end, apply the rest in reverse direction. It would be more convenient to have RTL declarators and LTR declarators separated in the AST type system.

This is only my wish and suggestion and not a bug report. Please feel free to reject this issue as "won't do" if it goes against the philosophy of your crate.

Mixed type specifiers

Hello all,

Currently lang-c accepts the following C code:

struct S {}
int z;

while the code is not valid (lack of semicolon ; after struct declaration). I believe this is because the grammar accepts declaration's specifier as a list:

lang-c/grammar.rustpeg

Lines 447 to 453 in a76a36c

declaration0 -> Declaration =
gnu<K<"__extension__">>? _ s:list1<declaration_specifier> _ d:cs0<node<init_declarator>> _ ";" {
Declaration {
specifiers: s,
declarators: d,
}
}

So it includes both struct S {} and int as type specifiers into the list, the invalid code is parsed somehow as:

struct S {} int z;

Similarly, the invalid declaration char int z; is accepted also.

Many thank for any feedback.

Syntax Error using MinGW/GCC

With the following program:

use std::path::Path;
use lang_c::driver as c;

fn lint_file<P: AsRef<Path>>(config: &c::Config, file: P) {
    let result = c::parse(config, file);
        
    match result {
        Ok(parsed) => {
            println!("SUCCESS");
            println!("{:#?}", parsed.source);
        }

        Err(c::Error::PreprocessorError(error)) => {
            println!("PREPROCESSOR");
            println!("{:#?}", error.into_inner().unwrap());
        }

        Err(c::Error::SyntaxError(error)) => {
            println!("SYNTAX");
            println!("{}", error.source);
            println!("{:#?}", error);
        }
    }
}

fn main() {
    let config = c::Config::with_gcc();
    lint_file(&config, "main.c");
}

And a minimal main.c file:

int main()
{
    return 0;
}

Prints this output:

SYNTAX
# 1 "main.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "main.c"
int main()
{
    return 0;
}

SyntaxError {
    source: "# 1 \"main.c\"\r\n# 1 \"<built-in>\"\r\n# 1 \"<command-line>\"\r\n# 1 \"main.c\"\r\nint main()\r\n{\r\n    return 0;\r\n}\r\n",
    line: 5,
    column: 11,
    offset: 78,
    expected: {
        ";",
        "asm",
        "<typedef_name>",
        "{",
        "=",
        "[",
        ",",
        "(",
    },
}

I am using:

> gcc --version
gcc.exe (MinGW.org GCC-8.2.0-3) 8.2.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS 

On Windows 10.

Am I not using this correctly?

Cannot parse macOS SDK <stdlib.h>

I'm not sure if this is in the scope of this project, but it seems <stdlib.h> cannot be parsed on macOS.

syntax error: unexpected token at "/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/mach/arm/_structs.h" line 498 column 2, expected '<typedef_name>', '}'
  included from tests/thing.h:6
  included from /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/stdlib.h:66
  included from /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/sys/wait.h:109
  included from /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/sys/signal.h:146
  included from /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/machine/_mcontext.h:34
  included from /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/arm/_mcontext.h:36
  included from /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/mach/machine/_structs.h:35

From looking at the error, it looks like it gets stuck on the __uint128_t and __uint32_t built-in types of the following definition:

struct __darwin_arm_neon_state64
{
 __uint128_t __v[32];
 __uint32_t __fpsr;
 __uint32_t __fpcr;
};

I tried specifying the flavor, but seems not to help

Consider keeping a line number or something in the Span object

I've started a crummy linter. Not much good yet; it's just got the one checker so far and that is the you're-actually-using-the-variable-before-it's-even-been-initialized checker. But it's finding problems where I put them, so.

The thing is I'd like to report to my end user something like "I found an issue on line 57". But information about line numbers is actually discarded by the time I figure out there's even a problem to report. I have reference to a Span object though, which so far as I can figure out, lets me know the (preprocessed) character count. I wonder if we can add the line number in there.

Using the AST

It's very nice to have a fully compliant C11 parser, but seems like the AST is fairly hard to search or manipulate at present. One possibility would be to match the visit, visit_*_mut, and fold functionality as exposed in the syn crate for manipulating Rust ASTs. Do you have direct plans? Or else a path that you like and would accept if submitted? Or else would you recommend providing such functionality in a separate crate?

Typo in `FloatBase` and `IntegerBase` enum

In src/ast.rs

#[derive(Debug, PartialEq, Clone)]
pub enum IntegerBase {
    Decimal,
    Octal,
    Hexademical,
}

// ..

#[derive(Debug, PartialEq, Clone)]
pub enum FloatBase {
    Decimal,
    Hexademical,
}

I think in both enum, Hexademical may be changed to Hexadecimal if the wording is not intended to prevent any confusion.

P.S. I am enjoying using your library where my colleagues and I are building Educational C compiler written in rust.

Fails on macos availability

When parsing

int ptsname_r(int fildes, char *buffer, size_t buflen) __attribute__((availability(macos,introduced=10.13.4))) __attribute__((availability(ios,introduced=11.3))) __attribute__((availability(tvos,introduced=11.3))) __attribute__((availability(watchos,introduced=4.3)));

it chokes on the 10.13.4 in the availability attribute.

It might be good to have a "raw" attribute for robustness that if an __attribute__ declaration fails to parse, just store the attribute string.

Readme example doesn't compile on 1.30.1 stable

When trying to compile the example in the readme:
println!("{:?}", parse(&config, "example.c"));

I'm getting the following error:
the size for values of type 'str' cannot be known at compilation time

This is resolved when passing a String reference instead of an str:
println!("{:?}", parse(&config, &"example.c".to_string()));

C Generator Feature

I've really enjoyed using this library so far! However, I am wondering if there is any desire to write a function / suite of functions to generate C code given an AST (I.e. the opposite of parse). Such a function would be nice to have, and wouldn't be too hard to write since most of the heavy lifting is already done via the Visit trait.

K&R support seems broken

The following simple program using K&R function definitions results in an error using lang-c. It compiles fine with gcc however.

extern int puts(char *);

int main(argc, argv)
int argc;
char **argv;
{
        puts("Hello!");

        return 0;
}

Rust test program:

/* Test program to show bug in K&R parsing.
 * 
 * Input program:
 *
 * extern int puts(char *);
 *
 * int main(argc, argv)
 * int argc;
 * char **argv;
 * {
 *      puts("Hello!");
 *
 *      return 0;
 * }
 *
 * Output:
 * Err(SyntaxError(SyntaxError { source: "# 1 \"kr.c\"\n# 1 \"<built-in>\"\n# 1 \"<command-line>\"\n# 31 \"<command-line>\"\n# 1 \"/usr/include/stdc-predef.h\" 1 3 4\n# 32 \"<command-line>\" 2\n# 1 \"kr.c\"\nextern int puts(char *);\n\nint main(argc, argv)\nint argc;\nchar **argv;\n{\n puts(\"Hello!\");\n\n return 0;\n}\n", line: 10, column: 14, offset: 184, expected: {"[_a-zA-Z]", "[_a-zA-Z0-9]", ")"} }))
 *
 */

extern crate lang_c;
use lang_c::driver::{Config, parse}; 

fn main() {
    let config = Config::default();
    println!("{:?}", parse(&config, "kr.c"));
}

Output:

Err(SyntaxError(SyntaxError { source: "# 1 "kr.c"\n# 1 ""\n# 1 ""\n# 31 ""\n# 1 "/usr/include/stdc-predef.h" 1 3 4\n# 32 "" 2\n# 1 "kr.c"\nextern int puts(char *);\n\nint main(argc, argv)\nint argc;\nchar **argv;\n{\n puts("Hello!");\n\n return 0;\n}\n", line: 10, column: 14, offset: 184, expected: {"[_a-zA-Z]", "[_a-zA-Z0-9]", ")"} }))

Chained LogicalAnds are parsed incorrectly

Hi - thanks a lot for this crate, it is really great! I did run in to one bug; it looks like chained && operators are not parsed correctly. An example:

extern crate lang_c;

use lang_c::driver::{Config, parse_preprocessed};

fn main() {
    let parse = parse_preprocessed(&Config::default(), "
int foo(void) {
    return 1 && 2 && 3;
}".to_string()).unwrap();
    println!("{:?}", parse);
}

Trimming the output down substantially and to just the return, the resulting parsed ast is:

Return(Some(BinaryOperator(BinaryOperatorExpression {
  operator: LogicalAnd,
  lhs: Constant(Integer(Integer { base: Decimal, number: "1" })),
  rhs: BinaryOperator(BinaryOperatorExpression {
    operator: BitwiseAnd,
    lhs: Constant(Integer(Integer { base: Decimal, number: "2" })),
    rhs: UnaryOperator(UnaryOperatorExpression {
      operator: Address,
      operand: Constant(Integer(Integer { base: Decimal, number: "3" })),
    }),
  }),
})))

The second && is being broken up into a bitwise-and followed by an address-of, as though the input were:

return (1 && (2 & (&3));

but I believe it should be parsed as though it were

return (1 && 2) && 3;

Adding parens around either 1 && 2 or 2 && 3 does not have this problem. I was also not able to recreate it with other infix operators, although I'm sure I didn't try all of them.

Typedef of function pointer with qualified return type

test.c:

typedef const char* (*fnPtr) ();

Output:

# 1 "<built-in>"
# 1 "<command-line>"
# 1 "test.c"
typedef const char* (*fnPtr) ();

SyntaxError {
    source: "# 1 \"test.c\"\r\n# 1 \"<built-in>\"\r\n# 1 \"<command-line>\"\r\n# 1 \"test.c\"\r\ntypedef const char* (*fnPtr) ();\r\n",
    line: 5,
    column: 9,
    offset: 76,
    expected: {
        "<typedef_name>",
    },
}

Wide character literals produce syntax errors

Example code:

int foo(void) {
    return L'a';
}

I get the following:

syntax error: unexpected token at "1.c" line 2 column 13, expected '!=', '"', '%', '%=', '&', '&&', '&=', '(', '*', '*=', '+', '++', '+=', ',', '-', '--', '-=', '->', '.', '/', '/=', ';', '<', '<<', '<<=', '<=', '=', '==', '>', '>=', '>>', '>>=', '?', '[', '[_a-zA-Z0-9]', '^', '^=', '|', '|=', '||'

Convert span offsets to the one in the pre pre-process file

Is it possible to convert an Span of this library to a (Filename, Span) in the original files in the disk? I'm trying to show some info about the C code using ariadne crate, and I'm currently using the pre-processed text using the source field of Parse, but I would like to do that conversion to make the spans show correct line number and file names.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.