Giter Site home page Giter Site logo

albertz / pycparser Goto Github PK

View Code? Open in Web Editor NEW
342.0 15.0 30.0 1.54 MB

C parser and interpreter written in Python with automatic ctypes interface generation

License: BSD 2-Clause "Simplified" License

Python 99.19% C 0.81%
pycparser python parsers interpreter c

pycparser's Introduction

PyCParser

https://github.com/albertz/PyCParser

A C parser and interpreter written in Python. Also includes an automatic ctypes interface generator.

It is looser than the C grammar, i.e. it should support a superset of the C language in general.

Some of the support may a bit incomplete or wrong at this point because I didn't really strictly followed the language specs but rather improved the parser by iteration on real-world source code.

Similar projects

Parsers / ctypes interface generators:

  • Eli Bendersky's pycparser. Complete C99 parser in pure Python. It depends on Python Lex-Yacc (PLY). (I didn't tested it yet. Seems to be the most complete and most professional project. If you don't want a C interpreter, this is probably the project you should use.)
  • pyclibrary (Github fork). Is quite slow and didn't worked that well for me.
  • ctypesgen. Also uses Lex+Yacc.
  • codegen. Uses GCC-XML. See below about the disadvantages of such an aproach.

Interpreters:

  • CInterpreter. Python.
  • CINT. Not in Python. Probably the most famous one.
  • Ch. Not in Python. Is not really free.
  • ups debugger. Not in Python.
  • PicoC. Not in Python. "A very small C interpreter."
  • BIC. Not in Python.

Why this project?

  • Be more flexible. It is much easier now with a hand-written parser to do operations on certain levels of the parsing pipe.
  • I wanted to have some self-contained code which can also easily run on the end-user side. So the end-user can just update the lib and its headers and then some application using this Python lib will automatically use the updated lib. This is not possible if you generated the ctypes interface statically (via some GCC-XML based tool or so).
  • I wanted to implement PySDL and didn't wanted to translate the SDL headers by hand. Also, I didn't wanted to use existing tools to do this to avoid further maintaining work at some later time. See the project for further info.
  • This functionality could be used similarly for many other C libraries.
  • A challenge for myself. Just for fun. :)

Examples

  • PySDL. Also uses the automatic ctypes wrapper and maps it to a Python module.
  • PyCPython. Interpret CPython in Python.
  • PyLua. Interpret Lua in Python.

Also see the tests/test_interpreter.{c,py} 'Hello world' example.

Also try out ./demos/interactive_interpreter.py --debug.

Current state

  • Many simple C programs should be parsed and interpret correctly now.
  • I'm quite sure that function pointer typedefs are handled incorrectly. E.g. typedef void f(); and typedef void (*f)(); are just the same right now. See cpre3_parse_typedef and do some testing if you want to fix this.
  • Many functions from the standard C library are still missing.
  • There might be some bugs. :)
  • C++ isn't supported yet. :)
  • The code style does not conform to PEP8 and standard Python conventions in many places, as it is quite old. Also, it probably should be restructured, as it has grown too much in single files. I'm slowly fixing this.

How does the interpreter work

This is probably a bit unusual. We wrap the most important standard C library functions directly to the native libc, via ctypes. We translate the parsed C code to a equivalent Python AST (via ast), which makes heavy use of ctypes. Then we just run this generated Python code. But we can also dump it. Thus we can compile C code to an equivalent Python program.

demo

--- Albert Zeyer, http://www.az2000.de

pycparser's People

Contributors

albertz avatar antibluequirk avatar dsblank avatar kb-1000 avatar markjenkins avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pycparser's Issues

octal and hex escapes in char and string declarations require further support

Octal (\NNN) and hex escape (\xNN) sequences in char and string declarations require further support.

Test C code:

#include <stdio.h>

#define NUM_TEST_SINGLE_CHAR 19
char test_single_char[NUM_TEST_SINGLE_CHAR] = {
  '\0',  // octal
  '\00', // octal
  '\000',// octal
  '\x0', // hex
  '\x00',// hex
  
  '\1',  // octal
  '\01', // octal
  '\001',// octal
  '\x1', // hex
  '\x01',// hex

  '\t',
  '\11', // octal
  '\011',// octal
  '\x9', // hex
  '\x09',// hex

  'z',
  '\172',// octal
  '\x7a',// hex
  '\x7A',// hex
};


#define NUM_TEST_STR_SINGLE_CHAR 19
char * test_str_single_char[NUM_TEST_STR_SINGLE_CHAR] = {
  "\0",  // octal
  "\00", // octal
  "\000",// octal
  "\x0", // hex
  "\x00",// hex
  
  "\1",  // octal
  "\01", // octal
  "\001",// octal
  "\x1", // hex
  "\x01",// hex

  "\t",
  "\11", // octal
  "\011",// octal
  "\x9", // hex
  "\x09",// hex

  "z",
  "\172",// octal
  "\x7a",// hex
  "\x7A",// hex
};

#define NUM_TEST_STR_MULTI_CHAR 13
char * test_str_multi_char[] = {
  "Tab\tGap",
  "Tab\11Gap",       // octal
  "Tab\011Gap",      // octal
  "Tab\x9Gap",       // hex, barely works as G is not a hex character
  "Tab\x9" "Gap",    // hex, but more clear and less scary
  "Tab\x09Gap", // hex, two characters
  "Tab\x9" "animals", // hex, restarting the quoting a nescesity
  "zzzAre sleepy time",
  "z\x7a\x7A" "Are sleepy time", // nescessary restarting of quoting
  "z z z Are sleepy time",
  "\172 \x7a \x7A Are sleepy time", // no restart of quoting required
  "time for zzzz",
  "time for \x7a\172\x7Az",
};

int main(){
  int i=0;
  for (i=0; i<NUM_TEST_SINGLE_CHAR; i++){
    // print single in quotes with trailing new line
    // \x for hex and %.2x to print in 2 digit hex format
    printf("'\\x%.2x'\n", test_single_char[i]);
  }
  printf("\n");
  for (i=0; i<NUM_TEST_STR_SINGLE_CHAR; i++){
    // print in double quotes with trailing new line
    // \x for hex and %.2x to print in 2 digit hex format
    printf("\"\\x%.2x\"\n", *test_str_single_char[i]);
  }
  printf("\n");
  
  for (i=0; i<NUM_TEST_STR_MULTI_CHAR; i++){
    printf("%s\n", test_str_multi_char[i]);
  }
  
  return 0;
}

For hex escape sequences this leads to cparser.simple_escape_char being invoked by cpre2_parse() with 'x' as an argument. Hex escape sequences are not of the simple kind that simple_escape_char is designed for. Handling for '\0' and "\0" doesn't recognize that these particular sequences are octal escapes.

Additional states are required in cpre2_parse().

The output of the above should be:

'\x00'
'\x00'
'\x00'
'\x00'
'\x00'
'\x01'
'\x01'
'\x01'
'\x01'
'\x01'
'\x09'
'\x09'
'\x09'
'\x09'
'\x09'
'\x7a'
'\x7a'
'\x7a'
'\x7a'

"\x00"
"\x00"
"\x00"
"\x00"
"\x00"
"\x01"
"\x01"
"\x01"
"\x01"
"\x01"
"\x09"
"\x09"
"\x09"
"\x09"
"\x09"
"\x7a"
"\x7a"
"\x7a"
"\x7a"

Tab	Gap
Tab	Gap
Tab	Gap
Tab	Gap
Tab	Gap
Tab	Gap
Tab	animals
zzzAre sleepy time
zzzAre sleepy time
z z z Are sleepy time
z z z Are sleepy time
time for zzzz
time for zzzz

I have some initial code to address the hex escapes in double quoated strings. After this issue is opened I'll reference the issue number.

Can't run an example out of the box

Steps:

git clone https://github.com/albertz/PyCParser.git
cd PyCParser
mkvirtualenv -p python2.7 -r ./requirements.txt pycparser
./runcprog.py ./demos/test_interpreter.c

Result:

EXCEPTION
Traceback (most recent call last):
File "./runcprog.py", line 56, in
line: from cparser import State, parse
locals:
cparser =
State =
parse =
File "/home/roba/work/github/PyCParser/cparser.py", line 14, in
line: from .cparser_utils import unicode, long, unichr
locals:
from =
from.cparser_utils =
unicode = <type 'unicode'>
long = <type 'long'>
unichr =
ValueError: Attempted relative import in non-package

Is there something to do before trying to use the runcprog.py script ?

Fedora patch, order of entries in system paths for unit testing

We needed this patch to package PyCParser for Fedora; you might to include it:

HG changeset patch

User Scott Tsai [email protected]

Date 1358446261 -28800

Node ID 12aa73c5da595a08f587c14a74e84bf72f0bf7a0

Parent a46039840b0ed8466bebcddae9d4f1df60d3bc98

tests/all_tests.py: add local paths to the front of sys.path

While doing pycparser development on a machine that already has an
older version of pycparser installed, we want unit tests to run against
the local copy instead of the system wide copy of pycparser.
This patch adds '.' and '..' to the front of sys.path instead of the back.

diff --git a/tests/all_tests.py b/tests/all_tests.py
--- a/tests/all_tests.py
+++ b/tests/all_tests.py
@@ -1,7 +1,7 @@
#!/usr/bin/env python

import sys
-sys.path.extend(['.', '..'])
+sys.path[0:0] = ['.', '..']

import unittest

Syntax Errors reported by pycparser but not gcc

The following code generates no warnings or errors when compiled with gcc with a large number of warning options, yet pycparser reports a syntax error. #includes omitted.

int main()
{
int i = 5, j = 6, k = 1;
if ((i=j && k == 1) || k > j)
printf("Hello, World\n");
return 0;
}

PEP 8

I'd rather not use tabs for indent, but four spaces.

Real compiler script

A real compiler script with input file options as well as options like -I, -D and -o would be great-

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.