albertz / pycparser Goto Github PK

View Code? Open in Web Editor NEW

344.0 15.0 30.0 1.54 MB

C parser and interpreter written in Python with automatic ctypes interface generation

License: BSD 2-Clause "Simplified" License

Python 99.19% C 0.81%

pycparser python parsers interpreter c

pycparser's Introduction

PyCParser

https://github.com/albertz/PyCParser

A C parser and interpreter written in Python. Also includes an automatic ctypes interface generator.

It is looser than the C grammar, i.e. it should support a superset of the C language in general.

Some of the support may a bit incomplete or wrong at this point because I didn't really strictly followed the language specs but rather improved the parser by iteration on real-world source code.

Similar projects

Parsers / ctypes interface generators:

Eli Bendersky's pycparser. Complete C99 parser in pure Python. It depends on Python Lex-Yacc (PLY). (I didn't tested it yet. Seems to be the most complete and most professional project. If you don't want a C interpreter, this is probably the project you should use.)
pyclibrary (Github fork). Is quite slow and didn't worked that well for me.
ctypesgen. Also uses Lex+Yacc.
codegen. Uses GCC-XML. See below about the disadvantages of such an aproach.

Interpreters:

CInterpreter. Python.
CINT. Not in Python. Probably the most famous one.
Ch. Not in Python. Is not really free.
ups debugger. Not in Python.
PicoC. Not in Python. "A very small C interpreter."
BIC. Not in Python.

Why this project?

Be more flexible. It is much easier now with a hand-written parser to do operations on certain levels of the parsing pipe.
I wanted to have some self-contained code which can also easily run on the end-user side. So the end-user can just update the lib and its headers and then some application using this Python lib will automatically use the updated lib. This is not possible if you generated the ctypes interface statically (via some GCC-XML based tool or so).
I wanted to implement PySDL and didn't wanted to translate the SDL headers by hand. Also, I didn't wanted to use existing tools to do this to avoid further maintaining work at some later time. See the project for further info.
This functionality could be used similarly for many other C libraries.
A challenge for myself. Just for fun. :)

Examples

PySDL. Also uses the automatic ctypes wrapper and maps it to a Python module.
PyCPython. Interpret CPython in Python.
PyLua. Interpret Lua in Python.

Also see the tests/test_interpreter.{c,py} 'Hello world' example.

Also try out ./demos/interactive_interpreter.py --debug.

Current state

Many simple C programs should be parsed and interpret correctly now.
I'm quite sure that function pointer typedefs are handled incorrectly. E.g. typedef void f(); and typedef void (*f)(); are just the same right now. See cpre3_parse_typedef and do some testing if you want to fix this.
Many functions from the standard C library are still missing.
There might be some bugs. :)
C++ isn't supported yet. :)
The code style does not conform to PEP8 and standard Python conventions in many places, as it is quite old. Also, it probably should be restructured, as it has grown too much in single files. I'm slowly fixing this.

How does the interpreter work

This is probably a bit unusual. We wrap the most important standard C library functions directly to the native libc, via ctypes. We translate the parsed C code to a equivalent Python AST (via ast), which makes heavy use of ctypes. Then we just run this generated Python code. But we can also dump it. Thus we can compile C code to an equivalent Python program.

--- Albert Zeyer, http://www.az2000.de

pycparser's People

Contributors

Stargazers

Watchers

pycparser's Issues

Can't run an example out of the box

Steps:

git clone https://github.com/albertz/PyCParser.git
cd PyCParser
mkvirtualenv -p python2.7 -r ./requirements.txt pycparser
./runcprog.py ./demos/test_interpreter.c

Result:

EXCEPTION
Traceback (most recent call last):
File "./runcprog.py", line 56, in
line: from cparser import State, parse
locals:
cparser =
State =
parse =
File "/home/roba/work/github/PyCParser/cparser.py", line 14, in
line: from .cparser_utils import unicode, long, unichr
locals:
from =
from.cparser_utils =
unicode = <type 'unicode'>
long = <type 'long'>
unichr =
ValueError: Attempted relative import in non-package

Is there something to do before trying to use the runcprog.py script ?

Real compiler script

A real compiler script with input file options as well as options like -I, -D and -o would be great-

Syntax Errors reported by pycparser but not gcc

The following code generates no warnings or errors when compiled with gcc with a large number of warning options, yet pycparser reports a syntax error. #includes omitted.

int main()
{
int i = 5, j = 6, k = 1;
if ((i=j && k == 1) || k > j)
printf("Hello, World\n");
return 0;
}

octal and hex escapes in char and string declarations require further support

Octal (\NNN) and hex escape (\xNN) sequences in char and string declarations require further support.

Test C code:

#include <stdio.h>

#define NUM_TEST_SINGLE_CHAR 19
char test_single_char[NUM_TEST_SINGLE_CHAR] = {
  '\0',  // octal
  '\00', // octal
  '\000',// octal
  '\x0', // hex
  '\x00',// hex
  
  '\1',  // octal
  '\01', // octal
  '\001',// octal
  '\x1', // hex
  '\x01',// hex

  '\t',
  '\11', // octal
  '\011',// octal
  '\x9', // hex
  '\x09',// hex

  'z',
  '\172',// octal
  '\x7a',// hex
  '\x7A',// hex
};


#define NUM_TEST_STR_SINGLE_CHAR 19
char * test_str_single_char[NUM_TEST_STR_SINGLE_CHAR] = {
  "\0",  // octal
  "\00", // octal
  "\000",// octal
  "\x0", // hex
  "\x00",// hex
  
  "\1",  // octal
  "\01", // octal
  "\001",// octal
  "\x1", // hex
  "\x01",// hex

  "\t",
  "\11", // octal
  "\011",// octal
  "\x9", // hex
  "\x09",// hex

  "z",
  "\172",// octal
  "\x7a",// hex
  "\x7A",// hex
};

#define NUM_TEST_STR_MULTI_CHAR 13
char * test_str_multi_char[] = {
  "Tab\tGap",
  "Tab\11Gap",       // octal
  "Tab\011Gap",      // octal
  "Tab\x9Gap",       // hex, barely works as G is not a hex character
  "Tab\x9" "Gap",    // hex, but more clear and less scary
  "Tab\x09Gap", // hex, two characters
  "Tab\x9" "animals", // hex, restarting the quoting a nescesity
  "zzzAre sleepy time",
  "z\x7a\x7A" "Are sleepy time", // nescessary restarting of quoting
  "z z z Are sleepy time",
  "\172 \x7a \x7A Are sleepy time", // no restart of quoting required
  "time for zzzz",
  "time for \x7a\172\x7Az",
};

int main(){
  int i=0;
  for (i=0; i<NUM_TEST_SINGLE_CHAR; i++){
    // print single in quotes with trailing new line
    // \x for hex and %.2x to print in 2 digit hex format
    printf("'\\x%.2x'\n", test_single_char[i]);
  }
  printf("\n");
  for (i=0; i<NUM_TEST_STR_SINGLE_CHAR; i++){
    // print in double quotes with trailing new line
    // \x for hex and %.2x to print in 2 digit hex format
    printf("\"\\x%.2x\"\n", *test_str_single_char[i]);
  }
  printf("\n");
  
  for (i=0; i<NUM_TEST_STR_MULTI_CHAR; i++){
    printf("%s\n", test_str_multi_char[i]);
  }
  
  return 0;
}

For hex escape sequences this leads to cparser.simple_escape_char being invoked by cpre2_parse() with 'x' as an argument. Hex escape sequences are not of the simple kind that simple_escape_char is designed for. Handling for '\0' and "\0" doesn't recognize that these particular sequences are octal escapes.

Additional states are required in cpre2_parse().

The output of the above should be:

'\x00'
'\x00'
'\x00'
'\x00'
'\x00'
'\x01'
'\x01'
'\x01'
'\x01'
'\x01'
'\x09'
'\x09'
'\x09'
'\x09'
'\x09'
'\x7a'
'\x7a'
'\x7a'
'\x7a'

"\x00"
"\x00"
"\x00"
"\x00"
"\x00"
"\x01"
"\x01"
"\x01"
"\x01"
"\x01"
"\x09"
"\x09"
"\x09"
"\x09"
"\x09"
"\x7a"
"\x7a"
"\x7a"
"\x7a"

Tab	Gap
Tab	Gap
Tab	Gap
Tab	Gap
Tab	Gap
Tab	Gap
Tab	animals
zzzAre sleepy time
zzzAre sleepy time
z z z Are sleepy time
z z z Are sleepy time
time for zzzz
time for zzzz

I have some initial code to address the hex escapes in double quoated strings. After this issue is opened I'll reference the issue number.

Fedora patch, order of entries in system paths for unit testing

We needed this patch to package PyCParser for Fedora; you might to include it:

HG changeset patch

User Scott Tsai [email protected]

Date 1358446261 -28800

Node ID 12aa73c5da595a08f587c14a74e84bf72f0bf7a0

Parent a46039840b0ed8466bebcddae9d4f1df60d3bc98

tests/all_tests.py: add local paths to the front of sys.path

While doing pycparser development on a machine that already has an
older version of pycparser installed, we want unit tests to run against
the local copy instead of the system wide copy of pycparser.
This patch adds '.' and '..' to the front of sys.path instead of the back.

diff --git a/tests/all_tests.py b/tests/all_tests.py
--- a/tests/all_tests.py
+++ b/tests/all_tests.py
@@ -1,7 +1,7 @@
#!/usr/bin/env python

import sys
-sys.path.extend(['.', '..'])
+sys.path[0:0] = ['.', '..']

import unittest

My failed try with py27 on win7 64bits

I am not familiar with git, I only know how to operate with the github web interface

so please read it at https://github.com/retsyo/albertz_PyCParser_Windows

and test_interpreter.py is in fact in directory demos as your original one

struct XYZ
{
Number Number[10];
}xyz;

PyCParser is not able to parse the 2nd structure, its complaining about Number[10]. The above is a valid C declaration.