lawrancej / compilerkit Goto Github PK
View Code? Open in Web Editor NEWCompiler construction library in C.
License: GNU Lesser General Public License v2.1
Compiler construction library in C.
License: GNU Lesser General Public License v2.1
Currently, examples/visitor-demo.c
shows a string representation for regular expressions. It'd be nice to have another way of representing regular expressions, as a tree.
examples
folder with a function it called regex_graphviz_printer
. It should output a .dot
file that shows the tree structure of of the regex.Update generate.sh, too.
Write a function that takes in a regular expression and a character, and produces a new regular expression, called the derivative, as described in http://matt.might.net/articles/parsing-with-derivatives/ and http://matt.might.net/papers/might2011derivatives.pdf. The function will use the visitor class (see examples/visitor-demo.c
for an idea).
GObject *compilerkit_derivative_parser(GObject *regex, gunichar symbol)
Once the grammar classes are ready, it'd be nice to have an implementation of the Sequitur algorithm, which infers a context-free grammar from examples. http://sequitur.info/
compilerkit_concatenation_new(compilerkit_symbol_new('a'), compilerkit_concatenation_new(compilerkit_symbol_new('b'), compilerkit_symbol_new('c'))
compilerkit_string_concatenation_new("abc");
Write a visitor to convert regular expressions into equivalent regular grammars.
Kay-Vuongs-MacBook-Pro:build KBVuong1$ cmake ..
CMake Error: CMake was unable to find a build program corresponding to "Unix Makefiles". CMAKE_MAKE_PROGRAM is not set. You probably need to select a different build tool.
CMake Error: Error required internal CMake variable not set, cmake may be not be built correctly.
Missing variable is:
CMAKE_C_COMPILER_ENV_VAR
CMake Error: Error required internal CMake variable not set, cmake may be not be built correctly.
Missing variable is:
CMAKE_C_COMPILER
CMake Error: Could not find cmake module file:/Users/KBVuong1/CompilerKit/build/CMakeFiles/CMakeCCompiler.cmake
CMake Error: CMAKE_C_COMPILER not set, after EnableLanguage
-- Configuring incomplete, errors occurred!
EmptyString currently allows multiple instances, which is wasteful of space. We only need one instance of emptystring.
compilerkit_empty_set_new
so that it statically allocates a single instance of that object, which it returns.compilerkit_empty_set_dispose
so that it doesn't bother to deallocate the instance.This depends on the grammar implementation to be more cooked up. Write a GraphViz visualizer for the grammar, nonterminal and terminal classes. It should produce output that looks like this: http://json.org/
It'd be nice to have convenience functions to produce the following character classes. Don't create separate classes for these, just use the character class definition (or compilerkit_alternation) from issue #27. Each of these will probably be one-liner functions inside src/convenience.c
.
/** Return a character class corresponding to [0-9] */
GObject *compilerkit_regex_digits(void);
/** Return a character class corresponding to [a-z] */
GObject *compilerkit_regex_lower(void);
/** Return a character class corresponding to [A-Z] */
GObject *compilerkit_regex_upper(void);
/** Return a character class corresponding to all punctuation */
GObject *compilerkit_regex_punct(void);
/** Return a character class corresponding to all whitespace */
GObject *compilerkit_regex_whitespace(void);
Once automata is complete, write a DFA minimizer, as described here: http://en.wikipedia.org/wiki/DFA_minimization
It'd be nice to enhance the build script CMakeLists.txt
so that it generates installable packages (e.g., .deb
, .rpm
, .msi
, .dmg
) using CPack (part of CMake). Note that CMake does not yet target .msi
, but some folks have sent in patches for msi support. In any event, writing .msi
packages would require use of WiX.
EmptySet currently allows multiple instances, which is wasteful of space. We only need one instance of emptyset.
compilerkit_empty_set_new
so that it statically allocates a single instance of that object, which it returns.compilerkit_empty_set_dispose
so that it doesn't bother to deallocate the instance.This should just use the kleene star and concatenation to get things done, i.e., a(a)*
In a new file in the examples
folder, write a GraphViz visualizer for the FSM class. It should write a .dot file for GraphViz to open. See here for an example of http://www.graphviz.org/content/fsm.
If either the left or the right side of a concatenation happens to be the EmptyString, then don't bother to allocate or return a new concatenation. Instead, return the other side.
compilerkit_concatenation_new
to return GObject *
. Check if either the left
or right
are the EmptyString, and if so, return the other side instead.tests/concatenation-test.c
that verifies that compilerkit_concatenation_new
works as intended.It'd be nice to have ready-made Abstract Syntax Tree (AST) classes in place.
Complement should work almost exactly like the KleeneStar class does now, taking in a single parameter in the constructor function compilerkit_complement_new
.
int dummy
with GObject *node
.In Kleene star a*
, if a
is the empty string, we don't need kleene star. Likewise, if a
is the emptyset, we don't need kleene star either.
compilerkit_kleene_star_new
, change the return type to GObject *
.compilerkit_kleene_star_new
) so that if the node
is either the emptystring or the emptyset, return just the emptystring or emptyset, respectively.tests/kleene-star-test.c
that verifies the constructor works as specified.We should use gunichar
instead of gchar
in the constructor for symbol, as well as in the private member field and the getter function.
If we call compilerkit_symbol_new('a')
twice, it allocates two objects, which wastes space. The second time compilerkit_symbol_new('a')
returns, it should return the instance of symbol a
allocated previously.
Modify compilerkit_symbol_new
to use a statically allocated hash table to track symbol instances, keyed by the character. Also, write a test case in tests/symbol-test.c
that compares the pointers returned by compilerkit_symbol_new
to ensure it's doing the job correctly.
There's already an initial stab at this, but the true test is whether another class can implement the generated interface. Do this on a separate branch!
It'd still be nice to have a macro to shorten this up. Perhaps in a separate file: CompilerKit/convenience.h
?
Using the existing regular expression classes, write a function that converts two characters, lo
and hi
into a character class. For example, if lo=a
and hi=z
, then the character class to match is [a-z]
. Using the existing regular expression classes, this becomes the alternation a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z
.
GObject *compilerkit_character_class_new (gunichar lo, gunichar hi)
If either the left or the right side of an alternation happens to be the EmptySet, then don't bother to allocate or return an alternation. Instead, return the other side.
compilerkit_alternation_new
to return GObject *
. Check if either the left
or right
are the emptyset, and if so, return the other side instead.src/alternation-test.c
that verifies that compilerkit_alternation_new
works as intended.Need to replace char and char* with wide chars & strings.
See: http://developer.gnome.org/glib/2.32/glib-Unicode-Manipulation.html
A regex usually supports a{k}
or a{k,l}
where k
and l
are numbers indicating the number of times to match regex a
. We can simluate that for a{k}
by producing the concatenation of a
k times. For a{k,l}
, we'd start with a
concatenated k
times, followed by a|empty-string
concatenated l
times.
It'd be nice to have a code generation interface, as well as some example code generators and AST visitors in place to output code.
Write a visitor to convert each kind of regular expression to a finite automata
Then:
The current code in add_transition and next_state function assumes a DFA. That is, CompilerKitFSM is already a DFA. But we should also implement an NFA.
It'd be nice to have diagrams here.
CMake Warning at CMakeLists.txt:11 (PROJECT):
To use the NMake generator, cmake must be run from a shell that can use the
compiler cl from the command line. This environment does not contain
INCLUDE, LIB, or LIBPATH, and these must be set for the cl compiler to
work.
-- The C compiler identification is unknown
CMake Warning at c:/Program Files (x86)/CMake 2.8/share/cmake-2.8/Modules/Platfo
rm/Windows-cl.cmake:28 (ENABLE_LANGUAGE):
To use the NMake generator, cmake must be run from a shell that can use the
compiler cl from the command line. This environment does not contain
INCLUDE, LIB, or LIBPATH, and these must be set for the cl compiler to
work.
Call Stack (most recent call first):
c:/Program Files (x86)/CMake 2.8/share/cmake-2.8/Modules/CMakeCInformation.cma
ke:60 (INCLUDE)
CMakeLists.txt:11 (PROJECT)
CMake Error: your RC compiler: "CMAKE_RC_COMPILER-NOTFOUND" was not found. Ple
ase set CMAKE_RC_COMPILER to a valid compiler path or name.
-- Check for CL compiler version
-- Check for CL compiler version - failed
-- Check if this is a free VC compiler
-- Check if this is a free VC compiler - yes
-- Using FREE VC TOOLS, NO DEBUG available
-- Check for working C compiler: cl
CMake Warning at CMakeLists.txt:2 (PROJECT):
To use the NMake generator, cmake must be run from a shell that can use the
compiler cl from the command line. This environment does not contain
INCLUDE, LIB, or LIBPATH, and these must be set for the cl compiler to
work.
CMake Warning at c:/Program Files (x86)/CMake 2.8/share/cmake-2.8/Modules/Platfo
rm/Windows-cl.cmake:28 (ENABLE_LANGUAGE):
To use the NMake generator, cmake must be run from a shell that can use the
compiler cl from the command line. This environment does not contain
INCLUDE, LIB, or LIBPATH, and these must be set for the cl compiler to
work.
Call Stack (most recent call first):
c:/Program Files (x86)/CMake 2.8/share/cmake-2.8/Modules/CMakeCInformation.cma
ke:60 (INCLUDE)
CMakeLists.txt:2 (PROJECT)
CMake Error at c:/Program Files (x86)/CMake 2.8/share/cmake-2.8/Modules/CMakeRCI
nformation.cmake:22 (GET_FILENAME_COMPONENT):
get_filename_component called with incorrect number of arguments
Call Stack (most recent call first):
c:/Program Files (x86)/CMake 2.8/share/cmake-2.8/Modules/Platform/Windows-cl.c
make:28 (ENABLE_LANGUAGE)
c:/Program Files (x86)/CMake 2.8/share/cmake-2.8/Modules/CMakeCInformation.cma
ke:60 (INCLUDE)
CMakeLists.txt:2 (PROJECT)
CMake Error: CMAKE_RC_COMPILER not set, after EnableLanguage
CMake Error: your C compiler: "cl" was not found. Please set CMAKE_C_COMPILER
to a valid compiler path or name.
CMake Error: Internal CMake error, TryCompile configure of cmake failed
-- Check for working C compiler: cl -- broken
CMake Error at c:/Program Files (x86)/CMake 2.8/share/cmake-2.8/Modules/CMakeTes
tCCompiler.cmake:52 (MESSAGE):
The C compiler "cl" is not able to compile a simple test program.
It fails with the following output:
CMake will not be able to correctly generate this project.
Call Stack (most recent call first):
CMakeLists.txt:11 (PROJECT)
CMake Error: your C compiler: "cl" was not found. Please set CMAKE_C_COMPILER
to a valid compiler path or name.
-- Configuring incomplete, errors occurred!
Write a visitor to convert grammar, terminal, and nonterminal into an equivalent pushdown automaton
a|b|c
compilerkit_alternation_new(compilerkit_symbol_new('a'), compilerkit_alternation_new(compilerkit_symbol_new('b'), compilerkit_symbol_new('c')))
compilerkit_alternation_newv(compilerkit_symbol_new('a'),compilerkit_symbol_new('b'),compilerkit_symbol_new('c'), NULL)
variable length arguments. va_list
I'd like to know what percentage of code the test suite exercises. Instructions here: http://www.cmake.org/Wiki/CTest/Coverage
$ cmake --build .
The system cannot find the file specified
CMake Error: Generator: execution of make failed. Make command was: nmake /NOLOG
O
T delays. Be there 10 minutes later.
Write a function that tests whether a regular expression can derive the empty string (see: http://matt.might.net/articles/parsing-with-derivatives/ for a definition). Use the visitor class to assist with this.
gboolean compilerkit_nullable(GObject *regex)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.