vpisarev / ficus Goto Github PK

View Code? Open in Web Editor NEW

71.0 71.0 9.0 24.72 MB

The programming language Ficus

License: Apache License 2.0

Makefile 0.03% OCaml 0.27% HLSL 5.72% FLUX 8.55% C 85.37% Python 0.05% CMake 0.01%

ficus's People

Contributors

Stargazers

Watchers

Forkers

4ekmah hf-clouds andrey-golubev watch-later zihaomu puneeth072003 kshubham20 fengyuentau dmitrys99

ficus's Issues

Regexp in ficus

Regular Expressions in Ficus

right now the standard library, module Re, offers 2 functions:

Re.compile(regexp_str: string) : Re.regex_t and
Re.match_(regexp_obj: Re.regex_t, matched_str: string): bool

The implementation uses Rob Pike's VM approach (O(N) regexp matching algorithm, where N is the matched string length), described and implemented at https://swtch.com/~rsc/regexp/; in particular, it's a wrapper on top of re1 library.

The implementation is just at "technology preview" level of functionality and stability. Here is the list of things to make this module really useful:

- change the implementation to directly process UTF32 strings instead of UTF8 strings. This is one of the first things to make, because it will let us to avoid conversion of matched strings to UTF8 (extra step that requires O(N) time and extra O(N) space) and will also let us to return the positions of matched sub-expressions as-is, without UTF8 to UTF32 conversion.
- avoid the use of global variables, in particular, the pool of "fx_re_Sub" buffers (freesub). It should be possible to invoke the regexp engine simultaneously from multiple threads (using different compiled regexp objects or even the same regexp).
- change the error handling. Now in the case of error exit() function is called, which is inappropriate. We should return negative error code to the user (i.e. throw exception using Ficus runtime facilities) with proper deallocation of temporary allocated buffers. Probably, it will require to make the step 4 first.
- rewrite regexp parsing code without using yacc. The regexp syntax is quite simple and it can be parsed using a simple recursive approach. Currently, we use statically compiled into .c Yacc/BISON-based parser, and with any further extensions to regexp syntax (which are expected) we'd need to regenerate .c and update it (add fx_re prefixes etc.).
- optimize match; skip the initial ".*?" processing (the first 2-3 instructions in regexp VM). Now we just run "find" and then check if sub[0] == 0 && sub[1] == str_length.
- return sub-matches indices to the user: Re.match_(r: Re.regex_t, s: string): (bool, (int, int)[]). That is, in addition to the success flag, return positions of submatches as 1D array of 2-element tuples (start_j, end_j). (start_0, end_0) — position of the whole regexp within the searched string. Remember that int in Ficus corresponds to int_ ~ ptrdiff_t in C.
- add syntax r"regexp_str" to Ficus to automatically mirror back slashes.
- (optional) add s ~ regexp syntax to Ficus (syntactic sugar): if input_filename ~ "{basename:.+}\.fx" { println(f"base name of the ficus source file is {basename}") }. That is, regexp matching, as in Perl, can be an operator with optional value capture. If matching is successful, we can get the named captured values and use them.
- implement Re.find(regexp_obj: Re.regex_t, str: string): (bool, (int, int) []). The semantics matches Re.match_(), but the match can be a substring of str, instead of the whole str. Maybe there should be overloaded version that will take extra offset: int parameter — from where to start the search.
- implement Re.findall(regexp_obj: Re.regex_t, str: string): (int, int) [,]. Returns matrix. i-th row corresponds to i-th match and contains the position of the whole match and positions of sub-matches.
- implement Re.replace(regexp_obj: Re.regexp_t, str: str: string, replace_with: string): string. replace_with string can use \<n> syntax, e.g. \0 (the whole matched string) \1 (the first matched sub-expression), \2(the second matched sub-expression) etc.
- (optional) We can also extend Re.replace() functionality. \U1 will mean convert the first submatch to uppercase, \L2 will mean convert the second submatch to lowercase, \C0 will mean capitalize the whole submatch (convert the first letter of it to uppercase).
- extend regexp syntax with the following constructions:
- [[^]c1-c2c3-c4c5-c6] syntax for specifying positive and negative character ranges. There should be new instructions in VM to implement this. Maybe it should be a variable-length instruction FX_RE_OP_POSRANGE and FX_RE_OP_NEGRANGE. Single characters can be encoded (internally) by duplicating the beginning and end of the range, e.g. [+-0-9] can be encoded as [+-+\--\-0-9].
- \b - word boundary. There should probably be a new VM instruction (like FX_RE_OP_WORDBOUND) to implement it.
- support \t, \r, \n, \x.., \u...., \U........ syntax to specify various characters.
- built-in class characters: \s (space), \d (decimal digit), \x (hexadecimal digit), \a (letter), \w (letter, decimal digit or underscore), and maybe some others. fx_isdigit() and other standard functions from ficus runtime can be used to implement this part, and probably a new VM instruction (e.g. FX_RE_OP_CHARCLASS <class_id>) should take care of it.
- (partially emulated by passing multiline=true) Add support for:
- ^ (beginning-of-line) anchor (FX_RE_OP_BEGIN_OF_LINE)
- $ (end-of-line) anchor (FX_RE_END_OF_LINE).
- (apparently, it's not needed, several alternatives, separated with '|', can be used to emulate the same behaviour) (optional). A very useful functionality for data parsing. Not sure about API, but it can be something like (not optimal) Re.matchsome(rl: Re.regex_t list, str: string[, offset: int]): (int, (int, int) []). That is, we check if str or str[offset:] matches with any of regexp from rl. If there are several matches, the longest is chosen. If there are several matches of the same length, the earliest expression in the list is chosen. The index of matched expression is returned. Such functionality can be useful to implement parsers (but using lex should be much more efficient in general for such tasks).
- (optional). Add optional flags to Re.match*(), Re.find*() and Re.replace(). The most important flags are ignore-case and multi-line matches. Need to double-check if it's consistent with other regexp engines. In the case of multi-line matches . or \s can match \r or \n or \r\n, ^ can match beginning of each line, $ - the end of each line.

=======

IMPORTANT NOTE:

It's a lot of work to bring the current very early draft to a solid state, even though many of the items are relatively easy to implement. Probably, instead of trying to make re1-based engine more mature, it makes sense to write a ficus wrapper on top of re2, implemented in C++, where all of the above features (and probably more) are already implemented.

adding 'deci' type

this is not a high-priority item at all, but maybe it would be nice to add a true 'decimal' floating-point number type to ficus, based on https://github.com/vpisarev/DEC64/tree/alt.

The rationale:

since it's purely software implementation that does not use floating-point operations internally, the operations on deci are guaranteed to give absolutely the same 'bit-exact' results on every platform, which can be useful for some accuracy-critical algorithms, reference implementations in tests etc.
for quite a broad range of values scanning/printing operations are 'bit-exact', i.e. the textual decimal representation of floating-point numbers perfectly matches the internal representation, unlike with IEEE754 binary-2 numbers.
the number of bits in deci's mantissa is greater than in double-precision IEEE754 format, hence the better accuracy can be achieved out-of-the-box. Besides, some extra primitives, such a super-accurate dot-product or Horner scheme (polynomial evaluation) can be added to the runtime to obtain even better accuracy (for IEEE754 formats such functions are possible and useful to implement as well though)

The details:

the type may be called 'deci', short for 'decimal'.
there should be 'd'/'D' suffix to specify literals of 'deci' type.
basic and not so basic operations on deci() (+, -, *, /, fmod, pow, sqrt, sin, cos, exp, log, etc.) will be added to ficus runtime and the standard library. Note: the current implementation partly uses __int128 type, which is GCC/clang extension. Need to add a branch for MSVC (the necessary multiply/divide intrinsics exist in MSVC).
operations on the type are converted to runtime calls at the C code generation phase.
some compile-time optimizations (e.g. constant folding, a+0, a*1 etc.) maybe applied to 'deci' operations

incorrect environment used to instantiate template functions

templ() and test_templ() in test/fst.fx example demonstrate the problem:

fun templ(x: 't): 't = x + (i :> 't)

fun test_templ()
{
    val i = 5
    templ(7.0)
}

the correct behaviour would be to report error about missing i.

Problem with prerequisite code for windows users.

The code is not working or can anyone tell in a detailed way how to setup in our local computer

ficus stdlib TODO

(in bold there are higher-priority items)

- Basic ops on strings, lists, arrays, numbers, tuples etc. ...
- Real complex type & ops (need support at compiler level, but mostly can be done in the standard library, just like "option")
- Introspection, compiler as a library
- JSON parser & emitter
- XML, YAML parser & emitter
- Async, parallel programming (Mutex class/module or equivalent functionality is necessary to implement first for map-reduce algorithms)
- OS services (files, directories, processes, ...)
- Time & Date (arithmetic operations on time and date, calendar etc.)
- Unit test framework
- Logging
- Pretty printing
- Regexp (RE2? or something like that)
- Parsing module for context-free grammars or some more constrained grammars
- Functional & imperative data structures (balanced trees, hash tables, ...)
- Basic operations on graphs (?)
- Matrix operations, including popular decompositions
- Bindings to BLAS & LAPACK
- Math functions
- Random number generation
- Solvers (Linear programming, convex programming etc.)
- Geometry (rotations, projections, multiple-view geometry etc.)
- BigInt a.k.a. long or bigint or decimal (bindings to GMP?) (libBF by Fabrice Bellard? or big.js?). See also #12
- GPGPU OpenCL (bindings & runtime loader of OpenCL; memory management)
- GPGPU CUDA (???)
- OpenGL API (or maybe something more modern?)
- Vector graphics (AGG, NanoVG, ...)
- Bindings to OpenCV
- Tiny image processing library (filters, resize & warping, color conversion etc.)
- Tiny DSP library (DFT, DCT, FIR/IIR filter, windows, ...)
- (work in progress) Deep learning inference library (importers + execution)
- Machine learning library (SVM, kNN, ...)
- Image I/O
- Bindings to zlib, ... (lossless data compression)
- Bindings to sqlite, convenient SQL queries
- Video encoding/decoding
- Audio I/O (including mp3 decoder & microphone & playback)
- Camera & other sensor access
- Small network library (sockets, file downloading from url, ...)
- Simple UI framework (?, LCUI? GLFW? tiny ui)

a few issues/annoyances

... found during conversion of the compiler to ficus:

Project status

What is the project status?

Do you want contributions? What kind of contributions do you accept?
Should one file issues for the project?

Should #5 and #16 be treated as roadmaps?

Thanks!

ficus 1.0 TODO

NOTE: the previous big ticket was renamed and closed.

Now we have some more things to be done to finally release Ficus 1.0 (or at least Ficus 1.0 beta):

- proper stack trace reporting in the case of exceptions, asserts etc. The stack trace should report the original filenames & lines in the original ficus code, not in the generated .c code
- option to generate shared lib (or static lib), not just executable (the only available option for now). In the case of shared lib probably it should be option to compile ficus runtime as a separate shared library, in order to use several ficus-generated modules together in the same final app.
- as a part of the previous item, provide some method to export clean C and/or C++ API instead of ugly-looking mangled names.
- (lower priority, but still important) option to generate binary python module
- (lower priority) dynamic loading of ficus-generated shared libraries
- automatically define platform-specific symbols that can be checked with preprocessor, like OS, CPU_ARCH etc.
- proper handling of complex modules. To avoid ambiguity, probably there should be __init__.fx in each ficus module directory (including subdirectories)
- add pragma pragma "framework:<framework>" to link Apple frameworks easily without using -clibs flag.
- add pragma pragma "pkg:<pkg>" to link packages via pkg-config more easily without using -cflags and -clibs
- partial implementation of Language Service Protocol: syntax highlighting, including inline C/C++ code, folding, jump to symbol definition, intellisense.
- (quite difficult to do) some debugger support, probably we should try to keep the original local variables' names as much as possible and provide helper scripts for GDB to visualize arrays, lists, variants etc.
- better support for records and variants with cases with record attributes; maybe if we have a variant case SomeCase: {...}, there should be automatically introduced SomeCase.t as type for the attribute record. Or some other way to pack and move data for the particular case.
- prepare proper string interpolation and the corresponding string() and print() with complete support for Python 3 format.
- automatic printing of variants
- fix bug when some type declarations are thrown away.
- better support for exceptions at C/C++ level, especially exceptions with some attributes, at least simple attributes (integers, strings).
- check once again if return is properly implemented, maybe not
- (much) more unit tests
- some threading without openmp? we will definitely need parallel for & @sync. What about tasks and asynchronous execution?
- round-up the standard library #5
- full-scale "tensor" data type for nd dense arrays, which type is unknown at compile time. That will help to wrap OpenCV better, implement something like numpy and also improve implementation of deep learning inference engine. Also, tensor could be stored on an acceleration device (GPU, NPU), which is a critical feature.
- implement various scalar intrinsics (partially done): round(), floor(), sin(), cos(), popcount() ...
- maybe not for 1.0, but it would be nice to provide some introspection and meta-compiler features usign something like Python decorators. Probably for that we would need a full-scale JIT

ficus 1.0 alpha TODO

This is the list of features to be implemented before ficus 1.0. Some of the features are optional and marked as such.

- for a@index <- A, i.e. the notation to get indices of the array element elements, not just their values. index is a tuple in the case of multi-dimensional arrays.
- (...) notation in generic function argument types to denote that the particular argument should be a tuple, but its size can be an arbitrary (for each specific size and each specific set of tuple element types a separate instance of generic function is created). There is a similar notation for records: {...}
- 'typ [+] notation to denote array arguments of varying dimensionality. For each specific dimensionality a separate instance is created.
- [\A; \B], i.. use of \array notation for compound array initialization. [\A1; \A2; ... ; \An] denotes "vertical" concatenation of arrays, where rows of \Ai are put next to each other. [\A1, \A2, ..., \An] denotes horizontal concatenation, where columns of \Ai are put next to each other. There can also be a mixed combination, and also it's possible to put scalar elements.
- support for inline functions; automatic inline expansion of small functions
- elimination of index range checks when arrays are accessed sequentially and unconditionally inside the loops.
- optimized concatenation of multiple strings. Instead of iterative construction of "a" + "b" + "c" + "d" + ... as (("a" + "b") + "c") + "d" ..., which results in multiple memory allocations and super-linear complexity, the compiler computes the final concatenation with a single call of intrinsic function.
- when keyword in the patten matching to add custom predicates.
- possibility to process multiple cases in match expression with a single action, e.g. match s { | "a" | "b" => foo() | _ => bar() }. Currently, when using multiple cases, it's not possible to capture values, but this is a minor inconvenience.
- object types that will let users write lst.rev() instead of List.rev(lst) etc. This is very light-weight method to introduce object-oriented notation, even for very basic, intrinsic types.
- str[.-n] syntax, i.e. unary minus with preceding dot. It helps to conveniently access the last string character, the last element of array etc.
- [very desirable, but optional] fun foo (a: t1, b: t2, ~param3:t3=defval3) notation for named/optional parameters.
- [unzip for ...] notation to produce a tuple of arrays as the result of comprehension instead of a single array of tuples.
- [optional] automatic detection of "nothrow" functions. If such functions are called directly, we can eliminate the check of return value
- automatic detection of stack overflow. Ficus exception is thrown in such a case.
- detection of division by zero. Ficus exception is thrown in such a case.
- (mostly done except for variants) automatically generate print() and string() for any type
- automatically generate comparison function for any type.
- OOP (object types (see above) + interfaces)
- (postponed till after 1.0; hand-written ficus lexer has been implemented) ~~port the lexer generator to ficus (fxlex)~~
- (postponed till after 1.0; hand-written ficus parser has been implemented); ~~port the parser generator to ficus (fxyacc)~~
- port ficus compiler to ficus
- implement nested modules
- (postponed till after 1.0) ~~[optional] module interfaces (separate files with .fxi extension, similar to .mli files in Ocaml)~~
- separated compilation when each ficus module is compiled into a dedicated .c file
- (there is never enough tests, but at least we have some) add extensive set of unit tests
- write ficus tutorial
- parallel keyword before for loops (the current implementation uses OpenMP)
- [optional] implement full-scale Python3-like format f"{:}"
- [optional] implement image/matrix/tensor border extrapolation using img.clip[y,x], img.zero[y,x] or similar notation.
- implement loop fusion
- add immutable RRB vector data type and the associated operations including concatenation, comprehension etc.
- implement a small but versatile standard library (see #5) (for 1.0 release only a small part of the planned library will be implemented)

==== some extra items ====

- add optional finally clause into try-catch to close files & sockets, unlock mutexes etc. safely. Shall we also add Python-like with expression?
- check for invalid type casts, implement more type cast modes (literal to tuple, tuple to tuple, tuple to scalar type (in the latter case it means that each tuple element is casted to this scalar type and this rule is applied recursively)
- [optional] optimize tuple packing/unpacking
- [optional] add back quote syntax `expr` that expands to (expr, expr-captured-as-string, modulename, line). The feature will help to make assert's much more informative.
- more intelligent type inference in the case of overloaded/template functions and type constructors (in particular, Math.... functions, None constructor etc.)
- insert more try-catch blocks into the compiler to catch compile errors
- (maybe later) [optional] add some syntax sugar for containers. Convert for (k, d) <- mymap {...} into Map.app(), fold ... for x <- myset {...} into Set.foldl(...) etc. In general, if the object we iterate over is a module type, we will convert the loop body into a lambda function and depending on whether it's a list comprehension, folding or a simple loop, we convert it to map, foldl or app.
- [optional] add syntax to embed data files into source code, val myfont = @data "myfont.ttf"
- more smooth support for the records and record patterns in functions.
- (optional) not only literals, but also id's as default initializers in the record fields or default function parameters, e.g. None etc.
- (optional) string() and print() for function types. Maybe it should also have the __name__ attribute.
- report warnings about unused variables. preferably also report about unused or unmatched variant tags in match {} expressions.
- it looks like immutable RRB vector (see item 32, now postponed) can be super-efficient in ficus' basic operations - comprehensions and iterations. And it will do many other things in O(1) or O(logN) time. Therefore, it should become the main collection type. Therefore, we need to change array comprehensions and literals from [ ... ] to [| ... |], and leave [...] to vectors. cons-based lists will stay [: ... :].
- maybe refactor compiler to speedup incremental compilation mode when just a few lines in one of the module are modified. This should include caching of k-forms, a bit different handling of template functions/types, and completely separable compilation and maybe even some changes in the fundamental id_t type. The goal is to avoid influence of one module's k-form to another, i.e. if the module itself or the modules it depends on are not edited, it's initial k-form should be exactly the same (and thus we can eliminate the stage of k-form optimization, c code generation and c code compilation alltogether, not just the latter one).
- implement hash table, use it whenever possible instead of Map or Set, e.g. inside K_remove_unused.
- maybe add non-type template arguments to make Map and Set more efficient.
- add "-rebuild" option to rebuild an app from scratch.
- refactor backtrace dump. Use the system call backtrace() to retrieve backtrace (just the stack). Do not use "update backtrace" inside FX_CALL() macro
- test everything on Linux
- need to extend pattern matching to allow nested alternating patterns, e.g. match tokens { | (PLUS | MINUS) as op :: rest => ... }. This will also solves the problem with when, which is now not clear where to attach in the case of several alternatives.
- stylistic change. need to normalize naming where possible. For example, function formal parameters should be called "parameters", not "arguments". Whereas the actual arguments passed to function at runtime should be called "arguments". Now arguments are used almost everywhere.
- (optimization idea: just put it here for now not to forget). When internal functions access non-local values from outer scope, and those internal functions are not passed around, just called, they can be augmented to take all those extra values as arguments. Probably, the best place to do it is inside k_lift, because mutable values from outer scope still need to be converted to references before being passed as arguments.
- (update: mimalloc was replaced with a tiny yet mighty rpmalloc, which is now included into ficus runtime) download and build (or include inside or make a git submodule) mimalloc as a dependency. mimalloc significantly boosts performance of the compiler and, in general, ficus-generated code. So, it would be nice to have it enabled by default.
- (maybe later) ~~maybe build everything (mimalloc, other useful 3rd-party libraries, runtime, reference ficus compiler, ficus compiler, tests, examples etc.) using cmake~~
- (maybe later, openmp is quite good) ~~add task framework into the runtime and the standard library, probably based on https://github.com/Pithikos/C-Thread-Pool.~~
- (openmp runtime for mac is included, on linux it's always available, on windows need to test, but it should work fine there too) implement @parallel for based on the task framework, so that it's always available, not only when the produced binaries are linked with OpenMP runtime. Getting rid of OpenMP dependency will also make ficus binaries more compact. Note that OpenMP-based parallel for backend is also possible.
- (ficus compiler is built from scratch in 21 seconds with -O1 flag on M1 macbook pro. On Linux it takes <15 seconds. That is, it's fast enough) try to make ficus compiler as parallel as possible. At minimum make the last phase parallel - generation of the C code, emitting C code, invoking C compiler on the updated files. At maximum, K-form optimization should also become mostly parallel with some synchronization points.
- test everything on Windows
- @sync-blocks for map-reduce style algorithms.
- mutable record fields
- Some form of conditional compilation, e.g. -pp flag that will run .fx code through cpp preprocessor. In addition, there should also -Dparam[=value] command-line option to define certain macros that are passed to the preprocessor

vpisarev / ficus Goto Github PK

ficus's People

Contributors

Stargazers

Watchers

Forkers

ficus's Issues

Regexp in ficus

Regular Expressions in Ficus

adding 'deci' type

incorrect environment used to instantiate template functions

Problem with prerequisite code for windows users.

ficus stdlib TODO

a few issues/annoyances

Project status

ficus 1.0 TODO

ficus 1.0 alpha TODO

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent