db47h / ragel Goto Github PK
View Code? Open in Web Editor NEWGo driver for ragel scanners
License: MIT License
Go driver for ragel scanners
License: MIT License
The arbitrary 32KiB buffer size can be problematic in some cases. It's not unseen to have very large comments in source code and for successful scanning, they need to fit in the buffer.
This can be done with no incompatible changes to the API via functional options or a public variable.
The buffer size could start with a reasonable default, like min(requestedSize, 32768)
and expand dynamically up to requestedSize
. The upper limit would be math.MaxInt
.
No need to have ruby installed and consistent unicode version in all parts of the application.
The idea is to distinguish lexing errors from I/O errors.
On the other hand, if an I/O error occurs, s.abort
is set to true
, so any subsequent calls to next will return an EOF
token.
Clients that need to distinguish error types may consider the built-in Error
token as fatal and implement their own soft error token in their ragel file.
Something possibly more versatile is to have any returned error token added to a list of errors (with the actual error
interface) and add ErrorCount
and Errors
method to Scanner
.
A concurrent version that can be applied on top of cdb0e77 is available here: https://gist.github.com/db47h/51579601ceb620aaf1427857e37da691
BenchmarkNext_largeishFile
reports 6 ms/op in concurrent mode vs. 1.17 ms/op with a FIFO on an AMD Bulldozer CPU. On an Intel Skylake CPU, the performance drop is less dramatic at 2.53 ms/op vs. 1.22 ms/op.
The implementation will stay non-concurrent until I see a proper use case where the parser gets busy enough to offset the performance drop caused by the IPC (if that's even possible).
Eventhough the AMD Bulldozer architecture is pathetic in terms of IPC, I am loath to implement optimizations that only marginally benefit higher end hardware while severely penalizing the others.
The current implementation does allow fcall out-of-the-box. Need to add stack
and top
.
Time for a v3 and rethink SaveVars
/GetVars
?
EDIT: stack handling can be implemented via a custom ragel.Interface
word.rl:27:19: graph lookup of "uletter" failed
had to change line 22: from
include WCHAR "utf8.rl";
...to...
include UTF8 "utf8.rl";
suppose the following grammar:
'/*' { s.Emit(ts, slashStar, string(data[ts:te])) };
'//' { s.Emit(ts, slashSlash, string(data[ts:te])) };
where a single /
has no meaning.
Trying to scan /
will likely fail with an unexpected error (i.e. a very unhelpful error).
Same for '/ //': we'll get a 1:2: invalid character ' '
. ragel's built-in error handling might be useful here.
This sample code will fail with a token too long
error (input buffer overflow):
input := "42"
l := lexer.New("", strings.NewReader(input), ragel.BufferSize(len(input)))
tok, err := l.Next()
Workaround: add 1 to buffer size.
Fix: Do an EOF check before complaining.
it simply returns EOF. This should be handled as part of the buffer overrun check.
Not in buffer overrun check, but in
Lines 305 to 310 in 9735ac3
where we need to check that if the current state != start state, then something is missing.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.