Giter Site home page Giter Site logo

recursiveregexpraptor-vs-benchmarks's Introduction

Performance comparison of regular expression engines

the original test environment by dark100 at http://sljit.sourceforge.net/regex_perf.html

best view [here](https://nasciiboy.github.io/raptorVSworld/)

Introduction

Processing text or raw byte-sequence are among the common tasks performed by most software tools. These tasks usually involve pattern matching algorithms, and the most popular tool for such purpose is regular expressions. Regular expressions has been evolved a lot since Kleene defined the regular sets in the 1950’s. Today we have several widely used regular expression engines which have different features which makes any performance comparison a difficult task, since a faster engine is not necessary better. Depending on the use case it might be enough to know whether a POSIX compatible regular expression matches to a line, even the position of the match is unneeded (grep utility). On the contrary other use cases require the position of capturing brackets, unicode support, conditional and atomic block (handling a byte sequence as a single character, like ‘sch’ in German language) support just to name a few. The former case needs a less sophisticated algorithm, which is likely be much faster than the latter, but again, that does not mean the former is better. More about these engine types can be found here.

Participants

The following popular engines were choosen:

and

Before anyone jump to any conclusions, I should note the followings:

  • The engines were not fine tuned (because of my lack of knowledge about their internal workings). I just compiled them with the default options. I know enabling or disabling some features can heavily affect the results. If you feel that you have a better configuration just drop me an e-mail and I will update the results (mailto:[email protected]).
  • The regular expression engines are compiled with -O3 to allow the best performance.
  • This comparison page was inspired by the work of John Maddock (See his own regex comparison here). The input is also the same he used before: mtent12.zip. It is a text file (e-book) which size is about 20 Mbytes.
  • Only common patterns are selected, they are not pathological cases nor have any PERL specific features. The comparison was caseful.

Results

x86-64 bit Intel Cerelon 847 1.1GHz (GCC 7.2.0, Go 1.9.1 – GNU/Linux)

Regular expressiontrere2Golangre2pcre-JITpcre-DFApcreonigregexp4Golangregexp4regexp3
".|\n"
"."
6486ms (19285324)34159ms (19285324)9072ms (19285324)1130ms (19285324)6677ms (19285324)4448ms (19285324)12876ms (19285324)1553ms (19285324)605ms (19285324)1808ms (19285324)
"\d"
":d"
1044ms (27084)3166ms (27084)134ms (27084)55ms (27084)68ms (27084)65ms (27084)127ms (27084)1340ms (27084)594ms (27084)1853ms (27084)
"\S"
":S"
4633ms (15451664)32530ms (15451664)7405ms (15451664)931ms (15451664)4593ms (15451664)3086ms (15451664)10453ms (15451664)2056ms (15451664)810ms (15451664)1912ms (15451664)
"\S+"
":S+"
2565ms (3414592)8000ms (3414592)1957ms (3414592)333ms (3414592)1619ms (3414592)924ms (3414592)3114ms (3414592)1521ms (3414592)654ms (3414592)1059ms (3414592)
"\w"
":w"
4769ms (14751878)32506ms (14751878)7052ms (14751878)952ms (14751878)4537ms (14751878)2994ms (14751878)10865ms (14751878)1811ms (14750958)819ms (14750958)1888ms (14750958)
"[:w_]"
"\w"
4762ms (14751878)29682ms (14751878)7066ms (14751878)952ms (14751878)4486ms (14751878)2989ms (14751878)10663ms (14751878)2472ms (14751878)1160ms (14751878)3171ms (14751878)
"[a-zA-Z0-9_]"
"[a-zA-Z0-9_]"
4650ms (14751878)29437ms (14751878)7121ms (14751878)994ms (14751878)4538ms (14751878)3074ms (14751878)10483ms (14751878)2735ms (14751878)1104ms (14751878)5260ms (14751878)
"[a-zA-Z]+"
"[a-zA-Z]+"
2317ms (3495761)7582ms (3495761)2048ms (3495761)327ms (3495761)1551ms (3495761)1007ms (3495761)2956ms (3495761)2250ms (3495761)860ms (3495761)2771ms (3495761)
"[.\s]+"
"[.:s]+"
2205ms (991813)9541ms (3430783)2036ms (3430783)399ms (3430783)1037ms (3430783)949ms (3430783)2676ms (3430783)2851ms (3430783)1363ms (3430783)3741ms (3430783)
"([^\n]+)"
"<[^\n]+>"
1591ms (314387)5611ms (314387)430ms (314387)82ms (314387)1038ms (314387)210ms (314387)697ms (314387)1301ms (314387)578ms (314387)788ms (314387)
"e"
"e"
520ms (1781425)2842ms (1781425)1000ms (1781425)139ms (1781425)439ms (1781425)380ms (1781425)1434ms (1781425)1739ms (1781425)663ms (1781425)1809ms (1781425)
"(((((e)))))"
"<<<<<e>>>>>"
529ms (1781425)4399ms (1781425)1001ms (1781425)203ms (1781425)1269ms (1781425)1256ms (1781425)1971ms (1781425)6168ms (1781425)2360ms (1781425)19939ms (1781425)
"((((((((((e))))))))))"
"<<<<<<<<<<e>>>>>>>>>>"
525ms (1781425)6709ms (1781425)1006ms (1781425)298ms (1781425)1850ms (1781425)1945ms (1781425)2176ms (1781425)12171ms (1781425)4034ms (1781425)58376ms (1781425)
"Twain"
"Twain"
1022ms (2388)12ms (2388)8ms (2388)49ms (2388)47ms (2388)10ms (2388)52ms (2388)1610ms (2388)589ms (2388)2450ms (2388)
"(Twain)"
"<Twain>"
1015ms (2388)13ms (2388)8ms (2388)49ms (2388)48ms (2388)14ms (2388)53ms (2388)2294ms (2388)927ms (2388)6027ms (2388)
"(?i)Twain"
"#*Twain"
1337ms (2657)3515ms (2657)166ms (2657)51ms (2657)287ms (2657)203ms (2657)349ms (2657)1616ms (2657)727ms (2657)2575ms (2657)
"(?i:Tw)(?:(?:a?A?[i_I]{0,2})?[nN])"
"(Tw)#*((a?A?[i_I]{0,2})?[nN])"
1375ms (2989)3474ms (2989)168ms (2989)56ms (2989)319ms (2989)236ms (2989)353ms (2989)2154ms (2989)1021ms (2989)9216ms (2989)
"[a-z]shing"
"[a-z]shing"
1597ms (1877)3614ms (1877)277ms (1877)47ms (1877)2302ms (1877)1487ms (1877)49ms (1877)3250ms (1877)1267ms (1877)5413ms (1877)
"Huck[a-zA-Z]+|Saw[a-zA-Z]+"
"Huck[a-zA-Z]+|Saw[a-zA-Z]+"
1779ms (396)4431ms (396)126ms (396)8ms (396)72ms (396)70ms (396)117ms (396)3894ms (396)1456ms (396)6599ms (396)
"[a-q][^u-z]{13}x"
"[a-q][^u-z]{13}x"
4446ms (5021)11181ms (5021)613ms (5021)5ms (5021)6169ms (5021)1742ms (5021)135ms (5021)9796ms (5021)3347ms (5021)10391ms (5021)
"Tom|Sawyer|Huckleberry|Finn"
"Tom|Sawyer|Huckleberry|Finn"
3151ms (3015)9003ms (3015)134ms (3015)89ms (3015)96ms (3015)97ms (3015)141ms (3015)6608ms (3015)2608ms (3015)10659ms (3015)
"(Tom|Sawyer|Huckleberry|Finn)"
"<Tom|Sawyer|Huckleberry|Finn>"
3302ms (3015)8876ms (3015)131ms (3015)87ms (3015)100ms (3015)102ms (3015)142ms (3015)7240ms (3015)2964ms (3015)23149ms (3015)
"[hHeELlOo][hHeELlOo][hHeELlOo][hHeELlOo][hHeELlOo]"
"[hHeELlOo][hHeELlOo][hHeELlOo][hHeELlOo][hHeELlOo]"
3656ms (534)4503ms (534)250ms (534)247ms (534)889ms (534)638ms (534)687ms (534)2903ms (534)1589ms (534)10916ms (534)
"Tom.{10,25}river|river.{10,25}Tom"
1844ms (2)4541ms (2)152ms (2)44ms (2)247ms (2)211ms (2)236ms (2)N/AN/AN/A
"ing[^a-zA-Z]"
"ing[^a-zA-Z]"
1149ms (85956)228ms (85956)111ms (85956)54ms (85956)243ms (85956)140ms (85956)140ms (85956)1611ms (85956)627ms (85956)3131ms (85956)
"[a-zA-Z]ing[^a-zA-Z]"
"[a-zA-Z]ing[^a-zA-Z]"
1859ms (85823)3884ms (85823)309ms (85823)56ms (85823)2376ms (85823)1549ms (85823)143ms (85823)3274ms (85823)1328ms (85823)6761ms (85823)
"([a-zA-Z]+ing)"
2070ms (95863)5411ms (95863)318ms (95863)213ms (95863)6616ms (95863)4174ms (95863)2603ms (95863)N/AN/AN/A

x86-64 bit AMD A6-9500 3.8GHz (GCC 7.2.1, Go 1.9.1 – GNU/Linux)

Regular expressiontrere2Golangre2pcre-JITpcre-DFApcreonigregexp4Golangregexp4regexp3
".|\n"
"."
2039ms (19285324)12477ms (19285324)3361ms (19285324)594ms (19285324)1673ms (19285324)1379ms (19285324)4188ms (19285324)559ms (19285324)190ms (19285324)660ms (19285324)
"\d"
":d"
347ms (27084)1064ms (27084)46ms (27084)16ms (27084)19ms (27084)20ms (27084)43ms (27084)478ms (27084)253ms (27084)669ms (27084)
"\S"
":S"
1468ms (15451664)9661ms (15451664)2779ms (15451664)339ms (15451664)1040ms (15451664)964ms (15451664)3649ms (15451664)721ms (15451664)287ms (15451664)709ms (15451664)
"\S+"
":S+"
908ms (3414592)2793ms (3414592)728ms (3414592)113ms (3414592)501ms (3414592)303ms (3414592)943ms (3414592)531ms (3414592)200ms (3414592)368ms (3414592)
"\w"
":w"
1483ms (14751878)9036ms (14751878)2637ms (14751878)345ms (14751878)1001ms (14751878)927ms (14751878)3387ms (14751878)637ms (14750958)305ms (14750958)693ms (14750958)
"\w"
"[:w_]"
1479ms (14751878)9049ms (14751878)2627ms (14751878)345ms (14751878)1003ms (14751878)920ms (14751878)3364ms (14751878)849ms (14751878)433ms (14751878)1190ms (14751878)
"[a-zA-Z0-9_]"
"[a-zA-Z0-9_]"
1476ms (14751878)9898ms (14751878)2636ms (14751878)346ms (14751878)1016ms (14751878)966ms (14751878)3384ms (14751878)920ms (14751878)390ms (14751878)1953ms (14751878)
"[a-zA-Z]+"
"[a-zA-Z]+"
782ms (3495761)3242ms (3495761)744ms (3495761)116ms (3495761)491ms (3495761)292ms (3495761)977ms (3495761)750ms (3495761)301ms (3495761)927ms (3495761)
"[.\s]+"
"[.:s]+"
658ms (991813)3080ms (3430783)684ms (3430783)117ms (3430783)329ms (3430783)276ms (3430783)865ms (3430783)955ms (3430783)484ms (3430783)1338ms (3430783)
"([^\n]+)"
"<[^\n]+>"
562ms (314387)1909ms (314387)149ms (314387)32ms (314387)342ms (314387)64ms (314387)234ms (314387)433ms (314387)174ms (314387)246ms (314387)
"e"
"e"
166ms (1781425)952ms (1781425)339ms (1781425)54ms (1781425)149ms (1781425)120ms (1781425)440ms (1781425)581ms (1781425)231ms (1781425)679ms (1781425)
"(((((e)))))"
"<<<<<e>>>>>"
166ms (1781425)1472ms (1781425)347ms (1781425)94ms (1781425)433ms (1781425)381ms (1781425)540ms (1781425)2126ms (1781425)824ms (1781425)7649ms (1781425)
"((((((((((e))))))))))"
"<<<<<<<<<<e>>>>>>>>>>"
166ms (1781425)2267ms (1781425)342ms (1781425)136ms (1781425)841ms (1781425)617ms (1781425)678ms (1781425)3981ms (1781425)1465ms (1781425)24102ms (1781425)
"Twain"
"Twain"
334ms (2388)5ms (2388)3ms (2388)12ms (2388)16ms (2388)4ms (2388)13ms (2388)559ms (2388)211ms (2388)1030ms (2388)
"(Twain)"
"<Twain>"
333ms (2388)5ms (2388)3ms (2388)12ms (2388)16ms (2388)5ms (2388)14ms (2388)781ms (2388)336ms (2388)2506ms (2388)
"#*Twain"
"(?i)Twain"
445ms (2657)1204ms (2657)65ms (2657)13ms (2657)96ms (2657)71ms (2657)117ms (2657)543ms (2657)246ms (2657)1061ms (2657)
"(?i:Tw)(?:(?:a?A?[i_I]{0,2})?[nN])"
"(Tw)#*((a?A?[i_I]{0,2})?[nN])"
458ms (2989)1191ms (2989)68ms (2989)25ms (2989)102ms (2989)80ms (2989)113ms (2989)729ms (2989)334ms (2989)3617ms (2989)
"[a-z]shing"
"[a-z]shing"
524ms (1877)1320ms (1877)90ms (1877)11ms (1877)750ms (1877)498ms (1877)12ms (1877)1073ms (1877)436ms (1877)1837ms (1877)
"Huck[a-zA-Z]+|Saw[a-zA-Z]+"
"Huck[a-zA-Z]+|Saw[a-zA-Z]+"
548ms (396)1511ms (396)42ms (396)4ms (396)23ms (396)23ms (396)39ms (396)1271ms (396)494ms (396)2483ms (396)
"[a-q][^u-z]{13}x"
"[a-q][^u-z]{13}x"
1467ms (5021)3540ms (5021)359ms (5021)3ms (5021)1985ms (5021)588ms (5021)40ms (5021)3306ms (5021)1190ms (5021)3356ms (5021)
"Tom|Sawyer|Huckleberry|Finn"
"Tom|Sawyer|Huckleberry|Finn"
931ms (3015)2794ms (3015)44ms (3015)24ms (3015)30ms (3015)31ms (3015)47ms (3015)2295ms (3015)864ms (3015)4132ms (3015)
"(Tom|Sawyer|Huckleberry|Finn)"
"<Tom|Sawyer|Huckleberry|Finn>"
937ms (3015)2956ms (3015)44ms (3015)26ms (3015)31ms (3015)33ms (3015)47ms (3015)2511ms (3015)1011ms (3015)9374ms (3015)
"[hHeELlOo][hHeELlOo][hHeELlOo][hHeELlOo][hHeELlOo]"
"[hHeELlOo][hHeELlOo][hHeELlOo][hHeELlOo][hHeELlOo]"
1104ms (534)1526ms (534)90ms (534)81ms (534)288ms (534)203ms (534)227ms (534)954ms (534)594ms (534)3782ms (534)
"Tom.{10,25}river|river.{10,25}Tom"
604ms (2)1520ms (2)57ms (2)20ms (2)76ms (2)71ms (2)76ms (2)N/AN/AN/A
"ing[^a-zA-Z]"
"ing[^a-zA-Z]"
379ms (85956)76ms (85956)41ms (85956)23ms (85956)79ms (85956)48ms (85956)39ms (85956)592ms (85956)222ms (85956)1255ms (85956)
"[a-zA-Z]ing[^a-zA-Z]"
"[a-zA-Z]ing[^a-zA-Z]"
639ms (85823)1341ms (85823)103ms (85823)24ms (85823)767ms (85823)485ms (85823)39ms (85823)1143ms (85823)461ms (85823)2355ms (85823)
"([a-zA-Z]+ing)"
717ms (95863)1903ms (95863)106ms (95863)75ms (95863)1766ms (95863)1424ms (95863)925ms (95863)N/AN/AN/A

Compile

Deps

  • bash
  • make
  • gcc
  • g++
  • go

get data

  1. $ wget http://www.gutenberg.org/files/3200/old/mtent12.zip
  2. $ dos2unix mtent12.txt data.txt
  3. $ dos2unix data.txt
  4. $ rm mtent12*

build benchmarks

  1. $ ./rebuild.sh
  2. $ ./rebench.sh

clean

$ ./reclean.sh

recursiveregexpraptor-vs-benchmarks's People

Contributors

nasciiboy avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

ehzuf

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.