Giter Site home page Giter Site logo

regex-golf's Introduction

regex-golf

Optimal solver for Peter Norvig's regex golf program, which was inspired by xkcd 1313, which was likely inspired by regex.alf.nu.

Peter uses a greedy algorithm to find a set of regex components that together cover all winners (strings that shall be matched). Starting with no components, he greedily adds the best-looking component, updates what's left to do, and repeats until done.

My findregex replacement does the same, except that at each step, it also explores what happens when not using that best-looking component. This is optimal, in the sense that it finds a shortest possible regex that's a disjunction of the provided regex components.

Result and runtime comparison for the two main examples (Star Wars/Trek and presidents):

Peter                9 chars  0.57 seconds  ' T|E.P| N'
Optimal              9 chars  0.38 seconds  ' T|E.P|^A'
Peter with pairs     9 chars  0.82 seconds  ' .?T|P.*E'
Optimal with pairs   7 chars  0.69 seconds  ' T|P.*E'

Peter               54 chars  3.37 seconds  'a.a|a..i|j|li|a.t|i..n|bu|oo|n.e|ay.|r.e$|tr|ls|po|v.l'
Optimal             52 chars  1.51 seconds  'j|po|a.a|a..i|bu|ma|ix|ho|li|v.l|ls|a.t|ay.|n.e|r.e$'
Peter with pairs    49 chars  3.64 seconds  'i.*o|o.*o|a.a|.f|bu|n.e|ay.|a.t|j|tr|di|rc|po|v.l'
Optimal with pairs  47 chars  2.80 seconds  'po|i.*o|bu|o.*o|j|a.a|a.t|v.l|r.i|rc|ay.|ma|n.e'

In the "with pairs" versions, I added components of the forms a.*b, a.+b and a.?b.
Peter's (as of Jan 11, 2014) is slower because it repeats checking for matches.

My findregex function represents a given problem as a component-to-winners mapping. The recursive search function first reduces the problem. Many components are useless because they cover a subset of what another covers and aren't shorter. So they're thrown away. Then, it looks for winners only matched by one component. As these components are needed anyway, they're used right away. This might make more components useless, so the two steps are repeated until nothing changes anymore. After this, every still not covered winner has at least two components that can cover it, so we can pick the best looking component and try with and without it.

Variables of findregex explained:

winners: Strings that must be matched. Abbreviated w (single) or ws (plural).
losers: Strings that must not be matched.
c and cs: The regex components (single and plural).
covers: Dictionary mapping each component to a set of winners it matches.
answer: Semi-global variable holding the best regex found so far.
regex: The regex being built (with '|'-prefix for simplicity).

regex-golf's People

Contributors

pochmann avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.