Giter Site home page Giter Site logo

abnf's Introduction

= abnf
This is a library to convert ABNF (Augmented Backus-Naur Form) to
Regexp (Regular Expression) written in Ruby.
It parses a description according to ABNF defined by RFC2234 and some variants.
Then the parsed grammar is transformed to recursion free.
Although the transformation is impossible in general, 
the library transform left- and right-recursion.
The recursion free grammar can be printed as Regexp literal
which is suitable in Ruby script.
The literal is pretty readable because parentheses are minimized and
properly indented by the Wadler's pretty printing algebra.
It is also possible to use the Regexp just in place.

== Author
Tanaka Akira <[email protected]>

== Home Page
http://www.a-k-r.org/abnf/

== Example
Following example shows URI-reference defined by RFC 2396 can be converted to
Regexp.
Note that URI-reference has no recursion.
Also note that the example works well without installing the library after make.

  require 'pp'
  require 'abnf'

  # URI-reference [RFC2396]
  pp ABNF.regexp_tree(<<-'End')
        URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ]
        absoluteURI   = scheme ":" ( hier_part | opaque_part )
        relativeURI   = ( net_path | abs_path | rel_path ) [ "?" query ]

        (...snip...)

        lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" |
                   "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" |
                   "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z"
        upalpha  = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" |
                   "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" |
                   "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"
        digit    = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
                   "8" | "9"
  End

The result is follows:
  %r{(?:[a-z][\x2b\x2d\x2e0-9a-z]*:
        (?:(?://
              (?:(?:(?:(?:[!'-\x2a\x2d\x2e0-9_a-z~]|
                          %[0-9a-f][0-9a-f]|
                          [\x24&\x2b,:;=])*@)?
                    (?:(?:(?:[0-9a-z]|[0-9a-z][\x2d0-9a-z]*[0-9a-z])\x2e)*
                       (?:[a-z]|[a-z][\x2d0-9a-z]*[0-9a-z])\x2e?|
                       \d+\x2e\d+\x2e\d+\x2e\d+)(?::\d*)?)?|
                 (?:[!'-\x2a\x2d\x2e0-9_a-z~]|
                    %[0-9a-f][0-9a-f]|
                    [\x24&\x2b,:;=@])+)
              (?:/
                 (?:[!'-\x2a\x2d\x2e0-9_a-z~]|%[0-9a-f][0-9a-f]|[\x24&\x2b,:=@])*
                 (?:;
                    (?:[!'-\x2a\x2d\x2e0-9_a-z~]|
                       %[0-9a-f][0-9a-f]|
                       [\x24&\x2b,:=@])*)*
                 (?:/
                    (?:[!'-\x2a\x2d\x2e0-9_a-z~]|
                       %[0-9a-f][0-9a-f]|
                       [\x24&\x2b,:=@])*
                    (?:;
                       (?:[!'-\x2a\x2d\x2e0-9_a-z~]|
                          %[0-9a-f][0-9a-f]|
                          [\x24&\x2b,:=@])*)*)*)?|
              /(?:[!'-\x2a\x2d\x2e0-9_a-z~]|%[0-9a-f][0-9a-f]|[\x24&\x2b,:=@])*
              (?:;
                 (?:[!'-\x2a\x2d\x2e0-9_a-z~]|
                    %[0-9a-f][0-9a-f]|
                    [\x24&\x2b,:=@])*)*
              (?:/
                 (?:[!'-\x2a\x2d\x2e0-9_a-z~]|%[0-9a-f][0-9a-f]|[\x24&\x2b,:=@])*
                 (?:;
                    (?:[!'-\x2a\x2d\x2e0-9_a-z~]|
                       %[0-9a-f][0-9a-f]|
                       [\x24&\x2b,:=@])*)*)*)
           (?:\x3f(?:[!\x24&-;=\x3f@_a-z~]|%[0-9a-f][0-9a-f])*)?|
           (?:[!'-\x2a\x2d\x2e0-9_a-z~]|%[0-9a-f][0-9a-f]|[\x24&\x2b,:;=\x3f@])
           (?:[!\x24&-;=\x3f@_a-z~]|%[0-9a-f][0-9a-f])*)|
        (?://
           (?:(?:(?:(?:[!'-\x2a\x2d\x2e0-9_a-z~]|
                       %[0-9a-f][0-9a-f]|
                       [\x24&\x2b,:;=])*@)?
                 (?:(?:(?:[0-9a-z]|[0-9a-z][\x2d0-9a-z]*[0-9a-z])\x2e)*
                    (?:[a-z]|[a-z][\x2d0-9a-z]*[0-9a-z])\x2e?|
                    \d+\x2e\d+\x2e\d+\x2e\d+)(?::\d*)?)?|
              (?:[!'-\x2a\x2d\x2e0-9_a-z~]|%[0-9a-f][0-9a-f]|[\x24&\x2b,:;=@])+)
           (?:/(?:[!'-\x2a\x2d\x2e0-9_a-z~]|%[0-9a-f][0-9a-f]|[\x24&\x2b,:=@])*
              (?:;
                 (?:[!'-\x2a\x2d\x2e0-9_a-z~]|
                    %[0-9a-f][0-9a-f]|
                    [\x24&\x2b,:=@])*)*
              (?:/
                 (?:[!'-\x2a\x2d\x2e0-9_a-z~]|%[0-9a-f][0-9a-f]|[\x24&\x2b,:=@])*
                 (?:;
                    (?:[!'-\x2a\x2d\x2e0-9_a-z~]|
                       %[0-9a-f][0-9a-f]|
                       [\x24&\x2b,:=@])*)*)*)?|
           /(?:[!'-\x2a\x2d\x2e0-9_a-z~]|%[0-9a-f][0-9a-f]|[\x24&\x2b,:=@])*
           (?:;(?:[!'-\x2a\x2d\x2e0-9_a-z~]|%[0-9a-f][0-9a-f]|[\x24&\x2b,:=@])*)*
           (?:/(?:[!'-\x2a\x2d\x2e0-9_a-z~]|%[0-9a-f][0-9a-f]|[\x24&\x2b,:=@])*
              (?:;
                 (?:[!'-\x2a\x2d\x2e0-9_a-z~]|
                    %[0-9a-f][0-9a-f]|
                    [\x24&\x2b,:=@])*)*)*|
           (?:[!'-\x2a\x2d\x2e0-9_a-z~]|%[0-9a-f][0-9a-f]|[\x24&\x2b,;=@])+
           (?:/(?:[!'-\x2a\x2d\x2e0-9_a-z~]|%[0-9a-f][0-9a-f]|[\x24&\x2b,:=@])*
              (?:;
                 (?:[!'-\x2a\x2d\x2e0-9_a-z~]|
                    %[0-9a-f][0-9a-f]|
                    [\x24&\x2b,:=@])*)*
              (?:/
                 (?:[!'-\x2a\x2d\x2e0-9_a-z~]|%[0-9a-f][0-9a-f]|[\x24&\x2b,:=@])*
                 (?:;
                    (?:[!'-\x2a\x2d\x2e0-9_a-z~]|
                       %[0-9a-f][0-9a-f]|
                       [\x24&\x2b,:=@])*)*)*)?)
        (?:\x3f(?:[!\x24&-;=\x3f@_a-z~]|%[0-9a-f][0-9a-f])*)?)?
     (?:\x23(?:[!\x24&-;=\x3f@_a-z~]|%[0-9a-f][0-9a-f])*)?}xi


Following example contains right-recursion:

  pp ABNF.regexp_tree(<<'End')
    s0 = n0 s0 / n1 s2 / n2 s1 / ""
    s1 = n0 s1 / n1 s0 / n2 s2
    s2 = n0 s2 / n1 s1 / n2 s0
    n0 = "0" / "3" / "6" / "9"
    n1 = "1" / "4" / "7"
    n2 = "2" / "5" / "8"
  End

The above ABNF description represents decimal numbers
which are multiples of 3 and the result is follows:

  %r{(?:[0369]|
        [147](?:[0369]|[147][0369]*[258])*(?:[147][0369]*[147]|[258])|
        [258][0369]*
        (?:[147]|
           [258](?:[0369]|[147][0369]*[258])*(?:[147][0369]*[147]|[258])))*}xi

A converted regexp can be used to match in place as:

  p /\A#{ABNF.regexp <<'End'}\z/o =~ "::13.1.68.3"
    IPv6address = "::"                  /
          7( hex4 ":" )          hex4   /
        1*8( hex4 ":" )      ":"        /
          7( hex4 ":" )    ( ":" hex4 ) /
          6( hex4 ":" ) 1*2( ":" hex4 ) /
          5( hex4 ":" ) 1*3( ":" hex4 ) /
          4( hex4 ":" ) 1*4( ":" hex4 ) /
          3( hex4 ":" ) 1*5( ":" hex4 ) /
          2( hex4 ":" ) 1*6( ":" hex4 ) /
           ( hex4 ":" ) 1*7( ":" hex4 ) /
                  ":"   1*8( ":" hex4 ) /
          6( hex4 ":" )                     IPv4address /
          6( hex4 ":" ) ":"                 IPv4address /
          5( hex4 ":" ) ":" 0*1( hex4 ":" ) IPv4address /
          4( hex4 ":" ) ":" 0*2( hex4 ":" ) IPv4address /
          3( hex4 ":" ) ":" 0*3( hex4 ":" ) IPv4address /
          2( hex4 ":" ) ":" 0*4( hex4 ":" ) IPv4address /
           ( hex4 ":" ) ":" 0*5( hex4 ":" ) IPv4address /
                  "::"      0*6( hex4 ":" ) IPv4address
    IPv4address = 1*3DIGIT "." 1*3DIGIT "." 1*3DIGIT "." 1*3DIGIT
    hex4    = 1*4HEXDIG
  End

== Document
* abnf.html : abnf.rb reference manual
* regexptree.html : regexptree.rb reference manual
* natset.html : natset.rb reference manual

== Requirements

* ruby-1.7.3 (2002-10-08) (older version doesn't work.)
* racc-1.3.11 (older version may work.)

== Download

* latest release: http://www.a-k-r.org/abnf/abnf-0.6.tar.gz

* development version: https://github.com/akr/abnf

== Install

  % make
  % mkdir /usr/local/lib/ruby/site_ruby/abnf
  % cp abnf/*.rb /usr/local/lib/ruby/site_ruby/abnf
  % cp abnf.rb natset.rb regexptree.rb /usr/local/lib/ruby/site_ruby

== License

Ruby's

abnf's People

Contributors

akr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

abnf's Issues

.regex wrongly appends 'i'

ABNF is usually case sensitive, right?

While the standard suggests it is illegal to use string literals "a" / "b" and instead to escape them in to unreadable muck, many people use them anyway.

In these (common) cases, it is often correct to respect the input case.

.regex always seems to return /something/i.

I suppose this may be a workaround for bad input, but at least an option to stop it would be cute.

Duplicate Definitions Extend Previous

If I define two times the same entity, it extends the previous entity.

For example:

a = "a"
a = "b"

Expected result: "b", "a" or (preferably) error.

Result:

%r{[ab]}xi

Maybe this is correct or documented, however it doesn't match my expectation. (I was under the impression in the RFC you have to use a special operator for extension, not = but rather *= or something similar.)

Thanks heaps for this library by the way, it is gold!

Method to_s returns weird prefix + suffix

print "Regexp for A to C: "
pp ABNF.regexp_tree(<<-'End').to_s
 a-to-c="abc"
End

Expected output: Regexp for A to C: "[abc]" or maybe ... "/[abc]/"

Observed output: Regexp for A to C: "(?i-m:abc)"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.