Giter Site home page Giter Site logo

acorn's Introduction

Acorn Build Status

๐Ÿšง Under Construction ๐Ÿ‘ท

A state machine compiler with no runtime dependency. Define a grammar using a subset of regular expression notation, then compile it into a blazing-fast state machine. Acorn supports lexers or custom string-based state machines.

Installation

Add this to your application's shard.yml:

dependencies:
  acorn:
    github: "rmosolgo/acorn"

Usage

  • Define the grammar in a build file:

    # ./build/my_lexer.cr
    require "acorn"
    
    class MyLexer < Acorn::Lexer
      # optional, rename the generated class:
      # name("Namespace::MyLexer")
      token :letter "a-z"
      token :number "0-9"
      generate "./src/my_lexer.cr"
    end
  • Generate the lexer:

    crystal run ./build/my_lexer.cr
    
  • Use the compiled lexer:

    require "./src/my_lexer.cr"
    MyLexer.scan(input) # => Array(Tuple(Symbol, String))

Regular Expressions

Tokens are defined with a small regular expression language:

Feature Example
Character a, 1, โค๏ธ
Sequence ab, 123
Alternation a|b
Grouping (ab)|c
Any character .
One of [abc]
Not one of [^abc]
Escape \[, \.
Unicode character range a-z, 0-9
Zero-or-more a*
One-or-more a+
Zero-or-one a?
Specific number a{3}
Between numbers a{3,4}
At least a{3,}

Build Step

An Acorn module is a Crystal program that generates code. To get a lexer, you have to run the Acorn module. Then, your main program should use the generated code.

For example, if you define a lexer:

# build/my_lexer.cr
class MyLexer < Acorn::Lexer
  # ...
  generate("./app/my_lexer.cr")
end

You should run the file with Crystal to generate the specified file:

crystal run build/my_lexer.cr

Then, your main program should require the generated file:

# my_app.cr
require "app/my_lexer"
MyLexer.scan(input) # => Array(Tuple(Symbol, String))

The generated code has no dependency on Acorn, so you only need this library during development.

Tokens

Acorn returns an of array tokens. Each token is a tuple with:

  • Symbol: the name of this token in the lexer definition
  • String: the segment of input which matched the pattern
  • {Int32, Int32}: line number and column number where the token began
  • {Int32, Int32}: line number and column number where the token ended

Line numbers and column numbers are 1-indexed, so the first character in the input is 1:1.

Custom Machines

Acorn lexers are actually a special case of state machine. You can specify a custom machine, too.

  • class MyMachine < Acorn::Machine to bring in the macros
  • alias Accumulator = ... to specify the data that will be modified during the process
    • It will be initialized with .new
    • It will be returned from .scan
  • use action :name, "pattern" { |acc, str, ts, te| ... } to define patterns
    • acc is an instance of your Accumulator
    • str is the original input
    • ts is the index in str where this token began
    • te is the index in str where this token ended (if the match is one character long, ts == te)
  • optionally, name("MyNamespace::MachineName") to rename the generated Crystal class

Development

  • rebuild fixtures with crystal run spec/prepare.cr
  • crystal spec

Goals & Non-Goals

Goals:

  • Great runtime performance
  • Linear time complexity
  • Reasonable code size (.cr file and memory usage)
  • No runtime dependency
  • Plain-Crystal API (source files are .cr, not a special format)
  • Easy to write a lexer, possible to write something else

Non-goals:

  • Great compile-time performance (choose simplicity over performance)
  • Fancy regexp features

TODO

  • Finish regexp language ((...), ., [^...])
    • A move on a is also a move on :any, how is that handled?
  • Better line/col awareness:
    • Add line/col to tokens
    • Add line/col to errors

License

LGPLv3

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.