Giter Site home page Giter Site logo

automata's Introduction

Automata

Follow on X    Join Discord

A tool for generating synthetic data that results from grammars writen for various automata (eg Finite State Machines, Pushdown Automata and Turing Machines). Grammars are saved as json, and you can build 1000 valid strings that match this grammar with:

$ automata -i my_grammar.json -o output.json -n 1000
aXb
aaXbb
aaaXbbb
aaaaXbbbb
...

This is intended to help train the neurallambda project and confer reasoning ability to LLMs.

Layout

rules/  # example grammars
data/   # data generated from those rules
app/    # the CLI tool
src/    # the supporting library

PDA example

There's a grammar that looks like a^nXb^n, which is any string like aaXbb and aaaaaXbbbbb, where the same number of bs follow the same number of as (and in the middle is X). You cannot write a regex for this (try defining that n), but you can write a pushdown automata that recognizes it. A transition function involves the following mappings between states (slightly simplified, see rules/anbn.json):

{
    "machine": "PDA",
    "symbols": ["a", "b", "X"],
    "rules": [
         # input symbol
           |  # current state         # resulting state
           |    |  # top of stack     |    # stack operation
           |    |          |          |      |
        [["a", "INITIAL", "A"],     ["Q1", ["push", "A"]]],
        [["X", "Q1", "A"],          ["Q2", "nullop"]],
        [["b", "Q2", "A"],          ["Q2", "pop"]],
    ]
}

Notice there are exactly 5 items in each transition rule for a PDA, and they come as 3 on the Left and 2 on the Right. These represent the left and right hand side of the relation that defines a PDA. In english, the rules say:

  1. given an input symbol 'a', if I'm in the state Q1, and the top of the stack is "A", let's transition by staying in state Q1, and push an "A" on the stack.

  2. if the input symbol is 'X', we know we need to transition to 'Q2', which implies we're gonna start looking for 'b's.

  3. each time we see a 'b', pop the stack. This is how we can ensure that the same number of pushes and pops happen, ie, same number of 'b's and 'a's.

Again, see rules/anbn.json) for the full implementation.

About the tool

Haskell was chosen because it produces trustworthy code, especially in this department of computer languages. When I next face a funky loss while training some NN, I want to minimize the risk that the data was improperly generated.

A simple typeclass is offered to allow the easy (if you speak haskell) specification of new Automata:

class Machine m a (s :: Type) where
  data L m a s -- ^ the Left side of a delta function/relation
  data R m a s -- ^ the Right side of a delta function/relation
  data S m a s -- ^ the State of the Machine
  -- | update the state (ex apply stack ops)
  action :: R m a s -> S m a s -> S m a s
  -- | build an input (ex add a peek at the top of a stack)
  mkL :: a -> S m a s -> L m a s

There are probably infinite things that are Turing Complete formulations of Turing Machines, and I hope to represent some in this library:

  • The classic tape model
  • An FSM with 1 queue
  • An FSM with 2 stacks
  • An FSM with multiple queues
  • An FSM with multiple tapes
  • Something (i'm not sure what) with addressable memory

"I just got here and have no clue what you're doing"

That's because I'm bad at explaining things.

I have a project called neurallambda where I hope to build architectures that augment traditional architectures with reasoning ability. Checkout the readme if you're skeptical, it explains why I don't think the current batch of AI has reasoning.

So for experimenting on these architectures I need data. Current datasets are largely around Natural Language, and probably GPT generated (therefore their reasoning is circumspect). It'll be easier to train on super simple perfectly trustworthy datasets, therefore why not generate them ourselves? That's where this lib comes in.

automata's People

Contributors

freckletonj avatar

Stargazers

 avatar Zmu avatar  avatar  avatar elucida avatar  avatar  avatar Liam Dyer avatar  avatar SeungheonOh avatar Weiss avatar Marco Z avatar Josh Miller avatar Jon Purdy avatar dicioccio lucas avatar 方泓睿 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.