Giter Site home page Giter Site logo

bashlex's Introduction

bashlex - Python parser for bash

GitHub Actions status

bashlex is a Python port of the parser used internally by GNU bash.

For the most part it's transliterated from C, the major differences are:

  1. it does not execute anything
  2. it is reentrant
  3. it generates a complete AST

Installation:

$ pip install bashlex

Usage

$ python
>>> import bashlex
>>> parts = bashlex.parse('true && cat <(echo $(echo foo))')
>>> for ast in parts:
...     print ast.dump()
ListNode(pos=(0, 31), parts=[
  CommandNode(pos=(0, 4), parts=[
    WordNode(pos=(0, 4), word='true'),
  ]),
  OperatorNode(op='&&', pos=(5, 7)),
  CommandNode(pos=(8, 31), parts=[
    WordNode(pos=(8, 11), word='cat'),
    WordNode(pos=(12, 31), word='<(echo $(echo foo))', parts=[
      ProcesssubstitutionNode(command=
        CommandNode(pos=(14, 30), parts=[
          WordNode(pos=(14, 18), word='echo'),
          WordNode(pos=(19, 30), word='$(echo foo)', parts=[
            CommandsubstitutionNode(command=
              CommandNode(pos=(21, 29), parts=[
                WordNode(pos=(21, 25), word='echo'),
                WordNode(pos=(26, 29), word='foo'),
              ]), pos=(19, 30)),
          ]),
        ]), pos=(12, 31)),
    ]),
  ]),
])

It is also possible to only use the tokenizer and get similar behaviour to shlex.split, but bashlex understands more complex constructs such as command and process substitutions:

>>> list(bashlex.split('cat <(echo "a $(echo b)") | tee'))
['cat', '<(echo "a $(echo b)")', '|', 'tee']

..compared to shlex:

>>> shlex.split('cat <(echo "a $(echo b)") | tee')
['cat', '<(echo', 'a $(echo b))', '|', 'tee']

The examples/ directory contains a sample script that demonstrate how to traverse the ast to do more complicated things.

Limitations

Currently the parser has no support for:

  • arithmetic expressions $((..))
  • the more complicated parameter expansions such as ${parameter#word} are taken literally and do not produce child nodes

Debugging

It can be useful to debug bashlex in conjunction to GNU bash, since it's mostly a transliteration. Comments in the code sometimes contain line references to bash's source code, e.g. # bash/parse.y L2626.

$ git clone git://git.sv.gnu.org/bash.git
$ cd bash
$ git checkout df2c55de9c87c2ee8904280d26e80f5c48dd6434 # commit used in
translating the code
$ ./configure
$ make CFLAGS=-g CFLAGS_FOR_BUILD=-g # debug info and don't optimize
$ gdb --args ./bash -c 'echo foo'

Useful things to look at when debugging bash:

  • variables yylval, shell_input_line, shell_input_line_index
  • breakpoint at yylex (token numbers to names is in file parser-built)
  • breakpoint at read_token_word (corresponds to bashlex/tokenizer._readtokenword)
  • xparse_dolparen, expand_word_internal (called when parsing $())

Motivation

I wrote this library for another project of mine, explainshell which needed a new parsing backend to support complex constructs such as process/command substitutions.

Releasing a new version

Suggestion for making a release environment:

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
  • make tests
  • bump version in setup.py
  • git tag the new commit
  • run python -m build
  • run twine upload dist/*

License

The license for this is the same as that used by GNU bash, GNU GPL v3+.

bashlex's People

Contributors

blankcanvasstudio avatar blurrymoi avatar doronbehar avatar henryiii avatar hugovk avatar idank avatar ifduyue avatar joerick avatar josephfrazier avatar keheliya avatar lacraig2 avatar milahu avatar nickdiego avatar tirkarthi avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.