Giter Site home page Giter Site logo

mtawsl / py-tree-sitter Goto Github PK

View Code? Open in Web Editor NEW

This project forked from tree-sitter/py-tree-sitter

0.0 0.0 0.0 102 KB

Python bindings to the Tree-sitter parsing library

License: MIT License

Shell 1.70% Python 43.33% C 53.98% Batchfile 0.99%

py-tree-sitter's Introduction

py-tree-sitter

Build Status Build status

This module provides Python bindings to the tree-sitter parsing library.

Installation

This package currently only works with Python 3. There are no library dependencies, but you do need to have a C compiler installed.

pip3 install tree_sitter

Usage

Setup

First you'll need a Tree-sitter language implementation for each language that you want to parse. You can clone some of the existing language repos or create your own:

git clone https://github.com/tree-sitter/tree-sitter-go
git clone https://github.com/tree-sitter/tree-sitter-javascript
git clone https://github.com/tree-sitter/tree-sitter-python

Use the Language.build_library method to compile these into a library that's usable from Python. This function will return immediately if the library has already been compiled since the last time its source code was modified:

from tree_sitter import Language, Parser

Language.build_library(
  # Store the library in the `build` directory
  'build/my-languages.so',

  # Include one or more languages
  [
    'vendor/tree-sitter-go',
    'vendor/tree-sitter-javascript',
    'vendor/tree-sitter-python'
  ]
)

Load the languages into your app as Language objects:

GO_LANGUAGE = Language('build/my-languages.so', 'go')
JS_LANGUAGE = Language('build/my-languages.so', 'javascript')
PY_LANGUAGE = Language('build/my-languages.so', 'python')

Basic Parsing

Create a Parser and configure it to use one of the languages:

parser = Parser()
parser.set_language(PY_LANGUAGE)

Parse some source code:

tree = parser.parse(bytes("""
def foo():
    if bar:
        baz()
""", "utf8"))

Inspect the resulting Tree:

root_node = tree.root_node
assert root_node.type == 'module'
assert root_node.start_point == (1, 0)
assert root_node.end_point == (3, 13)

function_node = root_node.children[0]
assert function_node.type == 'function_definition'
assert function_node.child_by_field_name('name').type == 'identifier'

function_name_node = function_node.children[1]
assert function_name_node.type == 'identifier'
assert function_name_node.start_point == (1, 4)
assert function_name_node.end_point == (1, 7)

assert root_node.sexp() == "(module "
    "(function_definition "
        "name: (identifier) "
        "parameters: (parameters) "
        "body: (block "
            "(if_statement "
                "condition: (identifier) "
                "consequence: (block "
                    "(expression_statement (call "
                        "function: (identifier) "
                        "arguments: (argument_list))))))))"

Walking Syntax Trees

If you need to traverse a large number of nodes efficiently, you can use a TreeCursor:

cursor = tree.walk()

assert cursor.node.type == 'module'

assert cursor.goto_first_child()
assert cursor.node.type == 'function_definition'

assert cursor.goto_first_child()
assert cursor.node.type == 'def'

# Returns `False` because the `def` node has no children
assert not cursor.goto_first_child()

assert cursor.goto_next_sibling()
assert cursor.node.type == 'identifier'

assert cursor.goto_next_sibling()
assert cursor.node.type == 'parameters'

assert cursor.goto_parent()
assert cursor.node.type == 'function_definition'

Editing

When a source file is edited, you can edit the syntax tree to keep it in sync with the source:

tree.edit(
    start_byte=5,
    old_end_byte=5,
    new_end_byte=5 + 2,
    start_point=(0, 5),
    old_end_point=(0, 5),
    new_end_point=(0, 5 + 2),
)

Then, when you're ready to incorporate the changes into a new syntax tree, you can call Parser.parse again, but pass in the old tree:

new_tree = parser.parse(new_source, tree)

This will run much faster than if you were parsing from scratch.

Pattern-matching

You can search for patterns in a syntax tree using a tree query:

query = PY_LANGUAGE.query("""
(function_definition
  name: (identifier) @function.def)

(call
  function: (identifier) @function.call)
""")

captures = query.captures(tree.root_node)
assert len(captures) == 2
assert captures[0][0] == function_name_node
assert captures[0][1] == "function.def"

py-tree-sitter's People

Contributors

maxbrunsfeld avatar yagebu avatar narpfel avatar jonafato avatar 2yz avatar kipre avatar eduardodx avatar julian avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.