Giter Site home page Giter Site logo

py-tree-sitter's Introduction

Python Tree-sitter

CI pypi

This module provides Python bindings to the tree-sitter parsing library.

Installation

The package has no library dependencies and provides pre-compiled wheels for all major platforms.

Note

If your platform is not currently supported, please submit an issue on GitHub.

pip install tree-sitter

Usage

Setup

Install languages

Tree-sitter language implementations also provide pre-compiled binary wheels. Let's take Python as an example.

pip install tree-sitter-python

Then, you can load it as a Language object:

import tree_sitter_python as tspython
from tree_sitter import Language, Parser

PY_LANGUAGE = Language(tspython.language(), "python")

Build from source

Warning

This method of loading languages is deprecated and will be removed in v0.22.0. You should only use it if you need languages that have not updated their bindings. Keep in mind that you will need a C compiler in this case.

First you'll need a Tree-sitter language implementation for each language that you want to parse.

git clone https://github.com/tree-sitter/tree-sitter-go
git clone https://github.com/tree-sitter/tree-sitter-javascript
git clone https://github.com/tree-sitter/tree-sitter-python

Use the Language.build_library method to compile these into a library that's usable from Python. This function will return immediately if the library has already been compiled since the last time its source code was modified:

from tree_sitter import Language, Parser

Language.build_library(
    # Store the library in the `build` directory
    "build/my-languages.so",
    # Include one or more languages
    ["vendor/tree-sitter-go", "vendor/tree-sitter-javascript", "vendor/tree-sitter-python"],
)

Load the languages into your app as Language objects:

GO_LANGUAGE = Language("build/my-languages.so", "go")
JS_LANGUAGE = Language("build/my-languages.so", "javascript")
PY_LANGUAGE = Language("build/my-languages.so", "python")

Basic parsing

Create a Parser and configure it to use a language:

parser = Parser()
parser.set_language(PY_LANGUAGE)

Parse some source code:

tree = parser.parse(
    bytes(
        """
def foo():
    if bar:
        baz()
""",
        "utf8",
    )
)

If you have your source code in some data structure other than a bytes object, you can pass a "read" callable to the parse function.

The read callable can use either the byte offset or point tuple to read from buffer and return source code as bytes object. An empty bytes object or None terminates parsing for that line. The bytes must encode the source as UTF-8.

For example, to use the byte offset:

src = bytes(
    """
def foo():
    if bar:
        baz()
""",
    "utf8",
)


def read_callable_byte_offset(byte_offset, point):
    return src[byte_offset : byte_offset + 1]


tree = parser.parse(read_callable_byte_offset)

And to use the point:

src_lines = ["\n", "def foo():\n", "    if bar:\n", "        baz()\n"]


def read_callable_point(byte_offset, point):
    row, column = point
    if row >= len(src_lines) or column >= len(src_lines[row]):
        return None
    return src_lines[row][column:].encode("utf8")


tree = parser.parse(read_callable_point)

Inspect the resulting Tree:

root_node = tree.root_node
assert root_node.type == 'module'
assert root_node.start_point == (1, 0)
assert root_node.end_point == (4, 0)

function_node = root_node.children[0]
assert function_node.type == 'function_definition'
assert function_node.child_by_field_name('name').type == 'identifier'

function_name_node = function_node.children[1]
assert function_name_node.type == 'identifier'
assert function_name_node.start_point == (1, 4)
assert function_name_node.end_point == (1, 7)

function_body_node = function_node.child_by_field_name("body")

if_statement_node = function_body_node.child(0)
assert if_statement_node.type == "if_statement"

function_call_node = if_statement_node.child_by_field_name("consequence").child(0).child(0)
assert function_call_node.type == "call"

function_call_name_node = function_call_node.child_by_field_name("function")
assert function_call_name_node.type == "identifier"

function_call_args_node = function_call_node.child_by_field_name("arguments")
assert function_call_args_node.type == "argument_list"


assert root_node.sexp() == (
    "(module "
        "(function_definition "
            "name: (identifier) "
            "parameters: (parameters) "
            "body: (block "
                "(if_statement "
                    "condition: (identifier) "
                    "consequence: (block "
                        "(expression_statement (call "
                            "function: (identifier) "
                            "arguments: (argument_list))))))))"
)

Walking syntax trees

If you need to traverse a large number of nodes efficiently, you can use a TreeCursor:

cursor = tree.walk()

assert cursor.node.type == "module"

assert cursor.goto_first_child()
assert cursor.node.type == "function_definition"

assert cursor.goto_first_child()
assert cursor.node.type == "def"

# Returns `False` because the `def` node has no children
assert not cursor.goto_first_child()

assert cursor.goto_next_sibling()
assert cursor.node.type == "identifier"

assert cursor.goto_next_sibling()
assert cursor.node.type == "parameters"

assert cursor.goto_parent()
assert cursor.node.type == "function_definition"

Important

Keep in mind that the cursor can only walk into children of the node that it started from.

See examples/walk_tree.py for a complete example of iterating over every node in a tree.

Editing

When a source file is edited, you can edit the syntax tree to keep it in sync with the source:

new_src = src[:5] + src[5 : 5 + 2].upper() + src[5 + 2 :]

tree.edit(
    start_byte=5,
    old_end_byte=5,
    new_end_byte=5 + 2,
    start_point=(0, 5),
    old_end_point=(0, 5),
    new_end_point=(0, 5 + 2),
)

Then, when you're ready to incorporate the changes into a new syntax tree, you can call Parser.parse again, but pass in the old tree:

new_tree = parser.parse(new_src, tree)

This will run much faster than if you were parsing from scratch.

The Tree.changed_ranges method can be called on the old tree to return the list of ranges whose syntactic structure has been changed:

for changed_range in tree.changed_ranges(new_tree):
    print("Changed range:")
    print(f"  Start point {changed_range.start_point}")
    print(f"  Start byte {changed_range.start_byte}")
    print(f"  End point {changed_range.end_point}")
    print(f"  End byte {changed_range.end_byte}")

Pattern-matching

You can search for patterns in a syntax tree using a tree query:

query = PY_LANGUAGE.query(
    """
(function_definition
  name: (identifier) @function.def
  body: (block) @function.block)

(call
  function: (identifier) @function.call
  arguments: (argument_list) @function.args)
"""
)

Captures

captures = query.captures(tree.root_node)
assert len(captures) == 2
assert captures[0][0] == function_name_node
assert captures[0][1] == "function.def"

The Query.captures() method takes optional start_point, end_point, start_byte and end_byte keyword arguments, which can be used to restrict the query's range. Only one of the ..._byte or ..._point pairs need to be given to restrict the range. If all are omitted, the entire range of the passed node is used.

Matches

matches = query.matches(tree.root_node)
assert len(matches) == 2

# first match
assert matches[0][1]["function.def"] == function_name_node
assert matches[0][1]["function.block"] == function_body_node

# second match
assert matches[1][1]["function.call"] == function_call_name_node
assert matches[1][1]["function.args"] == function_call_args_node

The Query.matches() method takes the same optional arguments as Query.captures(). The difference between the two methods is that Query.matches() groups captures into matches, which is much more useful when your captures within a query relate to each other. It maps the capture's name to the node that was captured via a dictionary.

To try out and explore the code referenced in this README, check out examples/usage.py.

py-tree-sitter's People

Contributors

2yz avatar akuli avatar amaanq avatar andreypopp avatar data-niklas avatar dependabot[bot] avatar dundargoc avatar eduardodx avatar gabdug avatar jhandley avatar jonafato avatar julian avatar kipre avatar ledenel avatar ls-jad-elkik avatar lunixbochs avatar maxbrunsfeld avatar narpfel avatar northisup avatar observeroftime avatar ocastejon avatar pedrovhb avatar ralpha avatar rjw57 avatar thabokani avatar whizsid avatar wstevick avatar yagebu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

py-tree-sitter's Issues

match or eq operators in Query are not working for python Bindings

I tried to use the query which has the query with matching regex. However those do not seem to be working. They work in playground, but not when running through Python.
I have following code.

i = 0
class ArgParser:
    logger = logging.getLogger("ArgParser")

I am trying to use the query as below

((attribute
    object: (identifier) @cls
   attribute: (identifier) @clsvar
 )
  (match? @cls "^cls$"))

When running through python, it gives getLogger as @clsvar (Ignoring the match condition). However while using the playground, I am getting the correct result. (When I remove the condition of match in playground, I get similar results as in python.. So I thought python binding is not checking the condition)

Is there any wayout?

What's the best highlighting algorithm I should use with py-tree-sitter?

I'd like to come up with a proper highlighting algorithm to use with QScintilla and I need a little bit of help/advice. In QScintilla basically you apply styles to chunks of texts using byte positions, here you can find a perfect explanation of how it works.

Right now I've coded a little snippet to understand the basics of tree-sitter... the goal would be apply a monokai style using the existing python grammar, consider this:

import textwrap
from tree_sitter import Language, Parser

def print_node(node):
    pos_point = f"[{node.start_point},{node.end_point}]"
    pos_byte = f"({node.start_byte},{node.end_byte})"
    print(
        f"{repr(node.type):<25}{'is_named' if node.is_named else '-':<20}"
        f"{pos_point:<30}{pos_byte}"
    )


def traverse(tree):
    def _traverse(node):
        print_node(node)

        for child in node.children:
            _traverse(child)

    _traverse(tree.root_node)


if __name__ == '__main__':
    code = textwrap.dedent("""\
        # こんにちは
        def hello():
            if world:
                foo()"""
    )
    print(repr(code), len(code.encode("utf-8")))

    print('-'*80)
    for l in code.splitlines(True):
        print(repr(l), len(l.encode("utf-8")))

    print('-'*80)
    grammar_name = "python"
    parser = Parser()
    parser.set_language(PY_LANGUAGE) # <-- IMPORTANT: Set PY_LANGUAGE to use your shared library
    tree = parser.parse(bytes(code, "utf8"))
    traverse(tree)

you should get this output:

'# こんにちは\ndef hello():\n    if world:\n        foo()' 58
--------------------------------------------------------------------------------
'# こんにちは\n' 18
'def hello():\n' 13
'    if world:\n' 14
'        foo()' 13
--------------------------------------------------------------------------------
'module'                 is_named            [(0, 0),(3, 13)]              (0,58)
'comment'                is_named            [(0, 0),(0, 17)]              (0,17)
'function_definition'    is_named            [(1, 0),(3, 13)]              (18,58)
'def'                    -                   [(1, 0),(1, 3)]               (18,21)
'identifier'             is_named            [(1, 4),(1, 9)]               (22,27)
'parameters'             is_named            [(1, 9),(1, 11)]              (27,29)
'('                      -                   [(1, 9),(1, 10)]              (27,28)
')'                      -                   [(1, 10),(1, 11)]             (28,29)
':'                      -                   [(1, 11),(1, 12)]             (29,30)
'if_statement'           is_named            [(2, 4),(3, 13)]              (35,58)
'if'                     -                   [(2, 4),(2, 6)]               (35,37)
'identifier'             is_named            [(2, 7),(2, 12)]              (38,43)
':'                      -                   [(2, 12),(2, 13)]             (43,44)
'expression_statement'   is_named            [(3, 8),(3, 13)]              (53,58)
'call'                   is_named            [(3, 8),(3, 13)]              (53,58)
'identifier'             is_named            [(3, 8),(3, 11)]              (53,56)
'argument_list'          is_named            [(3, 11),(3, 13)]             (56,58)
'('                      -                   [(3, 11),(3, 12)]             (56,57)
')'                      -                   [(3, 12),(3, 13)]             (57,58)

In that tree I see some missing information, how should I tweak my algorithm to print all relevant information I need to make syntax highlighting?

In any case, could you please explain the overall algorithm to use in combination with tree-sitter? In the past when using another libraries I've always found making syntax highlighting with large texts wasn't a trivial task... hopefully with tree-sitter this will be much easier :)

Thanks.

Ps. This is basically a question but at the end of this thread maybe we'll get some useful explanations that you can use to add to the docs... also, once I've got a QScintilla example working I could make a PR to add it as example/test if you want.

Tree node .parent does not appear to match expectation

Hello,

I'm using tree-sitter 0.2.0
[
% /usr/bin/pip3 freeze
tree-sitter==0.2.0
]

See this sample code (rename to parent_test.py to run):
parent_test.py.txt

I am just DFS walking the tree from the root.

Here's a small snippet of the output (beginning) it generates:

`Using source code:

def main():
	a = 2

Visiting node: id:4530382256, node.parent_id: None, actual_parent_id:None, b'def main():\n\t\ta = 2\n\t\t'
Node has 1 children..
Traversing child: 0
Visiting node: id:4530382384, node.parent_id: 4530382512, actual_parent_id:4530382256, b'def main():\n\t\ta = 2'
Node has 5 children..
Traversing child: 0
Visiting node: id:4530382512, node.parent_id: 4530382832, actual_parent_id:4530382384, b'def'
Node has 0 children..
Finished visiting node: id:4530382512, node.parent_id: 4530382832, actual_parent_id:4530382384, b'def'
Traversing child: 1`


Look at the 2nd node visited:

Visiting node: id:4530382384, node.parent_id: 4530382512, actual_parent_id:4530382256, b'def main():\n\t\ta = 2'

The node id when using node.parent is id: 4530382512
But the actual parent of this node in this traversal was id: 4530382256

So in summary, node.parent looks incorrect.

Is this a known issue? or is node.parent supposed to mean something different than "the parent of this node in the traversal"?

Add getting identifier node name to readme example

I am struggling to get identifier node. I've tried to read introduction page and using parser page, but still have no clue.

Using example in readme.

tree = parser.parse(
    bytes(
        """
def foo():
    if bar:
        baz()
""",
        "utf8",
    )
)

I want to get function identifier node name which is "foo".

I've tried several things but no clue:

ipdb> function_name_node.is_named
True

ipdb> function_name_node.__str__()
'<Node kind=identifier, start_point=(1, 4), end_point=(1, 7)>'

Thank you, sorry for beginner question.

Walking Syntax Trees

I am currently using the following method to walk the tree:

def traverse(tree):
    def _traverse(node):
        print_node(node)

        for child in node.children:
            _traverse(child)

    _traverse(tree.root_node)

How could someone do the same using TreeCursor?

provide ts_tree_get_changed_ranges binding

It would be great to expose this ts_tree_get_changed_ranges binding for Python

/**
 * Compare an old edited syntax tree to a new syntax tree representing the same
 * document, returning an array of ranges whose syntactic structure has changed.
 *
 * For this to work correctly, the old syntax tree must have been edited such
 * that its ranges match up to the new tree. Generally, you'll want to call
 * this function right after calling one of the `ts_parser_parse` functions.
 * You need to pass the old tree that was passed to parse, as well as the new
 * tree that was returned from that function.
 *
 * The returned array is allocated using `malloc` and the caller is responsible
 * for freeing it using `free`. The length of the array will be written to the
 * given `length` pointer.
 */
TSRange *ts_tree_get_changed_ranges(
  const TSTree *old_tree,
  const TSTree *new_tree,
  uint32_t *length
);

https://github.com/tree-sitter/tree-sitter/blob/master/lib/include/tree_sitter/api.h

Possible memory leak when using captures

I run py-tree-sitter on a Windows system in an almost clean 32bit 3.6 Python venv.
While parsing a LOT of files and running the same query on each of the resulting trees I found myself out of memory rather quickly. I tried to narrow down the source of my leak:

query = PY_LANG.query(""" some query string with captures"")
for root, _, files in os.walk("some/repos/"):
    for fn in files:
        if fn.endswith(".py"):
            with open(os.path.join(root, fn), mode="rb") as fd:
                tree = parser.parse(fd.read())
            query.captures(tree.root_node)

After playing around with the above code a little query.captures(tree.root_node) seems to be cause, since other tree operations or even just parsing are working perfectly fine. Other repositories using the python tree-sitter I know also don't really use queries , so I wondered whether this is known/intended behaviour and I'm just using it wronly or a genuine bug.
I tried looking at the C bindings, but unfortunately I'm quite inexperienced with C, so I couldn't find the exact cause.

Support memoryview inputs in addition to bytes inputs

A memoryview object exposes the C level buffer interface as a Python object which can then be passed around like any other object. https://docs.python.org/3/c-api/memoryview.html

I have a pyarrow table of strings where each string contains the content of a source code file.
Pyarrow exposes each row as pyarrow.Buffer, which is compatible with the Python memoryview interface.

It would be nice if tree-sitter can be executed directly on the memoryview, instead of requiring users to perform a copy to a new bytes object

Example:

    parser = tree_sitter.Language(lib_path, 'python')
    parser.parse(memoryview(tbl['content'][0].as_buffer()))  # TypeError: First argument to parse must be bytes
    parser.parse(tbl['content'][0].as_buffer().to_pybytes())  # Works, but performs unnecessary copy.

Test Failing on Windows 10 (64 bit) and Python 64 Bit (3.6 & 3.7)

It is crashing on the line parser.set_language(PYTHON).
I tried to build, install using python 32 bit installation (3.7). It worked smoothly. However as soon as started using 64 bit Python, it failed. (I even copied 32 bit language library, but as expected load library failed)
During debugging, I found that it was crashing at ts_language_version (in binding.c). Further debugging revealed that the language_id is not getting passed correctly while passing Language using set_laguange. (I printed out the language_id while using both 32bit & 64 bit python.. For 32 bit it was printed like 6 or 7.. While for 64 bit it was some garbage value..) (I am not sure how the object is passed to bindings.. but I thought to print it and observed it).

I am not sure if that matters, but while building the library for Languages, I saw the compiler was used from "C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.23.28105\bin\HostX86\x64\cl.exe". I see a folder named Hostx64 at ...\14.23.28105\bin too. However I could not figure it out how to force to use it.

Any pointers will be helpful.. Any additional setting I should do or Am I missing something?
(I do "python setup.py build" then "python setup.py install" and then "python setup.py test".. so that it uses the freshly built tree-sitter)

Thanks.

P.S. It worked perfectly on Ubuntu 18.0 64 bit

Obtain argument type from AST

Can tree-sitter support obtaining the argument type of callee function? I can use query mechanism to obtain the method_invocation Node, while I have no idea how to know the argument type of this invocated function.

To confirm it, can we obtain the argument type from the AST.

Thanks.

Document that a c compiler is required for installation

pip install tree_sitter
...snipped...
cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MT -Itree_sitter/core/lib/include -Itree_sitter/core/lib/utf8proc -Ic:\users\laura\appdata\local\programs\python\python37\include -Ic:\users\laura\appdata\local\programs\python\python37\include "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" /Tctree_sitter/core/lib/src/lib.c /Fobuild\temp.win-amd64-3.7\Release\tree_sitter/core/lib/src/lib.obj -std=c99
    error: command 'cl.exe' failed: No such file or directory

I think it should be included in the docs that Visual Studio is a dependency for installing tree_sitter

get an ast with identifier values

This is a awesome project.

I'm not familiar with this tool. Now all identifiers are replaced by the same word. I want to get an ast with identifier values, but I don't know how to do it.

No definition found for "Parser" or nodes.

I've installed py-tree-sitter, but something seems wrong; when trying to import Parser, I seem to be told it has no definition (which, among other things, means that in VSCode, autocomplete doesn't work for any of its functions, and it is not recognized as a class).

Going to the definition of Language shows that in the __init__.py file, the from tree_sitter.binding lines also do not work properly (no highlighting, and they have no definition).

The binding.c file also does not seem to exist in the tree_sitter folder in my virtual environment (instead there is a file that starts with binding.cpython), and I believe this is probably the cause of this problem. I should note I am not getting any errors in execution (indeed, I am able to parse code use functions/access nodes), this is just the IDE not allowing me any information about these functions (making development more difficult, as I cannot see at a glance, for example, what types nodes have without visiting github and reading binding.c).

How can I fix this?

memory leak in query api

Dear Community,

I found that there is a memory leak when using the query api.
Consider this small example, where i crawled the c++ repos with the most stars on github to data/raw/:

# %% show memory leak
from tree_sitter import Language, Parser
from glob import glob
from tqdm import tqdm

RELATIVE_PATH_TO_PARSER = "build/my-languages.so"
LANGUAGE = Language(RELATIVE_PATH_TO_PARSER, "cpp")

parser = Parser()
parser.set_language(LANGUAGE)
files = glob("data/raw/**/*.cpp", recursive=True)
files.extend(glob("data/raw/**/*.c", recursive=True))
files.extend(glob("data/raw/**/*.cc", recursive=True))

use_query = True

query_statement = "(if_statement(condition_clause)@if_statement)"
query = LANGUAGE.query(query_statement)
for file in tqdm(files):
    code = open(file, "rb").read()
    tree = parser.parse(code)
    if use_query:
        file_captures = query.captures(tree.root_node)

When using the "use_query"-switch, the memory usage goes well beyond 50GB when crawling through all of them, even though the "file_captures" variable is never actually used and also gets overwritten every iteration. The increase in memory is very steady, and not linked to the filesize of the file that is currently read.
When not using the "use_query"-switch, the memory usage stays much below a single GB.

Manually deleting the used looped-variables or running the garbage collector via gc.collect() every couple of iterations does not fix the bug.

I am not a good enough c++ developer myself to fix it. I hope it can be fixed by someone else. I am very happy to pride additional info!

Cheers

Cursor tree traversal

Hi, I'm trying to create a tree traversal using the cursor. The cursor should print the nodes as the recursive method in this other issue (#5). However, the recursive implementation without cursor is very inefficient as it ends up with a huge recursion depth.

This is what I wrote so far, it prints some parts of the tree, but then it stops.

def print_node(code, node):

    pos_point = f"[{node.start_point},{node.end_point}]"
    pos_byte = f"({node.start_byte},{node.end_byte}"
    print(
        f"{code[node.start_byte:node.end_byte]:<25}{node.type}"
    )

def itraverse(code, cursor):
    has_sibling = True
    while has_sibling:
        has_childs = True
        while has_childs:
            has_childs = cursor.goto_first_child()
            if not has_childs:
                print_node(code, cursor.node)
        
        
        #print_node(code, cursor.node)
        
        has_sibling = cursor.goto_next_sibling()

I think the problem is that the goto_next_sibling() doesn't work as I think (going to the closest sibling of the subtree). Is there a way to traverse the tree without using a stack and memorize the nodes that have been visited?

Python abruptly exits when set_language is called

My PC is running on Windows 10 and the bug is reproducible in both python and ipython shells, via powershell or anaconda prompt (cmd).

Before starting on creating a parser, I am trying to run the simplest possible examples with python bindings but there is a rather obscure issue I am dealing with. In the simplest context:

import os
from tree_sitter import Language, Parser
Language.build_library(os.path.join('server','tree-sitter','build','LANG.so'),
                                     [os.path.join('server','tree-sitter')])
LANGUAGE = Language(os.path.join('server','tree-sitter','build','LANG.so'), 'LANG')

parser = Parser()
parser.set_language(LANGUAGE)

for the grammer.js file

module.exports = grammar({
  name: 'LANG',

  rules: {
    // TODO: add the actual grammar rules
    source_file: $ => 'hello'
  }
});

Above code, when ran in any python shell, exits from that shell abruptly at the last line without giving any exception or error output. The build_library function returns True, and the parser can be created, so I am pretty sure that the setting part is the source of the problem.

I tried a different rules dictionary from a repo that I know it works, so the source is not that either. Sorry for not being able to provide more information, but I am also a bit perplexed as I have never seen such a thing in python before. Here's a pic if it tells you anything.

image

Thanks in advance!

Linker errors on windows & visual studio

Consider latest repos of py-tree-sitter & tree-sitter & tree-sitter-python and then also add tree-sitter-hello with the below grammar:

tree-sitter-hello/grammar.js:

module.exports = grammar({
  name: 'the_language_name',

  rules: {
    source_file: $ => 'hello'
  }
});

test.py:

from tree_sitter import Language, Parser


def build():
    Language.build_library(
        'build/parser.pyd',
        [
            "tree-sitter-python",
            "tree-sitter-hello",
        ]
    )


def main():
    LANGUAGE = Language('build/parser.pyd', 'python')
    parser = Parser()
    parser.set_language(LANGUAGE)
    tree = parser.parse(bytes("""a = 10""", "utf8"))

if __name__ == '__main__':
    build()
    main()

If I run test.py on the visual studio command prompt (vs2015) I'll get this output:

(py364_32) d:\mcve>python test.py
parser.c
parser.c
   Creating library build/parser.lib and object build/parser.exp
parser.obj : error LNK2001: unresolved external symbol _tree_sitter_python_external_scanner_create
parser.obj : error LNK2001: unresolved external symbol _tree_sitter_python_external_scanner_serialize
parser.obj : error LNK2001: unresolved external symbol _tree_sitter_python_external_scanner_deserialize
parser.obj : error LNK2001: unresolved external symbol _tree_sitter_python_external_scanner_scan
parser.obj : error LNK2001: unresolved external symbol _tree_sitter_python_external_scanner_destroy
build/parser.pyd : fatal error LNK1120: 5 unresolved externals
Traceback (most recent call last):
  File "d:\software\python364_32\Lib\distutils\_msvccompiler.py", line 519, in link
    self.spawn([self.linker] + ld_args)
  File "d:\software\python364_32\Lib\distutils\_msvccompiler.py", line 542, in spawn
    return super().spawn(cmd)
  File "d:\software\python364_32\Lib\distutils\ccompiler.py", line 909, in spawn
    spawn(cmd, dry_run=self.dry_run)
  File "d:\software\python364_32\Lib\distutils\spawn.py", line 38, in spawn
    _spawn_nt(cmd, search_path, dry_run=dry_run)
  File "d:\software\python364_32\Lib\distutils\spawn.py", line 81, in _spawn_nt
    "command %r failed with exit status %d" % (cmd, rc))
distutils.errors.DistutilsExecError: command 'C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\VC\\BIN\\link.exe' failed with exit status 1120

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test.py", line 21, in <module>
    build()
  File "test.py", line 9, in build
    "tree-sitter-hello",
  File "D:\virtual_envs\py364_32\lib\site-packages\tree_sitter\__init__.py", line 65, in build_library
    compiler.link_shared_object(object_paths, output_path)
  File "d:\software\python364_32\Lib\distutils\ccompiler.py", line 717, in link_shared_object
    extra_preargs, extra_postargs, build_temp, target_lang)
  File "d:\software\python364_32\Lib\distutils\_msvccompiler.py", line 522, in link
    raise LinkError(msg)
distutils.errors.LinkError: command 'C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\VC\\BIN\\link.exe' failed with exit status 1120

What's the reason to get these linker errors?

Installing and building tree_sitter fails on Windows with cl.exe

python -m pip install --upgrade pip
Collecting pip
  Downloading https://files.pythonhosted.org/packages/d8/f3/413bab4ff08e1fc4828dfc59996d721917df8e8583ea85385d51125dceff/pip-19.0.3-py2.py3-none-any.whl (1.4MB)
    100% |████████████████████████████████| 1.4MB 4.9MB/s
Installing collected packages: pip
  Found existing installation: pip 18.1
    Uninstalling pip-18.1:
      Successfully uninstalled pip-18.1
Successfully installed pip-19.0.3
PS C:\Bitnami\wampstack-7.1.24-0\apache2\htdocs\autoteststuff> python -m pip install --upgrade pip
Requirement already up-to-date: pip in c:\users\laura\appdata\local\programs\python\python37\lib\site-packages (19.0.3)
PS C:\Bitnami\wampstack-7.1.24-0\apache2\htdocs\autoteststuff> pip3 install tree_sitter
Collecting tree_sitter
  Using cached https://files.pythonhosted.org/packages/cf/c3/f1850242f8fb3676250fab00568310a2898d721c5f024a1e789e1de78ff7/tree_sitter-0.0.4.tar.gz
Installing collected packages: tree-sitter
  Running setup.py install for tree-sitter ... error
    Complete output from command c:\users\laura\appdata\local\programs\python\python37\python.exe -u -c "import setuptools, tokenize;__file__='C:\\Users\\Laura\\AppData\\Local\\Temp\\pip-install-y3rtodf6\\tree-sitter\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record C:\Users\Laura\AppData\Local\Temp\pip-record-tjbljfeq\install-record.txt --single-version-externally-managed --compile:
    running install
    running build
    running build_py
    creating build
    creating build\lib.win-amd64-3.7
    creating build\lib.win-amd64-3.7\tree_sitter
    copying tree_sitter\__init__.py -> build\lib.win-amd64-3.7\tree_sitter
    running build_ext
    building 'tree_sitter_binding' extension
    creating build\temp.win-amd64-3.7
    creating build\temp.win-amd64-3.7\Release
    creating build\temp.win-amd64-3.7\Release\tree_sitter
    creating build\temp.win-amd64-3.7\Release\tree_sitter\core
    creating build\temp.win-amd64-3.7\Release\tree_sitter\core\lib
    creating build\temp.win-amd64-3.7\Release\tree_sitter\core\lib\src
    cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MT -Itree_sitter/core/lib/include -Itree_sitter/core/lib/utf8proc -Ic:\users\laura\appdata\local\programs\python\python37\include -Ic:\users\laura\appdata\local\programs\python\python37\include "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" /Tctree_sitter/core/lib/src/lib.c /Fobuild\temp.win-amd64-3.7\Release\tree_sitter/core/lib/src/lib.obj -std=c99
    error: command 'cl.exe' failed: No such file or directory

    ----------------------------------------
Command "c:\users\laura\appdata\local\programs\python\python37\python.exe -u -c "import setuptools, tokenize;__file__='C:\\Users\\Laura\\AppData\\Local\\Temp\\pip-install-y3rtodf6\\tree-sitter\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record C:\Users\Laura\AppData\Local\Temp\pip-record-tjbljfeq\install-record.txt --single-version-externally-managed --compile" failed with error code 1 in C:\Users\Laura\AppData\Local\Temp\pip-install-y3rtodf6\tree-sitter\
PS C:\Bitnami\wampstack-7.1.24-0\apache2\htdocs\autoteststuff> pip3 install tree_sitter
Collecting tree_sitter
  Using cached https://files.pythonhosted.org/packages/cf/c3/f1850242f8fb3676250fab00568310a2898d721c5f024a1e789e1de78ff7/tree_sitter-0.0.4.tar.gz
Installing collected packages: tree-sitter
  Running setup.py install for tree-sitter ... error
    Complete output from command c:\users\laura\appdata\local\programs\python\python37\python.exe -u -c "import setuptools, tokenize;__file__='C:\\Users\\Laura\\AppData\\Local\\Temp\\pip-install-sdugcto3\\tree-sitter\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record C:\Users\Laura\AppData\Local\Temp\pip-record-morby4_e\install-record.txt --single-version-externally-managed --compile:
    running install
    running build
    running build_py
    creating build
    creating build\lib.win-amd64-3.7
    creating build\lib.win-amd64-3.7\tree_sitter
    copying tree_sitter\__init__.py -> build\lib.win-amd64-3.7\tree_sitter
    running build_ext
    building 'tree_sitter_binding' extension
    creating build\temp.win-amd64-3.7
    creating build\temp.win-amd64-3.7\Release
    creating build\temp.win-amd64-3.7\Release\tree_sitter
    creating build\temp.win-amd64-3.7\Release\tree_sitter\core
    creating build\temp.win-amd64-3.7\Release\tree_sitter\core\lib
    creating build\temp.win-amd64-3.7\Release\tree_sitter\core\lib\src
    C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.15.26726\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -Itree_sitter/core/lib/include -Itree_sitter/core/lib/utf8proc -Ic:\users\laura\appdata\local\programs\python\python37\include -Ic:\users\laura\appdata\local\programs\python\python37\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.15.26726\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.15.26726\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\cppwinrt" /Tctree_sitter/core/lib/src/lib.c /Fobuild\temp.win-amd64-3.7\Release\tree_sitter/core/lib/src/lib.obj -std=c99
    cl : Command line warning D9002 : ignoring unknown option '-std=c99'
    lib.c
    c:\users\laura\appdata\local\temp\pip-install-sdugcto3\tree-sitter\tree_sitter\core\lib\src\./alloc.h(47): warning C4477: 'fprintf' : format string '%lu' requires an argument of type 'unsigned long', but variadic argument 1 has type 'size_t'
    c:\users\laura\appdata\local\temp\pip-install-sdugcto3\tree-sitter\tree_sitter\core\lib\src\./alloc.h(47): note: consider using '%zu' in the format string
    c:\users\laura\appdata\local\temp\pip-install-sdugcto3\tree-sitter\tree_sitter\core\lib\src\./alloc.h(56): warning C4477: 'fprintf' : format string '%lu' requires an argument of type 'unsigned long', but variadic argument 1 has type 'size_t'
    c:\users\laura\appdata\local\temp\pip-install-sdugcto3\tree-sitter\tree_sitter\core\lib\src\./alloc.h(56): note: consider using '%zu' in the format string
    c:\users\laura\appdata\local\temp\pip-install-sdugcto3\tree-sitter\tree_sitter\core\lib\src\./alloc.h(65): warning C4477: 'fprintf' : format string '%lu' requires an argument of type 'unsigned long', but variadic argument 1 has type 'size_t'
    c:\users\laura\appdata\local\temp\pip-install-sdugcto3\tree-sitter\tree_sitter\core\lib\src\./alloc.h(65): note: consider using '%zu' in the format string
    c:\users\laura\appdata\local\temp\pip-install-sdugcto3\tree-sitter\tree_sitter\core\lib\src\./array.h(107): warning C4267: 'function': conversion from 'size_t' to 'uint32_t', possible loss of data
    c:\users\laura\appdata\local\temp\pip-install-sdugcto3\tree-sitter\tree_sitter\core\lib\src\./lexer.c(54): warning C4244: '=': conversion from 'utf8proc_ssize_t' to 'uint32_t', possible loss of data
    c:\users\laura\appdata\local\temp\pip-install-sdugcto3\tree-sitter\tree_sitter\core\lib\src\./lexer.c(62): warning C4244: '=': conversion from 'utf8proc_ssize_t' to 'uint32_t', possible loss of data
    c:\users\laura\appdata\local\temp\pip-install-sdugcto3\tree-sitter\tree_sitter\core\lib\src\./lexer.c(312): warning C4267: '=': conversion from 'size_t' to 'uint32_t', possible loss of data
    c:\users\laura\appdata\local\temp\pip-install-sdugcto3\tree-sitter\tree_sitter\core\lib\src\./parser.c(520): warning C4267: '=': conversion from 'size_t' to 'uint32_t', possible loss of data
    c:\users\laura\appdata\local\temp\pip-install-sdugcto3\tree-sitter\tree_sitter\core\lib\src\./parser.c(1582): warning C4996: 'fdopen': The POSIX name for this item is deprecated. Instead, use the ISO C and C++ conformant name: _fdopen. See online help for details.
    C:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\ucrt\stdio.h(2457): note: see declaration of 'fdopen'
    c:\users\laura\appdata\local\temp\pip-install-sdugcto3\tree-sitter\tree_sitter\core\lib\src\./parser.c(1652): warning C4267: 'function': conversion from 'size_t' to 'unsigned int', possible loss of data
    c:\users\laura\appdata\local\temp\pip-install-sdugcto3\tree-sitter\tree_sitter\core\lib\src\./parser.c(1725): warning C4267: 'function': conversion from 'size_t' to 'unsigned int', possible loss of data
    c:\users\laura\appdata\local\temp\pip-install-sdugcto3\tree-sitter\tree_sitter\core\lib\src\./subtree.c(186): warning C4244: 'initializing': conversion from 'TSSymbol' to 'uint8_t', possible loss of data
    c:\users\laura\appdata\local\temp\pip-install-sdugcto3\tree-sitter\tree_sitter\core\lib\src\./subtree.c(236): warning C4244: '=': conversion from 'TSSymbol' to 'uint8_t', possible loss of data
    C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.15.26726\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -Itree_sitter/core/lib/include -Itree_sitter/core/lib/utf8proc -Ic:\users\laura\appdata\local\programs\python\python37\include -Ic:\users\laura\appdata\local\programs\python\python37\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.15.26726\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.15.26726\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\cppwinrt" /Tctree_sitter/binding.c /Fobuild\temp.win-amd64-3.7\Release\tree_sitter/binding.obj -std=c99
    cl : Command line warning D9002 : ignoring unknown option '-std=c99'
    binding.c
    tree_sitter/binding.c(7): error C2059: syntax error: ';'
    tree_sitter/binding.c(10): error C2059: syntax error: '}'
    tree_sitter/binding.c(13): error C2059: syntax error: ';'
    tree_sitter/binding.c(15): error C2059: syntax error: '}'
    tree_sitter/binding.c(18): error C2059: syntax error: ';'
    tree_sitter/binding.c(20): error C2059: syntax error: '}'
    tree_sitter/binding.c(41): error C2143: syntax error: missing ')' before '*'
    tree_sitter/binding.c(41): error C2143: syntax error: missing '{' before '*'
    tree_sitter/binding.c(41): error C2059: syntax error: ')'
    tree_sitter/binding.c(41): error C2054: expected '(' to follow 'self'
    tree_sitter/binding.c(46): error C2143: syntax error: missing ')' before '*'
    tree_sitter/binding.c(46): error C2143: syntax error: missing '{' before '*'
    tree_sitter/binding.c(46): error C2059: syntax error: ')'
    tree_sitter/binding.c(46): error C2054: expected '(' to follow 'self'
    tree_sitter/binding.c(63): error C2143: syntax error: missing ')' before '*'
    tree_sitter/binding.c(63): error C2143: syntax error: missing '{' before '*'
    tree_sitter/binding.c(63): error C2371: 'PyObject': redefinition; different basic types
    c:\users\laura\appdata\local\programs\python\python37\include\object.h(110): note: see declaration of 'PyObject'
    tree_sitter/binding.c(63): error C2143: syntax error: missing ';' before '*'
    tree_sitter/binding.c(63): error C2059: syntax error: ')'
    tree_sitter/binding.c(63): error C2054: expected '(' to follow 'args'
    tree_sitter/binding.c(70): error C2143: syntax error: missing ')' before '*'
    tree_sitter/binding.c(70): error C2143: syntax error: missing '{' before '*'
    tree_sitter/binding.c(70): error C2059: syntax error: 'type'
    tree_sitter/binding.c(70): error C2059: syntax error: ')'
    tree_sitter/binding.c(74): error C2143: syntax error: missing ')' before '*'
    tree_sitter/binding.c(74): error C2143: syntax error: missing '{' before '*'
    tree_sitter/binding.c(74): error C2059: syntax error: 'type'
    tree_sitter/binding.c(74): error C2059: syntax error: ')'
    tree_sitter/binding.c(78): error C2143: syntax error: missing ')' before '*'
    tree_sitter/binding.c(78): error C2143: syntax error: missing '{' before '*'
    tree_sitter/binding.c(78): error C2059: syntax error: 'type'
    tree_sitter/binding.c(78): error C2059: syntax error: ')'
    tree_sitter/binding.c(82): error C2143: syntax error: missing ')' before '*'
    tree_sitter/binding.c(82): error C2143: syntax error: missing '{' before '*'
    tree_sitter/binding.c(82): error C2059: syntax error: 'type'
    tree_sitter/binding.c(82): error C2059: syntax error: ')'
    tree_sitter/binding.c(86): error C2143: syntax error: missing ')' before '*'
    tree_sitter/binding.c(86): error C2143: syntax error: missing '{' before '*'
    tree_sitter/binding.c(86): error C2059: syntax error: 'type'
    tree_sitter/binding.c(86): error C2059: syntax error: ')'
    tree_sitter/binding.c(90): error C2143: syntax error: missing ')' before '*'
    tree_sitter/binding.c(90): error C2143: syntax error: missing '{' before '*'
    tree_sitter/binding.c(90): error C2059: syntax error: 'type'
    tree_sitter/binding.c(90): error C2059: syntax error: ')'
    tree_sitter/binding.c(94): error C2143: syntax error: missing ')' before '*'
    tree_sitter/binding.c(94): error C2143: syntax error: missing '{' before '*'
    tree_sitter/binding.c(94): error C2059: syntax error: 'type'
    tree_sitter/binding.c(94): error C2059: syntax error: ')'
    tree_sitter/binding.c(120): error C2065: 'node_sexp': undeclared identifier
    tree_sitter/binding.c(120): warning C4312: 'type cast': conversion from 'int' to 'PyCFunction' of greater size
    tree_sitter/binding.c(118): error C2099: initializer is not a constant
    tree_sitter/binding.c(118): warning C4047: 'initializing': 'PyCFunction' differs in levels of indirection from 'int'
    tree_sitter/binding.c(118): warning C4047: 'initializing': 'int' differs in levels of indirection from 'char [42]'
    tree_sitter/binding.c(128): error C2065: 'node_get_type': undeclared identifier
    tree_sitter/binding.c(128): warning C4312: 'type cast': conversion from 'int' to 'getter' of greater size
    tree_sitter/binding.c(129): error C2065: 'node_get_is_named': undeclared identifier
    tree_sitter/binding.c(129): warning C4312: 'type cast': conversion from 'int' to 'getter' of greater size
    tree_sitter/binding.c(130): error C2065: 'node_get_start_byte': undeclared identifier
    tree_sitter/binding.c(130): warning C4312: 'type cast': conversion from 'int' to 'getter' of greater size
    tree_sitter/binding.c(131): error C2065: 'node_get_end_byte': undeclared identifier
    tree_sitter/binding.c(131): warning C4312: 'type cast': conversion from 'int' to 'getter' of greater size
    tree_sitter/binding.c(132): error C2065: 'node_get_start_point': undeclared identifier
    tree_sitter/binding.c(132): warning C4312: 'type cast': conversion from 'int' to 'getter' of greater size
    tree_sitter/binding.c(133): error C2065: 'node_get_end_point': undeclared identifier
    tree_sitter/binding.c(133): warning C4312: 'type cast': conversion from 'int' to 'getter' of greater size
    tree_sitter/binding.c(134): error C2065: 'node_get_children': undeclared identifier
    tree_sitter/binding.c(134): warning C4312: 'type cast': conversion from 'int' to 'getter' of greater size
    tree_sitter/binding.c(128): error C2099: initializer is not a constant
    tree_sitter/binding.c(128): warning C4047: 'initializing': 'setter' differs in levels of indirection from 'char [16]'
    tree_sitter/binding.c(129): error C2099: initializer is not a constant
    tree_sitter/binding.c(129): warning C4047: 'initializing': 'setter' differs in levels of indirection from 'char [21]'
    tree_sitter/binding.c(130): error C2099: initializer is not a constant
    tree_sitter/binding.c(130): warning C4047: 'initializing': 'setter' differs in levels of indirection from 'char [22]'
    tree_sitter/binding.c(131): error C2099: initializer is not a constant
    tree_sitter/binding.c(131): warning C4047: 'initializing': 'setter' differs in levels of indirection from 'char [20]'
    tree_sitter/binding.c(132): error C2099: initializer is not a constant
    tree_sitter/binding.c(132): warning C4047: 'initializing': 'setter' differs in levels of indirection from 'char [23]'
    tree_sitter/binding.c(133): error C2099: initializer is not a constant
    tree_sitter/binding.c(133): warning C4047: 'initializing': 'setter' differs in levels of indirection from 'char [21]'
    tree_sitter/binding.c(134): error C2099: initializer is not a constant
    tree_sitter/binding.c(134): warning C4047: 'initializing': 'setter' differs in levels of indirection from 'char [20]'
    tree_sitter/binding.c(142): error C2065: 'Node': undeclared identifier
    tree_sitter/binding.c(145): error C2065: 'node_dealloc': undeclared identifier
    tree_sitter/binding.c(145): warning C4312: 'type cast': conversion from 'int' to 'destructor' of greater size
    tree_sitter/binding.c(146): error C2065: 'node_repr': undeclared identifier
    tree_sitter/binding.c(146): warning C4312: 'type cast': conversion from 'int' to 'reprfunc' of greater size
    tree_sitter/binding.c(138): error C2099: initializer is not a constant
    tree_sitter/binding.c(138): warning C4047: 'initializing': 'setattrofunc' differs in levels of indirection from 'unsigned long'
    tree_sitter/binding.c(138): warning C4133: 'initializing': incompatible types - from 'char [14]' to 'PyBufferProcs *'
    tree_sitter/binding.c(138): warning C4047: 'initializing': 'getiterfunc' differs in levels of indirection from 'PyMethodDef *'
    tree_sitter/binding.c(138): warning C4133: 'initializing': incompatible types - from 'PyGetSetDef *' to 'PyMethodDef *'
    tree_sitter/binding.c(152): error C2065: 'Node': undeclared identifier
    tree_sitter/binding.c(152): error C2297: '*': illegal, right operand has type 'int *'
    tree_sitter/binding.c(152): error C2059: syntax error: ')'
    tree_sitter/binding.c(154): error C2223: left of '->node' must point to struct/union
    tree_sitter/binding.c(155): error C2223: left of '->children' must point to struct/union
    tree_sitter/binding.c(162): error C2143: syntax error: missing ')' before '*'
    tree_sitter/binding.c(162): error C2143: syntax error: missing '{' before '*'
    tree_sitter/binding.c(162): error C2059: syntax error: ')'
    tree_sitter/binding.c(162): error C2054: expected '(' to follow 'self'
    tree_sitter/binding.c(167): error C2143: syntax error: missing ')' before '*'
    tree_sitter/binding.c(167): error C2143: syntax error: missing '{' before '*'
    tree_sitter/binding.c(167): error C2059: syntax error: 'type'
    tree_sitter/binding.c(167): error C2059: syntax error: ')'
    tree_sitter/binding.c(176): error C2065: 'tree_get_root_node': undeclared identifier
    tree_sitter/binding.c(176): warning C4312: 'type cast': conversion from 'int' to 'getter' of greater size
    tree_sitter/binding.c(176): error C2099: initializer is not a constant
    tree_sitter/binding.c(176): warning C4047: 'initializing': 'setter' differs in levels of indirection from 'char [10]'
    tree_sitter/binding.c(184): error C2065: 'Tree': undeclared identifier
    tree_sitter/binding.c(187): error C2065: 'tree_dealloc': undeclared identifier
    tree_sitter/binding.c(187): warning C4312: 'type cast': conversion from 'int' to 'destructor' of greater size
    tree_sitter/binding.c(180): error C2099: initializer is not a constant
    tree_sitter/binding.c(180): warning C4047: 'initializing': 'PyBufferProcs *' differs in levels of indirection from 'unsigned long'
    tree_sitter/binding.c(180): warning C4047: 'initializing': 'unsigned long' differs in levels of indirection from 'char [14]'
    tree_sitter/binding.c(180): warning C4047: 'initializing': 'iternextfunc' differs in levels of indirection from 'PyMethodDef *'
    tree_sitter/binding.c(180): warning C4133: 'initializing': incompatible types - from 'PyGetSetDef *' to 'PyMemberDef *'
    tree_sitter/binding.c(193): error C2065: 'Tree': undeclared identifier
    tree_sitter/binding.c(193): error C2297: '*': illegal, right operand has type 'int *'
    tree_sitter/binding.c(193): error C2059: syntax error: ')'
    tree_sitter/binding.c(194): error C2223: left of '->tree' must point to struct/union
    tree_sitter/binding.c(205): error C2065: 'Parser': undeclared identifier
    tree_sitter/binding.c(205): error C2297: '*': illegal, right operand has type 'int *'
    tree_sitter/binding.c(205): error C2059: syntax error: ')'
    tree_sitter/binding.c(206): error C2223: left of '->parser' must point to struct/union
    tree_sitter/binding.c(210): error C2143: syntax error: missing ')' before '*'
    tree_sitter/binding.c(210): error C2143: syntax error: missing '{' before '*'
    tree_sitter/binding.c(210): error C2059: syntax error: ')'
    tree_sitter/binding.c(210): error C2054: expected '(' to follow 'self'
    tree_sitter/binding.c(215): error C2143: syntax error: missing ')' before '*'
    tree_sitter/binding.c(215): error C2143: syntax error: missing '{' before '*'
    tree_sitter/binding.c(215): error C2371: 'PyObject': redefinition; different basic types
    c:\users\laura\appdata\local\programs\python\python37\include\object.h(110): note: see declaration of 'PyObject'
    tree_sitter/binding.c(215): fatal error C1003: error count exceeds 100; stopping compilation
    error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Community\\VC\\Tools\\MSVC\\14.15.26726\\bin\\HostX86\\x64\\cl.exe' failed with exit status 2

    ----------------------------------------
Command "c:\users\laura\appdata\local\programs\python\python37\python.exe -u -c "import setuptools, tokenize;__file__='C:\\Users\\Laura\\AppData\\Local\\Temp\\pip-install-sdugcto3\\tree-sitter\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record C:\Users\Laura\AppData\Local\Temp\pip-record-morby4_e\install-record.txt --single-version-externally-managed --compile" failed with error code 1 in C:\Users\Laura\AppData\Local\Temp\pip-install-sdugcto3\tree-sitter\

I'm not sure if tree_sitter can even be compiled on windows like this? :(

Get the text for a node

Currently, there is some extra work on the user's side (compared to node-tree-sitter or the wasm bindings) involved in accessing text that corresponds to a node as mentioned elsewhere:

I looked a bit into how node-tree-sitter provides .text: #16 (comment)

As a proof of concept, I tried something similar for py-tree-sitter (though this makes no attempt to track changes) by roughly:

  • extending Tree to store source code
  • adding tree_get_text to pull out the stored source code
  • adding node_get_text which uses tree_get_text to access the retrieved source, using start_byte and end_byte to construct a slice of the source, and finally decode the result

This seems to work in limited testing. Does this approach seem sound? (sogaiu@c4f0a27)

(I don't know whether other bindings attempt to keep the "retained" source up-to-date when the parse tree changes and I don't have any good ideas about how that might be done here.)

Usage question: how to get list(field -> child nodes) from a parent node

Hi,

I have a usage question. The Node api supports get_child_by_field_id(id) and get_child_by_field_name(name) where you need to know the field id or field name. children() gives a list of child nodes.
How do I get a list of child nodes along with their field names, like [{field_name:Node}], when I don't know which fields a child node can have. Or, is there a way to get a list of child field names that are possible from a parent node or a node type?

Thanks!

Getting nodes by position

It is great to see the py binding is getting better with your recent addition of tree-queries, etc. Thank you for your efforts!

It would also be useful if node searches by position could be supported, i.e., *descendant_for_*_range / first_*_child_for_byte.

Segfaults when accessing nodes and node lists (after tree dealloc)

When writing code of the following form I ran into segfaults with py-tree-sitter (I'll also open a PR that reproduces it):

def parse(contents: bytes) -> List[Node]:
    tree = Parser.parse(contents)
    return tree.root_node.children

First I tried finding some issue with the creation of the List of child nodes or the nodes itself. Then I realised that this is probably due to the tree going out of scope.

I believe we need to add a reference to the tree (Python object) for every node so that the tree doesn't get destroyed while any of the nodes in the tree are still alive.

libc++ conflict with libstdc++ with Linux and Anaconda

When I try to run Language.build_library('languages.so', ['tree-sitter-python']) on a machine using an Anaconda python distribution (which uses libstdc++) and my Ubuntu 16 system has both libc++ and libstdc++ installed, build_library gives higher precedence to the existence of libc++, when libstdc++ is necessary for use with my Anaconda python. The result of this is that I get the error

/usr/bin/ld: cannot find -lc++.

I recognize this is strange since I have both installed, but perhaps the distutils compiler only pays attention to Anaconda library paths?

I fixed it on my system by giving higher precedence to libstdc++ (and can submit a PR), but would this break py-tree-sitter for others? Is there another way to fix this logic so it works? Since libc++ is 'newer' than libstdc++, perhaps the precedence of the older one makes sense?

Not installable on macOS

I cant install this on macOS with the following error message. I tried with several Python versions and with pip3 and with pipenv to see if there is a difference.
I also updated pip3, setuptools and wheel.

I did get it working on Ubuntu 20.04 with Python 3.8.

Error message on macOS.

➜  NO pipenv install tree-sitter --python 3.8
Virtualenv already exists!
Removing existing virtualenv...
Creating a virtualenv for this project...
Pipfile: /Users/zensored/Desktop/NO/Pipfile
Using /usr/bin/python3 (3.8.2) to create virtualenv...
⠧ Creating virtual environment...created virtual environment CPython3.8.2.final.0-64 in 396ms
  creator CPython3macOsFramework(dest=/Users/zensored/.local/share/virtualenvs/NO--wY_wDPi, clear=False, global=False)
  seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/Users/zensored/Library/Application Support/virtualenv)
    added seed packages: pip==20.2.4, setuptools==50.3.2, wheel==0.35.1
  activators BashActivator,CShellActivator,FishActivator,PowerShellActivator,PythonActivator,XonshActivator

✔ Successfully created virtual environment! 
Virtualenv location: /Users/zensored/.local/share/virtualenvs/NO--wY_wDPi
Installing tree-sitter...
Error:  An error occurred while installing tree-sitter!
Error text: Collecting tree-sitter
  Using cached tree_sitter-0.19.0.tar.gz (112 kB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Building wheels for collected packages: tree-sitter
  Building wheel for tree-sitter (PEP 517): started
  Building wheel for tree-sitter (PEP 517): finished with status 'error'
Failed to build tree-sitter

  ERROR: Command errored out with exit status 1:
   command: /Users/zensored/.local/share/virtualenvs/NO--wY_wDPi/bin/python /Users/zensored/.local/share/virtualenvs/NO--wY_wDPi/lib/python3.8/site-packages/pip/_vendor/pep517/_in_process.py build_wheel /var/folders/ww/9g40xcr51854gq69rrk2smt80000gn/T/tmpzxxy5bw0
       cwd: /private/var/folders/ww/9g40xcr51854gq69rrk2smt80000gn/T/pip-install-40n0w3yk/tree-sitter
  Complete output (26 lines):
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.macosx-10.14.6-x86_64-3.8
  creating build/lib.macosx-10.14.6-x86_64-3.8/tree_sitter
  copying tree_sitter/__init__.py -> build/lib.macosx-10.14.6-x86_64-3.8/tree_sitter
  warning: build_py: byte-compiling is disabled, skipping.
  
  running build_ext
  building 'tree_sitter.binding' extension
  creating build/temp.macosx-10.14.6-x86_64-3.8
  creating build/temp.macosx-10.14.6-x86_64-3.8/tree_sitter
  creating build/temp.macosx-10.14.6-x86_64-3.8/tree_sitter/core
  creating build/temp.macosx-10.14.6-x86_64-3.8/tree_sitter/core/lib
  creating build/temp.macosx-10.14.6-x86_64-3.8/tree_sitter/core/lib/src
  clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -iwithsysroot/System/Library/Frameworks/System.framework/PrivateHeaders -iwithsysroot/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/Headers -arch arm64 -arch x86_64 -Itree_sitter/core/lib/include -Itree_sitter/core/lib/src -I/Users/zensored/.local/share/virtualenvs/NO--wY_wDPi/include -I/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/include/python3.8 -c tree_sitter/core/lib/src/lib.c -o build/temp.macosx-10.14.6-x86_64-3.8/tree_sitter/core/lib/src/lib.o -std=c99 -Wno-unused-variable
  clang: warning: using sysroot for 'iPhoneSimulator' but targeting 'MacOSX' [-Wincompatible-sysroot]
  clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -iwithsysroot/System/Library/Frameworks/System.framework/PrivateHeaders -iwithsysroot/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/Headers -arch arm64 -arch x86_64 -Itree_sitter/core/lib/include -Itree_sitter/core/lib/src -I/Users/zensored/.local/share/virtualenvs/NO--wY_wDPi/include -I/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/include/python3.8 -c tree_sitter/binding.c -o build/temp.macosx-10.14.6-x86_64-3.8/tree_sitter/binding.o -std=c99 -Wno-unused-variable
  clang: warning: using sysroot for 'iPhoneSimulator' but targeting 'MacOSX' [-Wincompatible-sysroot]
  clang -bundle -undefined dynamic_lookup -Wl,-headerpad,0x1000 -arch arm64 -arch x86_64 build/temp.macosx-10.14.6-x86_64-3.8/tree_sitter/core/lib/src/lib.o build/temp.macosx-10.14.6-x86_64-3.8/tree_sitter/binding.o -o build/lib.macosx-10.14.6-x86_64-3.8/tree_sitter/binding.cpython-38-darwin.so
  clang: warning: using sysroot for 'iPhoneSimulator' but targeting 'MacOSX' [-Wincompatible-sysroot]
  ld: warning: -undefined dynamic_lookup is deprecated on iOS Simulator
  ld: -platform_version passed unknown platform name 'macos-simulator'
  clang: error: linker command failed with exit code 1 (use -v to see invocation)
  error: command 'clang' failed with exit status 1
  ----------------------------------------
  ERROR: Failed building wheel for tree-sitter
ERROR: Could not build wheels for tree-sitter which use PEP 517 and cannot be installed directly

✘ Installation Failed 

macOS 11.1 (20C69)
XCode Version 12.4 (12D4e)
Python 3.8, 3.9, 3.10 (yes I tested them all)
pip 21.0.1 from /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pip (python 3.10)
pipenv, version 2020.11.15

C Query Not Working

I followed your documentation and added a C parser to my project:

from tree_sitter import Language, Parser

Language.build_library(
    # Store the library in the `build` directory
    'build/c.so',

    # Include one or more languages
    [
        'tree-sitter-c'
    ]
)


C_LANGUAGE = Language('build/c.so', 'c')

The parser works, but I have problems with querying, I never get a capture:

query = C_LANGUAGE.query("""
(function_definition)
""")

captures = query.captures(tree.root_node)

Example source file:

#include <linux/interrupt.h>
#include <linux/kernel.h>
#include <linux/slab.h>
#include <linux/cpu.h>
#include <linux/sort.h>

static cpumask_var_t *alloc_node_to_cpumask(void)
{
    cpumask_var_t *masks;
    int node;

    masks = kcalloc(nr_node_ids, sizeof(cpumask_var_t), GFP_KERNEL);
    if (!masks)
        return NULL;

    for (node = 0; node < nr_node_ids; node++)
    {
        if (!zalloc_cpumask_var(&masks[node], GFP_KERNEL))
            goto out_unwind;
    }

    return masks;

out_unwind:
    while (--node >= 0)
        free_cpumask_var(masks[node]);
    kfree(masks);
    return NULL;
}

I tried many different queries and never get a result. Are C queries not supported yet or do I have to fix my queries somehow?

Incompatible Language Version for a couple of languages

I am using tree_sitter==0.19.0 and I am not able to parse swift, verilog, and agda code. I cloned the grammars and installed the newest bindings. Is there anything I can do?

Error messages

Swift:

ValueError: Incompatible Language version 10. Must be between 13 and 13

Verilog:

ValueError: Incompatible Language version 12. Must be between 13 and 13

Agda:

ValueError: Incompatible Language version 11. Must be between 13 and 13

Make TreePropertyCursor Object available

Can TreePropertyCursor be made accessible along with TreeCursor? It seems to have useful functionality.
If one can give me pointers, I can try to add the bindings.
I saw it at https://github.com/tree-sitter/rust-tree-sitter/blob/375e6b4b59961da4d62db1dda90c99263d1abfdc/src/lib.rs.
The usage I saw was at https://github.com/maxbrunsfeld/tree-tags/blob/master/src/crawler.rs.. Here, the propertyMatcher is used to find the scope_type, definition etc..
I would like to get the scope of any variable/identifier I am encountering in the AST.. From these examples, I got the impression that tree_sitter has some kind of book-keeping for it.

Expose full public API into python

This file api.h contains the public tree-sitter API, right? So, shouldn't be that fully exposed into python?

Which leads me to another question, the way you've created the wrapper is really nice (it's minimal and user friendly), also dependency-free but... why didn't you consider using cffi, swig or similars? Was it because you didn't want to add any dependency or any other reason?

Incompatible Language version 13. Must not be between 9 and 12

Happend on Windows7 x64

D:\>python -V
Python 3.8.5

D:\>python test_treesitter.py
parser.c
scanner.cc
D:\Repositories\eko\tree-sitter-python\src\scanner.cc(104): warning C4267: '=': conversion from 'size_t' to 'char', possible loss of data
D:\Repositories\eko\tree-sitter-python\src\scanner.cc(114): warning C4244: '=': conversion from 'unsigned short' to 'char', possible loss of data
D:\Repositories\eko\tree-sitter-python\src\scanner.cc(117): warning C4267: 'return': conversion from 'size_t' to 'unsigned int', possible loss of data
   Creating library build/python.lib and object build/python.exp
Generating code
Finished generating code
Traceback (most recent call last):
  File "test_treesitter.py", line 16, in <module>
    parser.set_language(PY_LANGUAGE)
ValueError: Incompatible Language version 13. Must not be between 9 and 12

D:\>dir build

 Directory of D:\build

05.03.2021  23:56    <DIR>          .
05.03.2021  23:56    <DIR>          ..
05.03.2021  23:56           195.072 python.dll
05.03.2021  23:56               675 python.exp
05.03.2021  23:56             1.740 python.lib

Content of test_treesitter.py

from tree_sitter import Language, Parser

Language.build_library(
    # Store the library in the `build` directory
    'build/python.dll',

    # Include one or more languages
    [
        r'D:\Repositories\eko\tree-sitter-python'
    ]
)

PY_LANGUAGE = Language('build/python.dll', 'python')

parser = Parser()
parser.set_language(PY_LANGUAGE)

tree = parser.parse(bytes("""
def foo():
    if bar:
        baz()
""", "utf8")
)

print(tree)

Missing parser.c error for typescript

While trying to invoke build_library with typescript is resulting in missing src/parser.c error. Is there any documentation on which languages are supported with the python binding?

Incremental parsing

Hello, how can I use your parser to work as I explain in the following lines?

code = 'System'
code = 'System.'
code = 'System.out'
code = 'System.out.' 
code = 'System.out.println' 
code = 'System.println('
code = 'System.println("hello")

It means, the parser should be able to detect that the java syntax is doing well at each time I add one or more tokens. I tried using tree-sitter, but it did work and I do not know if there is a trick.

The code I tested is:

from tree_sitter import Language, Parser
JAVA_LANGUAGE = Language('parser/my-languages.so', 'java') 
parser = Parser() 
parser.set_language(JAVA_LANGUAGE)
code = 'System.' #.out.println("hello");  
tree = parser.parse(bytes(code,'utf8')).root_node 
subtree= [x[0] for x in get_all_sub_trees(tree)]

When code = 'System.out.println("hello"); works, but when code = 'System' fails

Thank you in advance

Parsing invalid using trivial grammar

@maxbrunsfeld Hi Max, could you please check this little test?

I'm using latest master of tree-sitter and py-tree-sitter, both were built ok. And now I try
to use a simple grammar

(py364_32) d:\mcve>mkdir tree-sitter-hello

(py364_32) d:\mcve>cd tree-sitter-hello

(py364_32) d:\mcve\tree-sitter-hello>clipout > grammar.js

(py364_32) d:\mcve\tree-sitter-hello>cat grammar.js
module.exports = grammar({
  name: 'the_language_name',

  rules: {
    // The production rules of the context-free grammar
    source_file: $ => 'hello'
  }
});

(py364_32) d:\mcve\tree-sitter-hello>tree-sitter generate

(py364_32) d:\mcve\tree-sitter-hello>cd ..

(py364_32) d:\mcve>clipout > test.py

(py364_32) d:\mcve>cat test.py
from tree_sitter import Language, Parser


def build():
    Language.build_library(
        'build/parser.pyd',
        [
            "tree-sitter-hello",
        ]
    )


def main():
    LANGUAGE = Language('build/parser.pyd', 'the_language_name')
    parser = Parser()
    parser.set_language(LANGUAGE)
    tree = parser.parse(bytes("""hello""", "utf8"))

if __name__ == '__main__':
    build()
    main()

(py364_32) d:\mcve>python test.py
parser.c
   Creating library build/parser.lib and object build/parser.exp
Generating code
Finished generating code
Traceback (most recent call last):
  File "test.py", line 21, in <module>
    main()
  File "test.py", line 17, in main
    tree = parser.parse(bytes("""hello""", "utf8"))
ValueError: Parsing failed

What's the meaning of that "ValueError: Parsing failed"? What am I doing wrong? Do you see anything suspicious here?

Thanks in advance.

Text length calculation incorrect for non-English languages

Since there's no API to access the text right now, we do the following to obtain the text value of a node:

tree = parser.parse(source_code_bytes)

# ...

text = source_code_bytes[node.start_byte:node.end_byte]

However, this fails if the node text contains a multi-byte character.

Reproduction

from queue import Queue

from tree_sitter import Language, Parser

Language.build_library('build/my-languages.so', ['tree-sitter-javascript'])
JS_LANGUAGE = Language('build/my-languages.so', 'javascript')


parser = Parser()
parser.set_language(JS_LANGUAGE)
code = '''var t = '大';var k=1;'''
code_bytes = bytes(code, "utf8")
tree = parser.parse(code_bytes)
root_node = tree.root_node

queue = Queue()
queue.put(root_node)

while not queue.empty():
    node = queue.get()
    for child in node.children:
        if child.type == 'string':
            print(code[child.start_byte:child.end_byte])
        queue.put(child)

Expected output:

'大';v

Current output:

 '大';

If you subtract 2 from child.end_byte, it gives the expected result. It would be very much convenient to have a .text method or similar to access the text as opposed to accessing it with string slicing.

Please let me know if you need more information.

{{ at the end of an f-string string doesn't parse

{{ is used to write a literal "{" in an f-string, but it's not parsed correctly if it's at the end of the string

f"{my_var} {{"
module [0, 0] - [1, 0]
  ERROR [0, 0] - [0, 14]
    interpolation [0, 2] - [0, 10]
      identifier [0, 3] - [0, 9]

Adding a space after the {{ causes it to parses correctly:

f"{my_var} {{ "
module [0, 0] - [1, 0]
  expression_statement [0, 0] - [0, 15]
    string [0, 0] - [0, 15]
      interpolation [0, 2] - [0, 10]
        identifier [0, 3] - [0, 9]

Failled to build .so library

After installing tree sitter with and importing the library as explain in the README, the following command :

>>> from tree_sitter import Language, Parser
>>> Language.build_library(
... 'build/test_tree_sitter.so',
... ['path/to/tree-sitter-python'])

Got the following error:

/usr/bin/ld: cannot find -lc++abi
collect2: error: ld returned 1 exit status
Traceback (most recent call last):
  File "/usr/lib/python3.7/distutils/unixccompiler.py", line 215, in link
    self.spawn(linker + ld_args)
  File "/usr/lib/python3.7/distutils/ccompiler.py", line 910, in spawn
    spawn(cmd, dry_run=self.dry_run)
  File "/usr/lib/python3.7/distutils/spawn.py", line 36, in spawn
    _spawn_posix(cmd, search_path, dry_run=dry_run)
  File "/usr/lib/python3.7/distutils/spawn.py", line 159, in _spawn_posix
    % (cmd, exit_status))
distutils.errors.DistutilsExecError: command 'cc' failed with exit status 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
  File "/home/joe/.local/lib/python3.7/site-packages/tree_sitter/__init__.py", line 72, in build_library
    compiler.link_shared_object(object_paths, output_path)
  File "/usr/lib/python3.7/distutils/ccompiler.py", line 717, in link_shared_object
    extra_preargs, extra_postargs, build_temp, target_lang)
  File "/usr/lib/python3.7/distutils/unixccompiler.py", line 217, in link
    raise LinkError(msg)
distutils.errors.LinkError: command 'cc' failed with exit status 1

Failed to load tree-sitter-python, but build_library successful

I tried to use the python parser like in the readme example, but I cannot load the built language object file...

from tree_sitter import Language, Parser
Language.build_library('lang.so', ['/home/robo/.cache/wsyntree/python/tsrepo/'])
pylang = Language('./lang.so', 'python')
Traceback (most recent call last):
  File "<input>", line 1, in <module>
    pylang = Language('./lang.so', 'python')
  File "/home/robo/.local/lib/python3.8/site-packages/tree_sitter/__init__.py", line 81, in __init__
    self.lib = cdll.LoadLibrary(library_path)
  File "/usr/lib/python3.8/ctypes/__init__.py", line 451, in LoadLibrary
    return self._dlltype(name)
  File "/usr/lib/python3.8/ctypes/__init__.py", line 373, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: ./lang.so: undefined symbol: _ZSt20__throw_length_errorPKc

Since it compiled successfully, I assume it's a problem with py-tree-sitter and not tree-sitter-python (commit c4282ba and 4cca050 tested), but I might be wrong since tree-sitter-javascript worked just fine.

I tried both in and outside a virtualenv, on Pop!_OS 20.10, the same example worked without issue on Ubuntu 20.04

Using cc/c++ 10.2.0 on 20.10 and 8.4.0 on 20.04, python 3.8 on both

Possible memory leaks in `point_new`

Hi, I'm debugging the memory increase issue in our product(which use py-tree-sitter) and found there may exist a memory leak problem in point_new:

static PyObject *point_new(TSPoint point) {
  PyObject *row = PyLong_FromSize_t((size_t)point.row);
  PyObject *column = PyLong_FromSize_t((size_t)point.column);
  if (!row || !column) {
    Py_XDECREF(row);
    Py_XDECREF(column);
    return NULL;
  }
  return PyTuple_Pack(2, row, column);
}

PyTuple_Pack will add a reference on its arguments:

PyObject *
PyTuple_Pack(Py_ssize_t n, ...)
{
    Py_ssize_t i;
    PyObject *o;
    PyObject **items;
    va_list vargs;

    if (n == 0) {
        return tuple_get_empty();
    }

    va_start(vargs, n);
    PyTupleObject *result = tuple_alloc(n);
    if (result == NULL) {
        va_end(vargs);
        return NULL;
    }
    items = result->ob_item;
    for (i = 0; i < n; i++) {
        o = va_arg(vargs, PyObject *);
        Py_INCREF(o);
        items[i] = o;
    }
    va_end(vargs);
    tuple_gc_track(result);
    return (PyObject *)result;
}

So after I change point_new to following code, our product's memory usage is stable:

static PyObject *point_new(TSPoint point)
{
  PyObject *row = PyLong_FromSize_t((size_t)point.row);
  PyObject *column = PyLong_FromSize_t((size_t)point.column);
  if (!row || !column)
  {
    Py_XDECREF(row);
    Py_XDECREF(column);
    return NULL;
  }

  PyObject *obj = PyTuple_Pack(2, row, column);
  Py_XDECREF(row);
  Py_XDECREF(column);
  return obj;
}

I'm not very familiar with the python c extension, so not very sure if it's a real memory leak.

unusual bug

Using python3.6 (in a clean conda env) + tree-sitter==0.0.7 in WSL, running below

from tree_sitter import Language, Parser

Language.build_library("build/my-languages.so", ["vendor/tree-sitter-python"])

PY_LANGUAGE = Language("build/my-languages.so", "python")

parser = Parser()
parser.set_language(PY_LANGUAGE)

tree = parser.parse(
    bytes(
        """
from mymodule import f
a = f()
""",
        "utf8",
    )
)

cursor = tree.walk()

print(cursor.node.sexp())

print(dir(cursor))
print(cursor.node.type)

print(cursor.goto_first_child())
print(cursor.node.type)

print(cursor.goto_first_child())
print(cursor.node.type)
print(dir(cursor.node)) # X 
# print(cursor.node.is_named) #  Y
print(cursor.node.sexp())
print(cursor.node)

# Returns `False` because the `def` node has no children
print(not cursor.goto_first_child())

print(cursor.goto_next_sibling())
print(cursor.node.type)

print(cursor.goto_next_sibling())
print(cursor.node.type)

gives

...
Traceback (most recent call last):
  File "bug.py", line 34, in <module>
    print(cursor.node.sexp())
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9d in position 1: invalid start byte

Can you also see this behaviour?

If you comment line ... # X or uncomment line ... # Y, there's no error. I know nothing about tree-sitter, and was just messing around (trying to see if tree-sitter follows imports, which it doesn't seem to) ... but I figure you'd want to know. My suspicion is that it's something related to the C binding ... but it's a pretty weird bug (including e.g. if you delete the remainder of the code after where it errors, it stops erroring ... which really baffles me unless this isn't synchronous?)

FYI, error is from PyUnicode_FromString from here - whatever *string is isn't utf8.

undefined symbol: _ZSt20__throw_length_errorPKc

After buidling the library, I ran

 PY_LANGUAGE = Language('test_tree_sitter.so', 'python')     

and got the following error:

OSError                                   Traceback (most recent call last)
<ipython-input-4-da3a6dbada56> in <module>
----> 1 PY_LANGUAGE = Language('test_tree_sitter.so', 'python')

~/.local/lib/python3.7/site-packages/tree_sitter/__init__.py in __init__(self, library_path, name)
     79         """
     80         self.name = name
---> 81         self.lib = cdll.LoadLibrary(library_path)
     82         language_function = getattr(self.lib, "tree_sitter_%s" % name)
     83         language_function.restype = c_void_p

/usr/lib/python3.7/ctypes/__init__.py in LoadLibrary(self, name)
    432 
    433     def LoadLibrary(self, name):
--> 434         return self._dlltype(name)
    435 
    436 cdll = LibraryLoader(CDLL)

/usr/lib/python3.7/ctypes/__init__.py in __init__(self, name, mode, handle, use_errno, use_last_error)
    354 
    355         if handle is None:
--> 356             self._handle = _dlopen(self._name, mode)
    357         else:
    358             self._handle = handle

OSError: test_tree_sitter.so: undefined symbol: _ZSt20__throw_length_errorPKc

Edit the AST and apply the changes back to text

For example, i have this piece of code:
// this is a comment fn main() { /* this is another comment */ }

I traverse the AST to remove all of the comments, which expect to output this code:
fn main() { }
The question is that I have the edited version of AST, but not sure how to convert the edited AST back to text ? What is the function to call?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.