Giter Site home page Giter Site logo

smacker / go-tree-sitter Goto Github PK

View Code? Open in Web Editor NEW
361.0 12.0 96.0 39.88 MB

Golang bindings for tree-sitter https://github.com/tree-sitter/tree-sitter

License: MIT License

C 99.92% Go 0.06% Makefile 0.01% C++ 0.03% Shell 0.01% JavaScript 0.01%
tree-sitter binding golang golang-bindings syntax-tree

go-tree-sitter's Introduction

go tree-sitter

Build Status GoDoc

Golang bindings for tree-sitter

Usage

Create a parser with a grammar:

import (
	"context"
	"fmt"

	sitter "github.com/smacker/go-tree-sitter"
	"github.com/smacker/go-tree-sitter/javascript"
)

parser := sitter.NewParser()
parser.SetLanguage(javascript.GetLanguage())

Parse some code:

sourceCode := []byte("let a = 1")
tree, _ := parser.ParseCtx(context.Background(), nil, sourceCode)

Inspect the syntax tree:

n := tree.RootNode()

fmt.Println(n) // (program (lexical_declaration (variable_declarator (identifier) (number))))

child := n.NamedChild(0)
fmt.Println(child.Type()) // lexical_declaration
fmt.Println(child.StartByte()) // 0
fmt.Println(child.EndByte()) // 9

Custom grammars

This repository provides grammars for many common languages out of the box.

But if you need support for any other language you can keep it inside your own project or publish it as a separate repository to share with the community.

See explanation on how to create a grammar for go-tree-sitter here.

Known external grammars:

Editing

If your source code changes, you can update the syntax tree. This will take less time than the first parse.

// change 1 -> true
newText := []byte("let a = true")
tree.Edit(sitter.EditInput{
    StartIndex:  8,
    OldEndIndex: 9,
    NewEndIndex: 12,
    StartPoint: sitter.Point{
        Row:    0,
        Column: 8,
    },
    OldEndPoint: sitter.Point{
        Row:    0,
        Column: 9,
    },
    NewEndPoint: sitter.Point{
        Row:    0,
        Column: 12,
    },
})

// check that it changed tree
assert.True(n.HasChanges())
assert.True(n.Child(0).HasChanges())
assert.False(n.Child(0).Child(0).HasChanges()) // left side of the tree didn't change
assert.True(n.Child(0).Child(1).HasChanges())

// generate new tree
newTree := parser.Parse(tree, newText)

Predicates

You can filter AST by using predicate S-expressions.

Similar to Rust or WebAssembly bindings we support filtering on a few common predicates:

  • eq?, not-eq?
  • match?, not-match?

Usage example:

func main() {
	// Javascript code
	sourceCode := []byte(`
		const camelCaseConst = 1;
		const SCREAMING_SNAKE_CASE_CONST = 2;
		const lower_snake_case_const = 3;`)
	// Query with predicates
	screamingSnakeCasePattern := `(
		(identifier) @constant
		(#match? @constant "^[A-Z][A-Z_]+")
	)`

	// Parse source code
	lang := javascript.GetLanguage()
	n, _ := sitter.ParseCtx(context.Background(), sourceCode, lang)
	// Execute the query
	q, _ := sitter.NewQuery([]byte(screamingSnakeCasePattern), lang)
	qc := sitter.NewQueryCursor()
	qc.Exec(q, n)
	// Iterate over query results
	for {
		m, ok := qc.NextMatch()
		if !ok {
			break
		}
		// Apply predicates filtering
		m = qc.FilterPredicates(m, sourceCode)
		for _, c := range m.Captures {
			fmt.Println(c.Node.Content(sourceCode))
		}
	}
}

// Output of this program:
// SCREAMING_SNAKE_CASE_CONST

Development

Updating a grammar

Check if any updates for vendored files are available:

go run _automation/main.go check-updates

Update vendor files:

  • open _automation/grammars.json
  • modify reference (for tagged grammars) or revision (for grammars from a branch)
  • run go run _automation/main.go update <grammar-name>

It is also possible to update all grammars in one go using

go run _automation/main.go update-all

go-tree-sitter's People

Contributors

adonovan avatar ahumenberger avatar broofa avatar dennwc avatar didroe avatar erizocosmico avatar ethframe avatar glkz avatar grouville avatar himujjal avatar hinshun avatar jamesnicolas avatar jbedard avatar jfontan avatar jochil avatar leonero avatar liricooli avatar look avatar mcuadros avatar mickgmdb avatar micksmix avatar milas avatar p-e-w avatar pkuebler avatar sam-ulrich1 avatar shagabutdinov avatar sluongng avatar smacker avatar wesen avatar yuraaka avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

go-tree-sitter's Issues

Can not `go get` the package

I can not install tree sitter using go get:

$ go get github.com/smacker/go-tree-sitter
# github.com/smacker/go-tree-sitter
In file included from ../go/src/github.com/smacker/go-tree-sitter/bindings.go:5:
./bindings.h:4:10: fatal error: tree_sitter/runtime.h: No such file or directory
 #include "tree_sitter/runtime.h"
          ^~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.

I've cloned tree sitter "include" folder to "/usr/include" and it found "runtime.h" but then I got following:

$ go get github.com/smacker/go-tree-sitter
# github.com/smacker/go-tree-sitter
gcc: error: ../go/src/github.com/smacker/go-tree-sitter/tree-sitter/out/Release/libruntime.a: No such file or directory
# github.com/smacker/go-tree-sitter
bindings.c:5:13: warning: conflicting types for built-in function ‘log’ [-Wbuiltin-declaration-mismatch]
 static void log(void *payload, TSLogType type, const char *msg)

Can you please, help me with the installation?

parse error when return type is generic

func Abcd[T DataType](result *BaseFileResult[T]) []T {
	return nil
}

image

It reports some errors. Looks like it caused by return value []T

using latest version:
github.com/smacker/go-tree-sitter v0.0.0-20221023091341-2009a4db91e4

Update to tree-sitter v0.15

  • update Makefile to be able to build grammars with newer runtime
  • Add support for ts_node_child_by_field_name function

Go modules support

go build runs go get which is fails as there is no c code when this project is used as a dependency.

go mod error

This is the error Im getting upon running go mod tidy:

go: found github.com/smacker/go-tree-sitter/javascript in github.com/smacker/go-tree-sitter/javascript v0.0.1
xd imports
        github.com/smacker/go-tree-sitter/javascript: ambiguous import: found package github.com/smacker/go-tree-sitter/javascript in multiple modules:
        github.com/smacker/go-tree-sitter v0.0.0-20230720070738-0d0a9f78d8f8 (/drive/go/pkg/mod/github.com/smacker/[email protected]/javascript)
        github.com/smacker/go-tree-sitter/javascript v0.0.1 (/drive/go/pkg/mod/github.com/smacker/go-tree-sitter/[email protected])

NextCapture return inaccurate index

I have a query with multiple captures like this

(import_statement
	source: (string (string_fragment) @deps)
)

(call_expression
	function: (import)
	arguments: (arguments
		(string (string_fragment) @dynamic-deps)
	)
)

[
  "async"
  "await"
  ; '...'
  (spread_element)
  ; import * as blah from
  (import_statement
    (import_clause 
      (namespace_import)
    )
  )
] @need-tslib

Then I would use it to query over a .tsx file as follow

		for {
			cap, idx, ok := qc.NextCapture()
			if !ok {
				break
			}

			name := q.CaptureNameForId(idx)
			switch name {
			case "deps":
				for _, c := range cap.Captures {
					i := c.Node.Content(data)
					fmt.Println("DEBUG", fileName, name, i)
				}
			case "dynamic-deps":
				for _, c := range cap.Captures {
					i := c.Node.Content(data)
					fmt.Println("DEBUG", fileName, name, i)
				}
			case "need-tslib":
			default:
				log.Fatalf("Unexpected capture name %s", name)
			}
		}

The expected result is that I would eventually get all 3 capture groups.

Actual result is

DEBUG file.ts deps rxjs
DEBUG file.ts deps ../router/router
DEBUG file.ts deps ./user
DEBUG file.ts deps async
DEBUG file.ts deps await
DEBUG file.ts deps async
DEBUG file.ts deps async

Using the same query on tree-sitter-cli yielded expected result

(v0.20.1) ~/work/misc/tree-sitter-typescript/tsx> tree-sitter -V
tree-sitter 0.20.7

(v0.20.1) ~/work/misc/tree-sitter-typescript/tsx> tree-sitter query typescript.scm file.ts
file.ts
  pattern: 0
    capture: 0 - deps, start: (0, 25), end: (0, 29), text: `rxjs`
  pattern: 0
    capture: 0 - deps, start: (8, 20), end: (8, 36), text: `../router/router`
  pattern: 0
    capture: 0 - deps, start: (9, 22), end: (9, 28), text: `./user`
  pattern: 2
    capture: 2 - need-tslib, start: (178, 2), end: (178, 7), text: `async`
  pattern: 2
    capture: 2 - need-tslib, start: (192, 6), end: (192, 11), text: `await`
  pattern: 2
    capture: 2 - need-tslib, start: (198, 2), end: (198, 7), text: `async`
  pattern: 2
    capture: 2 - need-tslib, start: (203, 2), end: (203, 7), text: `async`

Question: cannot parse without language

I apologize if this isn't the right place to ask this question. If there is somewhere more appropriate this should go please let me know.

I created a new tree-sitter-poweron parser for a language used by my company. It works as expected in neovim.
When I add the required files to this repo and attempt to test, I'm getting a "cannot parse without language" error. I hoping someone can point me in the right direction for this. If I remove the "externals" from the tree-sitter grammar and add it without the scanner.c, the test passes but of course I need the external scanner to properly match a few tokens.

Updates are here:
https://github.com/phileagleson/go-tree-sitter/tree/tree-sitter-poweron

and the tree-sitter-poweron repo is here:
https://github.com/phileagleson/tree-sitter-poweron

Please let me know if there is any additional information that would be helpful

Handle predicates

When playing with the queries of go-tree-sitter, I realized (not 100% sure, but 95%) that it doesn't handle predicates such as:

(
  (identifier) @constant
  (#match? @constant "^[A-Z][A-Z_]+")
)

It gets matched, but not perfectly: there are duplicates, and the constraint of the match doesn't get applied.

According to the docs:

Note - Predicates are not handled directly by the Tree-sitter C library. They are just exposed in a structured form so that higher-level code can perform the filtering. However, higher-level bindings to Tree-sitter like the Rust crate or the WebAssembly binding implement a few common predicates like #eq? and #match?.

As this binding is relying on the C libraries (from what I understand), there's a great chance that this is not implemented at all.
I don't know where to start, if it's hard or not. But this would be an awesome addition, and would love to help on that 😇

PS: tested on the cue binding

Unable to build for windows

When I try to build this library with windows as a target I'm getting several errors in iter.go:

➜ GOARCH=amd64 GOOS=windows go build .
 github.com/smacker/go-tree-sitter
./iter.go:17:18: undefined: Node
./iter.go:21:21: undefined: Node
./iter.go:25:20: undefined: Node
./iter.go:30:26: undefined: Node
./iter.go:34:20: undefined: Node
./iter.go:38:32: undefined: Node
./iter.go:43:9: undefined: Node
./iter.go:46:18: undefined: Node
./iter.go:68:40: undefined: Node

When building for linux, the build succeeds as expected. I can see 'Node' is defined in bindings.go and both iter.go and bindings.go are part of the sitter package so I''m not sure why this error occurs.

Python 2/3

Would you consider replace current python (which afaik is a parser for python2) by python3 or at least add both?

Query finalizer called too early / segfault when reading results

The finalizer for *Query seems to be called too early, sometimes causing a segfault when - for example - calling *QueryCursor.NextMatch.

Here's some code that reproduces the segfault:

package main

import (
	"bytes"

	sitter "github.com/smacker/go-tree-sitter"
	"github.com/smacker/go-tree-sitter/javascript"
)

func main() {

	source := bytes.Repeat([]byte("document.location = 'https://example.com';"), 10000)

	parser := sitter.NewParser()
	parser.SetLanguage(javascript.GetLanguage())

	tree := parser.Parse(nil, source)
	root := tree.RootNode()

	for i := 0; i < 100; i++ {
		query, _ := sitter.NewQuery(
			[]byte("(assignment_expression) @match"),
			javascript.GetLanguage(),
		)

		qc := sitter.NewQueryCursor()

		qc.Exec(query, root)

		for {
			// This call to qc.NextMatch() will sometimes cause a segfault
			match, exists := qc.NextMatch()
			if !exists || match == nil {
				break
			}
		}
	}
}

And the output:

tom@work:~/src/github.com/bf-tomnomnom/tree-sitter-crash▶ go build && ./tree-sitter-crash
fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x2 addr=0x8c0 pc=0x10464af08]

runtime stack:
runtime.throw({0x104662c9b?, 0x0?})
	/usr/local/go/src/runtime/panic.go:992 +0x50
runtime.sigpanic()
	/usr/local/go/src/runtime/signal_unix.go:802 +0x1e8

goroutine 1 [syscall]:
runtime.cgocall(0x10463b62c, 0x14000056d38)
	/usr/local/go/src/runtime/cgocall.go:157 +0x54 fp=0x14000056d00 sp=0x14000056cc0 pc=0x1045b08e4
github.com/smacker/go-tree-sitter._Cfunc_ts_query_cursor_next_match(0x600001c48090, 0x14001849a10)
	_cgo_gotypes.go:1021 +0x3c fp=0x14000056d30 sp=0x14000056d00 pc=0x104637c3c
github.com/smacker/go-tree-sitter.(*QueryCursor).NextMatch.func1(0x10468eca0?, 0x0?)
	/Users/tomhudson/pkg/mod/github.com/smacker/[email protected]/bindings.go:886 +0x90 fp=0x14000056d80 sp=0x14000056d30 pc=0x104639c60
github.com/smacker/go-tree-sitter.(*QueryCursor).NextMatch(0x1400000e078)
	/Users/tomhudson/pkg/mod/github.com/smacker/[email protected]/bindings.go:886 +0x4c fp=0x14000056ea0 sp=0x14000056d80 pc=0x10463988c
main.main()
	/Users/tomhudson/src/github.com/bf-tomnomnom/tree-sitter-crash/main.go:32 +0x214 fp=0x14000056f70 sp=0x14000056ea0 pc=0x10463a6e4
runtime.main()
	/usr/local/go/src/runtime/proc.go:250 +0x250 fp=0x14000056fd0 sp=0x14000056f70 pc=0x1045df020
runtime.goexit()
	/usr/local/go/src/runtime/asm_arm64.s:1259 +0x4 fp=0x14000056fd0 sp=0x14000056fd0 pc=0x1046090c4

I was able to figure out that if the setting of *Query's finalizer here is commented out and the code is re-built, then it runs to completion without crashing.

I don't have a great deal of experience with Cgo so it's a little beyond me to figure out exactly why this is happening and how to prevent it, sorry!

Cannot build library with MacOS

Description

I'm building a program using my Mac and Go and I cannot get the library to compile.

Error:

$ go run main.go
# github.com/smacker/go-tree-sitter/lua
parser.c:235:18: warning: null character(s) preserved in string literal [-Wnull-character]
# command-line-arguments
/opt/homebrew/Cellar/go/1.21.3/libexec/pkg/tool/darwin_arm64/link: running c++ failed: exit status 1
0  0x1007e0380  __assert_rtn + 72
1  0x10073ab30  mach_o::PointerFormat_DYLD_CHAINED_PTR_64_OFFSET::unauthRebaseIsVmAddr() const + 0
2  0x10073bcdc  ___ZN6mach_o13ChainedFixups11buildFixupsENSt3__14spanIKNS_5Fixup10BindTargetELm18446744073709551615EEENS2_IKNS0_17SegmentFixupsInfoELm18446744073709551615EEEyRKNS0_13PointerFormatEjb_block_invoke_2 + 212
3  0x18bebb950  _dispatch_client_callout2 + 20
4  0x18beceba0  _dispatch_apply_invoke + 176
5  0x18bebb910  _dispatch_client_callout + 20
6  0x18becd3cc  _dispatch_root_queue_drain + 864
7  0x18becda04  _dispatch_worker_thread2 + 156
8  0x18c0690d8  _pthread_wqthread + 228
ld: Assertion failed: (rebasePtr->target == low56), function writeChainEntry, file ChainedFixups.cpp, line 1218.
clang: error: linker command failed with exit code 1 (use -v to see invocation)

Steps to reproduce

  1. Create a go module on Mac
  2. go get github.com/smacker/go-tree-sitter
  3. Use the library somehow and go run main.go

Environment

OS

MacBook M2 16GB Sonoma 14.2
Xcode 15.0
Build version 15A5195m

C compiler

Homebrew clang version 17.0.4
Target: arm64-apple-darwin23.2.0
Thread model: posix
InstalledDir: /opt/homebrew/opt/llvm/bin

Go

go version go1.21.3 darwin/arm64
GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/fuleco/Library/Caches/go-build'
GOENV='/Users/fuleco/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/fuleco/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='darwin'
GOPATH='/Users/fuleco/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/opt/homebrew/Cellar/go/1.21.3/libexec'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/opt/homebrew/Cellar/go/1.21.3/libexec/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.21.3'
GCCGO='gccgo'
AR='ar'
CC='cc'
CXX='c++'
CGO_ENABLED='1'
GOMOD='/Users/fuleco/Documents/Dev/panoptes/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/xp/f32vy1qx47l8wlb02kf574t40000gn/T/go-build3951625660=/tmp/go-build -gno-record-gcc-switches -fno-common'

This needs a proper reference to where I got most of the code. Sorry, I thought I had done proper attribution!

          This needs a proper reference to where I got most of the code. Sorry, I thought I had done proper attribution!

Most of the source changes came from https://github.com/klothoplatform/go-tree-sitter but without the changes to the struct so that it could be merged. I'll have to add proper attribution in the source before a merge would be appropriate

Originally posted by @sam-ulrich1 in #109 (comment)

Grammar update script not working

Any attempt to update a grammar without modifying grammars.json manually results in 404 error:

downloading language=python reference=v0.20.4
incorrect response status code reference=v0.20.4 url=https://raw.githubusercontent.com/tree-sitter/tree-sitter-python/5083d4e4a29d8672b413f19dcdcf0a016eb975d6/src/tree_sitter/parser.h language=python 0=statusCode 1=404

The issue applies to all grammars. It looks like git ls-remote returns some other hashes, not pullable from https://raw.githubusercontent.com.

How important is it to call Close()

I see the exposed API included Close() in almost all objects. But most documented examples and tests dont use Close() at all.

So how important is it to call Close() when using this lib? 🤔

multiple predicates in query results in query compilation error

using the following query:

(import_statement
	source: (string) @import)
(
	(call_expression
		function: (identifier) @function
		arguments: (arguments ((_) @other-imports)? ((_) @import) .))
	(#eq? @other-imports "")
	(#eq? @function "require"))

results in the following error:

wrong number of arguments to `#eq?` predicate. Expected 2, got 6 

i've tested this using the tree-sitter cli with the JS language and it works. note that the issue appears only when both eq? predicates are present in the 2nd pattern.

Who is using this binding?

Not sure if this is interested to this project, so please feel free to close the issue right away. Since "Discussions" are not enabled for this repository, i though I'll add an issue.

We are now officially using go-tree-sitter for Symflower for Java. Here is an introduction blog post on what we are roughly doing https://symflower.com/en/company/blog/2023/parsing-code-with-tree-sitter/ (more to come). Took us a while to get there but we think that this is not just a major step up in performance (and general resource usage), but also a step up in functionality that is yet to come. We are very thankful for this project and all people involved.

As i side note: there is source code that we cannot parse anymore due to problems with either treesitter itself or the Java grammar. We are on our way of fixing those as we go along, and we are also committed to fixing problems in go-tree-sitter. Hence, if something pops up, we are looking forward to discuss solutions and implementing them, and generally improving the project where we can.

Cheers,
@ahumenberger and @zimmski

Type of idx parameter in Child() and NamedChild() methods.

Child() and NamedChiled() parameters are currently int. Shouldn't they be of type uint32?

// Child returns the node's child at the given index, where zero represents the first child.
func (n Node) Child(idx int) *Node {
	nn := C.ts_node_child(n.c, C.uint32_t(idx))
	return n.t.cachedNode(nn)
}

// NamedChild returns the node's *named* child at the given index.
func (n Node) NamedChild(idx int) *Node {
	nn := C.ts_node_named_child(n.c, C.uint32_t(idx))
	return n.t.cachedNode(nn)
}

Adding predicates clause(s) *increases* the number of results

My apologies if this isn't the right place for this. I'm trying to wrap my head around how the tree-sitter project works, and how libs like go-tree-sitter are / aren't expected to add predicate support.

Specifically: Is it expected that adding predicates to a query may increase the number of returned Match results?

In codepen#10 I have a test that runs a few variations of a simple comment query against the same 3-comment input document. I would expect to get 3 results each time since I'm specifically NOT calling FilterPredicates(). What I see, however, is the following:

query Match count
(expression (comment) @foo) 3
(expression (comment) @foo (#match? @foo "^// the") ) 3
(expression (comment) @foo) (#match? @foo "^// the") 8
(expression (comment) @foo) (#match? @foo "^// the") (#match? @foo "water$") 13

Specific questions

  • Is this behavior expected?
  • Is it something tree-sitter does?
  • ... or is this some how caused by the golang bindings that go-tree-sitter provides

I'm asking because the predicate support added in #83 seems... confusing. And also incomplete. (See also #92 and #101). I'm contemplating putting up a PR that reworks this code somewhat, but I'd like to get a sense from you (@smacker) if that's something you'd be open to, and get your thoughts on how you'd like to see that (re)implemented.

ld: library not found for -lcrt0.o for python

After import "github.com/smacker/go-tree-sitter/python", I cannot compile executable binary file for windows and linux on OSX.
for linux

CGO_ENABLED=1 CC="x86_64-linux-musl-gcc" GOOS=linux GOARCH=amd64 CGO_LDFLAGS="-static" go build -trimpath -ldflags "-s -w -linkmode=external"

for windows

CGO_ENABLED=1 CC="x86_64-w64-mingw32-gcc" GOOS=windows GOARCH=amd64 CGO_LDFLAGS="-static" go build -buildmode=pie -trimpath -ldflags "-s -w"

There is no problem if I just import java or javascript cross-compilation

How does tree.Edit work for multiple changes?

From this example in README it is clear how to reparse one change:

...
// from input = []byte("let a = 1") to
// change 1 -> true
newText := []byte("let a = true")
tree.Edit(sitter.EditInput{
   ...
})

// generate new tree
newTree := parser.Parse(tree, newText)

But what if I had for example input := []byte("let a = 1; let c = 3;") and my newInput would be newInput := []byte("let a = 1; let b = 2; let c = 3; let d = 4;"), how would I construct sitter.EditInput then?

Also, is there a way to insert let b = 2; without having the whole input let a = 1; let b = 2; let c = 3; apart from manipulating byte slices directly?

Thank you!

Undefined symbols for architecture x86_64: "_ts_node_child", referenced from: __cgo_d6ef877dfa14_Cfunc_ts_node_child

On OSX Mojava I'm getting the following errors. The make phase completes successfully but go install does not. Any ideas?

[aat@cavern:~/.go/src/github.com/smacker/go-tree-sitter]go install
# github.com/smacker/go-tree-sitter
Undefined symbols for architecture x86_64:
  "_ts_node_child", referenced from:
      __cgo_d6ef877dfa14_Cfunc_ts_node_child in _x002.o
     (maybe you meant: __cgo_d6ef877dfa14_Cfunc_ts_node_child_count, __cgo_d6ef877dfa14_Cfunc_ts_node_child )
  "_ts_node_child_count", referenced from:
      __cgo_d6ef877dfa14_Cfunc_ts_node_child_count in _x002.o
     (maybe you meant: __cgo_d6ef877dfa14_Cfunc_ts_node_child_count)
  "_ts_node_end_byte", referenced from:
      __cgo_d6ef877dfa14_Cfunc_ts_node_end_byte in _x002.o
     (maybe you meant: __cgo_d6ef877dfa14_Cfunc_ts_node_end_byte)
  "_ts_node_has_changes", referenced from:
      __cgo_d6ef877dfa14_Cfunc_ts_node_has_changes in _x002.o
     (maybe you meant: __cgo_d6ef877dfa14_Cfunc_ts_node_has_changes)
  "_ts_node_has_error", referenced from:
      __cgo_d6ef877dfa14_Cfunc_ts_node_has_error in _x002.o
     (maybe you meant: __cgo_d6ef877dfa14_Cfunc_ts_node_has_error)
...

Make YAML package Bazel Gazelle friendly

Bazel is a popular build platform and Gazelle is a tool which helps automatically generate Golang BUILD configuration files for Bazel.

When try to bootstrap go-tree-sitter with Gazelle, an issue would occur:

> bazel test @com_github_smacker_go_tree_sitter//...
...
ERROR: /private/var/tmp/_bazel_sngoc/eaf44201dcd7b10cb864471f7469fb36/external/com_github_smacker_go_tree_sitter/yaml/BUILD.bazel:3:11: GoCompilePkg external/com_github_smacker_go_tree_sitter/yaml/yaml.a failed: (Exit 1): builder failed: error executing command bazel-out/darwin_arm64-opt-exec-2B5CBBC6/bin/external/go_sdk/builder compilepkg -sdk external/go_sdk -installsuffix darwin_arm64 -src external/com_github_smacker_go_tree_sitter/yaml/binding.go -src ... (remaining 33 arguments skipped)

Use --sandbox_debug to see verbose messages from the sandbox
external/com_github_smacker_go_tree_sitter/yaml/scanner.cc:5:10: fatal error: './schema/schema.generated.cc' file not found
#include "./schema/schema.generated.cc"
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
compilepkg: error running subcommand external/local_config_cc/cc_wrapper.sh: exit status 1

Taking a closer look, the generated BUILD file for yaml package looks like this

> cat bazel-<project>/external/com_github_smacker_go_tree_sitter/yaml/BUILD.bazel
load("@io_bazel_rules_go//go:def.bzl", "go_library", "go_test")

go_library(
    name = "yaml",
    srcs = [
        "binding.go",
        "parser.c",
        "parser.h",
        "scanner.cc",
    ],
    cgo = True,
    importpath = "github.com/smacker/go-tree-sitter/yaml",
    visibility = ["//visibility:public"],
    deps = ["//:go-tree-sitter"],
)

alias(
    name = "go_default_library",
    actual = ":yaml",
    visibility = ["//visibility:public"],
)

go_test(
    name = "yaml_test",
    srcs = ["binding_test.go"],
    deps = [
        ":yaml",
        "//:go-tree-sitter",
        "@com_github_stretchr_testify//assert",
    ],
)

So the ./yaml/schema directory, because it had no go file, was not recognized by Gazelle as a package that needed to generate BUILD files for. Thus the lacking of BUILD file there means that it was not possible to define dependencies between ./yaml and ./yaml/schema package.

I suggest move ./yaml/schema/schema.generated.cc to ./yaml/schema.generated.cc to make it a bit easier for Bazel projects to adopt this repo. Let me know what you think.

Memory leaks

While testing in Micro, I noticed that parsing and getting the root node of a tree leaks a large amount of memory. Minimal example to reproduce:

package main

import (
	"fmt"
	"sync"
	sitter "github.com/smacker/go-tree-sitter"
	"github.com/smacker/go-tree-sitter/javascript"
)

func main() {
	runTest()
	fmt.Println("done.")

	// Wait forever
	var lock sync.Mutex
	lock.Lock()
	lock.Lock()
}

func runTest() {
	parser := sitter.NewParser()
	parser.SetLanguage(javascript.GetLanguage())
	sourceCode := []byte("let a = 1")

	for i := 0; i < 500000; i++ {
		// Leaks 115 Megabytes
		tree := parser.Parse(sourceCode)
		// Leaks 525 Megabytes
		tree.RootNode()
	}
}

After printing done, runTest has returned and all variables related to tree-sitter are out of scope and their objects will be garbage collected. However, a whopping 640 Megabytes of memory is never freed, more than 1 kB per iteration despite the source code being only 9 bytes.

When the source code is larger, the amount of memory leaked per parse can exceed 1 MB. In Micro, I have seen more than 100 MB leak during one minute of normal use.

query result inconsistency between cli and lib

given the following query with the JS language:

(import_statement
	source: (string [
		("'" . (_) @import . "'")
		("\"" . (_) @import . "\"")]))

(
	(call_expression
		function: (identifier) @function
		arguments:
			(arguments
				((_) @other-imports)?
				((string [
					("'" . (_) @import . "'")
					("\"" . (_) @import . "\"")
				]))
				.))
	(#eq? @function "require"))

i get 0 results due to the string destructuring pattern. for example, changing the query to be:

(import_statement
	source: (string) @import)

(
	(call_expression
		function: (identifier) @function
		arguments: (arguments ((_) @other-imports)? ((string) @import) .))
	(#eq? @function "require"))

results in 6 results given this input:

import * as fs from 'fs';
import 'assert';
import { isAbsolute } from 'path';
import nan from "nan";
const foo = require('buffer');
const foo2 = require("console");
require("cluster", "crypto");
require('async_hooks', 'constants');

foo('bar');

fs.doThing();

process.exit(1);

the first query works with the tree-sitter CLI

Predicates don't behave the same way as tree-sitter cli

For example:

❯ tree-sitter query /dev/stdin contexts/test.go
      (binary_expression
         left: (_) @left
         right: (_) @right)
contexts/test.go
  pattern: 0
    capture: 0 - left, start: (10, 8), end: (10, 9), text: `a`
    capture: 1 - right, start: (10, 12), end: (10, 13), text: `1`
  pattern: 0
    capture: 0 - left, start: (20, 13), end: (20, 14), text: `i`
    capture: 1 - right, start: (20, 17), end: (20, 19), text: `10`
  pattern: 0
    capture: 0 - left, start: (23, 4), end: (23, 5), text: `a`
    capture: 1 - right, start: (23, 8), end: (23, 10), text: `10`

I have three binary expression in the test file.

Running the same query with a predicate returns both captures:

❯ tree-sitter query /dev/stdin contexts/test.go 
      (binary_expression
         left: (_) @left
         right: (_) @right
       (#eq? @right 1))
contexts/test.go
  pattern: 0
    capture: 0 - left, start: (10, 8), end: (10, 9), text: `a`
    capture: 1 - right, start: (10, 12), end: (10, 13), text: `1`

However, with go-tree-sitter, only the @right capture is returned, the @Left having been dropped because it does not appear in any predicate.

The following test added to predicates_test.go fails because it only returns one capture:

		{
			input: `1234 + 4321`,
			query: `((sum
  left: (expression (number) @left)
  right: (expression (number) @right))
  (#eq? @left 1234))`,
			expectedBefore: 2,
			expectedAfter:  2,
		},

Go printing "verbs" highlight group.

Currently, the only substring that gets a group is escape_sequence. Go has a collection of "verbs" you can see in the fmt package that should be highlighted as well in their own like print_verb.

It's very difficult to disambiguate placeholders in a string without syntax highlighting.
Here the escape rune is easy to find, but not the print verb.
image

Add go.mod grammar

I think a pretty nice addition to the list of grammars supported by default would be tree-sitter-go-mod to parse the contents of go.mod files.

My use case is I'm using tree sitter to parse and analyze some Go code, and as part of that, some things I want to implement require me to read go.mod .Would be nice to use tree sitter for this.

FieldNameForChild returns false results

When selecting ruby language and when doing this query

log.info($ARGUMENTS)

If we take the tree we will get this tree:

[program](https://tree-sitter.github.io/tree-sitter/playground#) [0, 0] - [1, 0]
  [call](https://tree-sitter.github.io/tree-sitter/playground#) [0, 0] - [0, 20]
    receiver: [identifier](https://tree-sitter.github.io/tree-sitter/playground#) [0, 0] - [0, 3]
    method: [identifier](https://tree-sitter.github.io/tree-sitter/playground#) [0, 4] - [0, 8]
    arguments: [argument_list](https://tree-sitter.github.io/tree-sitter/playground#) [0, 8] - [0, 20]
      [global_variable](https://tree-sitter.github.io/tree-sitter/playground#) [0, 9] - [0, 19]

As we can see there are 3 field names receiver, method and arguments

Now iterating or namedChildCount of call node with following code

for j := 0; j < int(node.NamedChildCount()); j++ {
	log.Println("it is "+ strconv.Itoa(j), node.FieldNameForChild(j))
}

we get results

it is 0 method
it is 1 arguments
it is 2

As we can see order is messed up (method should be on index 1, on index 0 it should be receiver)
field name receiver is also missing.

Upstreaming generation of bindings

Hi! Very cool project - I have to say that it's nice to see Go getting support for tree-sitter. I'm a member of the upstream tree-sitter org, and I'd like to include go bindings via the cli when creating/generating a grammar, though not by default. I plan to add C, Swift, and Go, and Swift/Go can be enabled in package.json with a "bindings" field or so (still working it out atm).

Would you be okay with me upstreaming this behavior? Obviously I'd be basically copying your binding.go/binding_test.go files pretty much.

Thanks!
Amaan

go get takes too much time

There are 2 problems:

  1. Submodules:

go get does git submodule update --recursive internally.
tree-sitter has many submodules inside that aren't needed, also go grammar downloads go&moby repositories because of examples: https://github.com/tree-sitter/tree-sitter-go/tree/v0.13.3/examples and ruby downloads ruby_spec repo.

as the solution, we can go back to git clone inside Makefile as it was before. Though management of vendored dependencies will be more complex. (need to hard-code sha1s in makefile)

  1. Some dependencies are huge:

for example, tree-sitter-ruby is 281921kb.

as the solution, we can vendor not a git repository but source code (1-2 files per grammar). The disadvantage of it is similar to prev problem -> deps management. It also can make this repo too huge in the future (source code for ruby grammar is ~23mb)

An alternative approach would be to download only necessary files from github by http.

LICENSE?

Can you clarify the license of the project?

Update javascript grammer

Hi,

I saw that javascript has 2 new version that start with rust-.
Since they not start with v
it not updated.
I tried to open PR with reference: master. but i fail to push to origin since i don't have permission.

So, can you grant me permission? or maybe update the reference yourself?

thanks!

bump to 20.7

Please bump to 20.7. It generates ABI version 14 which is not compatible with this binding.
thanks

suppress compilation warning (C#)

Please add following line to the file csharp/parser.c to suppress compilation warning (parser.c:527:32: warning: trigraph ignored [-Wtrigraphs]):

#pragma GCC diagnostic ignored "-Wtrigraphs"

Vendor "node-types.json" and "grammar.json"

We are creating an "AST generator" generator, which generates AST node types from the tree-sitter node-types.json file (as in https://github.com/tree-sitter/tree-sitter-java/tree/master/src). For this it would be great to have the node-types.json file next to the actual parser.c file to make sure to always have the correct version in place. And at some point we will also need the grammar.json file.

So long story short, I'd like to adapt the vendoring script such that also those JSON files are considered. Would such a change be accepted?

Question: Return string of particular node

I understand I can return start and end bytes, however how do I go about returning the string of a particular node? For example, If I wanted to return the string for every function in a tree? I may have missed this. Thanks!

Svelte Import Issue

ran into this issue when importing "github.com/smacker/go-tree-sitter/svelte":

Error: scanner.cc:2:10: fatal error: tree_sitter/parser.h: No such file or directory
2 | #include "tree_sitter/parser.h"
| ^~~~~~~~~~~~~~~~~~~~~~

Traversing the tree

Hey thanks again for creating the go bindings for this!

I'm struggling with using the C++ parser and traversing the tree. Are there any docs or examples we should link to that illustrate how best to use the bindings in this repo?

Specifically I'm wanting to traverse the tree looking for preprocessor conditionals (if, ifdef, elif, etc) and extract all the macro identifiers

Panic when parsing C++ code in tree-sitter > 0.20.0

Repo demonstrating bug here: https://github.com/micksmix/go-treesitter-crash-demo

tl;dr, I have a C++ source file that does not crash when I use go-tree-sitter/v0.0.0-20220829074436-0a7a807924f2 to parse it, but DOES crash / panic using any newer release of go-tree-sitter.

The minimal demo project above shows this, and I've copied the README here.

Usage

This demonstrates a crash in go-tree-sitter, which I suspect is due to an issue in upstream tree-sitter.

To recreate the bug:

go mod tidy
go run main.go -path ./crashers

It should reliably crash. At least it does on my M1 macbook. You should see:

[......truncated.....]
goroutine 34 [finalizer wait]:
runtime.gopark(0x10?, 0x104bed7e0?, 0x0?, 0x0?, 0x104bf5e80?)
        /opt/homebrew/Cellar/go/1.21.3/libexec/src/runtime/proc.go:398 +0xc8 fp=0x14000052580 sp=0x14000052560 pc=0x104b08f18
runtime.runfinq()
        /opt/homebrew/Cellar/go/1.21.3/libexec/src/runtime/mfinal.go:193 +0x108 fp=0x140000527d0 sp=0x14000052580 pc=0x104ae9f58
runtime.goexit()
        /opt/homebrew/Cellar/go/1.21.3/libexec/src/runtime/asm_arm64.s:1197 +0x4 fp=0x140000527d0 sp=0x140000527d0 pc=0x104b35e64
created by runtime.createfing in goroutine 1
        /opt/homebrew/Cellar/go/1.21.3/libexec/src/runtime/mfinal.go:163 +0x80

r0      0x104d90680
r1      0x104dbb200
r2      0x170268000
r3      0x0
r4      0x10
r5      0x1
r6      0x2
r7      0x0
r8      0xffffffff
r9      0x100
r10     0xffffffff
r11     0x3e
r12     0x104dbb210
r13     0x104dbb218
r14     0x0
r15     0x170000000
r16     0x170268000
r17     0x1a41ef1e0
r18     0x0
r19     0x104d90680
r20     0x104dbb200
r21     0x4
r22     0x170000000
r23     0x0
r24     0x4c
r25     0x170268000
r26     0x0
r27     0x1ff442160
r28     0x170268000
r29     0x16b32a300

To demonstrate it used to work

To see it not crash, open go.mod and un-comment the last line with comment of "does NOT crash":

// require github.com/smacker/go-tree-sitter v0.0.0-20230720070738-0d0a9f78d8f8 //crashes
// require github.com/smacker/go-tree-sitter v0.0.0-20221023091341-2009a4db91e4 //crashes... uses tree-sitter 0.20.7
require github.com/smacker/go-tree-sitter v0.0.0-20220829074436-0a7a807924f2 //does NOT crash... uses tree-sitter 0.20.0

Then re-run:

go mod tidy
go run main.go -path ./crashers

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.