Giter Site home page Giter Site logo

bblfsh / bblfshd Goto Github PK

View Code? Open in Web Editor NEW
349.0 16.0 54.0 16.74 MB

A self-hosted server for source code parsing

Home Page: https://doc.bblf.sh

License: GNU General Public License v3.0

Makefile 3.88% Go 95.03% Java 0.08% Python 0.01% Shell 0.11% Dockerfile 0.88%
server ast parser babelfish uast code-analysis

bblfshd's Introduction

bblfshd Build Status codecov license GitHub release

This repository contains bblfsh daemon (bblfshd), which includes the runtime that runs the driver in containers and the bblfshctl, a cli tool used to control the installed drivers and query the status of the daemon.

Drivers are implemented as Docker images, each having their own repository in the bblfsh organization on GitHub. For more information, see bblfsh SDK documentation.

Getting Started

See the Getting Started guide.

Quick start

This project is now part of source{d} Engine, which provides the simplest way to get started with a single command. Visit sourced.tech/engine for more information.

Rootless mode

The recommended way to run bblfshd by itself is using Docker:

docker run -d --name bblfshd \
  -p 9432:9432 \
  -v /var/lib/bblfshd:/var/lib/bblfshd \
  -v /proc:/newproc \
  --security-opt seccomp=./bblfshd-seccomp.json \
  bblfsh/bblfshd

On macOS, use this command instead to use a Docker volume:

docker run -d --name bblfshd \
  -p 9432:9432 \
  -v bblfsh-storage:/var/lib/bblfshd bblfsh/bblfshd \
  -v /proc:/newproc \
  --security-opt seccomp=./bblfshd-seccomp.json \
  bblfsh/bblfshd

To understand the flags -v /proc:/newproc and --security-opt seccomp=./bblfshd-seccomp.json, where bblfshd-seccomp.json is a file present in this repo, and check further requirements, please refer to rootless.md. bblfshd is based on container technology and interacts with the kernel at a low level. It exposes a gRPC server at the port 9432 by default which is used by the clients to interact with the server. Also, we mount the path /var/lib/bblfshd/ where all the driver images and container instances will be stored.

Privileged mode

We advise against it, but if you prefer to run bblfshd in privileged mode to skip configuration steps of rootless.md, you could do, in Linux:

docker run -d --name bblfshd --privileged -p 9432:9432 -v /var/lib/bblfshd:/var/lib/bblfshd bblfsh/bblfshd

or macOs:

docker run -d --name bblfshd --privileged -p 9432:9432 -v bblfsh-storage:/var/lib/bblfshd bblfsh/bblfshd

Install drivers

Now you need to install the driver images into the daemon, you can install the official images just running the command:

docker exec -it bblfshd bblfshctl driver install --all

You can check the installed versions by executing:

docker exec -it bblfshd bblfshctl driver list
+----------+-------------------------------+---------+--------+---------+-----+-------------+
| LANGUAGE |             IMAGE             | VERSION | STATUS | CREATED | GO  |   NATIVE    |
+----------+-------------------------------+---------+--------+---------+-----+-------------+
| python   | //bblfsh/python-driver:latest | v1.1.5  | beta   | 4 days  | 1.8 | 3.6.2       |
| java     | //bblfsh/java-driver:latest   | v1.1.0  | alpha  | 6 days  | 1.8 | 8.131.11-r2 |
+----------+-------------------------------+---------+--------+---------+-----+-------------+

To test the driver you can execute a parse request to the server with the bblfshctl parse command, and an example contained in the Docker image:

docker exec -it bblfshd bblfshctl parse /opt/bblfsh/etc/examples/python.py

SELinux

If your system has SELinux enabled (which is the default in Fedora, Red Hat, CentOS and many others) you'll need to compile and load a policy module before running the bblfshd Docker image or running driver containers will fail with a permission denied message in the logs.

To do this, run these commands from the project root:

cd selinux/
sh compile.sh
semodule -i bblfshd.pp

If you were already running an instance of bblfshd, you will need to delete the container (docker rm -f bblfshd) and run it again (docker run...).

Once the module has been loaded with semodule the change should persist even if you reboot. If you want to permanently remove this module run semodule -d bblfshd.

Alternatively, you could set SELinux to permissive module with:

echo 1 > /sys/fs/selinux/enforce

(doing this on production systems which usually have SELinux enabled by default should be strongly discouraged).

Development

If you wish to work on bblfshd , you'll first need Go installed on your machine (version 1.11+ is required) and Docker. Docker is used to build and run tests in an isolated environment.

For local development of bblfshd, first make sure Go is properly installed and that a GOPATH has been set. You will also need to add $GOPATH/bin to your $PATH.

Next, using Git, clone this repository into $GOPATH/src/github.com/bblfsh/bblfshd. All the necessary dependencies are automatically installed, so you just need to type make. This will compile the code and then run the tests. If this exits with exit status 0, then everything is working!

Dependencies

Ensure you have ostree and development libraries for ostree installed.

You can install from your distribution pack manager as follow, or built from source (more on that here).

Debian, Ubuntu, and related distributions:

$ apt-get install libostree-dev

Fedora, CentOS, RHEL, and related distributions:

$ yum install -y ostree-devel

Arch and related distributions:

$ pacman -S ostree

Building From Source

Build with:

$ make build

Running Tests

Run tests with:

$ make test

Environment variables

  • BBLFSHD_MAX_DRIVER_INSTANCES - maximal number of driver instances for each language. Default to a number of CPUs.

  • BBLFSHD_MIN_DRIVER_INSTANCES - minimal number of driver instances that will be run for each language. Default to 1.

Enable tracing

Bblfshd supports OpenTracing that can be used to profile request on a high level or trace individual requests to bblfshd and/or language drivers.

To enable it, you can use Jaeger. The easiest way is to start all-in-one Jaeger image:

docker run -d --name jaeger \
  -e COLLECTOR_ZIPKIN_HTTP_PORT=9411 \
  -p 5775:5775/udp \
  -p 6831:6831/udp \
  -p 6832:6832/udp \
  -p 5778:5778 \
  -p 16686:16686 \
  -p 14268:14268 \
  -p 9411:9411 \
  jaegertracing/all-in-one:1.8

For Docker installation of bblfshd add the following flags:

--link jaeger:jaeger -e JAEGER_AGENT_HOST=jaeger -e JAEGER_AGENT_PORT=6831 -e JAEGER_SAMPLER_TYPE=const -e JAEGER_SAMPLER_PARAM=1

For bblfshd running locally, set following environment variables:

JAEGER_AGENT_HOST=localhost JAEGER_AGENT_PORT=6831 JAEGER_SAMPLER_TYPE=const JAEGER_SAMPLER_PARAM=1

Run few requests, and check traces at http://localhost:16686.

For enabling tracing in production, consult Jaeger documentation.

License

GPLv3, see LICENSE

bblfshd's People

Contributors

abeaumont avatar alejandrosame avatar bfergerson avatar bzz avatar campoy avatar creachadair avatar david972 avatar dennwc avatar dependabot[bot] avatar dpordomingo avatar erizocosmico avatar juanjux avatar kuba-- avatar lwsanty avatar mcarmonaa avatar mcuadros avatar meyskens avatar ncordon avatar regadas avatar smola avatar vmarkovtsev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bblfshd's Issues

Logs: missing "container stopped" records

This is an example of a bblfsh server's logs under high load:

time="2017-09-11T10:12:06Z" level=info msg="parsing HomePageFragment2.java (28041 bytes)" 
time="2017-09-11T10:12:06Z" level=info msg="parsing OnlineFragment.java (6132 bytes)" 
time="2017-09-11T10:12:06Z" level=info msg="parsing PersonCenterFragment.java (134 bytes)" 
time="2017-09-11T10:12:06Z" level=info msg="parsing RankFragment.java (7577 bytes)" 
time="2017-09-11T10:12:06Z" level=info msg="container started bblfsh/java-driver:latest (01BSR6SDW513VN0ZZSS3PES6N4)" 
time="2017-09-11T10:12:06Z" level=info msg="parsing RelevantVideoFragment.java (135 bytes)" 
time="2017-09-11T10:12:06Z" level=info msg="parsing SubareaFragment.java (2345 bytes)" 
time="2017-09-11T10:12:06Z" level=info msg="parsing VideoInfoFragment.java (9443 bytes)" 
time="2017-09-11T10:12:06Z" level=info msg="container started bblfsh/java-driver:latest (01BSR6SE047BJAT9SPFDAWW33Y)" 
time="2017-09-11T10:12:06Z" level=info msg="parsing AreaItem.java (349 bytes)" 
time="2017-09-11T10:12:06Z" level=info msg="parsing BannerItem.java (708 bytes)" 
time="2017-09-11T10:12:06Z" level=info msg="parsing GameItem.java (499 bytes)" 
time="2017-09-11T10:12:06Z" level=info msg="parsing OnlineVideo.java (806 bytes)" 
time="2017-09-11T10:12:06Z" level=info msg="parsing Page.java (494 bytes)" 
time="2017-09-11T10:12:06Z" level=info msg="parsing User.java (2211 bytes)" 
time="2017-09-11T10:12:07Z" level=info msg="parsing Video.java (3521 bytes)" 
time="2017-09-11T10:12:07Z" level=info msg="parsing VideoItem.java (2590 bytes)" 
time="2017-09-11T10:12:07Z" level=info msg="container started bblfsh/java-driver:latest (01BSR6SE443RDZYB2WRGX9XZA2)" 
time="2017-09-11T10:12:07Z" level=info msg="parsing ArrayUtils.java (1166 bytes)" 
time="2017-09-11T10:12:07Z" level=info msg="parsing CompressionTools.java (2928 bytes)" 
time="2017-09-11T10:12:07Z" level=info msg="parsing Constants.java (3866 bytes)" 
time="2017-09-11T10:12:07Z" level=info msg="parsing DeviceUtils.java (7551 bytes)" 
time="2017-09-11T10:12:07Z" level=info msg="container started bblfsh/java-driver:latest (01BSR6SE8SKJ4YE9PH84NVX79P)" 
time="2017-09-11T10:12:07Z" level=info msg="parsing DownUtil.java (4623 bytes)" 
time="2017-09-11T10:12:07Z" level=info msg="container started bblfsh/java-driver:latest (01BSR6SEDFX9589EGJZA0Y7S7C)" 
time="2017-09-11T10:12:07Z" level=info msg="container started bblfsh/java-driver:latest (01BSR6SEHXZ69FWJKYBC579B59)" 
time="2017-09-11T10:12:07Z" level=info msg="parsing FileUitl.java (3662 bytes)" 
time="2017-09-11T10:12:07Z" level=info msg="parsing FileUtils.java (11263 bytes)" 
time="2017-09-11T10:12:07Z" level=info msg="parsing FractionalTouchDelegate.java (5312 bytes)" 
time="2017-09-11T10:12:08Z" level=info msg="parsing HttpDownloader.java (5836 bytes)" 
time="2017-09-11T10:12:08Z" level=info msg="parsing HttpUtil.java (15548 bytes)" 
time="2017-09-11T10:12:08Z" level=info msg="parsing ImageUtils.java (10043 bytes)" 
time="2017-09-11T10:12:08Z" level=info msg="parsing IntentHelper.java (3262 bytes)" 
time="2017-09-11T10:12:08Z" level=info msg="parsing JsoupUtil.java (833 bytes)" 
time="2017-09-11T10:12:08Z" level=info msg="parsing Logger.java (2561 bytes)" 
time="2017-09-11T10:12:08Z" level=info msg="parsing MediaUtils.java (9612 bytes)" 
time="2017-09-11T10:12:08Z" level=info msg="parsing MultiMemberGZIPInputStream.java (3315 bytes)" 
time="2017-09-11T10:12:08Z" level=info msg="parsing PreferenceUtils.java (4323 bytes)" 
time="2017-09-11T10:12:08Z" level=info msg="parsing StringUtils.java (8705 bytes)" 
time="2017-09-11T10:12:08Z" level=info msg="parsing ToastUtils.java (2011 bytes)" 
time="2017-09-11T10:12:08Z" level=info msg="parsing URLUtil.java (11974 bytes)" 
time="2017-09-11T10:12:08Z" level=info msg="parsing XmlReaderHelper.java (3897 bytes)" 
time="2017-09-11T10:12:08Z" level=info msg="container started bblfsh/java-driver:latest (01BSR6SG0XZ0FG1PV9DG4WKJM2)" 
time="2017-09-11T10:12:08Z" level=info msg="parsing ApplicationUtils.java (3928 bytes)" 
time="2017-09-11T10:12:09Z" level=info msg="parsing CircleImageView.java (7305 bytes)" 
time="2017-09-11T10:12:09Z" level=info msg="parsing CommonGestures.java (4955 bytes)" 
time="2017-09-11T10:12:09Z" level=info msg="parsing FileUtils.java (11308 bytes)" 
time="2017-09-11T10:12:09Z" level=info msg="container started bblfsh/java-driver:latest (01BSR6SG51GZQQN4R9SY4B798W)" 
time="2017-09-11T10:12:09Z" level=info msg="parsing LeftSliderLayout.java (13298 bytes)" 
time="2017-09-11T10:12:09Z" level=info msg="parsing MediaController.java (23956 bytes)" 
time="2017-09-11T10:12:09Z" level=info msg="container started bblfsh/java-driver:latest (01BSR6SG97GW4TBW4NWB7FPB94)" 
time="2017-09-11T10:12:09Z" level=info msg="parsing PlayerService.java (15043 bytes)" 
time="2017-09-11T10:12:09Z" level=info msg="container started bblfsh/java-driver:latest (01BSR6SGDF49AZFENEC93EPWYK)" 
time="2017-09-11T10:12:09Z" level=info msg="parsing PullToZoomListView.java (9210 bytes)" 
time="2017-09-11T10:12:09Z" level=info msg="container started bblfsh/java-driver:latest (01BSR6SGJ056PZ6VM9P9DB6SE3)" 
time="2017-09-11T10:12:09Z" level=info msg="container started bblfsh/java-driver:latest (01BSR6SGPG5NQZN6H0C6RBSHCY)" 
time="2017-09-11T10:12:10Z" level=info msg="parsing VP.java (2207 bytes)" 
time="2017-09-11T10:12:10Z" level=info msg="parsing VideoView.java (4721 bytes)" 
time="2017-09-11T10:12:12Z" level=info msg="parsing ScreenCaptureImageActivity.java (9608 bytes)" 
time="2017-09-11T10:12:13Z" level=info msg="parsing NotFoundException.java (546 bytes)" 
time="2017-09-11T10:12:13Z" level=info msg="parsing RestError.java (4171 bytes)" 
time="2017-09-11T10:12:13Z" level=info msg="parsing NotImplementedException.java (565 bytes)" 
time="2017-09-11T10:12:13Z" level=info msg="parsing CrudService.java (1943 bytes)" 
time="2017-09-11T10:12:13Z" level=info msg="parsing CrudServiceImpl.java (3271 bytes)" 
time="2017-09-11T10:12:13Z" level=info msg="parsing package-info.java (62 bytes)" 
time="2017-09-11T10:12:13Z" level=info msg="parsing ClassUtils.java (1725 bytes)" 
time="2017-09-11T10:12:13Z" level=info msg="parsing PostInitialize.java (811 bytes)" 
time="2017-09-11T10:12:13Z" level=info msg="parsing package-info.java (59 bytes)" 
time="2017-09-11T10:12:13Z" level=info msg="parsing PostInitializerRunner.java (7054 bytes)" 

I miss the message when containers stop/die. I assume they are written under DEBUG log level, but the level should be INFO since you report "started" messages under INFO. Thus currently it is not possible to estimate the average lifetime of a container.

Besides

I see that containers are spawned like crazy. It's been 17 hours since the torture started. Thus I have an impression that the scaling algorithm went wild. Are there any cold period tunables? How can I debug the reason why it happens? Will DEBUG log level help?

I propose to add the reason why the container is killed or restarted - e.g. a scaling decision or a panic/segfault. No need to be verbose, I want smth like

time="2017-09-11T10:12:09Z" level=info msg="container stopped bblfsh/java-driver:latest (01BSR6SG97GW4TBW4NWB7FPB94) - oops"
time="2017-09-11T10:12:09Z" level=info msg="container stopped bblfsh/java-driver:latest (01BSR6SG97GW4TBW4NWB7FPB94) - scaling"

High load concurrent queries hang

When I execute many (3000) concurrent queries (4 threads), Babelfish server either hangs or drops some requests without answering them. CPU load is 0%. After that, server becomes completely unresponsive and I have to restart it.

Server hangs on MacOs if /var/lib/bblfshd is mounted

Any ideas how to debug?

if I start bblfshd like this

docker run -d --name bblfshd --privileged -p 9432:9432 -v $(pwd)/bblfshd-data:/var/lib/bblfshd bblfsh/bblfshd

and then call "parse" method - server never respond. No matter do I do call inside or outside container docker exec -it bblfshd bblfshctl parse /opt/bblfsh/etc/examples/python.py || using dashboard.

After that I'm unable to do anything with container docker stop & docker kill just hangs forever. Only restarting all docker helps.

But if I run bblfshd without mounting

docker run -d --name bblfshd --privileged -p 9432:9432 bblfsh/bblfshd

everything works fine.

I undestand that issue description isn't very helpful, but I don't know how to collect more information.

bblfsh spawns zombie processes when it runs on ubuntu

What happened?

I worked with bblfsh on science-3 machine and found that too many zombies spawn in the system.
when I close bblfsh container they are gone away.

How can you reproduce it?

I work with bblfsh on science-3 machine.
It will be easier to reproduce it on it (In case, I can show you how) but in general, I do something like this:
run bblfsh in the usual way
docker run --privileged -p 9432:9432 --name bblfsh bblfsh/server

run another container in my case it is
docker run --rm -it -v /storage:/storage --device /dev/nvidiactl --device /dev/nvidia-uvm -v /data:/data -expose=9432 --privileged -e "LD_PRELOAD=" srcd/science bash

May be it is reproducible just in the system.

then

pip3 install git+https://github.com/bblfsh/client-python
pip3 install git+https://github.com/src-d/ast2vec@develop

and then run attached script.
entry_point.py.zip

You can check kind of zombies number with ps aux | grep Z | wc -l before and after.
In my case it 4 and 15.

Priority:

Medium

I need to get UAST for many repos in nearest future, so it quite important.

Other info you may need:

I tried to reproduce the same result under MacOS and failed

"proto: no encoder" log messages

When I start bblfsh and parse a file, I get

time="2017-09-27T14:53:52Z" level=info msg="binding to 0.0.0.0:9432"
time="2017-09-27T14:53:52Z" level=info msg="initializing runtime at /tmp/bblfsh-runtime"
time="2017-09-27T14:53:52Z" level=info msg="setting maximum size for sending and receiving messages to 104857600"
time="2017-09-27T14:53:52Z" level=info msg="starting gRPC server"
time="2017-09-27T14:54:17Z" level=info msg="container started bblfsh/java-driver:latest (01BV1X9KXP0HJBG4WYW2MNZZZB)"
time="2017-09-27T14:54:17Z" level=info msg="parsing FileUtils.java (2248 bytes)"
proto: no encoder for Filename string [GetProperties]
proto: no encoder for Language string [GetProperties]
proto: no encoder for Content string [GetProperties]
proto: no encoder for Encoding protocol.Encoding [GetProperties]

I guess these "proto" messages signal about smth, though the files are successfully parsed.

Cannot install via go get due to github.com/sirupsen/logrus namespace collision

This is related to some internal renaming. See sirupsen/logrus#570 (comment)

ubuntu@ubuntu16:~$ go version
go version go1.8.3 linux/amd64
ubuntu@ubuntu16:~$ echo $GOPATH
/home/ubuntu/go
ubuntu@ubuntu16:~$ go get -v github.com/bblfsh/server
can't load package: package github.com/bblfsh/server: case-insensitive import collision: "github.com/Sirupsen/logrus" and "github.com/sirupsen/logrus"

Apparently logrus should be imported using github.com/sirupsen/logrus

Strange behaviour during launching bblfsh server

When you launch bblfsh server and send requests to extract UAST before it fetches all docker images, bblfsh server returns empty responses.

How to reproduce:

  1. launch server
    docker run --privileged -p 9432:9432 --name bblfsh -e BBLFSH_MAX_INSTANCES_PER_DRIVER=1 bblfsh/server
  2. in parallel console start to send requests:
    LC_ALL=en_GB.UTF-8 python3 -m ast2vec repo2coocc --bblfsh 172.17.0.1:9432 -r https ://github.com/some/repo --linguist ./enry -o test_uast_one/repo.asdf
    then you can check response:
    python3 -m ast2vec dump test_uast_one/repo.asdf
    and result is
{'created_at': datetime.datetime(2017, 6, 20, 14, 29, 7, 781788),
 'dependencies': [],
 'model': 'co-occurrences',
 'uuid': 'f582e9a1-bcdb-4acd-9979-aa0ecf5d1f4f',
 'version': [1, 0, 0]}
Number of words: 0
First 10 words: []
Matrix info: number of non zero elements 0 , shape: [0, 0]

but if you wait and send request after loading of all images:

{'created_at': datetime.datetime(2017, 6, 20, 14, 31, 13, 988607),
 'dependencies': [],
 'model': 'co-occurrences',
 'uuid': '6da8f27b-7769-447b-93eb-afdfc4adfa5e',
 'version': [1, 0, 0]}
Number of words: 74
First 10 words: ['print', 'input', 'file', 'output', 'reader', 'writer', 'header', 'line', 'csv', 'sys']
Matrix info: number of non zero elements 2435 , shape: [74, 74]

Server build fails

Server version: 54c71fc
SDK version: bblfsh/sdk@df3e0da

Build log:

docker build -f Dockerfile.build -t bblfsh-server-build .
Sending build context to Docker daemon 112.9 MB
Step 1 : FROM golang:1.8-alpine
 ---> e7baf3b1a3a5
Step 2 : RUN apk add --no-cache git make musl-dev musl-utils gcc lvm2-dev btrfs-progs-dev
 ---> Using cache
 ---> fd6daebf7136
Step 3 : ENV GOPATH /go
 ---> Using cache
 ---> 48d0a86e3f57
Step 4 : WORKDIR /go/src/github.com/bblfsh/server
 ---> Using cache
 ---> 4f7d3d5252c2
Successfully built 4f7d3d5252c2
docker run --rm -v /home/dennwc/Go:/go bblfsh-server-build make build-internal
mkdir -p /go/src/github.com/bblfsh/server/build; \
for cmd in bblfsh; do \
        cd /go/src/github.com/bblfsh/server/cmd/${cmd}; \
        go build --ldflags '-X main.version=master -X main.build=06-15-2017_19_51_11' -o /go/src/github.com/bblfsh/server/build/${cmd} .; \
done;
# github.com/bblfsh/server/cmd/bblfsh
/usr/local/go/pkg/tool/linux_amd64/link: running gcc failed: exit status 1
/tmp/go-link-688671004/000001.o: In function `vsnprintf':
/usr/include/x86_64-linux-gnu/bits/stdio2.h:77: undefined reference to `__vsnprintf_chk'
/tmp/go-link-688671004/000001.o: In function `child_func':
/home/dennwc/Go/src/github.com/bblfsh/server/vendor/github.com/opencontainers/runc/libcontainer/nsenter/nsexec.c:212: undefined reference to `__longjmp_chk'
/tmp/go-link-688671004/000001.o: In function `fprintf':
/usr/include/x86_64-linux-gnu/bits/stdio2.h:97: undefined reference to `__fprintf_chk'
/usr/include/x86_64-linux-gnu/bits/stdio2.h:97: undefined reference to `__fprintf_chk'
/usr/include/x86_64-linux-gnu/bits/stdio2.h:97: undefined reference to `__fprintf_chk'
/usr/include/x86_64-linux-gnu/bits/stdio2.h:97: undefined reference to `__fprintf_chk'
/usr/include/x86_64-linux-gnu/bits/stdio2.h:97: undefined reference to `__fprintf_chk'
/tmp/go-link-688671004/000001.o:/usr/include/x86_64-linux-gnu/bits/stdio2.h:97: more undefined references to `__fprintf_chk' follow
/tmp/go-link-688671004/000001.o: In function `snprintf':
/usr/include/x86_64-linux-gnu/bits/stdio2.h:64: undefined reference to `__snprintf_chk'
/tmp/go-link-688671004/000001.o: In function `fprintf':
/usr/include/x86_64-linux-gnu/bits/stdio2.h:97: undefined reference to `__fprintf_chk'
/usr/include/x86_64-linux-gnu/bits/stdio2.h:97: undefined reference to `__fprintf_chk'
/usr/include/x86_64-linux-gnu/bits/stdio2.h:97: undefined reference to `__fprintf_chk'
/usr/include/x86_64-linux-gnu/bits/stdio2.h:97: undefined reference to `__fprintf_chk'
/usr/include/x86_64-linux-gnu/bits/stdio2.h:97: undefined reference to `__fprintf_chk'
/tmp/go-link-688671004/000001.o:/usr/include/x86_64-linux-gnu/bits/stdio2.h:97: more undefined references to `__fprintf_chk' follow
collect2: error: ld returned 1 exit status

make: *** [Makefile:118: build-internal] Error 2
Makefile:115: recipe for target 'build' failed
make: *** [build] Error 2

add driver client

Add driver client to be used by the server.

It should implement, at least:

ParseUAST(req *protocol.ParseUASTRequest) (*protocol.ParseUASTResponse, error)

The rest of it should be something along the lines of:

type Driver struct{}
func ExecDriver(r *runtime.Runtime, desc *DriverDescriptor) (*Driver, error)
func (d *Driver) ParseUAST(req *protocol.ParseUASTRequest) (*protocol.ParseUASTResponse, error)
func (d *Driver) Close() error

add driver client pool

Create driver client pool, controlling the execution of a pool of drivers with the same image and using it to serve concurrent requests.

Question about performance

Is there any plans to measure how fast bblfsh and drivers work and improve it?
I thought to use it in my project, but it works very slow.

Currently, the difference bblfsh vs native parser is huge.
Example of parsing itsdangerous.py 500 times:

native parser:

import ast
from datetime import datetime

st = datetime.now()

for _ in range(500):
    tree = ast.parse(open('itsdangerous.py').read())
    ast.dump(tree)

print(datetime.now() - st)

using bblfs:

package main

import (
	"fmt"
	"io/ioutil"
	"time"

	"gopkg.in/bblfsh/client-go.v2"
)

const times = 500

func main() {
	b, err := ioutil.ReadFile("itsdangerous.py")
	if err != nil {
		panic(err)
	}
	content := string(b)

	client, err := bblfsh.NewClient("0.0.0.0:9432")
	if err != nil {
		panic(err)
	}

	st := time.Now()

	for i := 0; i < times; i++ {
		_, err = client.NewParseRequest().Language("python").Content(content).Do()
		if err != nil {
			panic(err)
		}
	}

	fmt.Println(time.Now().Sub(st))

}

Results:

$ python3 parse.py
0:00:09.097725

$ go run main.go
2m34.900917922s

9s vs 2m34s is kinda huge...

Will you be open to my help with profiling/optimization?

Concurrent queries return randomly corrupted UASTs

We use the following script to reproduce the problem:

python3 -m bblfsh  ast2vec/ast2vec/__init__.py  ast2vec/ast2vec/__main__.py  ast2vec/ast2vec/dataset.py  ast2vec/ast2vec/df.py  ast2vec/ast2vec/enry.py  ast2vec/ast2vec/id2vec.py  ast2vec/ast2vec/id_embedding.py  ast2vec/ast2vec/repo2base.py  ast2vec/ast2vec/repo2coocc.py  ast2vec/ast2vec/repo2nbow.py  ast2vec/ast2vec/swivel.py  ast2vec/ast2vec/utils/__init__.py  ast2vec/ast2vec/utils/ast2vec.py  ast2vec/ast2vec/utils/gogo_pb2.py  ast2vec/ast2vec/utils/gogo_pb2_grpc.py  ast2vec/ast2vec/utils/sdk_protocol_pb2.py  ast2vec/ast2vec/utils/sdk_protocol_pb2_grpc.py  ast2vec/ast2vec/utils/uast_pb2.py  ast2vec/ast2vec/utils/uast_pb2_grpc.py  ast2vec/data/test.py  ast2vec/src/bblfsh/bblfsh/__init__.py  ast2vec/src/bblfsh/bblfsh/__main__.py  ast2vec/src/bblfsh/bblfsh/client.py  ast2vec/src/bblfsh/bblfsh/github/__init__.py  ast2vec/src/bblfsh/bblfsh/github/com/__init__.py  ast2vec/src/bblfsh/bblfsh/github/com/bblfsh/__init__.py  ast2vec/src/bblfsh/bblfsh/github/com/bblfsh/sdk/__init__.py  ast2vec/src/bblfsh/bblfsh/github/com/bblfsh/sdk/protocol/__init__.py  ast2vec/src/bblfsh/bblfsh/github/com/bblfsh/sdk/protocol/generated_pb2.py  ast2vec/src/bblfsh/bblfsh/github/com/bblfsh/sdk/protocol/generated_pb2_grpc.py  ast2vec/src/bblfsh/bblfsh/github/com/bblfsh/sdk/uast/__init__.py  ast2vec/src/bblfsh/bblfsh/github/com/bblfsh/sdk/uast/generated_pb2.py  ast2vec/src/bblfsh/bblfsh/github/com/gogo/__init__.py  ast2vec/src/bblfsh/bblfsh/github/com/gogo/protobuf/__init__.py  ast2vec/src/bblfsh/bblfsh/github/com/gogo/protobuf/gogoproto/__init__.py  ast2vec/src/bblfsh/bblfsh/github/com/gogo/protobuf/gogoproto/gogo_pb2.py  ast2vec/src/bblfsh/bblfsh/launcher.py  ast2vec/src/bblfsh/bblfsh/test.py  ast2vec/src/bblfsh/setup.py > 1.txt &
python3 -m bblfsh  ast2vec/ast2vec/__init__.py  ast2vec/ast2vec/__main__.py  ast2vec/ast2vec/dataset.py  ast2vec/ast2vec/df.py  ast2vec/ast2vec/enry.py  ast2vec/ast2vec/id2vec.py  ast2vec/ast2vec/id_embedding.py  ast2vec/ast2vec/repo2base.py  ast2vec/ast2vec/repo2coocc.py  ast2vec/ast2vec/repo2nbow.py  ast2vec/ast2vec/swivel.py  ast2vec/ast2vec/utils/__init__.py  ast2vec/ast2vec/utils/ast2vec.py  ast2vec/ast2vec/utils/gogo_pb2.py  ast2vec/ast2vec/utils/gogo_pb2_grpc.py  ast2vec/ast2vec/utils/sdk_protocol_pb2.py  ast2vec/ast2vec/utils/sdk_protocol_pb2_grpc.py  ast2vec/ast2vec/utils/uast_pb2.py  ast2vec/ast2vec/utils/uast_pb2_grpc.py  ast2vec/data/test.py  ast2vec/src/bblfsh/bblfsh/__init__.py  ast2vec/src/bblfsh/bblfsh/__main__.py  ast2vec/src/bblfsh/bblfsh/client.py  ast2vec/src/bblfsh/bblfsh/github/__init__.py  ast2vec/src/bblfsh/bblfsh/github/com/__init__.py  ast2vec/src/bblfsh/bblfsh/github/com/bblfsh/__init__.py  ast2vec/src/bblfsh/bblfsh/github/com/bblfsh/sdk/__init__.py  ast2vec/src/bblfsh/bblfsh/github/com/bblfsh/sdk/protocol/__init__.py  ast2vec/src/bblfsh/bblfsh/github/com/bblfsh/sdk/protocol/generated_pb2.py  ast2vec/src/bblfsh/bblfsh/github/com/bblfsh/sdk/protocol/generated_pb2_grpc.py  ast2vec/src/bblfsh/bblfsh/github/com/bblfsh/sdk/uast/__init__.py  ast2vec/src/bblfsh/bblfsh/github/com/bblfsh/sdk/uast/generated_pb2.py  ast2vec/src/bblfsh/bblfsh/github/com/gogo/__init__.py  ast2vec/src/bblfsh/bblfsh/github/com/gogo/protobuf/__init__.py  ast2vec/src/bblfsh/bblfsh/github/com/gogo/protobuf/gogoproto/__init__.py  ast2vec/src/bblfsh/bblfsh/github/com/gogo/protobuf/gogoproto/gogo_pb2.py  ast2vec/src/bblfsh/bblfsh/launcher.py  ast2vec/src/bblfsh/bblfsh/test.py  ast2vec/src/bblfsh/setup.py > 2.txt &
python3 -m bblfsh  ast2vec/ast2vec/__init__.py  ast2vec/ast2vec/__main__.py  ast2vec/ast2vec/dataset.py  ast2vec/ast2vec/df.py  ast2vec/ast2vec/enry.py  ast2vec/ast2vec/id2vec.py  ast2vec/ast2vec/id_embedding.py  ast2vec/ast2vec/repo2base.py  ast2vec/ast2vec/repo2coocc.py  ast2vec/ast2vec/repo2nbow.py  ast2vec/ast2vec/swivel.py  ast2vec/ast2vec/utils/__init__.py  ast2vec/ast2vec/utils/ast2vec.py  ast2vec/ast2vec/utils/gogo_pb2.py  ast2vec/ast2vec/utils/gogo_pb2_grpc.py  ast2vec/ast2vec/utils/sdk_protocol_pb2.py  ast2vec/ast2vec/utils/sdk_protocol_pb2_grpc.py  ast2vec/ast2vec/utils/uast_pb2.py  ast2vec/ast2vec/utils/uast_pb2_grpc.py  ast2vec/data/test.py  ast2vec/src/bblfsh/bblfsh/__init__.py  ast2vec/src/bblfsh/bblfsh/__main__.py  ast2vec/src/bblfsh/bblfsh/client.py  ast2vec/src/bblfsh/bblfsh/github/__init__.py  ast2vec/src/bblfsh/bblfsh/github/com/__init__.py  ast2vec/src/bblfsh/bblfsh/github/com/bblfsh/__init__.py  ast2vec/src/bblfsh/bblfsh/github/com/bblfsh/sdk/__init__.py  ast2vec/src/bblfsh/bblfsh/github/com/bblfsh/sdk/protocol/__init__.py  ast2vec/src/bblfsh/bblfsh/github/com/bblfsh/sdk/protocol/generated_pb2.py  ast2vec/src/bblfsh/bblfsh/github/com/bblfsh/sdk/protocol/generated_pb2_grpc.py  ast2vec/src/bblfsh/bblfsh/github/com/bblfsh/sdk/uast/__init__.py  ast2vec/src/bblfsh/bblfsh/github/com/bblfsh/sdk/uast/generated_pb2.py  ast2vec/src/bblfsh/bblfsh/github/com/gogo/__init__.py  ast2vec/src/bblfsh/bblfsh/github/com/gogo/protobuf/__init__.py  ast2vec/src/bblfsh/bblfsh/github/com/gogo/protobuf/gogoproto/__init__.py  ast2vec/src/bblfsh/bblfsh/github/com/gogo/protobuf/gogoproto/gogo_pb2.py  ast2vec/src/bblfsh/bblfsh/launcher.py  ast2vec/src/bblfsh/bblfsh/test.py  ast2vec/src/bblfsh/setup.py > 3.txt

Please note that you need the patch to client-python: https://github.com/vmarkovtsev/bblfsh.client-python/tree/main-change

Result: 1.txt, 2.txt and 3.txt are randomly different (checked with diff). Besides, some files are in wrong places or empty.

Growing number of driver processes

While getting UASTs and filtering for identifiers for python files of a single project using Engine, after 30min I can see 350+ driver processes inside the bblfshd container

Logs in details

ps

root      2169  0.0  0.0  18188  3188 pts/0    Ss   08:43   0:00 bash
root      2177  0.0  0.0      0     0 ?        Z    08:43   0:00 [runc:[1:CHILD]] <defunct>
root      2203  0.0  0.0      0     0 ?        Z    08:43   0:00 [runc:[1:CHILD]] <defunct>
root      2228  0.0  0.0      0     0 ?        Z    08:43   0:00 [runc:[1:CHILD]] <defunct>
root      2249  0.0  0.0      0     0 ?        Z    08:43   0:00 [runc:[1:CHILD]] <defunct>
root      2269  0.0  0.0      0     0 ?        Z    08:44   0:00 [runc:[1:CHILD]] <defunct>
root      2473  0.0  0.0      0     0 ?        Z    08:44   0:00 [runc:[1:CHILD]] <defunct>
root      2561  0.0  0.0      0     0 ?        Z    08:44   0:00 [runc:[1:CHILD]] <defunct>
root      2562 28.4  0.7  36036 28552 ?        Ssl  08:44   0:01 /opt/driver/bin/driver --log-level info --log-format text -
root      2572 30.7  0.6  81092 27848 ?        S    08:44   0:02 /usr/bin/python3.6 /usr/bin/python_driver

Container log

time="2017-11-16T08:48:35Z" level=info msg="python-driver version: dev-1908ca8 (build: 2017-11-14T11:31:28Z)" id=01bz207xxmc18dppgxwgywr5zs language=python
time="2017-11-16T08:48:35Z" level=info msg="server listening in /tmp/rpc.sock (unix)" id=01bz207xxmc18dppgxwgywr5zs language=python
time="2017-11-16T08:48:36Z" level=info msg="new driver instance started bblfsh/python-driver:latest (01bz207xxmc18dppgxwgywr5zs)"
time="2017-11-16T08:49:07Z" level=info msg="python-driver version: dev-1908ca8 (build: 2017-11-14T11:31:28Z)" id=01bz208xey432evff0pga9dnxr language=python
time="2017-11-16T08:49:07Z" level=info msg="server listening in /tmp/rpc.sock (unix)" id=01bz208xey432evff0pga9dnxr language=python
time="2017-11-16T12:24:51Z" level=error msg="error re-scaling pool: container is not destroyed" language=python

apache spark thread dump

org.bblfsh.client.BblfshClient.filter(BblfshClient.scala:33)
tech.sourced.engine.udf.QueryXPathUDF$$anonfun$queryXPath$2.apply(QueryXPathUDF.scala:45)
tech.sourced.engine.udf.QueryXPathUDF$$anonfun$queryXPath$2.apply(QueryXPathUDF.scala:44)
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
tech.sourced.engine.udf.QueryXPathUDF$.queryXPath(QueryXPathUDF.scala:44)

Steps to reproduce, using 30 concurrent clients:

// get Borges from https://github.com/src-d/borges/releases/tag/v0.8.3
echo -e "https://github.com/src-d/borges.git\nhttps://github.com/erizocosmico/borges.git\nhttps://github.com/jelmer/dulwich.git" > repos.txt
borges pack --loglevel=debug --workers=2 --to=./repos -f repos.txt

// get Apache Spark https://github.com/src-d/engine#quick-start
$SPARK_HOME/bin/spark-shell --driver-memory=4g --packages "tech.sourced:engine:0.1.7"

and then run :paste, paste code below and hit Ctrl+D

import tech.sourced.engine._

val engine = Engine(spark, "repos")
val repos = engine.getRepositories
val refs = repos.getHEAD.withColumnRenamed("hash","commit_hash")

val langs = refs.getFiles.classifyLanguages
val pyTokens = langs
  .where('lang === "Python")
  .extractUASTs.queryUAST("//*[@roleIdentifier]", "uast", "result")
  .extractTokens("result", "tokens")

val tokensToWrite = pyTokens
  .join(refs, "commit_hash")
  .select('repository_id, 'name, 'commit_hash, 'file_hash, 'path, 'lang, 'tokens)

spark.conf.set("spark.sql.shuffle.partitions", "30") //instead of default 200
tokensToWrite.show

then, if exec'ed to bblfshd container, one can see number of driver processes growing

apt-get update && apt-get install -y procps
ps aux | wc -l

add a mechanism to select specific images for language drivers

Currently we always use bblfsh/<lang>-driver:latest but we should be able to select specific images for some languages with environment variables. That will be particularly useful for testing.

I would say something like JAVA_DRIVER_IMAGE=myimage:foo or JAVA_DRIVER_IMAGE=docker-daemon:myimage:foo, but this is open to discussion.

Driver image override inside docker

Right now, if one tries to override default driver images while running server in docker container

BBLFSH_DRIVER_IMAGES="python=docker-daemon:bblfsh/python-driver:dev-4dd607b;java=docker-daemon:bblfsh/java-driver:dev-45a5e8f" docker run -e BBLFSH_DRIVER_IMAGES --privileged -p 9432:9432 --name bblfsh bblfsh/server:dev-45a5e8f

time="2017-07-17T09:28:02Z" level=debug msg="binding to 0.0.0.0:9432"
time="2017-07-17T09:28:02Z" level=debug msg="initializing runtime at /tmp/bblfsh-runtime"
time="2017-07-17T09:28:02Z" level=debug msg="Overriding image for python: docker-daemon:bblfsh/python-driver:dev-4dd607b"
time="2017-07-17T09:28:02Z" level=debug msg="Overriding image for java: docker-daemon:bblfsh/java-driver:dev-45a5e8f"
time="2017-07-17T09:28:02Z" level=debug msg="starting server"
time="2017-07-17T09:28:02Z" level=debug msg="registering gRPC service"
time="2017-07-17T09:28:02Z" level=info msg="starting gRPC server"

The client will fail with

error getting driver: missing driver for language python: runtime failure: Error loading image from docker engine: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

error getting driver: missing driver for language java: runtime failure: Error loading image from docker engine: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?]

There are no attempts to fetch a driver in server logs

Request hangs on the first ParseUAST call after a previous request with a FATAL error

Summary

Doing ParseUAST requests in a loop for several files will hang on ParseUAST() if the previous request returned a FATAL error, even reconnecting after it.

Expected result

If we handle the error subsequent ParseUASTRequest for different files should continue, maybe after doing a reconnect to the server.

Obtained result

The server log show the source code of the next file but the client process hangs on the next ParseUAST request.

/usr/lib/python3.6/__future__.py
/usr/lib/python3.6/__phello__.foo.py
/usr/lib/python3.6/_bootlocale.py
/usr/lib/python3.6/_collections_abc.py
/usr/lib/python3.6/_compat_pickle.py
/usr/lib/python3.6/_compression.py
/usr/lib/python3.6/_dummy_thread.py
/usr/lib/python3.6/_markupbase.py
/usr/lib/python3.6/_osx_support.py
/usr/lib/python3.6/_pydecimal.py
FATAL error with the file [ /usr/lib/python3.6/_pydecimal.py ]: [buffer size exceeded]
/usr/lib/python3.6/_pyio.py
# HANGS HERE

Reproducible: yes

How to reproduce:

  1. Download this .go file that reproduces the problem doing server requests for all of Python library (change the path of the Python stdlib directory to adapt to your system, it doesn't matter the version): https://gist.github.com/juanjux/f0ae6ed109522265a8bead8e2522d252

If no files crash (because SDK PR #127 has been merged), copy this file to the directory with the others (its intentionally so huge that it'll crash even with the 4MB buffer): https://gist.github.com/juanjux/b6a51cd934368c142e6ea235811d1f33

  1. Start the server: docker run --privileged -p 9432:9432 --name bblfsh bblfsh/server

  2. go run py2uast2pb.go

coala Collaboration

Hi,

I was just made aware of this project and this sounds super interesting - first: I'm the founder of coala.io and we have been thinking about doing some universal AST as well but never got around doing it properly. It's great to see that there's an own project and effort around this concept!

coala is a python based open source code analysis framework with a dependency mechanism. The main idea here is that people can write a module that generates an AST (preferrably something like your UAST) and other modules that consume that one, coala handles parallelization, caching, user interaction etc.

Given that it'd be super cool if we could maybe collaborate on that to a degree. I see you already have a python client and I need to read up on this stuff but I could see us providing your UAST for researchers and programmers to write code analysis and maybe build a query language for the code analysis (maybe you already have something like this in mind).

I just read about this 5 mins ago and those are just a few initial thoughts, what do you think?

Cheers!

Error reporting on ParseUAST to avoid `panic: runtime error`

Right now resp, err := client.ParseUAST(context.TODO(), req) may fail with error beeing NOT nil but actual parsing has failed and there are resp.Errors and resp.status is fatal.

resp may be nil (i.e if server is not running) which is absolutely fine.

It seems like quite un-obvious behavior (at least for Golang) that may either be possible to fix or at least to documented everywhere, including https://doc.bblf.sh/user/server-grpc-example.html#full-source-of-the-example

Otherwise API users will :finnadie: on random NPE panics on further response.UAST manipulations (which do not propagate any error message).

Here is the example of client that works around current behavior, which boils down to

 resp, err := client.ParseUAST(context.TODO(), req)
 if err != nil {
 	fmt.Printf("Error - ParseUAST failed, reposne is nil, error:%v for %s\n", err, f.Name)
 } else if resp == nil {
 	fmt.Printf("No error, but - ParseUAST failed, response is nil\n")
 } else if (len(resp.Errors) != 0) || (resp.Status != protocol.Ok) {
 	fmt.Printf("No error, but - ParseUAST status:%s, error:%v for %s\n", resp.Status, resp.Errors, f.Name)
 }

Package managers manifest parser / Question

Hi,

Hope you are all well !

Goal: generate a source code summary by parsing source code and package manager manifests.

I was wondering if the project would be suitable to parse CMakeLists.txt or other dependencies manager (NPM, Maven,...) in order to get a more complete overview of any project. And how would be the best approach to do it.

Example of use case:

Thanks in advance for any insights or point of view.

Cheers,

[feature request] add filepath to debug logging

I use bblfsh and often check what is doing on via

docker logs -f bblfshd

and it gives messages like:

time="2017-11-24T11:38:43Z" level=warning msg="request processed content 4047 bytes, status Error" elapsed=103.048661ms language=java

If status Error I'd really like to know the file path, on which it fails.

Is it possible to add?
You can add it and permanent field or add just in case of error.

Server can't handle too big files because of grpc limits: StatusCode.RESOURCE_EXHAUSTED, grpc: trying to send message larger than max

Code & repo to reproduce

# repo to reproduce
git clone https://github.com/svn2github/icu4j/ --depth 1
# bblfsh client call to reproduce error
root@science-3:~/tmp# PYTHONPATH=/media/root/storage/egor/ast2vec_rsync:/media/root/storage/egor/ast2vec_rsync/src/bblfsh GRPC_ARG_MAX_SEND_MESSAGE_LENGTH=-1 python3 -m bblfsh --disable-bblfsh-autorun -e 172.17.0.1:9432 -f icu4j/main/tests/core/src/com/ibm/icu/dev/test/bigdec/DiagBigDecimalTest.java
Traceback (most recent call last):
  File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/media/root/storage/egor/ast2vec_rsync/src/bblfsh/bblfsh/__main__.py", line 34, in <module>
    sys.exit(main())
  File "/media/root/storage/egor/ast2vec_rsync/src/bblfsh/bblfsh/__main__.py", line 30, in main
    print(client.parse_uast(args.file, args.language))
  File "/media/root/storage/egor/ast2vec_rsync/src/bblfsh/bblfsh/client.py", line 62, in parse_uast
    response = self._stub.ParseUAST(request, timeout=timeout)
  File "/usr/local/lib/python3.5/dist-packages/grpc/_channel.py", line 507, in __call__
    return _end_unary_response_blocking(state, call, False, deadline)
  File "/usr/local/lib/python3.5/dist-packages/grpc/_channel.py", line 455, in _end_unary_response_blocking
    raise _Rendezvous(state, None, None, deadline)
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with (StatusCode.RESOURCE_EXHAUSTED, grpc: trying to send message larger than max (4315530 vs. 4194304))>

Returned value when a language can't be detected

this is the code used in language.go

func GetLanguage(filename string, content []byte) string {
	lang := enry.GetLanguage(filename, content)
	if lang == "" {
		lang = enry.OtherLanguage
	}

	lang = strings.ToLower(lang)
	lang = strings.Replace(lang, " ", "-", -1)
	lang = strings.Replace(lang, "+", "p", -1)
	lang = strings.Replace(lang, "#", "sharp", -1)
	return lang
}

but since enry.OtherLanguage was changed to be the string zero value "", the if statement is reassigning again the same value to lang.

What is the expected value this function should return in case a language couldn't be detected?

corrupted response from server

reproducible for https://github.com/spring/svn-spring-archive
specific file: https://github.com/spring/svn-spring-archive/blob/master/Lobby/TASServer/TASServer.java

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/grpc/_common.py", line 129, in _transform
    return transformer(message)
google.protobuf.message.DecodeError: Error parsing message
^[[1;36mINFO^[[0m:repos2coocc:^[[0mhttps://github.com/shaunduncan/helga.git pending tasks: 43^[[0m
^[[1;36mINFO^[[0m:repos2coocc:^[[0mhttps://github.com/ups-nlp/nlp425.git pending tasks: 17^[[0m
^[[1;31mERROR^[[0m:repos2coocc:^[[0mError while processing ('/tmp/repo2nbow-6lu1go2q/TASServer/TASServer.java', 'Java').^[[0m
Traceback (most recent call last):
  File "/media/root/storage/egor/ast2vec_rsync/ast2vec/repo2base.py", line 97, in thread_loop
    filename, language=language, timeout=self._timeout)
  File "/media/root/storage/egor/ast2vec_rsync/src/bblfsh/bblfsh/client.py", line 62, in parse_uast
    response = self._stub.ParseUAST(request, timeout=timeout)
  File "/usr/local/lib/python3.5/dist-packages/grpc/_channel.py", line 507, in __call__
    return _end_unary_response_blocking(state, call, False, deadline)
  File "/usr/local/lib/python3.5/dist-packages/grpc/_channel.py", line 455, in _end_unary_response_blocking
    raise _Rendezvous(state, None, None, deadline)
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with (StatusCode.INTERNAL, Exception deserializing response!)>

code to reproduce:

python3 -m bblfsh -e 0.0.0.0:9432 -f TASServer.java --disable-bblfsh-autorun
ERROR:root:Exception deserializing message!
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/grpc/_common.py", line 129, in _transform
    return transformer(message)
google.protobuf.message.DecodeError: Error parsing message
Traceback (most recent call last):
  File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/media/root/storage/egor/ast2vec_rsync/src/bblfsh/bblfsh/__main__.py", line 34, in <module>
    sys.exit(main())
  File "/media/root/storage/egor/ast2vec_rsync/src/bblfsh/bblfsh/__main__.py", line 30, in main
    print(client.parse_uast(args.file, args.language))
  File "/media/root/storage/egor/ast2vec_rsync/src/bblfsh/bblfsh/client.py", line 62, in parse_uast
    response = self._stub.ParseUAST(request, timeout=timeout)
  File "/usr/local/lib/python3.5/dist-packages/grpc/_channel.py", line 507, in __call__
    return _end_unary_response_blocking(state, call, False, deadline)
  File "/usr/local/lib/python3.5/dist-packages/grpc/_channel.py", line 455, in _end_unary_response_blocking
    raise _Rendezvous(state, None, None, deadline)
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with (StatusCode.INTERNAL, Exception deserializing response!)>

Set explicit versions to dependencies in glide.yaml

We should avoid having versions unset since glide's cache makes weird things in that case.
A sane aproach may be:

  • latest stable version for all external dependencies and those internal dependencies with stable versions.
  • latest commit in master for internal dependencies without stable versions.

add basic gRPC server

Add basic Server that takes a runtime path and uses it to execute drivers with default names (bblfsh/<language>-driver) given gRPC requests.

README build instructions don't work

After correctly setting GOPATH I still had issues

package _/Users/rporres/git/bblfsh-server: unrecognized import path "_/Users/rporres/git/bblfsh-server" (import path does not begin with hostname)

This is how it worked for me (thx to @smola)

mkdir -p $GOPATH/src/github.com/bblfsh
cd $GOPATH/src/github.com/bblfsh
git clone https://github.com/bblfsh/server.git
cd server
make dependencies
make build

Tested in Ubuntu and Mac OS X

support running tests in Docker with make

Running babelfish server currently requires root. This makes running tests a pain, so Makefile should be modified to run tests in Docker and travis.yml should be updated if needed.

Server hangs after 10 minutes

I have a strange problem and it is hard to give you a short example how to reproduce the bug.
I really like to have one, but I can not find it.

The problem is that bblfsh server hangs after several minutes of work on science-3. The easy way to reproduce it on science-3 will be

docker run --rm --privileged -d -p 9434:9432 --name bblfsh_test bblfsh/server:v0.7.0
docker run --rm -it -v /storage:/storage --name bblfsh_hang_client -v /data:/data -e "LD_PRELOAD=" srcd/science bash
# next in bblfsh_hang_client container
cd /storage/konstantin
./setup_docker.sh
export PYTHONPATH='./modelforge:./ast2vec:./snippet_ranger'
rm -rf ./data/sources/matplotlib # I keep my data in matplotlib_old no worries :)
python3 ./entry_pnt.py

It will convert repos using ast2vec (last develop version(src-d/ml@973707e)) to asdf model filies. You need to wait 5-20 minutes and you will see in bblfsh logs something like:


time="2017-09-13T16:50:46Z" level=info msg="parsing rcm_server_pbs.py (9073 bytes)"
time="2017-09-13T16:50:46Z" level=info msg="parsing rcm_server_ssh.py (4199 bytes)"
time="2017-09-13T16:50:46Z" level=info msg="parsing test.py (254 bytes)"
time="2017-09-13T16:50:46Z" level=info msg="parsing rcm_client_tk.spec (5756 bytes)"
time="2017-09-13T16:50:47Z" level=info msg="container started bblfsh/python-driver:latest (01BSY2CV6XY2RJX079VC9PQQKD)"
time="2017-09-13T16:50:47Z" level=error msg="driver bblfsh/python-driver:latest (01BSY2C400G44J3ZW3AV2YHZ2C) stderr: ERROR:root:Filepath: , Errors: ['Traceback (most recent call last):\\n  File \"/usr/lib/python3.6/site-packages/python_driver/requestprocessor.py\", line 173, in process_request\\n    self._send_response(response)\\n  File \"/usr/lib/python3.6/site-packages/python_driver/requestprocessor.py\", line 220, in _send_response\\n    self.outbuffer.write(json.dumps(response, ensure_ascii=False))\\n  File \"/usr/lib/python3.6/json/__init__.py\", line 238, in dumps\\n    **kw).encode(obj)\\n  File \"/usr/lib/python3.6/json/encoder.py\", line 199, in encode\\n    chunks = self.iterencode(o, _one_shot=True)\\n  File \"/usr/lib/python3.6/json/encoder.py\", line 257, in iterencode\\n    return _iterencode(o, 0)\\n  File \"/usr/lib/python3.6/json/encoder.py\", line 180, in default\\n    o.__class__.__name__)\\nTypeError: Object of type \\'complex\\' is not JSON serializable\\n']"

(Sometime it is just hangs without error).

and in bblfsh_hang_client containner

WARNING:source_transformer:Failed to construct model for /storage/konstantin/data/repos/matplotlib/fish2000@imread: itemsize cannot be zero in type

It is because bblfsh hang.
But if you restart last command python3 ./entry_pnt.py it will produce the same warnings and nothing in bblfsh logs. May be It can be related to grpc problems, but I am not 100% sure.

P.S.: I and @fineguy keep trying to find simple example without ast2vec usage. Something like this: https://gist.github.com/zurk/ad464aa73ad244980457dd2f09ff3abd#file-bblfsh_hang-py but it seems to work ok at least for short time. Also ./entry_pnt.py you can find in the same gist: https://gist.github.com/zurk/ad464aa73ad244980457dd2f09ff3abd#file-entry_pnt-py just in case.

`--transport=docker-daemon` doesn't work then running the server from Docker

If I run the server like:

docker run --privileged -p 9432:9432 --rm --name bblfsh bblfsh/server --transport=docker-daemon

And then do a client request, I get this error:

python -m bblfsh --disable-bblfsh-autorun  -f test.py

status: FATAL
errors: "error getting driver: missing driver for language python: runtime failure: 
Error loading image from docker engine: Cannot connect to the Docker daemon at 
unix:///var/run/docker.sock. Is the docker daemon running?"

But it works perfectly if I run the server directly without Docker:

sudo ./bblfsh server --transport=docker-daemon

Reduce default verbosity of CLI logs

Processing 1000s or repos results in 100+mb of server logs mostly due to UASTs printed to stdout.
It can be avoided by skipping printing UASTs to stdout.

May be it could be hidden in -v or -vv or some deeper log-level? As it is very useful for lower-level debugging.

When bblfshd binary runs inside a container, it crash

When a bblfshd binary runs inside a container, it crash.

Is it an expected behavior?

How to reproduce:

I used the following Dockerfile, to define a container with a bblfshd binary inside

FROM ubuntu:16.04

WORKDIR bblfsh

RUN apt-get update && \
    apt-get install --assume-yes wget vim

ENV BBLFSH_VERSION 2.2.0
RUN wget "https://github.com/bblfsh/bblfshd/releases/download/v${BBLFSH_VERSION}/bblfshd_v${BBLFSH_VERSION}_linux_amd64.tar.gz" && \
    wget "https://github.com/bblfsh/bblfshd/releases/download/v${BBLFSH_VERSION}/bblfshctl_v${BBLFSH_VERSION}_linux_amd64.tar.gz" && \
    tar -xf "bblfshd_v${BBLFSH_VERSION}_linux_amd64.tar.gz" && \
    tar -xf "bblfshctl_v${BBLFSH_VERSION}_linux_amd64.tar.gz" && \
    mv */bblfsh* /usr/local/bin/ && \
    rm -rf bblfsh*

RUN apt-get install --assume-yes software-properties-common && \
    add-apt-repository --yes ppa:alexlarsson/flatpak && \
    apt-get update && \
    apt-get install --assume-yes libostree-1-1 tzdata

ENTRYPOINT bblfshd -log-level debug

I built the container, and entered in it:
docker build --rm --tag bblfsh-image .
docker run --detach --interactive --tty --rm --name bblfsh-container bblfsh-image
docker exec --interactive --tty bblfsh-container bash

and from its inside I installed the drivers and tried to parse a python file:

bblfshctl driver install python bblfsh/python-driver:latest;
echo "import something" > example.py
bblfshctl parse example.py

You get:

Installing python driver language from "bblfsh/python-driver:latest"... Done
Status: Fatal
Elapsed: 10.96427ms
Errors:
	- unexpected error: container_linux.go:265: starting container process caused "process_linux.go:250: running exec setns process for init caused \"exit status 34\""
[2017-11-17T21:42:48Z]  INFO bblfshd version: v2.2.0 (build: 2017-11-14T09:15:01+0000)
[2017-11-17T21:42:48Z]  INFO initializing runtime at /var/lib/bblfshd
[2017-11-17T21:42:48Z]  INFO server listening in 0.0.0.0:9432 (tcp)
[2017-11-17T21:42:48Z] DEBUG registering grpc service
[2017-11-17T21:42:48Z]  INFO control server listening in /var/run/bblfshctl.sock (unix)
...
[2017-11-17T21:43:23Z]  INFO driver python installed "bblfsh/python-driver:latest"
[2017-11-17T21:43:23Z] DEBUG detected language "python", filename "example.py"
[2017-11-17T21:43:23Z] DEBUG spawning driver instance "bblfsh/python-driver:latest" ...
nsenter: failed to unshare namespaces: Operation not permitted
WARN[0034] os: process already finished                 
[2017-11-17T21:43:23Z] ERROR error selecting pool: unexpected error: container_linux.go:265: starting container process caused "process_linux.go:250: running exec setns process for init caused \"exit status 34\""
[2017-11-17T21:43:23Z] ERROR request processed content 17 bytes, status Fatal elapsed=48.104549ms language=

QUALIFIED_IDENTIFIER is not SIMPLE_IDENTIFIER and duplication of CALL_CALLEE role

I found that QUALIFIED_IDENTIFIER is not SIMPLE_IDENTIFIER, but @vmarkovtsev say that it is supposed to be.
Also, I found duplication of CALL_CALLEE role.

How to reproduce:

from bblfsh.client import BblfshClient

filepath = "./matplotlib_example.py"
bc = BblfshClient("0.0.0.0:9432")
res = bc.parse(filepath, language='Python')
print(res)

matplotlib_example.py:

from matplotlib import pyplot as plt
plt.figure()

Output (lines 76-83):

       token: "figure"
        start_position {
          line: 2
          col: 1
        }
        roles: CALL_CALLEE
        roles: CALL_CALLEE
        roles: QUALIFIED_IDENTIFIER

The problem is in figure token. It means that we do not take into account function names during our machine leaning analysis

Client inside a container doesn't work as expected

According to documentation, this would be enough to make both server and client work with docker:

$ docker run --privileged -p 9432:9432 --name bblfsh bblfsh/server
$ docker run -v $(pwd):/work --link bblfsh bblfsh/server bblfsh client --address=bblfsh:9432 /work/sample.py

The second command generates an error instead:

$ docker run -v $(pwd):/work --link bblfsh bblfsh/server bblfsh client --address=bblfsh:9432 /work/sample.py
time="2017-09-07T10:03:07Z" level=info msg="binding to bblfsh:9432" 
time="2017-09-07T10:03:07Z" level=info msg="initializing runtime at /tmp/bblfsh-runtime" 
listen tcp 172.17.0.2:9432: bind: cannot assign requested address
time="2017-09-07T10:03:07Z" level=error msg="exiting with error: listen tcp 172.17.0.2:9432: bind: cannot assign requested address" 

Separating the linking and client execution parts seem to work though, so there may be same race condition there.

Reported by @dpordomingo

Server can't be launched like explained in bblfsh documentation

egor@science-3 ~/bblfsh-dev-image $docker run --privileged -p 9432:9432 --name bblfsh bblfsh/server
Unable to find image 'bblfsh/server:latest' locally
latest: Pulling from bblfsh/server
88286f41530e: Pull complete 
878f656258fa: Pull complete 
94595a4777da: Pull complete 
Digest: sha256:628b09f1a669a851abfecf9231ecfbaa07cda13a5fb34f8c1eb6d71dd1dbc6bc
Status: Downloaded newer image for bblfsh/server:latest
time="2017-07-05T16:47:33Z" level=debug msg="binding to 0.0.0.0:9432" 
time="2017-07-05T16:47:33Z" level=debug msg="initializing runtime at /tmp/bblfsh-runtime" 
invalid image driver format 
time="2017-07-05T16:47:33Z" level=error msg="exiting with error: invalid image driver format "

@abeaumont gave solution how it can be avoided right now:

export BBLFSH_DRIVER_IMAGES="a=a"; docker run -e BBLFSH_DRIVER_IMAGES --privileged -p 9432:9432 --name bblfsh bblfsh/server

but it should be fixed or documentation should be changed.

Log and rise gRPC message size

It's not a problem for grpc to have bigger sizes and was 100mb by default before 1.0.0

In use-case of src-d/berserker is not un-common to have 10s mb UASTs so may be shall we try 100mb default instead of 4?

It would also be nice to always log this parameter at runtime, for debugging purpose.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.