Giter Site home page Giter Site logo

Language analyzer as a service about phan HOT 23 CLOSED

phan avatar phan commented on May 22, 2024
Language analyzer as a service

from phan.

Comments (23)

morria avatar morria commented on May 22, 2024

Hi @mindplay-dk.

We've been trying to find ways to make phan fast on very large code bases and daemonizing it is a thing that folks have been talking about.

In the immediate term we've dodged the need to do that by providing persistence via a sqlite database (which you can use via the '-s' flag). By doing one long run and saving state, we're able to do very fast subsequent runs on just the changed files. You can give this a try by running phan on itself;

time ./phan -s phan.sqlite src/**/*.php
touch ./src/Phan/Phan.php
time ./phan -s phan.sqlite src/**/*.php

The second run which only has to parse and analyze the one changed file should be much faster. For very large code bases you'd want to run it as

phan -s .phan/database -r `git diff origin/master --name-only`

which will re-analyze only the files that differ from origin/master.

We have a vim plugin for phan (which works within syntastic). Right now it takes about 600ms to run an analysis using the stored state in sqlite which isn't quite good enough. We're shooting for 200ms response times and I'm not entirely sure how we'll get there. Daemonizing may end up being the approach we'll need to take if we can't find some time to shave off otherwise.

from phan.

mindplay-dk avatar mindplay-dk commented on May 22, 2024

It's great that you've been thinking about this :-)

Your work on caching in a local DB isn't wasted anyhow, as it'll speed up offline testing, e.g. when you're frequently running an entire build, so that effort is worthwhile either way :-)

But even 200 msec would feel high if attempting to use this to generate real-time feedback in an editor - it shouldn't scan or report errors for the entire codebase, just for the file you're currently editing, so the design would need to allow for refreshing and inspecting only a specified file.

The fact that changes in the file you're editing could lead to failed inspections in other files, is less interesting in terms of real-time feedback - updating inspections across the codebase can happen 600 msec later, no problem, the issue is real-time feedback for changes made to the file in front of you before it's even saved. For that, I think daemonizing is probably the only real way to go.

(of course, that's no necessarily part of your goals at all, but it would be nice, as I'm getting tired of waiting for a certain IDE to add certain, obvious missing inspections, which have been on their bug-tracker literally for years. I'd love to leave heavyweight IDEs behind, if I could get quality inspections and auto-completion from lightweight editors like Atom, Code, Brackets, etc.)

from phan.

jazzdan avatar jazzdan commented on May 22, 2024

Could we just shell out to php -l for real-time feedback in an editor?

from phan.

cweagans avatar cweagans commented on May 22, 2024

@mindplay-dk After you've indexed everything initially (a process which takes some time even in commercial IDEs), the time-to-feedback in an editor should largely depend on how quickly you can query information from the sqlite database, which should be reasonably quick (or at least quick enough to not get frustrated at your editor). If it's not quick enough, the sqlite database could be moved to a ramdisk.

from phan.

allan-simon avatar allan-simon commented on May 22, 2024

Note that for autocompletion of php, padawan has taken also the way of being runned has a service running in background https://github.com/mkusher/padawan.php

from phan.

mindplay-dk avatar mindplay-dk commented on May 22, 2024

@allan-simon looks interesting - though the goal here appears to be auto-completion, not inspection?

@cweagans SQLite also supports volatile/in-memory databases - so one could imagine a simple service which runs the full inspections, but doesn't hit the disk for a database or reloads PHP scripts. (Benchmarking this option should be as simple as switching to a memory database, then running the full inspection in a loop...)

from phan.

allan-simon avatar allan-simon commented on May 22, 2024

@mindplay-dk , yup exactly, padawan is only meant for autocompletion , but they do type infering through inspection, my point was mainly to show this project is build on a service architecture, which could be an inspiration if Phan was to go that way too :)

from phan.

cweagans avatar cweagans commented on May 22, 2024

FWIW, I definitely want a choice as to whether or not Phan is running in the background. Ctags is really nice in that I can "index" my code, and then just do lookups in a file without having to have something hogging my RAM while I work with my other lightweight tools.

from phan.

mindplay-dk avatar mindplay-dk commented on May 22, 2024

I definitely want a choice as to whether or not Phan is running in the background

@cweagans of course! I wasn't suggesting that running analysis should replace on-demand analysis - running analysis would be a nice option, but on-demand analysis is crucial for use in continuous integration servers, build tools, etc.

from phan.

cweagans avatar cweagans commented on May 22, 2024

@mindplay-dk So what gain do you see by having a process running in the background? As I see it, the only thing that needs to be running is a file watcher, and when a file changes, the normal one-off analysis can be run on that file. Is that accurate? Or are you envisioning something else?

from phan.

morria avatar morria commented on May 22, 2024

A few notes on this;

  • We're looking at using https://facebook.github.io/watchman/ to watch the code base and keep the sqlite database up to date.
  • Using the database to analyze a patch is taking about 600ms or so which isn't great for use within a text editor, so we're going to want to find some way to speed that up. Daemonizing to avoid sql queries may be something we need to do.

from phan.

cweagans avatar cweagans commented on May 22, 2024

Oh, watchman looks cool. I was looking at https://github.com/koala-framework/file-watcher yesterday, but Watchman seems like a much nicer solution!

from phan.

mindplay-dk avatar mindplay-dk commented on May 22, 2024

@cweagans the idea is to bring the pace up to real-time, for integration with an IDE/editor. That's all.

@morria I haven't really examined the model yet, but is there any reason why it wouldn't be possible to keep the model in memory and patch it when files change?

from phan.

morria avatar morria commented on May 22, 2024

@mindplay-dk Keeping everything in memory and watching for file changes should work well.

As it stands now, we (not entirely correctly) flush elements defined in a file from the sqlite database whenever that file changes, but we do not flush those elements from memory. It should be fairly easy to do that part.

from phan.

rlerdorf avatar rlerdorf commented on May 22, 2024

Using sqlite::memory: in daemon mode would be an easy solution. But because of the data marshalling needed not quite as fast using a native persistent memory model.

from phan.

staabm avatar staabm commented on May 22, 2024

So you are discussing something like hh_client - the hhvm typechecker?

from phan.

mindplay-dk avatar mindplay-dk commented on May 22, 2024

@morria if it's easy to keep and partially flush the model, that would be ideal - I'm going to bet it's a bit more work than just switching to sqlite::memory, but as @rlerdorf says, you'd still have the marshalling overhead.

Another issue that comes to mind, is resilience to incomplete/invalid code. I don't know that this is a feature you'd even want in your static analysis engine? It would need to be optional, since it doesn't make any sense to use it when analyzing a codebase for correctness.

I think there's two ways to approach this: (1) in the analysis engine, with actual support for partial/invalid code, or (2) make it the client's responsibility.

The latter is probably much simpler on the analysis side, and moves some of the complexity to the client IDE/editor, since the analysis engine would simply reject invalid/incorrect code. The client would query an entire file at a time, getting a full map of all the available analysis information (e.g.indexed by line-numbers) whenever the file is in a valid state, and would then hold and modify the information locally - it would locally query this data set, and modify the map in-memory, by re-mapping line numbers when the user inserts/deletes line-breaks; it would get an updated analysis only when the file is in a good/valid state. This is probably a fair trade-off between complexity and reliability.

In either case, the client needs to be able to inject a file for analysis, e.g. submitting changes that haven't been persisted to disk yet. This is a daemon concern only, and probably could be implemented at the daemon-level easily. An alternative, simplified approach, would be for the client to mirror (PHP files from) the actual workspace into a temporary folder, and run the analysis against that folder - saving the working file on every keystroke to re-trigger analysis...

from phan.

mindplay-dk avatar mindplay-dk commented on May 22, 2024

An alternative, simplified approach, would be for the client to mirror (PHP files from) the actual workspace into a temporary folder

For the record, this is what PHP Storm does - it aggressively saves files to a temporary folder on every keystroke, and runs tools like CS and MD against the copy.

from phan.

morria avatar morria commented on May 22, 2024

I'm hoping to take a look at this shortly. Thanks for all of the discussion, everyone.

from phan.

morria avatar morria commented on May 22, 2024

This hasn't been touched in a long time, so I'm going to close it.

Let me know if it should be re-opened.

from phan.

TysonAndre avatar TysonAndre commented on May 22, 2024

I implemented a POC of the daemon option in #563. It accepts a single file (or a small list of files). The command line interface and request format may change in the near future. Also, the combination of pcntl (forking processes) and opening tcp streams before forking may or may not be reliable on all systems. There's still some bugs to work out when classes are added/removed (E.g. switching git branches)

It ended up being similar to what was mentioned in #22 (comment). The startup time and the memory usage (when paused) should be roughly the same as the time needed for the parse phase alone.

  • To reduce latency, it uses --quick mode. It also skips steps such as plugins if a function/method/class/etc isn't the file being looked at at the moment (Not 100% correct, but close enough for many uses). It doesn't have support for getting files that depend on the analyzed files, either.

  • Each request takes 100ms for phan itself

  • Each request takes 300ms in a larger codebase. It's probably possible to bring this down a bit more with low effort, in two ways

    1. Using the inotify pecl (or a separate helper process running inotify) to tell the daemon when files were added or removed from folders phan cares about (instead of making phan recurse through all of the directories on every new request). ~100ms of that time was looking up files
    2. Adding an additional data structure mapping file names to lists of functions, lists of classes, etc.
      (If there's a lot of those methods when Phan loops through methods in the 'method' phase, calling $method->getContext()->getFile() to skip things if they weren't in a file we care about, on all of those methods, would add up to 100s of milliseconds.

from phan.

TysonAndre avatar TysonAndre commented on May 22, 2024

A draft of the documentation is on the wiki at https://github.com/etsy/phan/wiki/Using-Phan-Daemon-Mode

Daemon mode can be tried out if the master branch of phan (dev-master) is installed.

  • The service requires linux/unix because it requires pcntl.
  • Windows would require a linux VM to run (i.e. in a VM with a shared folder or docker image, with the addition of SSH port forwarding or changing the Phan daemon code from listening on 127.0.0.1 (only requests from localhost) to listening on more/all interfaces.

from phan.

TysonAndre avatar TysonAndre commented on May 22, 2024

The original issue has been addressed, please open new issues for related tickets (e.g. instructions for using it on windows, using a different protocol, autocompletion, etc).

from phan.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.