Comments (23)
Hi @mindplay-dk.
We've been trying to find ways to make phan fast on very large code bases and daemonizing it is a thing that folks have been talking about.
In the immediate term we've dodged the need to do that by providing persistence via a sqlite database (which you can use via the '-s' flag). By doing one long run and saving state, we're able to do very fast subsequent runs on just the changed files. You can give this a try by running phan on itself;
time ./phan -s phan.sqlite src/**/*.php
touch ./src/Phan/Phan.php
time ./phan -s phan.sqlite src/**/*.php
The second run which only has to parse and analyze the one changed file should be much faster. For very large code bases you'd want to run it as
phan -s .phan/database -r `git diff origin/master --name-only`
which will re-analyze only the files that differ from origin/master.
We have a vim plugin for phan (which works within syntastic). Right now it takes about 600ms to run an analysis using the stored state in sqlite which isn't quite good enough. We're shooting for 200ms response times and I'm not entirely sure how we'll get there. Daemonizing may end up being the approach we'll need to take if we can't find some time to shave off otherwise.
from phan.
It's great that you've been thinking about this :-)
Your work on caching in a local DB isn't wasted anyhow, as it'll speed up offline testing, e.g. when you're frequently running an entire build, so that effort is worthwhile either way :-)
But even 200 msec would feel high if attempting to use this to generate real-time feedback in an editor - it shouldn't scan or report errors for the entire codebase, just for the file you're currently editing, so the design would need to allow for refreshing and inspecting only a specified file.
The fact that changes in the file you're editing could lead to failed inspections in other files, is less interesting in terms of real-time feedback - updating inspections across the codebase can happen 600 msec later, no problem, the issue is real-time feedback for changes made to the file in front of you before it's even saved. For that, I think daemonizing is probably the only real way to go.
(of course, that's no necessarily part of your goals at all, but it would be nice, as I'm getting tired of waiting for a certain IDE to add certain, obvious missing inspections, which have been on their bug-tracker literally for years. I'd love to leave heavyweight IDEs behind, if I could get quality inspections and auto-completion from lightweight editors like Atom, Code, Brackets, etc.)
from phan.
Could we just shell out to php -l
for real-time feedback in an editor?
from phan.
@mindplay-dk After you've indexed everything initially (a process which takes some time even in commercial IDEs), the time-to-feedback in an editor should largely depend on how quickly you can query information from the sqlite database, which should be reasonably quick (or at least quick enough to not get frustrated at your editor). If it's not quick enough, the sqlite database could be moved to a ramdisk.
from phan.
Note that for autocompletion of php, padawan has taken also the way of being runned has a service running in background https://github.com/mkusher/padawan.php
from phan.
@allan-simon looks interesting - though the goal here appears to be auto-completion, not inspection?
@cweagans SQLite also supports volatile/in-memory databases - so one could imagine a simple service which runs the full inspections, but doesn't hit the disk for a database or reloads PHP scripts. (Benchmarking this option should be as simple as switching to a memory database, then running the full inspection in a loop...)
from phan.
@mindplay-dk , yup exactly, padawan is only meant for autocompletion , but they do type infering through inspection, my point was mainly to show this project is build on a service architecture, which could be an inspiration if Phan was to go that way too :)
from phan.
FWIW, I definitely want a choice as to whether or not Phan is running in the background. Ctags is really nice in that I can "index" my code, and then just do lookups in a file without having to have something hogging my RAM while I work with my other lightweight tools.
from phan.
I definitely want a choice as to whether or not Phan is running in the background
@cweagans of course! I wasn't suggesting that running analysis should replace on-demand analysis - running analysis would be a nice option, but on-demand analysis is crucial for use in continuous integration servers, build tools, etc.
from phan.
@mindplay-dk So what gain do you see by having a process running in the background? As I see it, the only thing that needs to be running is a file watcher, and when a file changes, the normal one-off analysis can be run on that file. Is that accurate? Or are you envisioning something else?
from phan.
A few notes on this;
- We're looking at using https://facebook.github.io/watchman/ to watch the code base and keep the sqlite database up to date.
- Using the database to analyze a patch is taking about 600ms or so which isn't great for use within a text editor, so we're going to want to find some way to speed that up. Daemonizing to avoid sql queries may be something we need to do.
from phan.
Oh, watchman looks cool. I was looking at https://github.com/koala-framework/file-watcher yesterday, but Watchman seems like a much nicer solution!
from phan.
@cweagans the idea is to bring the pace up to real-time, for integration with an IDE/editor. That's all.
@morria I haven't really examined the model yet, but is there any reason why it wouldn't be possible to keep the model in memory and patch it when files change?
from phan.
@mindplay-dk Keeping everything in memory and watching for file changes should work well.
As it stands now, we (not entirely correctly) flush elements defined in a file from the sqlite database whenever that file changes, but we do not flush those elements from memory. It should be fairly easy to do that part.
from phan.
Using sqlite::memory: in daemon mode would be an easy solution. But because of the data marshalling needed not quite as fast using a native persistent memory model.
from phan.
So you are discussing something like hh_client
- the hhvm typechecker?
from phan.
@morria if it's easy to keep and partially flush the model, that would be ideal - I'm going to bet it's a bit more work than just switching to sqlite::memory, but as @rlerdorf says, you'd still have the marshalling overhead.
Another issue that comes to mind, is resilience to incomplete/invalid code. I don't know that this is a feature you'd even want in your static analysis engine? It would need to be optional, since it doesn't make any sense to use it when analyzing a codebase for correctness.
I think there's two ways to approach this: (1) in the analysis engine, with actual support for partial/invalid code, or (2) make it the client's responsibility.
The latter is probably much simpler on the analysis side, and moves some of the complexity to the client IDE/editor, since the analysis engine would simply reject invalid/incorrect code. The client would query an entire file at a time, getting a full map of all the available analysis information (e.g.indexed by line-numbers) whenever the file is in a valid state, and would then hold and modify the information locally - it would locally query this data set, and modify the map in-memory, by re-mapping line numbers when the user inserts/deletes line-breaks; it would get an updated analysis only when the file is in a good/valid state. This is probably a fair trade-off between complexity and reliability.
In either case, the client needs to be able to inject a file for analysis, e.g. submitting changes that haven't been persisted to disk yet. This is a daemon concern only, and probably could be implemented at the daemon-level easily. An alternative, simplified approach, would be for the client to mirror (PHP files from) the actual workspace into a temporary folder, and run the analysis against that folder - saving the working file on every keystroke to re-trigger analysis...
from phan.
An alternative, simplified approach, would be for the client to mirror (PHP files from) the actual workspace into a temporary folder
For the record, this is what PHP Storm does - it aggressively saves files to a temporary folder on every keystroke, and runs tools like CS and MD against the copy.
from phan.
I'm hoping to take a look at this shortly. Thanks for all of the discussion, everyone.
from phan.
This hasn't been touched in a long time, so I'm going to close it.
Let me know if it should be re-opened.
from phan.
I implemented a POC of the daemon option in #563. It accepts a single file (or a small list of files). The command line interface and request format may change in the near future. Also, the combination of pcntl (forking processes) and opening tcp streams before forking may or may not be reliable on all systems. There's still some bugs to work out when classes are added/removed (E.g. switching git branches)
It ended up being similar to what was mentioned in #22 (comment). The startup time and the memory usage (when paused) should be roughly the same as the time needed for the parse phase alone.
-
To reduce latency, it uses
--quick
mode. It also skips steps such as plugins if a function/method/class/etc isn't the file being looked at at the moment (Not 100% correct, but close enough for many uses). It doesn't have support for getting files that depend on the analyzed files, either. -
Each request takes 100ms for phan itself
-
Each request takes 300ms in a larger codebase. It's probably possible to bring this down a bit more with low effort, in two ways
- Using the inotify pecl (or a separate helper process running inotify) to tell the daemon when files were added or removed from folders phan cares about (instead of making phan recurse through all of the directories on every new request). ~100ms of that time was looking up files
- Adding an additional data structure mapping file names to lists of functions, lists of classes, etc.
(If there's a lot of those methods when Phan loops through methods in the 'method' phase, calling$method->getContext()->getFile()
to skip things if they weren't in a file we care about, on all of those methods, would add up to 100s of milliseconds.
from phan.
A draft of the documentation is on the wiki at https://github.com/etsy/phan/wiki/Using-Phan-Daemon-Mode
Daemon mode can be tried out if the master branch of phan (dev-master) is installed.
- The service requires linux/unix because it requires pcntl.
- Windows would require a linux VM to run (i.e. in a VM with a shared folder or docker image, with the addition of SSH port forwarding or changing the Phan daemon code from listening on 127.0.0.1 (only requests from localhost) to listening on more/all interfaces.
from phan.
The original issue has been addressed, please open new issues for related tickets (e.g. instructions for using it on windows, using a different protocol, autocompletion, etc).
from phan.
Related Issues (20)
- PhanAccessReadOnlyProperty seems overzealous on array properties
- Support for GitHub Workflow Commands
- `@param 'AND'|'OR'` interpreted as `'AND'|'OR'|string`, but `'AND'|'OR'` when in `@method` definition.
- PhanUnreferencedUseNormal reported in constructor promotion
- PhanImpossibleCondition: testing for initialized static variable
- __CLASS__ no longer has the MAGIC_CLASS flag in PHP 8.3's native AST
- Symlink loops prevent finding all files, but phan still exits with a successful exit code
- Generic abstract classes don't respect type annotations
- False positive that xml_parser_create() can not take null as parameter.
- Phan reports an error for trait methods that have a return type hint of static and a return value of $this
- if($var) should not modify the type of $var.
- Question: PhanPossiblyUndeclaredVariable HOT 2
- PhanUndeclaredThis when newThis assigned to closure HOT 1
- `static` return type loses information about intersection types
- False positive PhanTypePossiblyInvalidDimOffset after array_key_exists
- Odd `PhanEmptyFQSENInClasslike` error when an intersection type is the object in an array callable
- Incorrect line number for PhanPluginMixedKeyNoKey and nested arrays
- `@return mixed` annotation suppressed errors about nulls being used even with `allow_overriding_vague_return_types` set to true
- Type condition lost when intermediary variable is used (PhanUndeclaredMethod)
- PhanUndeclaredConstant when constant is loaded late
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from phan.