Giter Site home page Giter Site logo

Comments (25)

kaloyan-raev avatar kaloyan-raev commented on May 22, 2024 1

$content is currently kept in memory in addition to the AST. Which means we basically keep all file contents in memory. It is only used for formatting, so I propose we re-read the content on formatting request.

File content should be kept in memory only for those files, which the language server received a didOpen notification for. And discarded on a didClose notification. The language server should not read the content from the file system for files that are currently opened in the editor.

from php-language-server.

felixfbecker avatar felixfbecker commented on May 22, 2024

That really is a lot. Suggestions?

from php-language-server.

kaloyan-raev avatar kaloyan-raev commented on May 22, 2024

It would be nice to have some configurable memory limit. And have the language server smart enough to fit in this memory limit, i.e. If the memory limit is tiny, the language server should optimize the memory usage by moving parts to a file system cache.

Our use case with Eclipse Che, is that the PHP language server runs as an agent in a Docker container with a total memory of 1-2 GB. There are more language severs (JSON, CSS, JavaScript, etc.), Che workspace agent, etc. I would imagine that the PHP language server in our case should be limited to 100-200 MB of memory.

from php-language-server.

mniewrzal avatar mniewrzal commented on May 22, 2024

Regarding memory limit I think it would be nice to add command line argument like "--tcp" for socket connection and apply it to ini_set in bin/php-language-server.php.

As for the performance problem I will take a look what is causing such slow down.

from php-language-server.

kaloyan-raev avatar kaloyan-raev commented on May 22, 2024

We have some experience with indexing in Eclipse PDT / Zend Studio. It was impossible to hold all data in-memory. Even a single regular project using a popular framework go over 10,000 files. The nature of the Eclipse workspace allows multiple project at the same time and the memory usage goes really crazy.

For long years we used an H2 database to store the data. H2 is an embedded relational database and as such it has some performance limitations. In the last year we switched to using Apache Lucene - a text-based search engine. It improve the performance significantly. Currently, Eclipse PDT has a much better performance than we have here, even without the latest changes which caused the big slow down.

from php-language-server.

felixfbecker avatar felixfbecker commented on May 22, 2024

Moving structures to disk would help.

Some ideas to remove RAM usage:

  • I implemented the definition map at project level so that a reference to the PhpDocument is saved, not the node. An improvement would be to discard the AST, and reparse the document upon request for definition/reference.
  • $content is currently kept in memory in addition to the AST. Which means we basically keep all file contents in memory. It is only used for formatting, so I propose we re-read the content on formatting request.

Any ideas regarding index time? We traverse all ASTs twice at index time.

from php-language-server.

felixfbecker avatar felixfbecker commented on May 22, 2024

@kaloyan-raev I heard from other folks that rebuilding the Eclipse PDT index takes them ~3min.
We could use SQLite. Or go with serialize/unserialize

from php-language-server.

felixfbecker avatar felixfbecker commented on May 22, 2024

@mniewrzal I don't know why it is so extreme for you... I just tried it on Symfony + dependencies, which is 7100 files:

All PHP files parsed in 147 seconds. 98 MiB allocated.

from php-language-server.

felixfbecker avatar felixfbecker commented on May 22, 2024

Let me clone magento

from php-language-server.

mniewrzal avatar mniewrzal commented on May 22, 2024

Ok I will also check Symfony + dependencies to compare results.

from php-language-server.

kaloyan-raev avatar kaloyan-raev commented on May 22, 2024

I heard from other folks that rebuilding the Eclipse PDT index takes them ~3min

It depends on the project size and which version of Eclipse PDT. We have done lots of the optimizations in the last couple of years. The switch to Apache Lucene was done in PDT 4.0, which was released in June this year.

We could use SQLite.

This is a relational database too. H2 has much better performance than SQLite.

I would count more on a text-based search engine like Apache Lucene. I wonder what would be the Lucene equivalent in PHP?

from php-language-server.

felixfbecker avatar felixfbecker commented on May 22, 2024

What do you mean by "text-based search engine"? Isn't this here about persisting the index, which is mostly used for resolving definitions (which currently works through the FQN)? In that case key/value would be the best performance, but it must be something the user doesn't have to install manually. Actually saving the objects in files wouldn't be too bad because it basically is key/value.

from php-language-server.

felixfbecker avatar felixfbecker commented on May 22, 2024

Also I want to note that I had always had XDebug enabled, which is noted to be a performance hog in the PHPParser docs. We should compare this to non-xdebug and if it has a significant impact investigate dynamic php.ini files.

Speaking of XDebug, has anyone ever used it to profile a PHP app? I would like to know which functions take the most CPU time. For example, if it is the file IO after all, I would like to add an async file_get_contents that uses streaming and returns a promise. If it is the parser, we could parse the project divide&conquer style in separate processes.

Benchmarks in CI would also be nice. Could add projects like magento as git submodules to fixtures.

from php-language-server.

felixfbecker avatar felixfbecker commented on May 22, 2024

@mniewrzal Just tested magento. It is true that it is dog-slow. For me the LS crashed at around the 20,000th file, which was several minutes into parsing already. But RAM usage isn't nearly as high as you experienced it, it was <600MB.

from php-language-server.

mniewrzal avatar mniewrzal commented on May 22, 2024

Interesting :) Are you using clean version from master or maybe you have some local changes?

from php-language-server.

felixfbecker avatar felixfbecker commented on May 22, 2024

My bad, I was working on your PHP_CodeSniffer branch. You are right, I'm now at file 4500/24226, 3 min parsing, 1,3GB RAM usage.

from php-language-server.

kaloyan-raev avatar kaloyan-raev commented on May 22, 2024

One more optimization that can be done is to exclude some folders from indexing, especially those containing tests.

I did some file counting on the Magento project:

$ find . -type f -name '*.php' | wc -l
24225
$ find . -type f -name '*.php' | grep "[Tt]est" | wc -l
9035

9K out of 24K files seems to be test files. This is up to 37% of CPU time and memory if we avoid indexing them.

The exclusion pattern should be configurable with some meaningful default.

from php-language-server.

felixfbecker avatar felixfbecker commented on May 22, 2024

Personally I want global symbol search to also work for tests though.

from php-language-server.

kaloyan-raev avatar kaloyan-raev commented on May 22, 2024

Sure, this is why it should be configurable. Some IDEs (like Eclipse) provide users with the ability to configure which folders are "source folders" and which not in a project. Such IDE should ask the language server to index only the source folders, but skip the rest.

This would be just empowering users to optimize performance when working with huge projects like Magento.

from php-language-server.

felixfbecker avatar felixfbecker commented on May 22, 2024

I profiled the LS while indexing magento.

profile

image

from php-language-server.

mniewrzal avatar mniewrzal commented on May 22, 2024

With latest master there is problem with indexing large PHP file e.g. https://github.com/composer/composer/blob/master/tests/Composer/Test/Autoload/Fixtures/classmap/LargeClass.php
There is also similar file in magento. During parsing such file LS crashed and restarted.

PHP Fatal error:  Allowed memory size of 268435456 bytes exhausted (tried to allocate 20480 bytes) in /home/wywrzal/git/vscode-php-intellisense/vendor/nikic/php-parser/lib/PhpParser/Parser/Php7.php on line 2335

from php-language-server.

felixfbecker avatar felixfbecker commented on May 22, 2024

@mniewrzal with unlimited memory limit? These large files definitely dont need to be parsed. Can we somehow catch the error? Or implement the support for ignoring files with globs, like **/Fixtures/*.*. Or we do a stat on the file first and don't parse it if it is too large. Could you open a separate issue for this? I would like to close this

from php-language-server.

mniewrzal avatar mniewrzal commented on May 22, 2024

My bad, I didn't recompile plugin sources. I was testing memory limit parameter and I had 256M limitation. Now everything is working like before. But I think it would be good to think about some limitation for PHP file size. This one file is bumping memory consumption from 18MiB to 542MiB for whole composer project.

from php-language-server.

felixfbecker avatar felixfbecker commented on May 22, 2024

Sure, a simple filesize() call should do. No real PHP file will be that large, it is only a fixture for testing. Please open another issue.

from php-language-server.

mniewrzal avatar mniewrzal commented on May 22, 2024

I think this one can be closed. Another issues can be processed with separate bug.

from php-language-server.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.