Giter Site home page Giter Site logo

Comments (14)

tomchristie avatar tomchristie commented on June 3, 2024 3

The issue is that the site navigation requires the entire pages collection to be available for the one page to be rendered. This is where caching and/or concurrency would likely be helpful. For that matter, the pages don't need to all be fully rendered, but they all do need to be read and processed to a certain extent to determine the page title, etc for the nav.

Okay, so I've been working on this and I've got enough to demo now...

https://github.com/mkdocs/sketch/tree/main

That's a work-in-progress of "how could mkdocs look" that properly deals with this issue.

Specifically, the mkdocs serve command doesn't require a site build at all*

I needed to do a bit of poking to make this work with the terraform example above (since it doesn't include a nav config), tho once I'd done there serve startup time was under a second.

There's other aspects that I'm looking to address as part of that work, just getting things into shape so that I've got a coherent body of work to start sharing here.

* search indexes aren't in there just yet. yes they would require a full-site build, but we can use HTML rel=preload links to prompt them in the background, and likely also have per-page caching.

from mkdocs.

pawamoy avatar pawamoy commented on June 3, 2024 2

Also, solid work @kamilkrzyskow πŸ‘ Thanks for making and sharing all this!

from mkdocs.

squidfunk avatar squidfunk commented on June 3, 2024 1

Users have mentioned this in multiple occasions, but I'm having a hard time finding it due to GitHub's rather mediocre issue search. Here's what I could gather from a quick search:

The fact is that 30min is a worst case scenario. Even a repeated build that takes 1 minute is too slow to be useful, and --dirtyreload isn't a workable solution due to the problems stated, especially for plugin authors. It also doesn't solely depend on the number of pages, but on the plugins used. Thus, discussing how plugins and the core could better work together to employ caching and reduce build time is a discussion we should start.

Running out of memory is another problem that should be fixed, as already discussed in #2669

from mkdocs.

pawamoy avatar pawamoy commented on June 3, 2024 1

Perhaps some sort of another minify plugin needs to be released which uses some C/Rust libraries to handle the minification process πŸ€”

Like this one https://github.com/monosans/mkdocs-minify-html-plugin? Could you build once with it and see if you just spared 950 seconds or so πŸ˜›? It only minifies HTML files though apparently (but still CSS and JS within them).

from mkdocs.

squidfunk avatar squidfunk commented on June 3, 2024 1

Ah, nice, I didn't know about the minify-html plugin! I'll check it out and probably switch to it. Offloading pure string processing to Rust makes a lot of sense.

from mkdocs.

waylan avatar waylan commented on June 3, 2024 1

Caching for later builds with mkdocs serve won't help much, as it immediately turns off the prospective user. Also rendering the whole docs in the background with concurrency seems also like a waste of resources when I only want to check one web page only.

So I would like to see some sort of on-demand loading, serve would only process index.html and later only load pages when navigating to them. This of course breaks the last on_post_build event, as plugins expect all files to be present in the site directory, so invoking it after only a few pages were built could lead to issues. Other events are more agile IMO

The issue is that the site navigation requires the entire pages collection to be available for the one page to be rendered. This is where caching and/or concurrency would likely be helpful. For that matter, the pages don't need to all be fully rendered, but they all do need to be read and processed to a certain extent to determine the page title, etc for the nav.

And then there are those scenarios where a page's content consists of the pages collection (either be means of a plugin or as a static template). In that case, to render that page (even if the nav is excluded), the entire pages collection is needed.

Ultimately, it has been the above two issues which have thus far prevented a better solution from being developed. Work out a way to address those and then we may have a workable solution.

from mkdocs.

pawamoy avatar pawamoy commented on June 3, 2024 1

Quick thought: what if plugins informed MkDocs whether each one of their hooks could be executed concurrently, or only sequentially? I'm imagining some utilities to build a "pipeline" of things to run depending on whether they support concurrency or not.

Quick flowchart which doesn't make sense but illustrate the idea:

flowchart TD
    p1f["plugin1.on_files"]
    p2f["plugin2.on_files"]
    p3f["plugin3.on_files"]
    p1n["plugin1.on_nav"]
    p2n["plugin2.on_nav"]
    p3n["plugin3.on_nav"]
    p1pm["plugin1.on_page_markdown"]
    p2pm["plugin2.on_page_markdown"]
    p3pm["plugin3.on_page_markdown"]
    start --> p1f & p2f
    p1f & p2f  --> p3f
    p3f --> p1n & p2n & p3n
    p1n & p2n & p3n --> p1pm
    p1pm --> p2pm & p3pm
  • Plugin 1 and 2 on_files run concurrently, then plugin 3 on_files sequentially.
  • Plugin 1, 2 and 3 on_nav run concurrently.
  • Plugin 1 on_page_markdown runs sequentially, then plugin 2 and 3 on_page_markdown run concurrently.

EDIT: hmm I suppose there's another possible layer of concurrency on files/pages themselves. The transformation pipeline would likely be quite complex. I'm sick and have fever today so please be indulgent πŸ˜‚

from mkdocs.

humitos avatar humitos commented on June 3, 2024 1

Quick thought: what if plugins informed MkDocs whether each one of their hooks could be executed concurrently, or only sequentially?

This is exactly what Sphinx does. Each extension defines if it's safe for parallel reading and/or parallel writing. See https://www.sphinx-doc.org/en/master/extdev/index.html#extension-metadata

I haven't checked how it works internally, but it's probably something to explore a little more and see if there are some ideas that can be reused.

from mkdocs.

pawamoy avatar pawamoy commented on June 3, 2024

Are there public examples of large repositories that take up to 30 minutes to build? I tried locally with 10K dummy files and ran out of memory before the site was built πŸ˜… With 1K files, the template rendering seems to be the most costly.

from mkdocs.

squidfunk avatar squidfunk commented on June 3, 2024

This is a project with 3,400 files and a very limited set of plugins, i.e., search, minify and social:
https://github.com/openfabr/terraform-provider-cdk-docs

IMHO, not many plugins, and the social plugin which I wrote employs caching, which means repeated builds are much cheaper due to leveraging cached images. I've built the project on my machine, an M2 MacBook Pro:

First build

INFO    -  Cleaning site directory
INFO    -  Building documentation to directory: .../terraform-provider-cdk-docs/site
...
INFO    -  Documentation built in 537.92 seconds

Repeated build (social plugin cached)

INFO    -  Cleaning site directory
INFO    -  Building documentation to directory: .../terraform-provider-cdk-docs/site
...
INFO    -  Documentation built in 487.02 seconds

It's infeasible to make edits on this project without --dirtyreload, which as mentioned is incorrect, plus the author has to wait for more than 8 minutes until the live reload server becomes available. Add a few more plugins and a few hundred more pages and you're up to 20 minutes.

from mkdocs.

kamilkrzyskow avatar kamilkrzyskow commented on June 3, 2024

I tested the repository mentioned above on my Ryzen 3600 Windows 10 PC, mkdocs==1.6.0, mkdocs-material==9.5.20
First build:

$ mkdocs build
INFO    -  Cleaning site directory
INFO    -  Building documentation to directory: C:\MyFiles\_git\removable\performance-test\site
... A lot of warnings about absolute paths etc., which could also impact performance due to printing to Terminal
INFO    -  Documentation built in 1335.42 seconds

Repeated build:
I do not dare to run it again πŸ˜…

I used my performance_debug hook.
Debug YAML result: performance_debug_first.yml.txt
More info about the categories can be found in the gist Python file, but most should be self explanatory, but the amount of files could have generated quite a bit of noise πŸ€”

  PLUGINS_PER_EVENTS:
    on_post_page|mkdocs_minify_plugin.plugin.MinifyPlugin: 958.97267 # The main culprit of the long build time
    on_page_context|material.plugins.search.plugin.SearchPlugin: 23.14142 # Expected given the amount of files
    on_config|material.plugins.social.plugin.SocialPlugin: 10.61701 # on_config not expected being affected by amount of files, is it always this slow?
    on_page_markdown|material.plugins.social.plugin.SocialPlugin: 1.20333 # magic of concurrency
    ...
    on_post_build|material.plugins.social.plugin.SocialPlugin: 0.00389 # magic of concurrency

Currently the mkdocs serve will invoke the same as mkdocs build, so the benchmark results apply there too.
The main issue is with the minify plugin, a much cheaper (performance-wise) minification, of sorts, could be achieved using Jinja2 Environment settings, which I mentioned here, and another approach would be proper enforcement of whitespace management inside the template files, via the %- tags. Perhaps some sort of another minify plugin needs to be released which uses some C/Rust libraries to handle the minification process πŸ€”

  MARKDOWN_PER_CLASSES:
    pymdownx.superfences.SuperFencesBlockPreprocessor: 11.09541
    markdown.treeprocessors.InlineProcessor: 10.98559

I'm surprised those markdown values are so low, as last time I checked with GMC (~190 files) the same classes had ~6 seconds each. Perhaps the complexity of the Markdown or the amount of Code Blocks has a bigger impact than I thought. But still 3k files vs 200 files and only a x2 time increase seems odd hmm

Template rendering took ~270 seconds:

  TEMPLATE_ROOTS:
    main.html|sum: 267.45968

this time gets repeated each re-serve without --dirtyreload


Caching for later builds with mkdocs serve won't help much, as it immediately turns off the prospective user.
Also rendering the whole docs in the background with concurrency seems also like a waste of resources when I only want to check one web page only.

So I would like to see some sort of on-demand loading, serve would only process index.html and later only load pages when navigating to them. This of course breaks the last on_post_build event, as plugins expect all files to be present in the site directory, so invoking it after only a few pages were built could lead to issues. Other events are more agile IMO

I guess this would requires a fork in mkdocs serve and mkdocs build event loops? Rather risky, but would allow for more control maybe? Just a first top of the head idea ✌️

from mkdocs.

dr-br avatar dr-br commented on June 3, 2024

I would like to be able to use parallel build. It has been stated in #1900 that the benefit is not so high. However, I have lots of jupyter-notebooks to convert (the execute step consumes most of the time). I ended up executing all notebooks concurrently in advance.

from mkdocs.

pawamoy avatar pawamoy commented on June 3, 2024

Nice work @tomchristie!

  • search indexes aren't in there just yet. yes they would require a full-site build, but we can use HTML rel=preload links to prompt them in the background, and likely also have per-page caching.

In the case of mkdocstrings and its cross-references ability, rel=preload wouldn't be enough. To statically resolve a cross-reference, we must wait for all pages to have been built. The only way to make cross-references work when serving pages on the fly (without building everything) would be to inject some Javascript magic πŸ€” Like, the plugin would store query-able state in the server, that the client could continuously request, until all needed pages were loaded with rel=preload and the unresolved references on the current page can be resolved πŸ€” And since we don't know which pages are needed to resolve a reference, all pages would have to be pre-loaded anyway πŸ€” (or, if not all, maybe most pages, with a priority order or something).

from mkdocs.

squidfunk avatar squidfunk commented on June 3, 2024

This looks really promising! Really excited how this will work with more complex setups. I guess there're still things to be worked out (haven't checked the implementation), but it's a great start! πŸ‘

from mkdocs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.