Comments (14)
The issue is that the site navigation requires the entire pages collection to be available for the one page to be rendered. This is where caching and/or concurrency would likely be helpful. For that matter, the pages don't need to all be fully rendered, but they all do need to be read and processed to a certain extent to determine the page title, etc for the nav.
Okay, so I've been working on this and I've got enough to demo now...
https://github.com/mkdocs/sketch/tree/main
That's a work-in-progress of "how could mkdocs look" that properly deals with this issue.
Specifically, the mkdocs serve
command doesn't require a site build at all*
I needed to do a bit of poking to make this work with the terraform example above (since it doesn't include a nav
config), tho once I'd done there serve startup time was under a second.
There's other aspects that I'm looking to address as part of that work, just getting things into shape so that I've got a coherent body of work to start sharing here.
* search indexes aren't in there just yet. yes they would require a full-site build, but we can use HTML rel=preload
links to prompt them in the background, and likely also have per-page caching.
from mkdocs.
Also, solid work @kamilkrzyskow π Thanks for making and sharing all this!
from mkdocs.
Users have mentioned this in multiple occasions, but I'm having a hard time finding it due to GitHub's rather mediocre issue search. Here's what I could gather from a quick search:
- #1900
- #990 (not 30min, but 3-5min with 750 pages)
- squidfunk/mkdocs-material#1887 (comment)
- squidfunk/mkdocs-material#2110 (comment)
- squidfunk/mkdocs-material#2110 (comment)
The fact is that 30min is a worst case scenario. Even a repeated build that takes 1 minute is too slow to be useful, and --dirtyreload
isn't a workable solution due to the problems stated, especially for plugin authors. It also doesn't solely depend on the number of pages, but on the plugins used. Thus, discussing how plugins and the core could better work together to employ caching and reduce build time is a discussion we should start.
Running out of memory is another problem that should be fixed, as already discussed in #2669
from mkdocs.
Perhaps some sort of another minify plugin needs to be released which uses some C/Rust libraries to handle the minification process π€
Like this one https://github.com/monosans/mkdocs-minify-html-plugin? Could you build once with it and see if you just spared 950 seconds or so π? It only minifies HTML files though apparently (but still CSS and JS within them).
from mkdocs.
Ah, nice, I didn't know about the minify-html plugin! I'll check it out and probably switch to it. Offloading pure string processing to Rust makes a lot of sense.
from mkdocs.
Caching for later builds with
mkdocs serve
won't help much, as it immediately turns off the prospective user. Also rendering the whole docs in the background with concurrency seems also like a waste of resources when I only want to check one web page only.So I would like to see some sort of on-demand loading,
serve
would only process index.html and later only load pages when navigating to them. This of course breaks the laston_post_build
event, as plugins expect all files to be present in thesite
directory, so invoking it after only a few pages were built could lead to issues. Other events are more agile IMO
The issue is that the site navigation requires the entire pages collection to be available for the one page to be rendered. This is where caching and/or concurrency would likely be helpful. For that matter, the pages don't need to all be fully rendered, but they all do need to be read and processed to a certain extent to determine the page title, etc for the nav.
And then there are those scenarios where a page's content consists of the pages collection (either be means of a plugin or as a static template). In that case, to render that page (even if the nav is excluded), the entire pages collection is needed.
Ultimately, it has been the above two issues which have thus far prevented a better solution from being developed. Work out a way to address those and then we may have a workable solution.
from mkdocs.
Quick thought: what if plugins informed MkDocs whether each one of their hooks could be executed concurrently, or only sequentially? I'm imagining some utilities to build a "pipeline" of things to run depending on whether they support concurrency or not.
Quick flowchart which doesn't make sense but illustrate the idea:
flowchart TD
p1f["plugin1.on_files"]
p2f["plugin2.on_files"]
p3f["plugin3.on_files"]
p1n["plugin1.on_nav"]
p2n["plugin2.on_nav"]
p3n["plugin3.on_nav"]
p1pm["plugin1.on_page_markdown"]
p2pm["plugin2.on_page_markdown"]
p3pm["plugin3.on_page_markdown"]
start --> p1f & p2f
p1f & p2f --> p3f
p3f --> p1n & p2n & p3n
p1n & p2n & p3n --> p1pm
p1pm --> p2pm & p3pm
- Plugin 1 and 2
on_files
run concurrently, then plugin 3on_files
sequentially. - Plugin 1, 2 and 3
on_nav
run concurrently. - Plugin 1
on_page_markdown
runs sequentially, then plugin 2 and 3on_page_markdown
run concurrently.
EDIT: hmm I suppose there's another possible layer of concurrency on files/pages themselves. The transformation pipeline would likely be quite complex. I'm sick and have fever today so please be indulgent π
from mkdocs.
Quick thought: what if plugins informed MkDocs whether each one of their hooks could be executed concurrently, or only sequentially?
This is exactly what Sphinx does. Each extension defines if it's safe for parallel reading and/or parallel writing. See https://www.sphinx-doc.org/en/master/extdev/index.html#extension-metadata
I haven't checked how it works internally, but it's probably something to explore a little more and see if there are some ideas that can be reused.
from mkdocs.
Are there public examples of large repositories that take up to 30 minutes to build? I tried locally with 10K dummy files and ran out of memory before the site was built π With 1K files, the template rendering seems to be the most costly.
from mkdocs.
This is a project with 3,400 files and a very limited set of plugins, i.e., search
, minify
and social
:
https://github.com/openfabr/terraform-provider-cdk-docs
IMHO, not many plugins, and the social plugin which I wrote employs caching, which means repeated builds are much cheaper due to leveraging cached images. I've built the project on my machine, an M2 MacBook Pro:
First build
INFO - Cleaning site directory
INFO - Building documentation to directory: .../terraform-provider-cdk-docs/site
...
INFO - Documentation built in 537.92 seconds
Repeated build (social plugin cached)
INFO - Cleaning site directory
INFO - Building documentation to directory: .../terraform-provider-cdk-docs/site
...
INFO - Documentation built in 487.02 seconds
It's infeasible to make edits on this project without --dirtyreload
, which as mentioned is incorrect, plus the author has to wait for more than 8 minutes until the live reload server becomes available. Add a few more plugins and a few hundred more pages and you're up to 20 minutes.
from mkdocs.
I tested the repository mentioned above on my Ryzen 3600 Windows 10 PC, mkdocs==1.6.0, mkdocs-material==9.5.20
First build:
$ mkdocs build
INFO - Cleaning site directory
INFO - Building documentation to directory: C:\MyFiles\_git\removable\performance-test\site
... A lot of warnings about absolute paths etc., which could also impact performance due to printing to Terminal
INFO - Documentation built in 1335.42 seconds
Repeated build:
I do not dare to run it again π
I used my performance_debug hook.
Debug YAML result: performance_debug_first.yml.txt
More info about the categories can be found in the gist Python file, but most should be self explanatory, but the amount of files could have generated quite a bit of noise π€
PLUGINS_PER_EVENTS:
on_post_page|mkdocs_minify_plugin.plugin.MinifyPlugin: 958.97267 # The main culprit of the long build time
on_page_context|material.plugins.search.plugin.SearchPlugin: 23.14142 # Expected given the amount of files
on_config|material.plugins.social.plugin.SocialPlugin: 10.61701 # on_config not expected being affected by amount of files, is it always this slow?
on_page_markdown|material.plugins.social.plugin.SocialPlugin: 1.20333 # magic of concurrency
...
on_post_build|material.plugins.social.plugin.SocialPlugin: 0.00389 # magic of concurrency
Currently the mkdocs serve
will invoke the same as mkdocs build
, so the benchmark results apply there too.
The main issue is with the minify plugin, a much cheaper (performance-wise) minification, of sorts, could be achieved using Jinja2 Environment settings, which I mentioned here, and another approach would be proper enforcement of whitespace management inside the template files, via the %-
tags. Perhaps some sort of another minify plugin needs to be released which uses some C/Rust libraries to handle the minification process π€
MARKDOWN_PER_CLASSES:
pymdownx.superfences.SuperFencesBlockPreprocessor: 11.09541
markdown.treeprocessors.InlineProcessor: 10.98559
I'm surprised those markdown values are so low, as last time I checked with GMC (~190 files) the same classes had ~6 seconds each. Perhaps the complexity of the Markdown or the amount of Code Blocks has a bigger impact than I thought. But still 3k files vs 200 files and only a x2 time increase seems odd hmm
Template rendering took ~270 seconds:
TEMPLATE_ROOTS:
main.html|sum: 267.45968
this time gets repeated each re-serve without --dirtyreload
Caching for later builds with mkdocs serve
won't help much, as it immediately turns off the prospective user.
Also rendering the whole docs in the background with concurrency seems also like a waste of resources when I only want to check one web page only.
So I would like to see some sort of on-demand loading, serve
would only process index.html and later only load pages when navigating to them. This of course breaks the last on_post_build
event, as plugins expect all files to be present in the site
directory, so invoking it after only a few pages were built could lead to issues. Other events are more agile IMO
I guess this would requires a fork in mkdocs serve
and mkdocs build
event loops? Rather risky, but would allow for more control maybe? Just a first top of the head idea βοΈ
from mkdocs.
I would like to be able to use parallel build. It has been stated in #1900 that the benefit is not so high. However, I have lots of jupyter-notebooks to convert (the execute step consumes most of the time). I ended up executing all notebooks concurrently in advance.
from mkdocs.
Nice work @tomchristie!
- search indexes aren't in there just yet. yes they would require a full-site build, but we can use HTML
rel=preload
links to prompt them in the background, and likely also have per-page caching.
In the case of mkdocstrings and its cross-references ability, rel=preload
wouldn't be enough. To statically resolve a cross-reference, we must wait for all pages to have been built. The only way to make cross-references work when serving pages on the fly (without building everything) would be to inject some Javascript magic π€ Like, the plugin would store query-able state in the server, that the client could continuously request, until all needed pages were loaded with rel=preload
and the unresolved references on the current page can be resolved π€ And since we don't know which pages are needed to resolve a reference, all pages would have to be pre-loaded anyway π€ (or, if not all, maybe most pages, with a priority order or something).
from mkdocs.
This looks really promising! Really excited how this will work with more complex setups. I guess there're still things to be worked out (haven't checked the implementation), but it's a great start! π
from mkdocs.
Related Issues (20)
- plugin autorefs installed but not found HOT 1
- Organizing chapters on toolbar using mkdocs
- Anchor validation false positives after upgrading to 1.6
- Uncaught DOMException: Element.querySelectorAll: '> .dropdown-submenu > a' is not a valid selector HOT 5
- Substring search does not work
- Emit INFO instead of WARNING for Deprecated options HOT 5
- Break search plugin out into separate package HOT 8
- Empty mkdocs_theme.yml breaks build
- Generate nav with links to headers HOT 1
- ModuleNotFoundError: No module named 'mkdocs.tests' HOT 1
- FR: Anchor validation warning should remind the user it's case sensitive, especially if there are case-insensitive matches. HOT 3
- Anchor validation and special characters. HOT 2
- How to specify the port in MkDocs HOT 1
- AttributeError: 'EntryPoints' object has no attribute 'get' in xarray.backends HOT 1
- Should the markdown renderer treat a single line break as <br>? HOT 2
- Feature Request: Extend the `on_page_context` event with the reference to the Jinja2 Environment HOT 1
- Mkdocs no longer respecting set display text for same .md files. HOT 3
- Cannot get mkdocs to recognize the caseinsensitive plugin HOT 1
- Table display βflickersβ when refreshing the page HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mkdocs.