Giter Site home page Giter Site logo

Comments (27)

slorber avatar slorber commented on May 5, 2024 3

🀯 Didn't expect it to have such an impact.

Finally this perf regression was a good thing πŸ˜„

from docusaurus.

andrewgbell avatar andrewgbell commented on May 5, 2024 2

Just to add we've recently rolled back from 3.1 to 3.0.1 for this exact same issue (we also have a large site). Normally would take approx 45 mins to build, and with 3.1 moves to just over 2 hours.

However, maybe of interest, when we initially rolled back we updated our package-lock.json and noticed the build times stayed the same (close to 2 hours). Reverting to the original package-lock.json prior to our 3.1 upgrade that we used when originally on 3.0.1, the build went back to 45 mins.

I've just tried it again, and when using 3.0.1, and building without a package-lock.json to use the latest dependencies, the build time more than doubles.

As an aside, onBrokenAnchors: "ignore", made no difference for us (and we also fixed all the broken anchors).

from docusaurus.

ravilach avatar ravilach commented on May 5, 2024 1

First off, huge fan of Docusaurus. Wanted to comment along. Might be tangential to this, we also were increasing a 2x increase in build times upgrading from Docusaurus 3.0.1 to 3.1. We ended up downgrading back to 3.0.1. We leverage our own CI Solution, Harness CI Enterprise.

3.0.1 Builds: 8-9 mins
3.1 Builds: 17-20 mins.

We are wanting to dig in a little further if anyone on Docusaurus Project Side can weigh in On Broken Anchors feature [https://github.com//pull/9528]. If that feature has to build a list of the anchors, on larger sites if that step takes time. We tried configuring onBrokenMarkdownLinks to Ignore but I believe the process still runs, just. not producing or throwing the output. Potentially moving Ignore to not execute at all?

The big increase comes between Server Compile and the "done" hook.

[success] [webpackbar] Server: Compiled successfully in 7.36m
[SUCCESS] Generated static files in "build".
[INFO] Use `npm run serve` command to test your build locally.
Done in 983.84s.

Node Build Version: 18.19.0

Thanks for a great project!

from docusaurus.

anaclumos avatar anaclumos commented on May 5, 2024 1

Hey

No we didn't change anything recently that could lead to such a significant difference.

But your report is not clear enough.

What was the version of Docusaurus you used before exactly?

Was using 3.0.1.

How long did it take to build previously?

It took under 30 minutes.

Can you replicate this only on your computer, or also on CI such as GitHub Actions?

My Docusaurus site is pretty big and doesn't fit on CI machines. RAM usage used to spike to 14GB the sealing process, and all CI machines crashed at this point.

Are we even sure it's Docusaurus fault? Your log shows that Done in 12618.98s.. Please show us the time it takes executing only the Docusaurus build command, building just one language for example, and nothing else.

I am sure. Every other script finishes under 1 minute, and the it's only the Docusaurus build step that hangs.

How comes you are reporting using Node.js 16.4 while Docusaurus v3.0 requires Node 18?

I am using v18.17.1. Where did you get this information, may I ask?

from docusaurus.

slorber avatar slorber commented on May 5, 2024 1

@andrewgbell it looks like the build-time increase is not related to the 3.1 upgrade, but rather the upgrade of a transitive dependency that has a perf regression.

It would be super helpful for me to be able see/run that upgrade myself and study the package-lock.json diff.

Can someone share a site / branch that build faster in 3.0.1, and where I could reproduce the build time regression by upgrading.


Does the onBrokenAnchor always run but does not display results if ignore. If does not run, can weed that out.

@ravilach I'd recommend to try turning off both onBrokenLinks: "ignore" and onBrokenLinks: "ignore", because we only "bypass" the broken link checker if both are ignored atm.

I'll try to optimize that better in the future, but in the meantime the code looks like this:

  if (onBrokenLinks === 'ignore' && onBrokenAnchors === 'ignore') {
    return;
  }

  const brokenLinks = getBrokenLinks({
    routes,
    collectedLinks: normalizeCollectedLinks(collectedLinks),
  });

  reportBrokenLinks({brokenLinks, onBrokenLinks, onBrokenAnchors});

Note: is this possible that you encounter longer build times only due to cache eviction.

We use Webpack with persistent caching and on rebuilds it's supposed to rebuild faster.

It may be possible that your site builds longer simply because the caches were empty?

In this case I suggest trying to run docusaurus clear && docusaurus build on your "fast branch" and see if it becomes slower to build.

from docusaurus.

andrewgbell avatar andrewgbell commented on May 5, 2024 1

node_modules/@docusaurus/core/lib/server/brokenLinks.js

@slorber Yes, that worked great! Replaced the file and ran it with:

onBrokenLinks: "warn",
onBrokenAnchors: "warn",
onBrokenMarkdownLinks: "throw",

And it built just as quick as earlier. Thanks for all your help with this!

from docusaurus.

slorber avatar slorber commented on May 5, 2024 1

Awesome news then πŸŽ‰ thanks for reporting

from docusaurus.

slorber avatar slorber commented on May 5, 2024

Hey

No we didn't change anything recently that could lead to such a significant difference.

But your report is not clear enough.

What was the version of Docusaurus you used before exactly?

How long did it take to build previously?

Can you replicate this only on your computer, or also on CI such as GitHub Actions?

What was the upgrade PR?

Are we even sure it's Docusaurus fault? Your log shows that Done in 12618.98s.. Please show us the time it takes executing only the Docusaurus build command, building just one language for example, and nothing else.

How comes you are reporting using Node.js 16.4 while Docusaurus v3.0 requires Node 18?

from docusaurus.

ravilach avatar ravilach commented on May 5, 2024

Hey
No we didn't change anything recently that could lead to such a significant difference.
But your report is not clear enough.
What was the version of Docusaurus you used before exactly?

Was using 3.0.1.

How long did it take to build previously?

It took under 30 minutes.

Can you replicate this only on your computer, or also on CI such as GitHub Actions?

My Docusaurus site is pretty big and doesn't fit on CI machines. RAM usage used to spike to 14GB the sealing process, and all CI machines crashed at this point.

Are we even sure it's Docusaurus fault? Your log shows that Done in 12618.98s.. Please show us the time it takes executing only the Docusaurus build command, building just one language for example, and nothing else.

I am sure. Every other script finishes under 1 minute, and the it's only the Docusaurus build step that hangs.

How comes you are reporting using Node.js 16.4 while Docusaurus v3.0 requires Node 18?

I am using v18.17.1. Where did you get this information, may I ask?

+1 on the sealing process, where the resource usage/time seems to spike for us also. Anything added to that process from 3.0.1 -> 3.1, e.g On Broken Anchors? Thanks!

from docusaurus.

OzakIOne avatar OzakIOne commented on May 5, 2024

If it's coming from the brokenAnchor you may try to put onBrokenAnchors in docusaurus.config file to ignore

Maybe you can disable it in your CI but still have a build process somewhere that you run manually / every few times to check for broken links / anchors

from docusaurus.

ravilach avatar ravilach commented on May 5, 2024

If it's coming from the brokenAnchor you may try to put onBrokenAnchors in docusaurus.config file to ignore

Maybe you can disable it in your CI but still have a build process somewhere that you run manually / every few times to check for broken links / anchors

Thanks @OzakIOne it's a great feature. Curious, we noticed the same behavior with ignore. Does the onBrokenAnchor always run but does not display results if ignore. If does not run, can weed that out.

from docusaurus.

slorber avatar slorber commented on May 5, 2024

@anaclumos I tried using your repo before the upgrade (https://github.com/anaclumos/extracranial/tree/f144432acdfff55d741a1dbc568ae0b51dd052fe) but the usage of Bun package manager makes it inconvenient to troubleshoot.

First when I run bun install on your repo with latest Bun version, it seems to resolve to newer versions of Docusaurus dependency ranges, and modify your bun.lockb file:

CleanShot 2024-01-18 at 12 11 28@2x

Then, the binary format of the lockfile makes it super inconvenient to inspect and diff.

Maybe I could try using the exact same version of Bun you are using, and it would not upgrade? For now I'm unable to troubleshoot this using your repo.

from docusaurus.

andrewgbell avatar andrewgbell commented on May 5, 2024

@andrewgbell it looks like the build-time increase is not related to the 3.1 upgrade, but rather the upgrade of a transitive dependency that has a perf regression.

It would be super helpful for me to be able see/run that upgrade myself and study the package-lock.json diff.

Can someone share a site / branch that build faster in 3.0.1, and where I could reproduce the build time regression by upgrading.

Unfortunately the repo isn't (yet) open source, but I can share the package-lock.json's from both runs if any use and potentially the build log files if there's anything particular you need?

from docusaurus.

slorber avatar slorber commented on May 5, 2024

Unfortunately the repo isn't (yet) open source, but I can share the package-lock.json's from both runs if any use and potentially the build log files if there's anything particular you need?

@andrewgbell I'd have to run this locally myself, partially upgrading some libs in a dichotomic way to find out which transitive dep cause the problem. I doubt seeing a diff will be enough to identify the problem unfortunately, I need to run the code.

from docusaurus.

ravilach avatar ravilach commented on May 5, 2024

@andrewgbell it looks like the build-time increase is not related to the 3.1 upgrade, but rather the upgrade of a transitive dependency that has a perf regression.

It would be super helpful for me to be able see/run that upgrade myself and study the package-lock.json diff.

Can someone share a site / branch that build faster in 3.0.1, and where I could reproduce the build time regression by upgrading.

Does the onBrokenAnchor always run but does not display results if ignore. If does not run, can weed that out.

@ravilach I'd recommend to try turning off both onBrokenLinks: "ignore" and onBrokenLinks: "ignore", because we only "bypass" the broken link checker if both are ignored atm.

I'll try to optimize that better in the future, but in the meantime the code looks like this:

  if (onBrokenLinks === 'ignore' && onBrokenAnchors === 'ignore') {
    return;
  }

  const brokenLinks = getBrokenLinks({
    routes,
    collectedLinks: normalizeCollectedLinks(collectedLinks),
  });

  reportBrokenLinks({brokenLinks, onBrokenLinks, onBrokenAnchors});

Note: is this possible that you encounter longer build times only due to cache eviction.

We use Webpack with persistent caching and on rebuilds it's supposed to rebuild faster.

It may be possible that your site builds longer simply because the caches were empty?

In this case I suggest trying to run docusaurus clear && docusaurus build on your "fast branch" and see if it becomes slower to build.

Unfortunately the repo isn't (yet) open source, but I can share the package-lock.json's from both runs if any use and potentially the build log files if there's anything particular you need?

@andrewgbell I'd have to run this locally myself, partially upgrading some libs in a dichotomic way to find out which transitive dep cause the problem. I doubt seeing a diff will be enough to identify the problem unfortunately, I need to run the code.

Ours is Open Source: https://github.com/harness/developer-hub if that helps. Currently on DS 3.0.1.

Here is the yarn.lock from the 3.1 upgrade: https://github.com/harness/developer-hub/blob/7b5fbafc4036f61d30e094362a67204cc573cf7a/yarn.lock

from docusaurus.

andrewgbell avatar andrewgbell commented on May 5, 2024

@andrewgbell it looks like the build-time increase is not related to the 3.1 upgrade, but rather the upgrade of a transitive dependency that has a perf regression.
It would be super helpful for me to be able see/run that upgrade myself and study the package-lock.json diff.
Can someone share a site / branch that build faster in 3.0.1, and where I could reproduce the build time regression by upgrading.

Unfortunately the repo isn't (yet) open source, but I can share the package-lock.json's from both runs if any use and potentially the build log files if there's anything particular you need?

@slorber If you need another repo, let me know as I can invite you into our org.

from docusaurus.

slorber avatar slorber commented on May 5, 2024

Still investigating your site @ravilach, but it looks like there are 2 problems:

  • the broken link checker now using node new URL() is much slower (edit: it's not that it's the matchRoutes calls)
  • a transitive dependency is causing longer build times (I suspect related to postcss or css-loader)

Have any of you tried to upgrade without fully regenerating the lockfile, and disabling all the broken link checkers?

yarn upgrade @docusaurus/core@latest @docusaurus/cssnano-preset@latest @docusaurus/plugin-client-redirects@latest @docusaurus/plugin-debug@latest @docusaurus/plugin-google-analytics@latest                                  @docusaurus/plugin-google-gtag@latest @docusaurus/plugin-sitemap@latest @docusaurus/preset-classic@latest @docusaurus/theme-classic@latest @docusaurus/theme-mermaid@latest                                                                           @docusaurus/theme-search-algolia@latest @docusaurus/module-type-aliases@latest @docusaurus/tsconfig@latest
  • onBrokenLinks: "ignore"
  • onBrokenAnchors: "ignore"

from docusaurus.

ravilach avatar ravilach commented on May 5, 2024

Still investigating your site @ravilach, but it looks like there are 2 problems:

* the broken link checker now using node `new URL()` is much slower

* a transitive dependency is causing longer build times (I suspect related to postcss or css-loader)

Have any of you tried to upgrade without fully regenerating the lockfile, and disabling all the broken link checkers?

yarn upgrade @docusaurus/core@latest @docusaurus/cssnano-preset@latest @docusaurus/plugin-client-redirects@latest @docusaurus/plugin-debug@latest @docusaurus/plugin-google-analytics@latest                                  @docusaurus/plugin-google-gtag@latest @docusaurus/plugin-sitemap@latest @docusaurus/preset-classic@latest @docusaurus/theme-classic@latest @docusaurus/theme-mermaid@latest                                                                           @docusaurus/theme-search-algolia@latest @docusaurus/module-type-aliases@latest @docusaurus/tsconfig@latest
* `onBrokenLinks: "ignore"`

* `onBrokenAnchors: "ignore"`

Thanks @slorber, much appreciated!

from docusaurus.

anaclumos avatar anaclumos commented on May 5, 2024

Maybe I could try using the exact same version of Bun you are using, and it would not upgrade? For now I'm unable to troubleshoot this using your repo.

I migrated to pnpm.

from docusaurus.

andrewgbell avatar andrewgbell commented on May 5, 2024

Still investigating your site @ravilach, but it looks like there are 2 problems:

  • the broken link checker now using node new URL() is much slower
  • a transitive dependency is causing longer build times (I suspect related to postcss or css-loader)

Have any of you tried to upgrade without fully regenerating the lockfile, and disabling all the broken link checkers?

yarn upgrade @docusaurus/core@latest @docusaurus/cssnano-preset@latest @docusaurus/plugin-client-redirects@latest @docusaurus/plugin-debug@latest @docusaurus/plugin-google-analytics@latest                                  @docusaurus/plugin-google-gtag@latest @docusaurus/plugin-sitemap@latest @docusaurus/preset-classic@latest @docusaurus/theme-classic@latest @docusaurus/theme-mermaid@latest                                                                           @docusaurus/theme-search-algolia@latest @docusaurus/module-type-aliases@latest @docusaurus/tsconfig@latest
  • onBrokenLinks: "ignore"
  • onBrokenAnchors: "ignore"

Hi, I've added:

  onBrokenLinks: "ignore",
  onBrokenAnchors: "ignore",
  onBrokenMarkdownLinks: "throw",

alongside running

npm upgrade @docusaurus/core @docusaurus/cssnano-preset @docusaurus/plugin-client-redirects @docusaurus/plugin-debug @docusaurus/plugin-google-analytics @docusaurus/plugin-google-gtag @docusaurus/plugin-sitemap @docusaurus/preset-classic @docusaurus/theme-classic @docusaurus/theme-mermaid @docusaurus/theme-search-algolia @docusaurus/module-type-aliases @docusaurus/tsconfig

And build time dropped back to the expected (in fact a few minutes quicker, approx 40 mins). I've tried removing
onBrokenAnchors: "ignore",

However build time just back up again to over 2 hours.

I've also tried adding these ignores again

  onBrokenLinks: "ignore",
  onBrokenAnchors: "ignore",
  onBrokenMarkdownLinks: "throw",

but upgrading the whole package-lock.json again. As of today, it slows by about 10% over the run above (About 45 mins), which is a huge improvement on where we were last week so not sure if a dependency has updated since.

So looks like you're correct on the two issues though the brokenlinks and anchors seems to have a far greater impact.

from docusaurus.

andrewgbell avatar andrewgbell commented on May 5, 2024

Still investigating your site @ravilach, but it looks like there are 2 problems:

  • the broken link checker now using node new URL() is much slower
  • a transitive dependency is causing longer build times (I suspect related to postcss or css-loader)

Have any of you tried to upgrade without fully regenerating the lockfile, and disabling all the broken link checkers?

yarn upgrade @docusaurus/core@latest @docusaurus/cssnano-preset@latest @docusaurus/plugin-client-redirects@latest @docusaurus/plugin-debug@latest @docusaurus/plugin-google-analytics@latest                                  @docusaurus/plugin-google-gtag@latest @docusaurus/plugin-sitemap@latest @docusaurus/preset-classic@latest @docusaurus/theme-classic@latest @docusaurus/theme-mermaid@latest                                                                           @docusaurus/theme-search-algolia@latest @docusaurus/module-type-aliases@latest @docusaurus/tsconfig@latest
  • onBrokenLinks: "ignore"
  • onBrokenAnchors: "ignore"

Hi, I've added:

  onBrokenLinks: "ignore",
  onBrokenAnchors: "ignore",
  onBrokenMarkdownLinks: "throw",

alongside running

npm upgrade @docusaurus/core @docusaurus/cssnano-preset @docusaurus/plugin-client-redirects @docusaurus/plugin-debug @docusaurus/plugin-google-analytics @docusaurus/plugin-google-gtag @docusaurus/plugin-sitemap @docusaurus/preset-classic @docusaurus/theme-classic @docusaurus/theme-mermaid @docusaurus/theme-search-algolia @docusaurus/module-type-aliases @docusaurus/tsconfig

And build time dropped back to the expected (in fact a few minutes quicker, approx 40 mins). I've tried removing
onBrokenAnchors: "ignore",

However build time just back up again to over 2 hours.

I've also tried adding these ignores again onBrokenLinks: "ignore", onBrokenAnchors: "ignore", onBrokenMarkdownLinks: "throw",

but this time upgrading the whole package-lock.json. As of today, it now runs through at the same speed as above so not sure if a dependency has updated since.

So looks like you're correct on the brokenlinks and anchors seems to have a far greater impact.

from docusaurus.

slorber avatar slorber commented on May 5, 2024

Thanks for reporting @andrewgbell

I've submitted a PR that should optimize things, likely faster than before: #9778

So far it seems to work on @ravilach site.


Could you give it a test by building locally with this modified file?

node_modules/@docusaurus/core/lib/server/brokenLinks.js

"use strict";
/**
 * Copyright (c) Facebook, Inc. and its affiliates.
 *
 * This source code is licensed under the MIT license found in the
 * LICENSE file in the root directory of this source tree.
 */
Object.defineProperty(exports, "__esModule", { value: true });
exports.handleBrokenLinks = void 0;
const tslib_1 = require("tslib");
const lodash_1 = tslib_1.__importDefault(require("lodash"));
const logger_1 = tslib_1.__importDefault(require("@docusaurus/logger"));
const react_router_config_1 = require("react-router-config");
const utils_1 = require("@docusaurus/utils");
const utils_2 = require("./utils");
function matchRoutes(routeConfig, pathname) {
    // @ts-expect-error: React router types RouteConfig with an actual React
    // component, but we load route components with string paths.
    // We don't actually access component here, so it's fine.
    return (0, react_router_config_1.matchRoutes)(routeConfig, pathname);
}
function createBrokenLinksHelper({ collectedLinks, routes, }) {
    const validPathnames = new Set(collectedLinks.keys());
    // Matching against the route array can be expensive
    // If the route is already in the valid pathnames,
    // we can avoid matching against it as an optimization
    const remainingRoutes = routes
        .filter((route) => !validPathnames.has(route.path));
    function isPathnameMatchingAnyRoute(pathname) {
        if (matchRoutes(remainingRoutes, pathname).length > 0) {
            // IMPORTANT: this is an optimization here
            // See https://github.com/facebook/docusaurus/issues/9754
            // Large Docusaurus sites have many routes!
            // We try to minimize calls to a possibly expensive matchRoutes function
            validPathnames.add(pathname);
            return true;
        }
        return false;
    }
    function isPathBrokenLink(linkPath) {
        const pathnames = [linkPath.pathname, decodeURI(linkPath.pathname)];
        if (pathnames.some((p) => validPathnames.has(p))) {
            return false;
        }
        if (pathnames.some(isPathnameMatchingAnyRoute)) {
            return false;
        }
        return true;
    }
    function isAnchorBrokenLink(linkPath) {
        const { pathname, hash } = linkPath;
        // Link has no hash: it can't be a broken anchor link
        if (hash === undefined) {
            return false;
        }
        // Link has empty hash ("#", "/page#"...): we do not report it as broken
        // Empty hashes are used for various weird reasons, by us and other users...
        // See for example: https://github.com/facebook/docusaurus/pull/6003
        if (hash === '') {
            return false;
        }
        const targetPage = collectedLinks.get(pathname) || collectedLinks.get(decodeURI(pathname));
        // link with anchor to a page that does not exist (or did not collect any
        // link/anchor) is considered as a broken anchor
        if (!targetPage) {
            return true;
        }
        // it's a not broken anchor if the anchor exists on the target page
        if (targetPage.anchors.has(hash) ||
            targetPage.anchors.has(decodeURIComponent(hash))) {
            return false;
        }
        return true;
    }
    return {
        collectedLinks,
        isPathBrokenLink,
        isAnchorBrokenLink,
    };
}
function getBrokenLinksForPage({ pagePath, helper, }) {
    const pageData = helper.collectedLinks.get(pagePath);
    const brokenLinks = [];
    pageData.links.forEach((link) => {
        const linkPath = (0, utils_1.parseURLPath)(link, pagePath);
        if (helper.isPathBrokenLink(linkPath)) {
            brokenLinks.push({
                link,
                resolvedLink: (0, utils_1.serializeURLPath)(linkPath),
                anchor: false,
            });
        }
        else if (helper.isAnchorBrokenLink(linkPath)) {
            brokenLinks.push({
                link,
                resolvedLink: (0, utils_1.serializeURLPath)(linkPath),
                anchor: true,
            });
        }
    });
    return brokenLinks;
}
/**
 * The route defs can be recursive, and have a parent match-all route. We don't
 * want to match broken links like /docs/brokenLink against /docs/*. For this
 * reason, we only consider the "final routes" that do not have subroutes.
 * We also need to remove the match-all 404 route
 */
function filterIntermediateRoutes(routesInput) {
    const routesWithout404 = routesInput.filter((route) => route.path !== '*');
    return (0, utils_2.getAllFinalRoutes)(routesWithout404);
}
function getBrokenLinks({ collectedLinks, routes, }) {
    const filteredRoutes = filterIntermediateRoutes(routes);
    const helper = createBrokenLinksHelper({
        collectedLinks,
        routes: filteredRoutes,
    });
    const result = {};
    collectedLinks.forEach((_unused, pagePath) => {
        try {
            result[pagePath] = getBrokenLinksForPage({
                pagePath,
                helper,
            });
        }
        catch (e) {
            throw new Error(`Unable to get broken links for page ${pagePath}.`, {
                cause: e,
            });
        }
    });
    return result;
}
function brokenLinkMessage(brokenLink) {
    const showResolvedLink = brokenLink.link !== brokenLink.resolvedLink;
    return `${brokenLink.link}${showResolvedLink ? ` (resolved as: ${brokenLink.resolvedLink})` : ''}`;
}
function createBrokenLinksMessage(pagePath, brokenLinks) {
    const type = brokenLinks[0]?.anchor === true ? 'anchor' : 'link';
    const anchorMessage = brokenLinks.length > 0
        ? `- Broken ${type} on source page path = ${pagePath}:
   -> linking to ${brokenLinks
            .map(brokenLinkMessage)
            .join('\n   -> linking to ')}`
        : '';
    return `${anchorMessage}`;
}
function createBrokenAnchorsMessage(brokenAnchors) {
    if (Object.keys(brokenAnchors).length === 0) {
        return undefined;
    }
    return `Docusaurus found broken anchors!

Please check the pages of your site in the list below, and make sure you don't reference any anchor that does not exist.
Note: it's possible to ignore broken anchors with the 'onBrokenAnchors' Docusaurus configuration, and let the build pass.

Exhaustive list of all broken anchors found:
${Object.entries(brokenAnchors)
        .map(([pagePath, brokenLinks]) => createBrokenLinksMessage(pagePath, brokenLinks))
        .join('\n')}
`;
}
function createBrokenPathsMessage(brokenPathsMap) {
    if (Object.keys(brokenPathsMap).length === 0) {
        return undefined;
    }
    /**
     * If there's a broken link appearing very often, it is probably a broken link
     * on the layout. Add an additional message in such case to help user figure
     * this out. See https://github.com/facebook/docusaurus/issues/3567#issuecomment-706973805
     */
    function getLayoutBrokenLinksHelpMessage() {
        const flatList = Object.entries(brokenPathsMap).flatMap(([pagePage, brokenLinks]) => brokenLinks.map((brokenLink) => ({ pagePage, brokenLink })));
        const countedBrokenLinks = lodash_1.default.countBy(flatList, (item) => item.brokenLink.link);
        const FrequencyThreshold = 5; // Is this a good value?
        const frequentLinks = Object.entries(countedBrokenLinks)
            .filter(([, count]) => count >= FrequencyThreshold)
            .map(([link]) => link);
        if (frequentLinks.length === 0) {
            return '';
        }
        return logger_1.default.interpolate `

It looks like some of the broken links we found appear in many pages of your site.
Maybe those broken links appear on all pages through your site layout?
We recommend that you check your theme configuration for such links (particularly, theme navbar and footer).
Frequent broken links are linking to:${frequentLinks}`;
    }
    return `Docusaurus found broken links!

Please check the pages of your site in the list below, and make sure you don't reference any path that does not exist.
Note: it's possible to ignore broken links with the 'onBrokenLinks' Docusaurus configuration, and let the build pass.${getLayoutBrokenLinksHelpMessage()}

Exhaustive list of all broken links found:
${Object.entries(brokenPathsMap)
        .map(([pagePath, brokenPaths]) => createBrokenLinksMessage(pagePath, brokenPaths))
        .join('\n')}
`;
}
function splitBrokenLinks(brokenLinks) {
    const brokenPaths = {};
    const brokenAnchors = {};
    Object.entries(brokenLinks).forEach(([pathname, pageBrokenLinks]) => {
        const [anchorBrokenLinks, pathBrokenLinks] = lodash_1.default.partition(pageBrokenLinks, (link) => link.anchor);
        if (pathBrokenLinks.length > 0) {
            brokenPaths[pathname] = pathBrokenLinks;
        }
        if (anchorBrokenLinks.length > 0) {
            brokenAnchors[pathname] = anchorBrokenLinks;
        }
    });
    return { brokenPaths, brokenAnchors };
}
function reportBrokenLinks({ brokenLinks, onBrokenLinks, onBrokenAnchors, }) {
    // We need to split the broken links reporting in 2 for better granularity
    // This is because we need to report broken path/anchors independently
    // For v3.x retro-compatibility, we can't throw by default for broken anchors
    // TODO Docusaurus v4: make onBrokenAnchors throw by default?
    const { brokenPaths, brokenAnchors } = splitBrokenLinks(brokenLinks);
    const pathErrorMessage = createBrokenPathsMessage(brokenPaths);
    if (pathErrorMessage) {
        logger_1.default.report(onBrokenLinks)(pathErrorMessage);
    }
    const anchorErrorMessage = createBrokenAnchorsMessage(brokenAnchors);
    if (anchorErrorMessage) {
        logger_1.default.report(onBrokenAnchors)(anchorErrorMessage);
    }
}
// Users might use the useBrokenLinks() API in weird unexpected ways
// JS users might call "collectLink(undefined)" for example
// TS users might call "collectAnchor('#hash')" with/without #
// We clean/normalize the collected data to avoid obscure errors being thrown
// We also use optimized data structures for a faster algorithm
function normalizeCollectedLinks(collectedLinks) {
    const result = new Map();
    Object.entries(collectedLinks).forEach(([pathname, pageCollectedData]) => {
        result.set(pathname, {
            links: new Set(pageCollectedData.links.filter(lodash_1.default.isString)),
            anchors: new Set(pageCollectedData.anchors
                .filter(lodash_1.default.isString)
                .map((anchor) => (anchor.startsWith('#') ? anchor.slice(1) : anchor))),
        });
    });
    return result;
}
async function handleBrokenLinks({ collectedLinks, onBrokenLinks, onBrokenAnchors, routes, }) {
    if (onBrokenLinks === 'ignore' && onBrokenAnchors === 'ignore') {
        return;
    }
    const brokenLinks = getBrokenLinks({
        routes,
        collectedLinks: normalizeCollectedLinks(collectedLinks),
    });
    reportBrokenLinks({ brokenLinks, onBrokenLinks, onBrokenAnchors });
}
exports.handleBrokenLinks = handleBrokenLinks;

from docusaurus.

slorber avatar slorber commented on May 5, 2024

Thanks @andrewgbell

Don't you see any improvement too? On @ravilach site (that I simplified a bit, just 1 docs plugin instance instead of 5), I see a significant improvement in time to handle broken links and total build time.

3.0
handleBrokenLinks: 3:28.361 (m:ss.mmm)
✨  Done in 636.47s.

3.1 before
handleBrokenLinks: 6:32.570 (m:ss.mmm)
✨  Done in 785.73s.

3.1 after optimizations
handleBrokenLinks: 694.907ms
✨  Done in 361.92s.

from docusaurus.

andrewgbell avatar andrewgbell commented on May 5, 2024

Hi @slorber , sorry yes. I'd been comparing 3.1 optimisations with 3.1 ignore broken links so hadn't spotted it.

But looking again we get:

3.0 build time with handleBrokenLinks - 54 mins

3.1 (without fix) build time with handleBrokenLinks - 137 mins

3.1 (with fix) build time with handleBrokenLinks - 41 mins

So very significant! Thanks!

from docusaurus.

anaclumos avatar anaclumos commented on May 5, 2024

Just updated. It's even faster than before!! Thank you so much πŸ˜ƒ

from docusaurus.

slorber avatar slorber commented on May 5, 2024

awesome news @anaclumos

Do you mind sharing numbers? How much faster is it?

from docusaurus.

anaclumos avatar anaclumos commented on May 5, 2024

It used to take around 20 minutes. Now it finishes around 11 minutes.

from docusaurus.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.