Giter Site home page Giter Site logo

Comments (19)

XhmikosR avatar XhmikosR commented on June 18, 2024 2

Alright, I successfully managed to set a CLI solution but that scans only one locale.

So, I figure we need to use the API, right? Not very familiar with it but it seems linkinator returns a Promise so my first attempt doesn't work.

If you have any time, here's my WIP branch with the API solution nodejs/nodejs.org@master...XhmikosR:master-xmr-linkinator-2

First, I want to at least land the English locale CLI scanning on nodejs/nodejs.org#2565 and then see how we can improve things with using the API directly.

from linkinator.

github-actions avatar github-actions commented on June 18, 2024 1

🎉 This issue has been resolved in version 2.11.1 🎉

The release is available on:

Your semantic-release bot 📦🚀

from linkinator.

nkuehn avatar nkuehn commented on June 18, 2024 1

Thanks, using the server root works now!

from linkinator.

JustinBeckwith avatar JustinBeckwith commented on June 18, 2024

Ahh, that's not how --recurse works :) The --recurse flag just signals that we are going to follow links from the top level scanned page, and go another level deeper. Everything in linkinator assumes the link checking is happening in the context of either a.) a url you've pointed it to or b.) a directory with an index.html that sprawls out. It doesn't really know anything about filesystem.

To get this to work today - you'd have to run linkinator for each directory. Something like:

for d in */ ; do
    npx linkinator --recurse "$d"
done

from linkinator.

XhmikosR avatar XhmikosR commented on June 18, 2024

The thing is that if I do that then linkinator cannot find the root. So I'm stuck :/

from linkinator.

JustinBeckwith avatar JustinBeckwith commented on June 18, 2024

I suppose I'm a little confused. What do you mean by not being able to "find the root"? Effectively - linkinator expects that it starts from a single crawling point, and can follow links to all other links it wants to be checked.

Are you saying that the subdirectories throughout your structure also have independent roots that need to be crawled?

from linkinator.

XhmikosR avatar XhmikosR commented on June 18, 2024

If I do linkinator build/en --recurse, then linkinator throws errors because /static/ isn't in the en folder, it's one folder up.

Maybe adding a root option could help in such cases.

I'm not sure how I can make linkinator work in this case. Here's a WIP branch https://github.com/XhmikosR/nodejs.org/tree/master-xmr-linkinator

from linkinator.

JustinBeckwith avatar JustinBeckwith commented on June 18, 2024

Thank you - I get it now! All linkinator does when you point it at a directory is naively spin up a web server from the CWD to serve requests. You could do something very similar:

npx http-server .
for d in */ ; do
    npx linkinator --recurse http://localhost:8080/$d
done

from linkinator.

XhmikosR avatar XhmikosR commented on June 18, 2024

Thanks for the reply!

The problem with this approach is that it's not cross-platform, unfortunately. Also, http-server is broken BTW. I personally use the serve package, but it seems I'm having some issues in this case.

That being said, I'll see what I can cook up. I still believe a new option for linkinator to load files from is useful though.

Also a thing I notice from a quick test is that linkinator throws for the link dns-prefetch links.

from linkinator.

JustinBeckwith avatar JustinBeckwith commented on June 18, 2024

Using the API seems like the right thing to do here for sure. Linkinator does return a promise from the check method :) There's a simple example here:
https://github.com/JustinBeckwith/linkinator/#simple-example

from linkinator.

JustinBeckwith avatar JustinBeckwith commented on June 18, 2024

Just wanted to check in @XhmikosR - did you end up getting this rolling with the API? I was thinking alternatively - you could create an index page at the root level (mostly for testing purposes) that has a link to all of the per-language root pages, and then scan that page.

from linkinator.

XhmikosR avatar XhmikosR commented on June 18, 2024

@JustinBeckwith I started a patch one day nodejs/nodejs.org@1cdc841 but never got to tweak it :/

Do not hesitate to PR if you have some time :) We definitely get some false positives, but my plan is to just keep a list of whitelisted domains, which will be better than nothing.

from linkinator.

nkuehn avatar nkuehn commented on June 18, 2024

Hi, I have a similar setup with microsites under the root that link to each other. I am now throwing an index.html with nothing but <a href="/foo"></a> (or multiple of these) into the root but it's not really nice. In production the webserver does a HTTP level redirect to the actual entry page, but that's not happening in the local webserver emulation here (how should it know my custom redirect...).

It would be great if the use case would be covered by the CLI, too since it's pretty good already with colored output, grouping etc.

Conceptually the issue is about being able to separate the first entrypoint from the scope of the recursion and the root path of the local webserver.

So what about an optional flag --entrypoint that allows to e.g. call

linkinator ./output-dir --entrypoint /home

With the effect that the local webserver is started on ./output-dir but the crawler starts at http://localhost:0000/home instead of defaulting to http://localhost:0000/home?

(could be a plural form too, taking multiple entrypoints)

Does that make sense? What do you think?

from linkinator.

JustinBeckwith avatar JustinBeckwith commented on June 18, 2024

I think I may have finally cracked this nut 😆 The latest version of linkinator (2.6 and up) now supports passing multiple locations, and it supports globbing. So you should be able to do something like this:

$ linkinator '*/index.html' 

This will do a top level scan (you can of course use --recurse if you want to follow links). The path config flag accepts a string or a string[], so this should work from the CLI or the API. Now when you pass multiple paths like this, it will default the server root to process.cwd(). Just to be safe, I added a flag to override that too:

$ linkinator --server-root . "**/index.html"

@XhmikosR @nkuehn want to take this for a spin and let me know how it goes?

from linkinator.

nkuehn avatar nkuehn commented on June 18, 2024

@JustinBeckwith sounds like a nice solution, I'll give it a try. Thanks for the update!

Is the --recurse crawling shared among the matches of the glob or would linkinator individually recurse (and then maybe check some parts twice if the matches link to each other)?

from linkinator.

JustinBeckwith avatar JustinBeckwith commented on June 18, 2024

Any time we check a link, regardless of the top level root path, it will be added to a cache and not checked again. So no double checking :) On the matching logic - the --recurse flag will match the root path given on the CLI to it's own tree of requests. So if a request starts at jbeckwith.com/page/2, during the scan of that tree it will only match links that start with jbeckwith.com/page/2.

from linkinator.

nkuehn avatar nkuehn commented on June 18, 2024

Thanks for the clarification! It's a bit embarassing but I think I am not understanding what the --server-root option really does, I assume I am just projecting what I hope it does.

Given my file layout

...
src
linkinator.config.json
public/
public/website1/
public/website1/index.html
public/website1/fooContent/index.html
public/
public/website2/
public/website2/index.html
public/website2/fooContent/index.html
...

when I call npx linkinator public/*/index.html --server-root public, I am getting

[404] http://localhost:5433/public/website1/index.html
[404] http://localhost:5433/public/website2/index.html
...

but had hoped for

[200] http://localhost:5433/website1/index.html
[200] http://localhost:5433/website2/index.html
...

I can work around it by temporarily copying linkinator.config.json into the public build output directory and then run linkinator from that directory without the --server-root option but since you built it I would prefer using the server root.

from linkinator.

JustinBeckwith avatar JustinBeckwith commented on June 18, 2024

The paths you pass into linkinator will be relative to the server-root. So you could say:

npx linkinator public/*/index.html --server-root .

Or you could say:

npx linkinator */index.html --server-root public

from linkinator.

nkuehn avatar nkuehn commented on June 18, 2024

The latter you propose was what I initially tried, but what I get is

$ npx linkinator */index.html --server-root public
[404] http://localhost:5738/public/index.html
[404] http://localhost:5738/websites/index.html

the version output is 2.7.0. "websites" accidentally also contains an index.html so the glob catches it. To me the output points towards the glob not being relative to the passed server root

from linkinator.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.