Comments (19)
Alright, I successfully managed to set a CLI solution but that scans only one locale.
So, I figure we need to use the API, right? Not very familiar with it but it seems linkinator returns a Promise so my first attempt doesn't work.
If you have any time, here's my WIP branch with the API solution nodejs/nodejs.org@master...XhmikosR:master-xmr-linkinator-2
First, I want to at least land the English locale CLI scanning on nodejs/nodejs.org#2565 and then see how we can improve things with using the API directly.
from linkinator.
🎉 This issue has been resolved in version 2.11.1 🎉
The release is available on:
Your semantic-release bot 📦🚀
from linkinator.
Thanks, using the server root works now!
from linkinator.
Ahh, that's not how --recurse
works :) The --recurse
flag just signals that we are going to follow links from the top level scanned page, and go another level deeper. Everything in linkinator
assumes the link checking is happening in the context of either a.) a url you've pointed it to or b.) a directory with an index.html
that sprawls out. It doesn't really know anything about filesystem.
To get this to work today - you'd have to run linkinator
for each directory. Something like:
for d in */ ; do
npx linkinator --recurse "$d"
done
from linkinator.
The thing is that if I do that then linkinator cannot find the root. So I'm stuck :/
from linkinator.
I suppose I'm a little confused. What do you mean by not being able to "find the root"? Effectively - linkinator expects that it starts from a single crawling point, and can follow links to all other links it wants to be checked.
Are you saying that the subdirectories throughout your structure also have independent roots that need to be crawled?
from linkinator.
If I do linkinator build/en --recurse
, then linkinator throws errors because /static/
isn't in the en
folder, it's one folder up.
Maybe adding a root option could help in such cases.
I'm not sure how I can make linkinator work in this case. Here's a WIP branch https://github.com/XhmikosR/nodejs.org/tree/master-xmr-linkinator
from linkinator.
Thank you - I get it now! All linkinator does when you point it at a directory is naively spin up a web server from the CWD to serve requests. You could do something very similar:
npx http-server .
for d in */ ; do
npx linkinator --recurse http://localhost:8080/$d
done
from linkinator.
Thanks for the reply!
The problem with this approach is that it's not cross-platform, unfortunately. Also, http-server is broken BTW. I personally use the serve package, but it seems I'm having some issues in this case.
That being said, I'll see what I can cook up. I still believe a new option for linkinator to load files from is useful though.
Also a thing I notice from a quick test is that linkinator throws for the link dns-prefetch links.
from linkinator.
Using the API seems like the right thing to do here for sure. Linkinator does return a promise from the check
method :) There's a simple example here:
https://github.com/JustinBeckwith/linkinator/#simple-example
from linkinator.
Just wanted to check in @XhmikosR - did you end up getting this rolling with the API? I was thinking alternatively - you could create an index page at the root level (mostly for testing purposes) that has a link to all of the per-language root pages, and then scan that page.
from linkinator.
@JustinBeckwith I started a patch one day nodejs/nodejs.org@1cdc841 but never got to tweak it :/
Do not hesitate to PR if you have some time :) We definitely get some false positives, but my plan is to just keep a list of whitelisted domains, which will be better than nothing.
from linkinator.
Hi, I have a similar setup with microsites under the root that link to each other. I am now throwing an index.html with nothing but <a href="/foo"></a>
(or multiple of these) into the root but it's not really nice. In production the webserver does a HTTP level redirect to the actual entry page, but that's not happening in the local webserver emulation here (how should it know my custom redirect...).
It would be great if the use case would be covered by the CLI, too since it's pretty good already with colored output, grouping etc.
Conceptually the issue is about being able to separate the first entrypoint from the scope of the recursion and the root path of the local webserver.
So what about an optional flag --entrypoint
that allows to e.g. call
linkinator ./output-dir --entrypoint /home
With the effect that the local webserver is started on ./output-dir
but the crawler starts at http://localhost:0000/home
instead of defaulting to http://localhost:0000/home
?
(could be a plural form too, taking multiple entrypoints)
Does that make sense? What do you think?
from linkinator.
I think I may have finally cracked this nut 😆 The latest version of linkinator (2.6 and up) now supports passing multiple locations, and it supports globbing. So you should be able to do something like this:
$ linkinator '*/index.html'
This will do a top level scan (you can of course use --recurse
if you want to follow links). The path
config flag accepts a string or a string[], so this should work from the CLI or the API. Now when you pass multiple paths like this, it will default the server root to process.cwd()
. Just to be safe, I added a flag to override that too:
$ linkinator --server-root . "**/index.html"
@XhmikosR @nkuehn want to take this for a spin and let me know how it goes?
from linkinator.
@JustinBeckwith sounds like a nice solution, I'll give it a try. Thanks for the update!
Is the --recurse
crawling shared among the matches of the glob or would linkinator individually recurse (and then maybe check some parts twice if the matches link to each other)?
from linkinator.
Any time we check a link, regardless of the top level root path, it will be added to a cache and not checked again. So no double checking :) On the matching logic - the --recurse
flag will match the root path given on the CLI to it's own tree of requests. So if a request starts at jbeckwith.com/page/2
, during the scan of that tree it will only match links that start with jbeckwith.com/page/2
.
from linkinator.
Thanks for the clarification! It's a bit embarassing but I think I am not understanding what the --server-root
option really does, I assume I am just projecting what I hope it does.
Given my file layout
...
src
linkinator.config.json
public/
public/website1/
public/website1/index.html
public/website1/fooContent/index.html
public/
public/website2/
public/website2/index.html
public/website2/fooContent/index.html
...
when I call npx linkinator public/*/index.html --server-root public
, I am getting
[404] http://localhost:5433/public/website1/index.html
[404] http://localhost:5433/public/website2/index.html
...
but had hoped for
[200] http://localhost:5433/website1/index.html
[200] http://localhost:5433/website2/index.html
...
I can work around it by temporarily copying linkinator.config.json
into the public
build output directory and then run linkinator from that directory without the --server-root
option but since you built it I would prefer using the server root.
from linkinator.
The paths you pass into linkinator
will be relative to the server-root. So you could say:
npx linkinator public/*/index.html --server-root .
Or you could say:
npx linkinator */index.html --server-root public
from linkinator.
The latter you propose was what I initially tried, but what I get is
$ npx linkinator */index.html --server-root public
[404] http://localhost:5738/public/index.html
[404] http://localhost:5738/websites/index.html
the version output is 2.7.0. "websites" accidentally also contains an index.html
so the glob catches it. To me the output points towards the glob not being relative to the passed server root
from linkinator.
Related Issues (20)
- linkinator and curl disagree on link status HOT 1
- Support custom base path for local files
- Retry a gaxios request with new url HOT 2
- Excluding external URLs HOT 2
- Feature Request: concurrency per host HOT 3
- bug: path without trailing slash gets 404 error for relative links on page HOT 2
- npm audit fix causing unexpected token error HOT 1
- Request: Rate Limit
- How to run linkinator for "PUG" files HOT 3
- I have a website that needs authentication with username and password. Is it possible to authenticate a website and then run linkinator? HOT 2
- Skip links starting with http doesn't work
- local server: handling extensionless links
- meta http-equiv="refresh" redirects not being handled
- Same broken link in multiple files is being mentioned only once HOT 1
- Recursively crawling local directory for Markdown files fails HOT 1
- Debug output for CSV failureDetails column appears incorrect HOT 1
- Request: Allow Passing Custom User-Agent Through Config
- Limit --recurse to a certain url
- Add a limit for the number of scanned URLs
- `--silent` CLI option is not documented
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from linkinator.