fpapado / accessible-pipeline Goto Github PK

View Code? Open in Web Editor NEW

8.0 8.0 6.0 498 KB

Accessibility monitoring, made easy

License: Other

TypeScript 72.62% JavaScript 27.38%

a11y accessibility axe ci puppeteer

accessible-pipeline's People

Contributors

Stargazers

Watchers

Forkers

jetch naapi bkace devinhaughey eevajonnapanula javi-ei

accessible-pipeline's Issues

Introduce an asynchronous, streaming runCore

The current runCore function buffers all the results in memory, until everything is done.
Only at the end, does it actually give the results to the caller.

This is fine for small workloads, but it can be prohibitive for larger ones, especially given the large size of AxE results.

Another issue with buffering is that streaming workloads (i.e. the CLI "reporter" API) need some hard-coded support for reporting progress. You can see this in the current implementation as a custom pino logger. While I like the current setup (unix-style piping is cool!), I think we could unify this use-case with a streaming API at the runCore level.

The current runCore API (that buffers) can be built on top of a streaming one, by collecting all of the results, and flushing at the end.

Potential next steps

Something along these lines could work:

Rename the current runCore to runCoreStreaming, that outputs a stream, or async iterator. Instead of appending to results and returning, yield each result.
Add a new runCore function, that calls runCoreStreaming, and collects the results. Return those results once the Observable is done.
Finally, consider swapping runCore to be runCoreStreaming. Rename the return-based runCore to runCoreBuffered, or so. Considered; will stick with runCore for now.
Add docs with this behaviour

Future work

This mostly concerns the CLI use case:

Remove the current "streaming API" logging. Move its events to the CLI module, or a module on top of core.
Consider whether to still use sub-processing in the CLI, or whether a function call would suffice.

Add Threshold for considering a page "failing"

Feature: Add Basic Auth option

Some sites / environments use HTTP Basic Auth to authenticate users. This might be necessary for CI tests.

Potential next steps

Puppeteer seems to have page.authenticate for this.

Create a test
Add the options to the CLI

Announce the default `pageLimit` to the user

The default page limit is set to 20, to avoid accidentally bombing a site. However, this is not surfaced adequately, and can confuse the user, who sees only a few pages crawled.

Possible next steps

I think two things could be done:

Surface this reasoning in the documentation
Furthermore, if the user has not specified an alternative to the default, print a warning-level "Will run with the default option of...". This will have to be added to the streaming API as well.

Add a documentation section about consuming the report JSON

Document the JSON format, and that it has the types from AxeResults.
Give ideas about alternative reporters, site generators etc.

Surface page errors in the reporter, report file

At the moment, we retry on error up to --maxRetries. Then, if the page still fails, we skip it, and remove it from the "to visit" list. This is fine for not crashing the process, but can be opaque for the users.

We should have a way to surface these errors:

Add streamingSendErrorProcessing to the reporter API
Find a way to report these to the report.json. Ideally, I'd like to keep the report as close to AxeResults as possible. We could make it so that the type is Result = Ok AxeResult | Err Error. It has the benefit of forcing users to account for error cases in their processors, but we might need something different.

Automatically generate documentation file using tsdoc

Microsoft has tsdoc, which allows extracting comments from the source. It is built with TypeScript in mind.

I'd love a future where we can pull out the documentation and put it into a markdown file.

Potential next steps

Try out tsdoc, and parse the files, starting at the index
Generate a markdown document for it. There might be a library to do this, but it might also be a script in this repository
Integrate a build-docs step in the package.json

Possible bug in ignoring extensions

Spotted in live use by a coworker:

A bunch of .pdfs were crawled, when they shouldn't have, according to parameters.

How to investigate

We should write tests for this use case.
We'd start by setting up fixtures under test/fixtures/ignoring-extensions.

We'd have the following files as fixtures:

index.html
document.pdf
image.jpg
about.html

index links to all the other files

Then, we should verify that:

With .pdf as the ignore parameter, runCore only crawls index.html, about.html, image.jpg
With both .pdf and .jpg as ignore parameters, runCore only crawls index.html and about.html

Perhaps we should also verify that:

The CLI passes the ignore parameter correctly. Possibly a unit test with zero/one/two ignores specified. This could be done as an integration test as well, though it might be harder to verify.

Add option to skip query parameters

This is similar to the to the option to skip fragment links

Often in websites, a bunch of "equivalent" URLs use query parameters, either for cache-busting, or for toggling functionality. Those could be skippable via --ignoreQueryParams.

Possible steps

Create a test case, with fixtures under test/fixtures/skip-query-parameters.
Add the functionality to runCore, under Options. Implement this check in the decision-making parts ("should visit page" etc.).
Add the functionality to the cli, similarly to Options.
Document the parameters existence in the docs

This list seems long, so we could take them one at a time! If you need help setting up the test cases, I'd love to help :)

Add TypeDoc documentation generation stage to CI

Set up CI to make documentation every time the branch is about to merge. Don't know how exactly it should be done, maybe it is possible to plug in "Docs generation" commit pre-merge to master. Up for a discussion.

Add a section with best practices

This came up when talking with James.

Some of the options are good to have, like skipFragmentLinks and ignoreExtensions. They help cut down a lot of duplicate (or irrelevant) links, which makes runs faster, and eases the load on the site.

Ideally, I'd like to have a section called "Best practices", with a section on "Tuning the pages visited":

Run with a low limit (e.g. 20, the default), inspect the state.json to see the queue
See if fragment links are valid or not
Check for any other extensions
Add those options to your script

Add a documentation section about running in CI

The aim of the module is to run in CI. We should consider how to document that.

Things like:

Installing
Setting log level
Setting the --ci flag in the CLI
Using the CLI vs the runCore options
Options for reporting
Guides to various CI providers

As a more contained example, we could have a guide with 1. through 4., leaving 5. and 6. as future work.

Use page.$ instead of cheerio

Back when I started this project, I could not find a way to query the DOM from Puppeteer. Thus, in order to run selectors (to find the anchors), I used cheerio. This needs more data to load, another parser etc. It seems brittle, and I think we can move away from it.

Possible next steps

Read the documentation on page.$()
Use the page selector in place of cheerio
Uninstall cheerio and its types

Error in command "accessible-pipeline run https://example.com"

running "accessible-pipeline run https://example.com" results in output

Error: spawn ./dist/cli.js ENOENT
    at Process.ChildProcess._handle.onexit (internal/child_process.js:264:19)
    at onErrorNT (internal/child_process.js:456:16)
    at processTicksAndRejections (internal/process/task_queues.js:80:21)
Emitted 'error' event on ChildProcess instance at:
    at Process.ChildProcess._handle.onexit (internal/child_process.js:270:12)
    at onErrorNT (internal/child_process.js:456:16)
    at processTicksAndRejections (internal/process/task_queues.js:80:21) {
  errno: 'ENOENT',
  code: 'ENOENT',
  syscall: 'spawn ./dist/cli.js',
  path: './dist/cli.js',
  spawnargs: [ 'run', 'https://example.com', '--ci', '--streaming' ]
}

Feature: Add the ability to log in to pages

Some flows might need the user to be logged in (via a Cookie, typically?) to get to the page we need to test.

It would be nice to have the option to do that!

I am not 100% settled on how we will achieve this.
My first thought was to add an option to read a cookie from a file.
However, that seems overly specific.

Another option would be having users to pass in a Puppeteer Page object by themselves. They could log in before hand, using selectors.

A middle ground would be to add an onBeforeAssert hook, that lets the user run scripts with the Page context. Would that be better? Worse? Horrible? Let's find out! :)

Prior art

BackstopJS has a setting cookies option, that allows loading cookies from a file. Maybe that would work as well?

TypeDoc improvements

There are some refinements to be done after #28 is merged.
Namely:

Find out why source files treated as "External". There are several issues opened for it in TypeDoc repo but it doesn't have working solution, at least I don't find one over there... Currently I just manipulate Handlebars templates to get rid of "External" word, which works ok, but still weird. Although from the discussions in TypeDoc's repo I understand that most people have problems with Internal vs External thing, so maybe we can just ignore that and continue hiding "external" word using Handlebars.
Refine small details: I didn't update that much the default way of generating markdown documents, so it might have extra noise or some little bit hard-to-read formatting. So if there is something that looks out of place - it could be changed in /typedocTheme/.

This is quite abstract "Make docs look less like crap" issue so I guess any improvements are welcome, the 2 points I mentioned is just the ones I've noticed myself during setting up the docs.

Improving the Development Experience

A few things came up during our Helsinki Hacktoberfest workshop, that I'd like to work on myself.
They are mostly about the first-time usage, as well as the development loop.

In rough order:

Include the type-checking in CI
Mention the type-checking watch-mode in the Development section of the README
Simplify run-cli to use ts-node cli.ts, rather than the built file. Keep production: run-cli
Mention that running production: run-cli requires npm run build, and chmod +x dist/cli.js
Add a smoke test for building, in CI
Add an end-to-end CLI test (exits 0?)
Pull out the infrastructure for tests, so that new files can use them

Add option to screenshot DOM node

It might be nice to screenshot a DOM node for people to investigate issues.
However, I'm not sure how much complexity this adds!

Posible next steps

Puppeteer offers this as an API.

The main question revolves around how the user stores these.
Since the main results are stored in JSON, it is not possible to store the image binary (unless we base64 it, which would be huge).

Thus, an alternative would be to force the user to provide us with a callback, to let them do whatever they want with the screenshot.

Something like this:

We'd need an Option like screenshotFailingNodes: (screenshot: Buffer) => void
If specified, and there are failures, screenshot the node
Call the callback with the resulting Buffer
Add relevant docs and an explanation of the rationale. This will probably be the important part!

Add a document with different use-cases

Something that we mentioned in the Helsinki Hacktoberfest meetup.
There are different use-cases for different projects.

For example, when would you run this on CI? Would you have a hard pass/fail?
If a site has a lot of issues, perhaps it's best to ship metrics to some other service, to track progress?

I'm sure there's other scenarios that I'm missing!

Add options to cut down report file size

AxE logs a lot of information, even for passes. It also stores a lot of HTML information, that is probably only valuable for violations (and sometimes, not even there; the streaming reporter does not use them). This can lead to a large JSON file, which can crash the reporter when running view.

A possible solution is to:

Remove these fields with a trimReportJSON function, taking parameters for categories (passes, violations, etc).
Add types for these "reduced" files, for example AxeResultsReduced
Surface those to the reporter and the docs for writing your own
Make these configurable e.g. --removeReportsFor passes,pending, with defaults

A future question is whether to remove the html parts for the violations key as well. They are not super useful ime, and they seem the most prone to breaking!

fpapado / accessible-pipeline Goto Github PK

accessible-pipeline's People

Contributors

Stargazers

Watchers

Forkers

accessible-pipeline's Issues

Potential next steps

Future work

Potential next steps

Possible next steps

Potential next steps

How to investigate

Possible steps

Possible next steps

Prior art

Posible next steps

Recommend Projects

Recommend Topics

Recommend Org