Giter Site home page Giter Site logo

accessible-pipeline's People

Contributors

fpapado avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

accessible-pipeline's Issues

Introduce an asynchronous, streaming runCore

The current runCore function buffers all the results in memory, until everything is done.
Only at the end, does it actually give the results to the caller.

This is fine for small workloads, but it can be prohibitive for larger ones, especially given the large size of AxE results.

Another issue with buffering is that streaming workloads (i.e. the CLI "reporter" API) need some hard-coded support for reporting progress. You can see this in the current implementation as a custom pino logger. While I like the current setup (unix-style piping is cool!), I think we could unify this use-case with a streaming API at the runCore level.

The current runCore API (that buffers) can be built on top of a streaming one, by collecting all of the results, and flushing at the end.

Potential next steps

Something along these lines could work:

  • Rename the current runCore to runCoreStreaming, that outputs a stream, or async iterator. Instead of appending to results and returning, yield each result.
  • Add a new runCore function, that calls runCoreStreaming, and collects the results. Return those results once the Observable is done.
  • Finally, consider swapping runCore to be runCoreStreaming. Rename the return-based runCore to runCoreBuffered, or so. Considered; will stick with runCore for now.
  • Add docs with this behaviour

Future work

This mostly concerns the CLI use case:

  • Remove the current "streaming API" logging. Move its events to the CLI module, or a module on top of core.
  • Consider whether to still use sub-processing in the CLI, or whether a function call would suffice.

Announce the default `pageLimit` to the user

The default page limit is set to 20, to avoid accidentally bombing a site. However, this is not surfaced adequately, and can confuse the user, who sees only a few pages crawled.

Possible next steps

I think two things could be done:

  • Surface this reasoning in the documentation
  • Furthermore, if the user has not specified an alternative to the default, print a warning-level "Will run with the default option of...". This will have to be added to the streaming API as well.

Surface page errors in the reporter, report file

At the moment, we retry on error up to --maxRetries. Then, if the page still fails, we skip it, and remove it from the "to visit" list. This is fine for not crashing the process, but can be opaque for the users.

We should have a way to surface these errors:

  • Add streamingSendErrorProcessing to the reporter API
  • Find a way to report these to the report.json. Ideally, I'd like to keep the report as close to AxeResults as possible. We could make it so that the type is Result = Ok AxeResult | Err Error. It has the benefit of forcing users to account for error cases in their processors, but we might need something different.

Automatically generate documentation file using tsdoc

Microsoft has tsdoc, which allows extracting comments from the source. It is built with TypeScript in mind.

I'd love a future where we can pull out the documentation and put it into a markdown file.

Potential next steps

  • Try out tsdoc, and parse the files, starting at the index
  • Generate a markdown document for it. There might be a library to do this, but it might also be a script in this repository
  • Integrate a build-docs step in the package.json

Possible bug in ignoring extensions

Spotted in live use by a coworker:

A bunch of .pdfs were crawled, when they shouldn't have, according to parameters.

How to investigate

We should write tests for this use case.
We'd start by setting up fixtures under test/fixtures/ignoring-extensions.

We'd have the following files as fixtures:

  • index.html
  • document.pdf
  • image.jpg
  • about.html

index links to all the other files

Then, we should verify that:

  • With .pdf as the ignore parameter, runCore only crawls index.html, about.html, image.jpg
  • With both .pdf and .jpg as ignore parameters, runCore only crawls index.html and about.html

Perhaps we should also verify that:

  • The CLI passes the ignore parameter correctly. Possibly a unit test with zero/one/two ignores specified. This could be done as an integration test as well, though it might be harder to verify.

Add option to skip query parameters

This is similar to the to the option to skip fragment links

Often in websites, a bunch of "equivalent" URLs use query parameters, either for cache-busting, or for toggling functionality. Those could be skippable via --ignoreQueryParams.

Possible steps

  • Create a test case, with fixtures under test/fixtures/skip-query-parameters.
  • Add the functionality to runCore, under Options. Implement this check in the decision-making parts ("should visit page" etc.).
  • Add the functionality to the cli, similarly to Options.
  • Document the parameters existence in the docs

This list seems long, so we could take them one at a time! If you need help setting up the test cases, I'd love to help :)

Add TypeDoc documentation generation stage to CI

Set up CI to make documentation every time the branch is about to merge. Don't know how exactly it should be done, maybe it is possible to plug in "Docs generation" commit pre-merge to master. Up for a discussion.

Add a section with best practices

This came up when talking with James.

Some of the options are good to have, like skipFragmentLinks and ignoreExtensions. They help cut down a lot of duplicate (or irrelevant) links, which makes runs faster, and eases the load on the site.

Ideally, I'd like to have a section called "Best practices", with a section on "Tuning the pages visited":

  • Run with a low limit (e.g. 20, the default), inspect the state.json to see the queue
  • See if fragment links are valid or not
  • Check for any other extensions
  • Add those options to your script

Add a documentation section about running in CI

The aim of the module is to run in CI. We should consider how to document that.

Things like:

  1. Installing
  2. Setting log level
  3. Setting the --ci flag in the CLI
  4. Using the CLI vs the runCore options
  5. Options for reporting
  6. Guides to various CI providers

As a more contained example, we could have a guide with 1. through 4., leaving 5. and 6. as future work.

Use page.$ instead of cheerio

Back when I started this project, I could not find a way to query the DOM from Puppeteer. Thus, in order to run selectors (to find the anchors), I used cheerio. This needs more data to load, another parser etc. It seems brittle, and I think we can move away from it.

Possible next steps

  • Read the documentation on page.$()
  • Use the page selector in place of cheerio
  • Uninstall cheerio and its types

Error in command "accessible-pipeline run https://example.com"

running "accessible-pipeline run https://example.com" results in output

Error: spawn ./dist/cli.js ENOENT
    at Process.ChildProcess._handle.onexit (internal/child_process.js:264:19)
    at onErrorNT (internal/child_process.js:456:16)
    at processTicksAndRejections (internal/process/task_queues.js:80:21)
Emitted 'error' event on ChildProcess instance at:
    at Process.ChildProcess._handle.onexit (internal/child_process.js:270:12)
    at onErrorNT (internal/child_process.js:456:16)
    at processTicksAndRejections (internal/process/task_queues.js:80:21) {
  errno: 'ENOENT',
  code: 'ENOENT',
  syscall: 'spawn ./dist/cli.js',
  path: './dist/cli.js',
  spawnargs: [ 'run', 'https://example.com', '--ci', '--streaming' ]
}

Feature: Add the ability to log in to pages

Some flows might need the user to be logged in (via a Cookie, typically?) to get to the page we need to test.

It would be nice to have the option to do that!

I am not 100% settled on how we will achieve this.
My first thought was to add an option to read a cookie from a file.
However, that seems overly specific.

Another option would be having users to pass in a Puppeteer Page object by themselves. They could log in before hand, using selectors.

A middle ground would be to add an onBeforeAssert hook, that lets the user run scripts with the Page context. Would that be better? Worse? Horrible? Let's find out! :)

Prior art

BackstopJS has a setting cookies option, that allows loading cookies from a file. Maybe that would work as well?

TypeDoc improvements

There are some refinements to be done after #28 is merged.
Namely:

  1. Find out why source files treated as "External". There are several issues opened for it in TypeDoc repo but it doesn't have working solution, at least I don't find one over there... Currently I just manipulate Handlebars templates to get rid of "External" word, which works ok, but still weird. Although from the discussions in TypeDoc's repo I understand that most people have problems with Internal vs External thing, so maybe we can just ignore that and continue hiding "external" word using Handlebars.
  2. Refine small details: I didn't update that much the default way of generating markdown documents, so it might have extra noise or some little bit hard-to-read formatting. So if there is something that looks out of place - it could be changed in /typedocTheme/.

This is quite abstract "Make docs look less like crap" issue so I guess any improvements are welcome, the 2 points I mentioned is just the ones I've noticed myself during setting up the docs.

Improving the Development Experience

A few things came up during our Helsinki Hacktoberfest workshop, that I'd like to work on myself.
They are mostly about the first-time usage, as well as the development loop.

In rough order:

  • Include the type-checking in CI
  • Mention the type-checking watch-mode in the Development section of the README
  • Simplify run-cli to use ts-node cli.ts, rather than the built file. Keep production: run-cli
  • Mention that running production: run-cli requires npm run build, and chmod +x dist/cli.js
  • Add a smoke test for building, in CI
  • Add an end-to-end CLI test (exits 0?)
  • Pull out the infrastructure for tests, so that new files can use them

Add option to screenshot DOM node

It might be nice to screenshot a DOM node for people to investigate issues.
However, I'm not sure how much complexity this adds!

Posible next steps

Puppeteer offers this as an API.

The main question revolves around how the user stores these.
Since the main results are stored in JSON, it is not possible to store the image binary (unless we base64 it, which would be huge).

Thus, an alternative would be to force the user to provide us with a callback, to let them do whatever they want with the screenshot.

Something like this:

  • We'd need an Option like screenshotFailingNodes: (screenshot: Buffer) => void
  • If specified, and there are failures, screenshot the node
  • Call the callback with the resulting Buffer
  • Add relevant docs and an explanation of the rationale. This will probably be the important part!

Add a document with different use-cases

Something that we mentioned in the Helsinki Hacktoberfest meetup.
There are different use-cases for different projects.

For example, when would you run this on CI? Would you have a hard pass/fail?
If a site has a lot of issues, perhaps it's best to ship metrics to some other service, to track progress?

I'm sure there's other scenarios that I'm missing!

Add options to cut down report file size

AxE logs a lot of information, even for passes. It also stores a lot of HTML information, that is probably only valuable for violations (and sometimes, not even there; the streaming reporter does not use them). This can lead to a large JSON file, which can crash the reporter when running view.

A possible solution is to:

  • Remove these fields with a trimReportJSON function, taking parameters for categories (passes, violations, etc).
  • Add types for these "reduced" files, for example AxeResultsReduced
  • Surface those to the reporter and the docs for writing your own
  • Make these configurable e.g. --removeReportsFor passes,pending, with defaults

A future question is whether to remove the html parts for the violations key as well. They are not super useful ime, and they seem the most prone to breaking!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.