webhintio / hint Goto Github PK
View Code? Open in Web Editor NEW💡 A hinting engine for the web
Home Page: https://webhint.io/
License: Apache License 2.0
💡 A hinting engine for the web
Home Page: https://webhint.io/
License: Apache License 2.0
Discussion moved from #87 (comment):
Continuing the discussion about the encoding there are a couple things we can do.
HTML5
defaults toutf8
but previous versions wereISO-8859-1
which are not supported by node directly.jsdom
uses iconv-lite to do text transformations. I don't know how popular that encoding (or any other) are in non-western cultures.
We could:
- Accept the PR as it is with a known issues section in the documentation linking to an issue to fix it.
- Use iconv-lite to add support for the same encodings and maybe look into contributing back for the most popular missing ones. We will need to see what happens with
jsdom
collector because we are using request to get the initial HTML and by default usesutf8
and only supports the same that node does.I'm not sure what percentage of the web is in non
utf8
but we should check it and add support if it is significant if we want sonar to be successful.
Right now it is undefined
or null
(emitAsync('fetch::end', null, networkData);
). I think the reason is we want to put in there the element that triggered the requests but I don't think that's the way to go.
We are analyzing that resource and we need a way to identify exactly so we can group errors related to it. Also, some events will not have a source tag.
The initiator
should come in the networkData
object and be optional.
Try running #84 with npm run site -- http://edge.ms
and you will see issues are not well grouped and makes more sense to have the url directly there instead of looking for it under request
or response
.
We need to:
<!doctype html><html><head>
in the same lineLook into how SSL Server Test
can be integrated.
See also:
Epic to track all the places where we need to add tests (everywhere). Goal should be to have +90% code coverage.
"DENY"
on pages that allow a user to make a state changing operation (e.g: login pages, pages that contain one-click purchase links, checkout or bank-transfer confirmation pages, pages that make permanent configuration changes, etc.)See also:
Right now the CDP sends a redirect event but it is not very complete. I'm thinking it should be something like:
{
"source": "string", // The original url that initiated the request
"hops": ["string"] // All the hops we've done so far
}
We could also add the hops
property to the fetch::end
and targetfetch::end
events and not trigger another event. What do you think @alrra ?
Right now we are doing:
exports const name = ...
: types.ts
module.exports = { ... }
: jsdom.ts
We should pick up a syntax and stick to it through the project.
We should do several things:
interfaces.ts
. All are interfaces IIRC.interfaces.ts
as a way to aggregate all and export them if it makes sense.any
as possible, probably by adding more interfaces.We are not currently testing that fetch::error
is properly emitted (and CDP
doesn't emit it).
Need to fix this.
Strict-Transport-Security
is sent for resources served over HTTPS.max-age
value is small (less than 10886400 seconds should be an error, no configuration possible).Also inform users about https://hstspreload.org/? However, make it clear that:
subdomains
in the long run.See also:
Not all rules (especially ones that do network related checks) make sense with local files, so currently those types of rules will need to have extra checks to work properly.
Wouldn't it be better if rules would specify (e.g.: have a property in meta
) if they work or not with local files?
Make collectors:
We use rule-runner
to test our rules. Right now, the biggest problem is that we need to create a bit of infrastructure around it to test the rules (some mocks, rely on jsdom
, etc.).
We should refactor rule-runner
in such a way that:
collectors
(although this should be configurable because of the limitations with travis)We should probably have a special web server we can control via the configuration in the rules. Port should be random so we can spin up several at the same time (tests are run in parallel).
It should accept text as HTML to return as well as a folder for static resources plus a way to configure some of the response headers.
Add a new collector that supports the Chrome Debugging Protocol.
Since the Zopfli
output (for the gzip
option) is valid gzip
content, there doesn't seem to be a straightforward and foolproof way to identify files compressed with Zopfli
.
From an email discussion with @lvandeve:
There is no way to tell for sure. Adding information to the output to indicate zopfli, would actually add bits to the output so such thing is not done :) Any compressor can set the FLG, MTIME, and so on to anything it wants, and users of zopfli can also change the MTIME bytes that zopfli had output to an actual time.
One heuristic to tell that it was compressed with zopfli or another dense deflate compressor is to compress it with regular gzip -9 (which is fast), and compare that the size of the file to test is for example more than 3% smaller.
gzip
A gzip
member header has the following structure
+---+---+---+---+---+---+---+---+---+---+
|ID1|ID2|CM |FLG| MTIME |XFL|OS | (more-->)
+---+---+---+---+---+---+---+---+---+---+
where:
ID1
= 1f
and ID2
= 8b
- these are the magic numbers that uniquely identify the content as being gzip
.
CM
= 8
- this is a value customarily used by gzip
FLG
and MTIME
are usually non-zero values.
XFL
will be either 0
, 2
, or 4
:
0
- default, compressor used intermediate levels of compression (when any of the -2
... -8
options are used).2
- the compressor used maximum compression, slowest algorithm (when the -9
or --best
option is used).4
- the compressor used fastest algorithm (when the -1
or --fast
option is used).Zopfli
On thing that Zopfli
does is that it sets FLG
and MTIME
to zero, XFL
to 2
, and OS
to 3
, so basically files compressed with Zopfli
will most likely start with 1f8b 0800 0000 0000 0203
, unless things are changed by the user (which in general doesn't seem very likely to happen).
Now, regular gzip
output might also start with that, even thought the chance of doing so is smaller:
XFL
set to 2
.gzip
will have non-zero values for MTIME
and FLG
.So, if a file doesn't start with 1f8b 0800 0000 0000 0203
, it's a good (not perfect) indication that Zopfli
wasn't used, but it's a fast check compared to compressing files and comparing file sizes. However, if a file does start with that, it can be either Zopfli
or gzip
, and we cannot really make assumptions here.
The documentation will contain a lot of links, so we should have an automatic process to detect broken ones.
TODO:
Look into how the Nu HTML Checker
can be integrated.
We should avoid using any
on TypeScript as much as possible.
This is part of #75 but at a larger scale.
Set-Cookie
header is sent with the Secure
and HttpOnly
values if the page is served over HTTPS.See also:
Check is something is served compressed using gzip
:
Make a request with the Accept-Encoding: "gzip"
header, then verify if the response is served with the Content-Encoding: "gzip"
header and the body of the response starts with 1f 8b
.
What should be compressed with gzip
:
File type | Commonly used file extension(s) |
Commonly used media types(s) |
---|---|---|
Atom | .atom |
application/atom+xml |
App Cache Manifest | .appcache |
text/cache-manifest |
BMP | .bmp |
image/bmp |
CSS | .css |
text/css |
Cursors Images | .cur |
image/x-icon image/vnd.microsoft.icon |
Embedded OpenType font | .eot |
application/vnd.ms-fontobject |
Favicon | .ico |
image/x-icon image/vnd.microsoft.icon |
HTML | .html .htm ... |
text/html application/xhtml+xml |
HTML Components | .htc |
text/x-component |
JavaScript | .js |
application/javascript text/javascript |
JSON | .json ... |
application/json application/<something>+json |
OpenType font | .otf |
font/opentype |
RDF | .rdf |
application/rdf+xml |
RSS | .rss |
application/rss+xml |
Source Maps | .map |
application/json |
SVG | .svg |
image/svg+xml |
TrueType font | .ttc .ttf |
application/x-font-ttf |
TXT | .txt |
text/plain |
vCard | .vcard vcf |
text/vcard |
VTT | .vtt |
text/vtt |
XML | .xml ... |
application/xml text/xml application/<something>+xml |
Web App Manifest | .webmanifest .json |
application/manifest+json |
Notes:
WOFF
fonts should not be compressed (see: h5bp/server-configs-apache#42).SVGZ
should be served with the Accept-Encoding: "gzip"
header as they are compressed by default.Check is something is served compressed using Zopfli
:
Same as with gzip
, just that we need to detect Zopfli
.
What should be compressed with Zopfli
:
Same as with gzip
.
Note(s):
WOFF
fonts use Zopfli
compression internally.Check is something is served compressed using Brotli
.
Make a request with the Accept-Encoding: "br"
header, then verify if the response is served with the Content-Encoding: "br"
header (no magic numbers?).
Note: Brotli
compressed responses should be served only over HTTPS.
What should be compressed with Brotli
:
Same as with gzip
.
Note: This is just a starting point, we will probably split this into more specific issues / rules.
<script>
or <link>
elements don't have the integrity
attribute<script>
or <link>
elements doesn’t match the associated integrity
valueSee also:
By default, the collectors will not request certain resources (e.g.: manifest file, all font files, etc.).
Since quite a few rules will need to analyze those requests/resources, it wouldn't make sense to have every rule add custom code to request them, so we should add a helper to do that and notify the subscribed rules.
What we might want to request (incomplete list):
<link rel="manifest" href="site.webmanifest">
)rss
and atom
files (<link rel="alternate" ... href="...">
)@font-face
rules)//# sourceMappingURL=example.js.map
)Or test if just with the following config it works:
'/path': {
statusCode: 301,
content: '/'
}
TODO:
determine exactly what should be checked and how.
Check if the if the meta tag is not included completely within the first 1024 bytes of the document - done by the markdown validator (see: #28)
Check if meta tag is not specified as early as possible (before any content that could be controlled by an attacker, such as a <title>
element) so to avoid a potential encoding-related security issue in Internet Explorer (Note: this was only an issue with IE6?).
Check if non-utf-8
encodings are used
Check if things like utf8
are used (even though this is valid nowadays as the specifications and browsers alias utf8
to utf-8
, that wasn't the case in the past).
Check if the short version is used (i.e.: <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
=> <meta charset="utf-8">
).
Other?
nosniff
.See also:
Servers, frameworks, and server-side languages (e.g.: ASP.NET, PHP), often set, by default, HTTP headers with values that contains information about them: their name, version number, etc.
Sending those types of HTTP headers does not provide any value to users, contributes to header bloat, and just gives more information to any potential attackers about the technology stack being used.
List of headers:
Server
X-AspNet-Version
X-AspNetMvc-version
X-Powered-By
X-Runtime
X-Version
Although the project started as plain JavaScript, we are going to migrate to TypeScript to help with the documentation and development process (intellisense, type checking, etc.).
This issue covers (if needed):
Sinon has released a new major version since we started the tests on this project. We need to update as soon as possible so we don't get left behind now that we don't use it that much.
1; mode=block
. (?)See also:
Check for if the tag is included, and has width=device-width
.
Check for values that create bad user experience such as user-scalable=no
.
Check for values that are ignored such as user-scalable
, min-scale
, and max-scale
.
From https://webkit.org/blog/7367/new-interaction-behaviors-in-ios-10/:
"Now, we ignore the user-scalable, min-scale and max-scale settings. If you have content that disabled zoom, please test it on iOS 10, and understand that many users will be zooming now."
See also:
Yes, we want to group them and I think it should be via the meta object. We can use the
meta.docs.category
or any another property we want. I don’t think extends is a good idea because that should be to extend a configuration set. We could add an option in.sonarrc
to enable all rules within a category (that also works with the command line).It could be something like:
“categories”: [“webapp”, “security”]
That can be mixed with the rules (rules will have higher priority so if we enable a category but disable a rule, all the rules for that category but that one will be enabled).
Problems for this approach:
- How to set up the severity error (could be an object instead of just an array of strings)
- How to set up the extended configuration (maybe it just enables the default values and then user needs to configure further via rules)
X-WebKit-CSP
, X-Content-Security-Policy
).TODO
: Look into what other checks we can add for that this (e.g.: validate the content of the header, upgrade-insecure-requests
)See also:
Check if the <link>
tag (e.g. <link rel="apple-touch-icon" href="apple-touch-icon.png">
) is specified
Note: In the past, people usually just had the apple-touch-icon
in the root of the site, but that is no longer consider a good practice and can create a lot of issues (see: h5bp/html5-boilerplate#1622).
Check if a 180×180px
image is not used
Check if multiple multiple images of different sizes are used
Note: Usually iOS devices get upgraded pretty quickly, most people being on the latest 2 version of the iOS, so specifying multiple sizes of the apple-touch-icon
just adds to the weight of the page, without no real benefit. One 180×180px
apple-touch-icon
is nowadays enough for all cases, as Safari will scale it down automatically if needed (see also: h5bp/html5-boilerplate#1367).
Check if the image has transparent background
Other?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.