devongarde / ssc Goto Github PK
View Code? Open in Web Editor NEWstatic site checker (an opinionated HTML nitpicker)
License: Other
static site checker (an opinionated HTML nitpicker)
License: Other
Static Site Checker (an opinionated HTML nitpicker) version 0.1.60 (c) 2020-2024 dylan harris see LICENCE.txt and LICENSE.txt for copyright & licence notice https://ssc.lu/ https://github.com/devongarde/ssc ssc analyses static X/HTML snippets, files and sites: - HTML living standard, Jan 2005 to Jan 2024 - HTML Tags/1.0/+/2.0/3.0/3.2/4.00/4.01/5.0/5.1/5.2/5.3-draft - CSS 1/2.0/2.1/2.2-draft, 2007-2023 snapshots, more - SVG 1.0/1.1/1.2 Tiny/1.2 Full/2.0/2.x-draft - MathML 1/2/3/4-draft - XHTML 1.0/1.1/2.0/5.x - finds broken links - server side includes, mostly - many ontologies with opinions on: - standard english where dialect is required - legal but slovenly HTML - abhorrent rudeness such as AUTOPLAY on <VIDEO> It does NOT: - analyse or understand scripts - analyse or understand XML or derivatives, except as noted above It can output: - 'repaired' HTML (not XHTML) - HTML with resolved server side includes - JSONs of ontological content - website statistical information - deduplicated websites ssc -h for a usage summary. ssc -f config_file analyse site using preprepared configuration ssc directory analyse website based in directory To build & run: 1. Follow the build instructions in build.txt 2. Gleefully run ssc. It will misbehave if you are insufficiently gleeful. This is an alpha version of ssc. It is incomplete. What is complete needs refining. The developer needs coffee. It may contain unexpected features. If you encounter one, please help improve ssc by collecting the following information (where relevant) and forwarding it to the developer: - version of ssc; - precise version of the operating system; - hardware architecture and system information; - detailed description of the problem; - detailed description of the steps to recreate it; - copy of output file/s and relevant logs; - copy of pages/website being analysed; - precise command used; - configuration file/s used, if any; - any ndx file or other pre-existing file used during the run; - any known workarounds, fixes or solutions; - a video of a dance interpretation of the issue. Email everything to [email protected] (if the collected files are more than small, please use a public fileserver and email the link). Do NOT send anything confidential. Furthermore, unless you state otherwise, we reserve the right to publish some or all of the information sent in future versions of ssc, usually in the test suite. If you have a fix, you are invited to submit a pull request on github, at https://github.com/devongarde/ssc . Thank you. SSC can be run in a CGI environment. This is intended for use with OpenBSD's native httpd web server (https://man.openbsd.org/httpd.8). You are reminded that SSC is not production software. Do NOT expose it to untrusted data sources, such as those found on the open web. Notes on names: - recipe: a nod to Vernor Vinge's "A Fire Upon the Deep"; - tea: without tea, nothing works; then there's builders' tea; - sauce: makes the dull tasty; identifies linguistically weak pedants; - toast: toasts code; i liked burnt toast; - heater: i'm not stopping now; - unii: my preferred plural of unix: to my ears, both unixes and unices sound like they sing castrato. - andor: and/or sans ancienne; land of Gift (aber nicht das Gift) SEE ALSO build.txt notes on building ssc gen.txt a model man page usage.txt how to use ssc releasenotes.txt fishless chips LICENCE.txt ssc licence information LICENSE.txt formal GPL 3 licence more licences licences for borrowed external content Background I have a website, arts & ego, at https://dylanharris.org/. It has approaching 60G of original content. It contains hand coded HTMLs 2 to 5. It is a complete mess. Despite a long search, I could not find any tools to properly identify its flaws. Anything I did find was at most cursory. Then came the cow flu*. Hence ssc is a covid project that grew out of hand. * corvid means crow, thus covid means cow**. ** by the laws of sympathetic spelling. Unabashed Opportunism If you appreciate modernist poetry or abstract photography, click on books at https://dylanharris.org/ for gen. REMINDER This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. dylan harris [email protected] December 2023
Given the poor quality of the myriad RDF specifications, I am not implementing it. I will change my mind if an RDF specification that specifies RDF comes along. Until them, I will complete the implementation of RDFa, only.
I now thoroughly understand why WhatWG created microdata!
Set up a container to allow people to test ssc easily.
A Centos 8 guest under docker will avoid licence issues.
The top should contain the executables for download, plus one folder (recipe) and the README.
CSS is an integral part of modern HTML usage, including on static sites. CSS is integrated directly into HTML 5 via SVG 1.1.
Thus any half-decent static site checker would check that static site's CSS. For that reason, a CSS checker would be useful.
Note that ssc already parses CSS files, although only to pick up class names.
Complete facility to export microformat data to JSON files.
The current living standard requires the current draft of MathML 4.
Create a CMake configuration to allow ssc to be built on systems that support CMake and clang or gcc.
Improve XHTML 5 analysis.
Implement SVG 1.2 Full analysis, in context of the published 1.2 Tiny
These attributes' values are currently ignored. They should be verified.
Some MathML2 attributes only apply to certain content, and that is not obvious. Add some comments where confusion may be possible.
In server side includes, using #set and #echo
#set Y = "Z"
#set X = "Y"
#echo X
what should appear is Y, but what actually appears is Z. A double substitution incorrectly occurs. This be a bug.
Amend ssc to better integrate with the command line, including unix shells and windows cmd / powershell, within the context of ssc being cross-platform.
Notes:
Decide whether to check or not SMIL --- and, at least, be aware of it.
There are insufficient tests everywhere. Many tests steal examples from standards, but do not test all permitted enum values, for example. Expand test coverage accordingly.
Furthermore, a number of supposedly support versions of HTML etc. do not have proper test coverage, admittedly because those versions were never properly supported in reality. All the same, add tests to improve the quality of corresponding ssc analysis.
Some sites will have been written using the WhatWG standard of the time they were written, or of earlier times. Implement ssc analysis of earlier such standards, and enable the user to select the appropriate standard by rough date.
Add configuration options to allow those using private microdata to use ssc to verify it.
Performance is bad.
Two tactics should be applied to improve it.
ssc uses an INI style configuration file because it is essential that a command line tool's configuration file be obvious and easily human editable from the command line. In those respects, as good as JSON and XML are for data interchange, they are unsuitable as configuration files.
However, INI file format has serious weaknesses particularly when complex structures must be expressed. OpenBSD solved this problem with a number of utilities, such as pf, httpd, smtpd. If a more powerful configuration format is required, use that format.
If programmatic configuration is necessary, also support JSON.
Rather than posting convenient binaries, sort out keys and subscriptions to produce signed versions as appropriate for each platform. Similarly, sort out PGP keys for here, and do it properly this time.
The WhatWG HTML standard includes microformat data declared using microdata attributes (itemscope etc.). Update ssc to support analysis of this kind of thing.
The current MathML 1 tests are inadequate. Add some more, at least the examples in the spec.
There is currently code to support webmentions. Add tests and tart it, or if it won't be tarted, drop it.
The current ssc coverage of unicode is flawed, and only works by good fortune. Make sure it works properly, and add tests to cover it.
Implement SVG 1.2 Tiny analysis
Implement RDF in HTML analysis
It should build release code!
Implement MathML 3 analysis
The current living standard requires it
Testing is automated, but build and release is not. Do the latter. Presume a hypervisor is to hand, and allow configuration files to control it.
NOTE: I built a set of scripts to do this for something else a few years ago. I'll probably simply update those scripts. They'll require a unix environment, and almost certainly presume a mac. I will ensure though, they run under Linux, and Microsoft's WSL, although I won't set up configuration for them. I will certainly not export virtual guests.
Currently, enum types do not properly use internal HTML versions, and in consequence a number of types are declared when only one type with proper version management is needed. Sort this out.
The current namespace analysis is too rigid, for example ssc complains when namespaced elements are used (it should shut up). Update and improve the analysis.
it would be useful to spell check text in an HTML file
ssc 0.0.58 only analyses schema.org microdata when it is declared using WhatWG microdata attributes (itemscope etc.). schema.org itself supports that AND RDF. Add support for schema.org microdata analysis using RDF in ssc.
Currently, there is an silly conflict between some microformat vocabularies / properties and rel values. Refactor microformat code to sort this out.
Remove hard-coded boost directories from build files (both makefiles and visual studio files)
For example, under Windows 10:
*** /project/web/live/and/change/18-07.shtml
407 ** ** inv sym,
==> "invsym" is no child of "i" [no_such_folder]
the linked file exists, so this appears to be a bug
Update schema.org code for version 11.
Otherwise some valid itemprop's will fail validation
For example:
*** /project/web/live/foto/090410b/image-1.shtml
222 **
contentLocation is a property of CreativeWork. WebPage inherits from CreativeWork. Thus this error report is false.
Given the software can interpret SSI (except formulae), and given it can embed missing closures and even sometimes missing elements, provided an mechanism to output the "corrected" HTML. Also look at outputting HTML from XHTML. Don't consider XHTML from HTML, leave that to tidy.
Parse javascript, at the very least to allow correct analysis of the surrounding HTML.
There's no point in writing a full blown analysis of javascript, there should be dozens of good ones out there. However, some simple analysis could be useful.
There are a number of types whose analysis code is almost, but not quite, the same as each other. Sort this mess out.
Implement MathML 2 analysis
Implement SVG 2.0 analysis
ssc 0.0.58 analysis covers a July 2020 version of the WhatWG HTML standard. Bring ssc up-to-date to cover a recent version of the standard.
ssc should not analyse STYLE, but it currently complains if, for example, double quotes are used (something requires by CSS). Fix this.
Currently ssc only allows additional values in a few enum types. Users should be able to add configuration to extend any enum type so they can verify HTML using private extensions.
Improve incorrectness analysis by improving reporting where the various standards require spelling that conflicts with standard English, so that people are warned when they fail to use dialect English spelling.
Implement recent draft MathML 4 analysis
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.