Giter Site home page Giter Site logo

devongarde / ssc Goto Github PK

View Code? Open in Web Editor NEW
10.0 1.0 2.0 2 GB

static site checker (an opinionated HTML nitpicker)

License: Other

C 5.07% C++ 37.14% HTML 54.51% CSS 2.65% JavaScript 0.04% CMake 0.13% Batchfile 0.22% Shell 0.09% Ruby 0.08% Nit 0.06% Makefile 0.01% Tcl 0.01%
html svg xhtml microformats mathml schema-org microdata utility command-line-tool

ssc's Introduction

Static Site Checker
(an opinionated HTML nitpicker)
version 0.1.60
(c) 2020-2024 dylan harris
see LICENCE.txt and LICENSE.txt for copyright & licence notice
https://ssc.lu/
https://github.com/devongarde/ssc



ssc analyses static X/HTML snippets, files and sites:
- HTML living standard, Jan 2005 to Jan 2024
- HTML Tags/1.0/+/2.0/3.0/3.2/4.00/4.01/5.0/5.1/5.2/5.3-draft
- CSS 1/2.0/2.1/2.2-draft, 2007-2023 snapshots, more
- SVG 1.0/1.1/1.2 Tiny/1.2 Full/2.0/2.x-draft
- MathML 1/2/3/4-draft
- XHTML 1.0/1.1/2.0/5.x
- finds broken links
- server side includes, mostly
- many ontologies

with opinions on:
- standard english where dialect is required
- legal but slovenly HTML
- abhorrent rudeness such as AUTOPLAY on <VIDEO>

It does NOT:
- analyse or understand scripts
- analyse or understand XML or derivatives, except as noted above

It can output:
- 'repaired' HTML (not XHTML)
- HTML with resolved server side includes
- JSONs of ontological content
- website statistical information
- deduplicated websites



ssc -h
for a usage summary.

ssc -f config_file
analyse site using preprepared configuration

ssc directory
analyse website based in directory



To build & run:
1. Follow the build instructions in build.txt
2. Gleefully run ssc. It will misbehave if you are insufficiently
   gleeful.



This is an alpha version of ssc. It is incomplete. What is complete
needs refining. The developer needs coffee.

It may contain unexpected features. If you encounter one, please help
improve ssc by collecting the following information (where relevant)
and forwarding it to the developer:
- version of ssc;
- precise version of the operating system;
- hardware architecture and system information;
- detailed description of the problem;
- detailed description of the steps to recreate it;
- copy of output file/s and relevant logs;
- copy of pages/website being analysed;
- precise command used;
- configuration file/s used, if any;
- any ndx file or other pre-existing file used during the run;
- any known workarounds, fixes or solutions;
- a video of a dance interpretation of the issue.
Email everything to [email protected] (if the collected files are more than
small, please use a public fileserver and email the link). Do NOT send
anything confidential. Furthermore, unless you state otherwise, we
reserve the right to publish some or all of the information sent in
future versions of ssc, usually in the test suite. If you have a fix,
you are invited to submit a pull request on github, at
https://github.com/devongarde/ssc . Thank you.



SSC can be run in a CGI environment. This is intended for use with
OpenBSD's native httpd web server (https://man.openbsd.org/httpd.8).
You are reminded that SSC is not production software. Do NOT expose it
to untrusted data sources, such as those found on the open web.



Notes on names:
- recipe: a nod to Vernor Vinge's "A Fire Upon the Deep";
- tea: without tea, nothing works; then there's builders' tea;
- sauce: makes the dull tasty; identifies linguistically weak pedants;
- toast: toasts code; i liked burnt toast;
- heater: i'm not stopping now;
- unii: my preferred plural of unix: to my ears, both unixes and unices
        sound like they sing castrato.
- andor: and/or sans ancienne; land of Gift (aber nicht das Gift)



SEE ALSO
build.txt        notes on building ssc
gen.txt          a model man page
usage.txt        how to use ssc
releasenotes.txt fishless chips
LICENCE.txt      ssc licence information
LICENSE.txt      formal GPL 3 licence
more licences    licences for borrowed external content



Background
I have a website, arts & ego, at https://dylanharris.org/. It has
approaching 60G of original content. It contains hand coded HTMLs 2
to 5. It is a complete mess. Despite a long search, I could not find
any tools to properly identify its flaws. Anything I did find was at
most cursory.

Then came the cow flu*.

Hence ssc is a covid project that grew out of hand.

* corvid means crow, thus covid means cow**.
** by the laws of sympathetic spelling.



Unabashed Opportunism
If you appreciate modernist poetry or abstract photography, click on
books at https://dylanharris.org/ for gen.



REMINDER
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.



dylan harris
[email protected]
December 2023

ssc's People

Contributors

devongarde avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

xirdigh

ssc's Issues

RDFa

Given the poor quality of the myriad RDF specifications, I am not implementing it. I will change my mind if an RDF specification that specifies RDF comes along. Until them, I will complete the implementation of RDFa, only.

I now thoroughly understand why WhatWG created microdata!

Set up test container

Set up a container to allow people to test ssc easily.

A Centos 8 guest under docker will avoid licence issues.

reorganise repository

The top should contain the executables for download, plus one folder (recipe) and the README.

Parse CSS

CSS is an integral part of modern HTML usage, including on static sites. CSS is integrated directly into HTML 5 via SVG 1.1.

Thus any half-decent static site checker would check that static site's CSS. For that reason, a CSS checker would be useful.

Note that ssc already parses CSS files, although only to pick up class names.

MathML 4 (draft)

The current living standard requires the current draft of MathML 4.

CMAKE

Create a CMake configuration to allow ssc to be built on systems that support CMake and clang or gcc.

XHTML 5

Improve XHTML 5 analysis.

SVG 1.2 Full

Implement SVG 1.2 Full analysis, in context of the published 1.2 Tiny

MathML 2 mspace attributes

Some MathML2 attributes only apply to certain content, and that is not obvious. Add some comments where confusion may be possible.

SSI double variable substitution

In server side includes, using #set and #echo

#set Y = "Z"
#set X = "Y"

#echo X

what should appear is Y, but what actually appears is Z. A double substitution incorrectly occurs. This be a bug.

Better integration with command line environment

Amend ssc to better integrate with the command line, including unix shells and windows cmd / powershell, within the context of ssc being cross-platform.

Notes:

  • openBSD generally offers the best experience; try and achieve that level of quality;
  • CLIG is useful as a general background read for context although it gets things wrong (for example, configuration files must be easily human readable and editable, which is why ssc uses INI format & not JSON / XML; if ssc requires a more powerful format, it will switch to PF-style configuration files);
  • whatever CLIG says, well written man pages are essential from the (unix) command line mostly because the alternatives are, in comparison, difficult to use and a pain to dig up.

SMIL

Decide whether to check or not SMIL --- and, at least, be aware of it.

Test coverage

There are insufficient tests everywhere. Many tests steal examples from standards, but do not test all permitted enum values, for example. Expand test coverage accordingly.

Furthermore, a number of supposedly support versions of HTML etc. do not have proper test coverage, admittedly because those versions were never properly supported in reality. All the same, add tests to improve the quality of corresponding ssc analysis.

Older WhatWG standards

Some sites will have been written using the WhatWG standard of the time they were written, or of earlier times. Implement ssc analysis of earlier such standards, and enable the user to select the appropriate standard by rough date.

performance

Performance is bad.

Two tactics should be applied to improve it.

  • multithreading
  • performance analysis

Switch to PF-style configuration file

ssc uses an INI style configuration file because it is essential that a command line tool's configuration file be obvious and easily human editable from the command line. In those respects, as good as JSON and XML are for data interchange, they are unsuitable as configuration files.

However, INI file format has serious weaknesses particularly when complex structures must be expressed. OpenBSD solved this problem with a number of utilities, such as pf, httpd, smtpd. If a more powerful configuration format is required, use that format.

If programmatic configuration is necessary, also support JSON.

signed installers

Rather than posting convenient binaries, sort out keys and subscriptions to produce signed versions as appropriate for each platform. Similarly, sort out PGP keys for here, and do it properly this time.

Integrate microformat with microdata

The WhatWG HTML standard includes microformat data declared using microdata attributes (itemscope etc.). Update ssc to support analysis of this kind of thing.

Add MathML 1 tests

The current MathML 1 tests are inadequate. Add some more, at least the examples in the spec.

Webmentions

There is currently code to support webmentions. Add tests and tart it, or if it won't be tarted, drop it.

UNICODE

The current ssc coverage of unicode is flawed, and only works by good fortune. Make sure it works properly, and add tests to cover it.

RDF

Implement RDF in HTML analysis

SVG 2.0

The current living standard requires it

automate build process

Testing is automated, but build and release is not. Do the latter. Presume a hypervisor is to hand, and allow configuration files to control it.

NOTE: I built a set of scripts to do this for something else a few years ago. I'll probably simply update those scripts. They'll require a unix environment, and almost certainly presume a mac. I will ensure though, they run under Linux, and Microsoft's WSL, although I won't set up configuration for them. I will certainly not export virtual guests.

Refactor enum types

Currently, enum types do not properly use internal HTML versions, and in consequence a number of types are declared when only one type with proper version management is needed. Sort this out.

namespaces

The current namespace analysis is too rigid, for example ssc complains when namespaced elements are used (it should shut up). Update and improve the analysis.

Spellchecker

it would be useful to spell check text in an HTML file

schema.org in RDF

ssc 0.0.58 only analyses schema.org microdata when it is declared using WhatWG microdata attributes (itemscope etc.). schema.org itself supports that AND RDF. Add support for schema.org microdata analysis using RDF in ssc.

Refactor microformat analysis

Currently, there is an silly conflict between some microformat vocabularies / properties and rel values. Refactor microformat code to sort this out.

valid schema.org property contentLocation marked as invalid

For example:

*** /project/web/live/foto/090410b/image-1.shtml

222 **

**
==> "contentLocation" cannot have sub-values, and/or is not a property of "WebPage" [bad_property]

contentLocation is a property of CreativeWork. WebPage inherits from CreativeWork. Thus this error report is false.

Export "fixed" HTML

Given the software can interpret SSI (except formulae), and given it can embed missing closures and even sometimes missing elements, provided an mechanism to output the "corrected" HTML. Also look at outputting HTML from XHTML. Don't consider XHTML from HTML, leave that to tidy.

Parse Javascript

Parse javascript, at the very least to allow correct analysis of the surrounding HTML.

There's no point in writing a full blown analysis of javascript, there should be dozens of good ones out there. However, some simple analysis could be useful.

Refactor type_master

There are a number of types whose analysis code is almost, but not quite, the same as each other. Sort this mess out.

SVG 2.0

Implement SVG 2.0 analysis

Current WhatWG standard

ssc 0.0.58 analysis covers a July 2020 version of the WhatWG HTML standard. Bring ssc up-to-date to cover a recent version of the standard.

Suppress STYLE issues

ssc should not analyse STYLE, but it currently complains if, for example, double quotes are used (something requires by CSS). Fix this.

Allow extra identifiers in any ENUM type

Currently ssc only allows additional values in a few enum types. Users should be able to add configuration to extend any enum type so they can verify HTML using private extensions.

Incorrectness

Improve incorrectness analysis by improving reporting where the various standards require spelling that conflicts with standard English, so that people are warned when they fail to use dialect English spelling.

MathML 4

Implement recent draft MathML 4 analysis

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.