lifthrasiir / roadroller Goto Github PK

View Code? Open in Web Editor NEW

285.0 2.0 10.0 590 KB

Roadroller: Flattens Your JavaScript Demo

Home Page: https://lifthrasiir.github.io/roadroller/

License: Other

JavaScript 64.95% HTML 35.05%

compressor js compression javascript js13kgames

roadroller's Issues

Online demo should optimize in background after the initial compression

-O1 has been default in the CLI since 2.0.0, but the online demo doesn't optimize until the button is pressed. Ideally though the demo should instantaneously behave after the textarea update with default parameters (-O0) then optimize in background to produce the -O1 equivalent, so that the demo more or less behaves identically to the CLI. This obviously requires the web worker (#5).

Length-reducing preprocessor

The current Roadroller algorithm takes linear time relative to the input length, so the length-reducing preprocessor is desirable as long as it doesn't interfere with context modelling. In fact the current JS preprocessor is one of them (it essentially replaces "text" models where the context unit is a word or token with more general bytewise models). But it doesn't work for non-JS inputs, so we need more general solutions.

In particular LZP is a promising candidate because of its simplicity. I even hoped that for shorter inputs LZP-only mode can be possible, but it doesn't seem to be the case at the moment (PNG bootstrap works better).

console report error

see demo.html line 424. "dynamicModels" is constant, but "dynamicModels = undefined" in line 426.

Alternative activation function

Context mixing commonly uses the logistic function f(x) = 1/(1+exp(-x)) as an activiation function, but it is not the only possibility. Since Math.log/exp is quite expensive and given their accuracy is implementation-defined (although all known browsers use the same approximation), it is worthwhile to see if an alternative activation function can be used.

I've personally tried two possibilities g(x) = x/sqrt(1+x^2) and h(x) = x/(1+abs(x)). (Note that IEEE 754-2008 requires sqrt to be correctly rounded.) They are roughly similar to the original logistic function after scaling:

g(x) turned out to be a drop-in for the logistic function, while h(x) required an adjustment to the learning rate (by about 2^-10). As an example the following is a required change against 3f60e44 for g(x).

         for (let i = 0; i < this.models.length; ++i) {
             const weight = this.weights[i];
-            const prob = this.models[i].predict(context) * 2 + 1;
-            const stretchedProb = Math.log(prob / ((2 << this.precision) - prob));
+            const prob = (this.models[i].predict(context) * 2 + 1) / (1 << this.precision) - 1;
+            const stretchedProb = prob / Math.sqrt(1 - prob * prob);
             this.stretchedProbs[i] = stretchedProb;
             total += weight * stretchedProb;
         }

         // since CM is the last model and predictions are not stored,
         // we can just compute the external probability directly.
-        const mixedProb = (2 << this.precision) / (1 + Math.exp(-total));
+        const mixedProb = (total / Math.sqrt(1 + total * total) + 1) * (1 << this.precision);
         if (mixedProb >= (2 << this.precision)) {
             throw new Error('LogisticMixModel.predict: weighted average overflow');
         }

                 // calculate the mixed prediction q
                 `x=u.map((j,i)=>(` +
-                    `y=p[j]*2+1,` +
+                    `y=(p[j]*4+2)/M-1,` +
                     // stretch(prob), needed for updates
-                    `y=Math.log(y/(M-y)),` +
-                    `q-=w[i]*y,` +
+                    `y/=Math.sqrt(1-y*y),` +
+                    `q+=w[i]*y,` +
                     // premultiply with learning rate
                     `y${learningRateNum == 1 ? '' : '*'+learningRateNum}/${learningRateDenom}` +
                 `)),` +
 
                 // q: squash(sum of weighted preds) followed by adjustment
-                `q=M/(1+Math.exp(q))|1,` +
+                `q=(q/Math.sqrt(1+q*q)+1)*M/2|1,` +
                 // decode the bit b
                 `b=t%M<q,` +
                 `t=(b?q:M-q)*(t>>${precision + 1})+t%M-!b*q` +

Unfortunately both were not enough for replacing the logistic function, g(x) was 1--400 bytes larger while h(x) was 1000+ bytes larger for most samples I've tried. Really unfortunate, as g(x) were significantly faster than f(x) in V8 (by about 10%). Any other suggestions are welcomed.

CommonJS support

Although Node has modules support, many modern build processes still rely on CJS exports.

Maybe support multiple environments with something like rollup:

// rollup.config.js
[
  {
    input: `./index.js`,
    output: { file: `dist/index.js`, format: 'esm' }
  },
  {
    input: `./index.js`,
    output: { file: `dist/index.cjs`, format: 'cjs' }
    ]
  }
]

// package.json
{
  "main": "dist/index.cjs",
  "module": "dist/index.js"
}

Better error message on invalid JS

A syntax error (or possibly unsupported future syntax like #23) currently throws a fixed exception. It should instead point to the context and line number. The column number is also good to have, but the accurate calculation of the column number is much harder than it seems (tab expansion, surrogate pairs, East Asian width etc).

Expose `DefaultModelCompressOptions.disableWasm` to `PackerOptions` and the CLI

PackerOptions.disableWasm is already recognized but missing in d.ts. We also need an equivalent option in the CLI, probably --wasm|--no-wasm (with no shorthands).

`-OO` or any future optimization level should try the search at multiple different starting points

-OO currently starts a new search at the previous best parameters, but running -O2 multiple times independently might have a better result. There may be multiple reasons: linear parameters would always stuck on local minima, and SA might not be sufficient to escape from the current local minimum. For that reason it was suggested that -O3 should just run -O2 (or equivalently and more efficiently, only the simulated annealing part) 10 times and keep the best among them.

Multiple inputs

This is hinted in some commented codes, but we need to be able to put multiple inputs into a single compressed stream:

A raw data in a JS string needs to be quoted.
The current Roadroller escapes every character beyond ASCII for better compression and a raw data can be severely disadvantaged unless specifically handled.
Some data may have different optimal parameters for the compression.
We can provide uniform preprocessing routines that are tuned for Roadroller.

The exact design is not yet finished, but it almost surely involves an additional code in the evaluated JS input and special identifiers like $ROADROLLER$INPUT_NAME.

Reuse the memory in the wasm compressor

Currently each wasm compressor instance holds its own memory so we can't blindly increase the number of instances. Instead we can make it shared (and update the marking mechanism accordingly), so that we can control the memory cache ourselves. This can improve the max memory used during the compressor.

Make the compressor only produce the estimated number of bytes used during optimization

The current wasm compressor produces an array of bitwise probabilities (count = input.length * inBits ) that is fed to ANSEncoder, but the optimization process only needs the number of bytes used so this can be massively simplified. The length of the first line is log_2(PROD_i { predictions[i] / 2^(precision + 1) }) plus the coding overhead, while the length of the second line is currently independent of the exact content of the first line.

I've experimentally implemented this and found that while it is indeed faster, the estimation was slightly off for an unknown reason---probably the coding overhead is not as insignificant. This can be a problem at the later stages of the optimization where each improvement is as small as a single byte. Whether this is relevant or not hasn't been yet investigated.

Avoid using control codes in the generated code if possible

I though it wasn't much of issue, but it is clearly causing an UX problem. U+007F seems okay, but U+001F doesn't render in many environments so an alternative would be appreciated.

Test with older versions of node.js

Maybe I can use nvm or similar to test locally, or set up the CI instance to make sure that.

parcel plugin

Hello greatest coder in South Korea (unified Korea?)

Anyway, I love your work on this. It's crazy. Good. Good crazy. And crazy good too!!! :p ;) xx ;p

Can you please make a parcel plugin? You probably won't, because it's too much work I guess and you don't need to.

I can hack this by running roadroller on my scripts after bundling with parcel...but I would just love to be able to throw it in.

OMG that would be great.

Thank you kindly sir, you are the greatest programmer in the Koreas (i believe)!!! :p :) xx ;p

cris

--uncompressed-only option for fair comparisson when output is single JS

Thanks for your awesome and inspiring work! I am working on a somewhat similar effort (https://github.com/eyaler/ztml) and wanted to compare compression performance with roadroller. I ran some initial benchmarks which you can see in the readme, however, my use case is the one where everything has to be in a JS single file (no following gzip), which is not fair for roadroller. I found the commented out --uncompressed-only... not sure if this is something you are planning to implement, or otherwise do you have any suggestions on how to change your code for this goal.

Online demo should use web workers

The online demo currently does the compression and optimization in the main thread. While it does yield whenever possible the slowdown is noticable. It is currently implemented in that way because we need to split (or possibly duplicate) non-worker and worker code otherwise; it can be done today with some build-level hacks, but it is annoying and I prefer to do it with a proper multiple input mechanism instead (#8).

License question

Hello, this is incredible.

I checked the license, and "any decoder code is put in the public domain." Is that to be take to mean if I compress a web app bundle with roadroller then the web app code will be public domain?

I would love to use this to package up resources, but I couldn't if it would change the license of what I applied it to.

New output format: Seven-bit coding (`-F7`)

This is an approach used by base-122 and optimal if you are using UTF-8. It recognizes the forbidden code units (in this case, \r, $, `, \ and possibly \0) and encodes it and the following code unit into two-byte UTF-8 sequences:

0xxxxxxx                for allowed code units xx
110xxxyy 10yyyyyz       for forbidden code units xx (indexed) and following code unit yy

It should be noted that two-byte UTF-8 sequences have up to two spare bits to spare.

The index to forbidden code units are encoded to 3 bits (the first index 0 should be reserved to make it valid UTF-8), so we definitely have a single unused bit (denoted as z above).
Not all indices to forbidden code units would be used. Since coding into two-byte sequence actually saves the space, we can assign otherwise allowed code units into these indices.

We may need a change to ANSEncoder to leverage this spare bit, so it is not as trivial.

Preset in the decoder

Roadroller compression library can make use of shared presets for better compression, but it is unclear how to leverage presets in the generated decoder. I have considered two possibilities:

Since the decoder itself is a sizable JS code, it can in principle double as a preset. Indeed in my testing giving the decoder as a preset resulted in 20--30B improvements for samples, but it was less than what I hoped for and the additional decoder size won't make it.
WebGL is highly standardized and versioned so we can in principle extract a list of identifiers out of WebGLRenderingContext as a context. But the actual list varied across browsers, and the WebGL spec doesn't prevent WebGLRenderingContext being amended anyway, so it was deemed unsafe.

strange code

line 301 in demo.html:
sel === sel is always true. what is it mean?

New output format: Self-XHR/Fetch (`-F8xhr`, `-F8fetch`)

xem's int2binary2html gives three approaches to encode binary data. The second is based on single-byte encoding (#31), the third is not as efficient as the current six-bit encoding (documented in #29), but the first is based on XHR/fetch which can be compatible over many settings.

The modus operandi is as follows: the HTML contains a hidden chunk of binary data (which can be interpreted as U+FFFD by browsers). JS then reads the HTML file itself as an ArrayBuffer and extract required bytes. The decoder would then look like this:

with(new XMLHttpRequest)open('GET','#'),responseType='arraybuffer',send(),onload=r=>{...} // XHR
fetch('#').then(r=>r.arrayBuffer()).then(r=>{r=new Uint8Array(r);...}) // fetch

The decoder should not pollute a global scope unless it's fine

<canvas id=a><script>/* Roadroller output from a JS code using a */</script>

This won't work, because a (among others) gets assigned during the decompression and it shadows window.a.

The problem is that this might be okay depending on the input, and we definitely want to exploit that whenever possible. There are ~16 variables used by Roadroller, carefully chosen so that the decoder gets compressed more, and changing any of them can have undesirable effects. One possible resolution is to put everything into a giant function:

eval(((A='....', t=1234, a, c, f, l, ...) => {
    /* code */
})())

But I'm not sure how to make it compatible with multiple inputs (#8).

Workaround

For now you should avoid using the following ids in your HTML, either outside <script> or written with document.write.

A M a c e f l m o p r t u w x y

If you can't avoid using them there are two other workarounds:

Prepend window. to all uses of such variables.
Put delete VARIABLE; for all affected variables at the beginning of your code.

Build a rollup plugin?

Rollup is pretty cool for bundling ES modules. If you want to minify the generated code, you need to use a plugin like "uglify" or whatever. I was thinking that a roadroller rollup plugin may be useful and would be pretty cool.

Also -- Happy New Year @lifthrasiir! and good to see you are working on many new compression innovations! :)

Deterministic optimization

Currently optimization is fully probabilistic, which can be an issue for automated build process. For now I'm not particularly keen to the seeded optimization because there has been a great variance across multiple optimization runs (for example, a particularly lucky run can save 10--100 additional bytes), but that's a possibility.

Directly produce an optimal ZIP/gzip file

Roadroller strongly depends on DEFLATE's Huffman coding because JS string literals are not efficient in terms of information entropy (~6.96 bits per byte). The first line specifically exploits this by using the minimum amount of literal bytes, but a stock zlib doesn't fully recognize this difference in the symbol distribution and normally combines two lines into one block. Zopfli-like tools do recognize this, but the user has to use those tools to benefit from this.

Maybe we can solve this UX problem by directly producing an optimal ZIP/gzip file from Roadroller itself. This is not a small task because:

We should be able to insert a preamble before the resulting <script> tag.
In case of ZIP:
- We should be able to insert arbitrary additional files to the resulting file; or
- We should completely eliminate the needs for additional files, e.g. we should process image files into Roadroller-friendly formats.

While Roadroller somehow has a working implementation of zlib (-5), the optimal size can only be reached with Zopfli or similar tool so Roadroller should depend on that.

Escaped quotes hinders the compression

The current model assumes that every opening quote is paired with the subsequent closing quote, which is not the case for escaped quotes. We probably should transform string literals so that every escaped quote is replaced with hex escapes.

suggestion of decoder

I think (α=α*997+(ο[λ-δ]|0)|0) should be α=α*997+(ο[λ-δ]|0)

js-tokens doesn't support private identifiers (`#foo`)

Hello @lifthrasiir

I ran

$ npx roadroller src/bang.js

on this file: https://github.com/i5ik/BANG/blob/799c5ed70bf0585b11024a1597f21fe5691ead74/src/bang.js

And received this output:

Error: invalid JS code in the input

The file src/bang.js runs correctly in the browser (as you can see)

Above, I ran roadroller on Node v14.17.6

Below, I used nvm to try node v16.9.1 and re-installed roadroller@latest and ran with -v and got the same error as well as

Actual memory usage: 145.9 MB (out of 150 MB)
Error: invalid JS code in the input

I tried with more memory --M and had

Actual memory usage: 600.8 MB (out of 1000 MB)
Error: invalid JS code in the input

Just leaving this report here for information.

Thank you again for this library!

Better support for larger inputs

The current decompressor is optimized for size and very slow (<50 KB/s). I've been asked about the possibility to use Roadroller for larger inputs, possibly around the single digit megabyte range, and while Roadroller is generally not a good choice for them I've heard of several (rarer) use cases where Roadroller can actually help.

Possible improvements:

Reimplement the decompressor in wasm (would be 3--4x faster). This is not hard to do, but we need to think carefully about the memory layout. This will increase the compressed size by several hundred bytes so probably we need a separate option for that.
Support a progress callback for larger inputs. Roadroller currently assumes that each compression round is fast enough that no further report is required, but it will be definitely required for larger inputs.
Length-reducing preprocessor (e.g. LZ77). This can possibly trigger a major architecture change.

decoder size is saved 2bytes

current:
for(;e<size;)for(x=1;x<128;x=x*2+U)
suggest:
for(;x=e<size;)for(;x<128;x+=x+U)

Built-in way to cache the parameters

Currently compression parameters have to be manually copied from the search result, but it would be great to have an official method to save that. This would require a sort of the configure-from-file option, or we can combine it with the cache output file so that using the same cache parameter will automatically read and write the cache. It is best to have the same feature also in the API, but I'm not sure I can make it work without sacrificing non-Node.js uses.

Abbreviate non-identifier tokens

It is technically possible to abbreviate (i.e. convert into unused single bytes) non-identifier tokens in the JS preprocessor, but I'm not very sure about the outcome. For example if the optimal number of abbreviations is close to the maximum, some previously abbreviated tokens may have to be unabbreviated and that can result in worse compression.

Rename the current `-O1` to `-O2` and introduce a faster `-O1`

The current -O1 targets 1--2 minutes and currently performs about 300 compressions, which is prohibitively slow in interactive uses. I'd like to have a faster version that takes less than 10 seconds so that it can be inserted to the regular build process. If this works well we can possibly make -O1 default as well.

Adhere to Content Security Policy?

Is it possible to use this compressor while adhering to CSP unsafe-eval policies? (i.e. avoid the use of eval(), new Function() etc.)

Node.js 14 doesn't have a global `performance` object

This only breaks the optimizer, which was why I never realized this (I only tested an one-shot compression).

Roadroller as a library should be usable in the web environment so we can't blindly import perf_hooks. It is more troubling that we can't exactly use await import('perf_hooks') either because Node.js 14 doesn't have a top-level await. I think Packer.optimize should just have a local performance variable which is initialized either from window.performance or (await import('perf_hooks')).performance whichever is available.

New output format: Single-byte encoding (`-F8l1` or `-F8cyrl`)

The normal JS string literal is limited in its information density, but if we have a control over the character encoding we can overcome this limitation. xem's int2binary2html has used ISO/IEC 8859-1 (Latin-1, l1) for example, which requires 88 bytes to decode:

b=i.charCodeAt(),b>>8?"€ ‚ƒ„…†‡ˆ‰Š‹Œ Ž  ‘’“”•–—˜™š›œ žŸ".indexOf(i)+128:b
// note that the string literal above is also encoded in Latin-1 so they are one byte per character.

After some testing I concluded that ISO/IEC 8859-5 (cyrillic) is a better choice, which can be decoded in 44 bytes of code due to its regular assignment:

b=i.charCodeAt(),b>>8?b%3683-864:b-167?b:253

Unlike 8859-1 we need to declare the character encoding for 8859-5, but <meta charset=cyrillic> is only 23 bytes long so it is still better than 8859-1.

A critical problem with this approach is that the server-side header is preferred over the client-side metadata, so it is very sensitive to the server setting. In the case of js13kGames the server originally didn't declare the character encoding but it now does, breaking past entries that have used this technique.

Remove unneeded `new` keyword

In the output, I noticed that it's using the new keyword:
new Uint16Array(12<<22).fill(1<<15),new Uint8Array(12<<22),0,0,0)

Correct me if I'm wrong, but I don't think it's necessary and will remove a few bytes!

Uint16Array(12<<22).fill(1<<15),Uint8Array(12<<22),0,0,0)

Publish to deno.land/x

Deno is yet another JavaScript runtime.
Deno users mainly import scripts hosted in deno.land rather than NPM.

Here's how to publish it: https://deno.land/x?query=roadroller

Scrapped ideas

This issue lists some wild ideas I had in mind but scrapped for various reasons. Maybe some of them might be useful in other projects.

Source map support

It turns out that we can put the source map comment into evaled code and devtools do recognize it. This should not present in the final build, so I think the resulting code should be made obvious that the source map is in use, like this:

eval(Function(/* compressed data here */)(...)+'\n//# sourceMappingUrl=foo.js.map')

We would have to preserve the existing sourceMappingUrl comment and update it (and also sourceURL) if the JS preprocessing is in use.

Whitespace between math operators

Try this in the online demo:

const f = (t) => 1 - --t;

console.log(f(1));

That generates:

eval(Function("[M='EGLYWeQK{NHDV^vz}BAr`{yaBxjDGZic_CHz]FFSD{|c'"
,...']charCodeAtUinyxp',"for(;e<38;c[e++]=x-128)for(x=1;x<128;n=p.map((i,y)=>(t=r[i]*2+1,t=Math.log(t/(h-t)),A-=a[y]*t,t/500)),A=~-h/(1+Math.exp(A))|1,U=o%h<A,o=o%h+(U?A:h-A)*(o>>17)-!U*A,p.map((i,y)=>(t=r[i]+=(U*h/2-r[i]<<13)/((C[i]+=C[i]<5)+1/20)>>13,a[y]+=n[y]*(U-A/h))),x=x*2+U)for(p='010202103203210431053105410642065206541'.split(A=0).map((i,y)=>(t=0,[...i].map((i,y)=>(t=t*997+(c[e-i]|0)|0)),h*32-1&t*997+x)*12+y);o<h*32;o=o*64|M.charCodeAt(d++)&63);return String.fromCharCode(...c)")([],[],1<<17,[0,0,0,0,0,0,0,0,0,0,0,0],new Uint16Array(51e6).fill(1<<15),new Uint8Array(51e6),0,0,0))

If you replace eval with console.log, the result is:

const f=(t)=>1---t;
console.log(f(1));

Which is invalid. Invalid left-hand side expression in postfix operation The white space between the minus - and the decrement -- is significant.

RangeError with -M 1024

npx roadroller -M 1024 -O2 -D -a write -t text input_file.txt -o output_file.js

i get:

file:///D:/Users/user/node_modules/roadroller/wasm.mjs:677
        new Uint8Array(memory.buffer, M_INPUT).set(input);
        ^

RangeError: Start offset -1879046368 is outside the bounds of the buffer
    at new Uint8Array (<anonymous>)
    at file:///D:/Users/user/node_modules/roadroller/wasm.mjs:677:9
    at compressWithDefaultModel (file:///D:/Users/user/node_modules/roadroller/index.mjs:571:45)
    at Function.doPack (file:///D:/Users/user/node_modules/roadroller/index.mjs:971:75)
    at calculateSize (file:///D:/Users/user/node_modules/roadroller/index.mjs:1363:35)
    at updateBest (file:///D:/Users/user/node_modules/roadroller/index.mjs:1386:26)
    at updateBestAndReportProgress (file:///D:/Users/user/node_modules/roadroller/index.mjs:1397:56)
    at file:///D:/Users/user/node_modules/roadroller/index.mjs:1524:26
    at evaluate (file:///D:/Users/user/node_modules/roadroller/index.mjs:1436:31)
    at search (file:///D:/Users/user/node_modules/roadroller/index.mjs:1446:51)

if i use lower M or run with -O1 it works file.

Uncompressed input

Currently Roadroller gives you a JS code which decompresses into the original input when run. Normally you then have to put it into the template that is not compressed; this is trivial in the API but not in the CLI, so we need CLI options to prepend and append arbitrary uncompressed data to the output. Depending on the design there may be several options.

-t asis: Any input with this type would be printed as is, with implied -a write. This fits well with multiple inputs (#8) but there is a possible ambiguity. For example -t asis one.html -t js a.js -t asis two.html -t js b.js -t asis three.html suggests that a.js and b.js is separated with two.html, but this might not be desirable or even impossible depending on the implementation.
-t before|after: These options ensure the uncompressed input to go either before or after the compressed data. I'm not sure how to handle multiple of them though.
-B|--before and -A|--after: Similar to the previous but it is a bit clearer because they are technically not inputs but output transformations (the CLI has an implicit convention that lowercased options are related to inputs and uppercased options are related to outputs).
-T|--template: "Before"/"after" inputs would be related, so a single option might be more appropriate. It is also free from multiple uncompressed inputs (we can simply forbid multiple -T options). A mark where the compressed JS goes should be chosen however.

Also unlike most compressed inputs, uncompressed inputs can be as short as <script> and </script> so ideally it should be possible to give them directly from the command line.

Better parameter search algorithm

Roadroller currently uses simulated annealing for sparseSelectors (which have a large multidimensional space) and a modified binary search for all other parameters (linear). It works relatively well, but there might be a better way to optimize them altogether.

It was observed that sparseSelectors search results wildly vary across runs, so we want to reduce a possibility to run into "unhelpful" neighbors. I've already tested a form of tabu search in the past but it didn't work well---sparseSelectors have a large number of possible neighbors so we could only mark a small portion of them over the search.

A modified binary search is based on the assumption that the size function to be minimized is convex and it has a relatively large "bowl" across the entire range. Both are only approximately true and sometimes the search misses better parameters. The biggest reason is that we don't have a derivative of the size function available, so we can only guess it from a limited number of evaluations.

In any case, we want a metaheuristic algorithm that:

Can optimize multivariate functions.
Can operate without a direct derivative function.
Works especially well for roughly convex functions, which may have multiple local minima around the global minimum.

Search for optimal quote modelling

The current quote modelling is all or nothing, but we can model some quotes (like ') and ignore others (like `). Or we can give different context hashes for different quotes.

Configurable output format

The current output format is fixed to the six-bit code, which is optimized for js13kGames submissions. There are numerous other possibilities though and I'd like to reserve the -F|--output-format option in the CLI for this purpose. For the API it would be probably new method to be used instead of the current Packer.makeDecoder.

The default would remain as the six-bit coding, an explicit option being -F6. The first digit denotes outBits which is the number of bits per each coded unit. There may be following characters if there are multiple output formats with the same outBits or it requires a manual intervention (therefore we would probably never have -F8).

Each possible future format would go into there own issues. In this issue I'd document the current six-bit coding.

`-F6`: Six-bit coding

This is the default and only possibility as of 2.1.0. The compressed data is encoded into a template literal, where a code unit k is encoded as a code point k or k + 0x40 so that the escape sequence is not required. The latter is preferred since they coincide with alphabets, but the former is desirable or even required when k is one of the following:

0x1c (because 0x5c is \)
0x28 (which is (, the first line has to include this character when -D is not in use)
0x3d (which is =, the first line has to include this character when -D is in use)
0x3f (because 0x7f is not printable)

This format is designed to be "compressed" again with the Huffman tree. This is ensured by making every code unit corresponds to a (mostly consecutive) set of 64 characters and minimize the number of characters not in the set (we can't completely get rid of them because ` has to be somewhere in that line). The Huffman tree needs to encode characters not in the set so the actual coding rate is slightly below 8 bit/char even after DEFLATE, but the optimal tree should result in at least 7 * (384/385) = 7.979 bit/char.

lifthrasiir / roadroller Goto Github PK

roadroller's Issues

Workaround

-F6: Six-bit coding

Recommend Projects

Recommend Topics

Recommend Org

`-F6`: Six-bit coding