kamil-kielczewski / small-jsfuck Goto Github PK

View Code? Open in Web Editor NEW

7.0 7.0 0.0 100 KB

Generate small jsf code

HTML 100.00%

small-jsfuck's People

Contributors

Stargazers

Watchers

small-jsfuck's Issues

Checek and change digits jsf representations

After small statistical experimets it looks like for average code after conversion to base4 the most common character is 1,0,2 and less common is 3.

Check this on popular codefiles (like jquery etc.) - if true, then switch numbers representation to

0 --> 0       jsf:    +!![]
1 --> false   jsf:     +![]
2 --> true    jsf:   +(+[])
3 --> 1       jsf: +(+!![])

This will cause that in bootstrap we need to add .replace(/1/g,3) (because before 0 and 1 have same representation as in jsf) as follows

.replace(/true/g,2).replace(/false/g, 1).replace(/1/g,3)

check how typical 1kB of code is smaller after this change
check how much bootstrap code grows (if only few kB and point 1 is nice, then it is worth to make that change )

Alternative coding for base16

Similar to #6 check altenative coding for base16

Procedure for generate text statistics

Create reusable function which allow to prepare encoded text statistic :

text will be encoded to base4, base8 or base9 (maybe base16)
procedure count popularity of baseX each character and return output as array of baseX characters sortet from most to lest popular

This is already done for base8 in this fiddle

Research and modify jsf to resuable library

FInd some library convention to write js lib

Alternative coding base8

Instead of #4 approach we can try to use following modification - we assume that each character have 4-digit representation (we use padding 0 - details here)

So we can change this '\141\154\145\162\164\50\61\51' to this '\141\154\145\162\164\050\061\051' -and now we don't need backslashes at all (in base8 code represenation now we use 8 digit - not 9 like before)

eval(eval("'\\"+ "141154145162164050061051".match(/.../g).join("\\")+"'"))

We need to perform similar investigation like for #4.

Bootstrap: we can use above approach using current small-jsfuck algorithm but whitch toString(8) and parseInt(x,8

add Tool to fast rebuild deconverter

write some tool which allow build devonverterer in faster way. This tool can be written in jsfiddle.net (but final code should be copied here)

eval testing

I was testing the decode on https://enkhee-osiris.github.io/Decoder-JSFuck/ and it worked fine when eval was unchecked. Does small-jsfuck do eval?

Thanks for doing this fork. I would really like to have jsfuck do a large js file, but without compression, it's over 70mb.

Alternative coding base9

Again we go back do idea from #4 but now the goal is not best compression ratio but shorter decoder - which will be used for small codes - 3 versions to check (last one have best compression ratio)

eval(eval("'91419154914591629164950961951'".replace(/9/g, "\\")))

eval(eval("'false141false154false145false162false164false50false61false51'".split("false").join("\\")))

eval(eval("'\\"+"141154145162164050061051".match(/.../g).join("\\")+"'"))

So we actually don't want to use shortest available jsf representation for each number - each number will be directly represented by jsf number code. This allow to short decoder (with cost of lower "compression" ratio)

Add option to deconvert with non-deprecated functions

(un)escape - gate for all

Having following characters: 123456789 aceflnmoprstu (which can be acheved one by one) we are able to get other lower/upper case letters and some characters without using deprecated methods like italics. Deprecated methods was used in jsfuck for size-optimizations. To avoid them we can use escape and unescape methods - technique base on this question and answer. We can do it by e.g. for letter C (which has hexadecimal escape code 43) as follows (we show 5 steps of formula evolution towards jsf)

step1:  unescape("%43")
step2:  unescape(escape(" ")[0]+43)
step3:  unescape(escape((NaN+[]["flat"])[11])[0]+43)
step4:  Function("return unescape")()(Function("return escape")()(" ")[0]+43)
step5:  []["flat"]["constructor"]("return unescape")()([]["flat"]["constructor"]("return escape")()((NaN+[]["flat"])[11])[0]+43)

Using this approach we have access to: !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_ abcdefghijklmnopqrstuvwxyz{|}~ + tilde character and more. Using this technique we don't need to use String.fromCharCode (which in old approach forced to use deprecated methods like 'italics' or 'fontcolor' etc.)

For letter "C" we can also use below shortcut based on escape only discovered by Siguza

step1: Function("return escape")()(",")[2]
step2: []["flat"]["constructor"]("return escape")()([[]]["concat"]([[]])+[])[2]

Give "checbox" which will alow to compile using deconverter based on above non-deprecated functions

List of all characters:

[...Array(256)].map((x,i)=> unescape(`%${i.toString(16)}`))

Alternative coding

Check octal coding idea proposed by jsfuck author aemkei here:

I like the idea of encoding the characters into numbers in a bootstrap to save space.

Have you thought about using octal sequences? This would save some some space per characters:

EG:

eval(eval("'91419154914591629164950961951'".replace(/9/g, "\")))
The bootstrap code is ~25k but maybe we can save some bytes by replacing the quotes or backspace.

'\141\154\145\162\164\50\61\51' in chrome console gives "alert(1)"

Check if this works with emoji/Chinese letters

Emoji and Chinese letters

Currently emoji are encoded in wrong way - check why and find solution (but if it increase bootstrap code, then add proper "checkbox" to use it). If this will casue "big problems" then set as "wontfix" (because emoji escape characters are supported in JS and works e.g. alert("\u{1f601}") ).

TMP

this issue is only for tempoprary work...

Design archtecture of future relase of compiler

In futre release user will:

Put his code
select to compile code using deprecated (small), partial-deprecated (medium), non-deprecated (bigest) version of decoder.
small-jsf, depending on code size will choose method (base4, base8 and base9 - and maybe base 16)
small-jsf wil use text statistics #10 to use shortest jsf representation for most popular baseX characters
show output cod to user and allow to download as .js file

Compiler shoud be written in such way that replacing jsf-representation for decoder schould be automatic and easy (this is done for base8 here

Optimisation - statistic analise of input code

Optimisation - lets analyse which characters are most used in input code - and base on that prepare proper map base4 to jsf (look to #1) - in this case decompresor shoud be constructed in dynamic way (or we can have hardcoded 4! permutations) - but the main decision is: which num for shortes representations is better 1 or 0, and sequenc for two larger representation 2,3 - so we need 4 decompresor variants (this also can play important rule if base8 will be shorter #6)