Giter Site home page Giter Site logo

subjs's Introduction

SubJS interpreter

Its general functionality is to support analysis of obfuscated (probably malware) JavaScript code, by:

  • parsing and interpreting a language that is subset of JavaScript
  • providing mechanism for execution interrupting when there occur next level of obfuscated code execution

Program can detect execution of a code from string object. When this situation occur, code (argument of this evaluation) will be saved, and interpretation stopped. User can resume interpretation.

Table of contents

Interpretable language specification

Lexical units

No keyword atoms:

Symbol Description
ident variable, class of function name
constant constants like "string" or 100 or 123.0021

Keyword atoms:

no Symbol
0 '++'
1 '--'
2 '.'
3 ','
4 ';'
5 '__unused_symbol'
6 '$'
7 function
8 class
9 '('
10 ')'
11 '['
12 ']'
13 '}'
14 '{'
15 var
16 let
17 const
18 new
19 '!'
20 '^'
21 '&'
22 '*'
23 '-'
24 '+'
25 '/'
26 '|'
27 return
28 if
29 else
30 continue
31 break
32 do
33 while
34 for
35 '='
36 '+='
37 '-='
38 '*='
39 '/='
40 '^='
41 '&='
42 '|='
43 try
44 catch
45 finally
46 '=='
47 '!='
48 '&&'
49 '
50 '<'
51 '<='
52 '>'
53 '>='
54 '%'
55 '%='
56 '?'
57 ':'
58 '<<'
59 '>>'
60 '>>>'

Syntax

Starting symbol

Starting symbol is Program.

Program
    = { Element }
    | epsilon

Element
    = function Identifier '(' ParameterList ')' CompoundStatement
    | Statement

ParameterList
    = ident {',' ident}
    | epsilon

CompoundStatement
    = '{' Statements '}'

Statements
    = empty
    | Statement Statements

Statement
    ;
    if Condition Statement
    if Condition Statement else Statement
    while Condition Statement
    ForParen ';' ExpressionOpt ';' ExpressionOpt ')' Statement
    ForBegin ';' ExpressionOpt ';' ExpressionOpt ')' Statement
    ForBegin in Expression ')' Statement
    break ';'
    continue ';'
    with '(' Expression ')' Statement
    return ExpressionOpt ';'
    CompoundStatement
    VariablesOrExpression ';'

Condition
    = '(' Expression ')'

ForParen
    = for '('

ForBegin
    = ForParen VariablesOrExpression

VariableType
    = var
    | let
    | const

VariablesOrExpression
    = VariableType Variables
    | Expression

Variables
    = Variable
    | Variable , Variables

Variable
    = ident
    | ident = AssignmentExpression

ExpressionOpt
    = empty
    | Expression

OneLineCommaOperator

TODO: think if it is needed Should contain at least one ','

OneLineCommaOperatorSeparator
    = epsilon
    | ';'
    | '$'
OneLineCommaOperator
    = Expression ',' OneLineCommaOperatorNext
OneLineCommaOperatorNext
    = Expression ',' OneLineCommaOperatorNext
    | Expression

Expression

Expression
    = AssignmentExpression
    | AssignmentExpression "," Expression

AssignmentExpression
    = ConditionalExpression
    | ConditionalExpression AssignmentOperator AssignmentExpression

ConditionalExpression
    = OrExpression
    | OrExpression "?" AssignmentExpression ":" AssignmentExpression

OrExpression
    = AndExpression
    | AndExpression "||" OrExpression

AndExpression
    = BitwiseOrExpression
    | BitwiseOrExpression "&&" AndExpression

BitwiseOrExpression
    = BitwiseXorExpression
    | BitwiseXorExpression "|" BitwiseOrExpression

BitwiseXorExpression
    = BitwiseAndExpression
    | BitwiseAndExpression "^" BitwiseXorExpression

BitwiseAndExpression
    = EqualityExpression
    | EqualityExpression "&" BitwiseAndExpression

EqualityExpression
    = RelationalExpression
    | RelationalExpression EqualityOperator EqualityExpression

RelationalExpression
    = ShiftExpression
    | RelationalExpression RelationalOperator ShiftExpression

ShiftExpression
    = AdditiveExpression
    | AdditiveExpression ShiftOperator ShiftExpression

AdditiveExpression
    = MultiplicativeExpression
    | MultiplicativeExpression AdditiveOperator AdditiveExpression

MultiplicativeExpression
    = UnaryExpression
    | UnaryExpression MultiplicativeOperator MultiplicativeExpression

UnaryExpression
    = MemberExpression
    | UnaryOperator UnaryExpression
    | '-' UnaryExpression
    | IncrementOperator MemberExpression // TODO
    | MemberExpression IncrementOperator
    | new Constructor // TODO
    | delete MemberExpression

// TODO
Constructor
    = this . ConstructorCall
    | ConstructorCall

ConstructorCall
    = ident
    | ident ( ArgumentListOpt )
    | ident . ConstructorCall

MemberExpression
    = PrimaryExpression {
        '.' PrimaryExpression
        | '[' Expression ']'
        | '(' ArgumentListOpt ')'
    }

ArgumentListOpt
    = epsilon
    | ArgumentList

ArgumentList
    = AssignmentExpression
    | AssignmentExpression ',' ArgumentList

PrimaryExpression
    = '(' Expression ')'
    | ident
    | constant
    | '[' ArrayExpression ']'
    | false
    | true
    | null
    | this

ArrayExpression
    = { AssignmentExpression ',' | ',' | epsilon}

AssignmentOperator
    = '='
    | '+='
    | '-='
    | '*='
    | '/='
    | '^='
    | '&='
    | '|='
    | '%='

EqualityOperator
    = '==' Expression
    | '!=' Expression

RelationalOperator
    = '<' Expression
    | '<=' Expression
    | '>' Expression
    | '>=' Expression

ShiftOperator
    = '<<'
    | '>>'
    | '>>>'

AdditiveOperator
    = '+'
    | '-'

MultiplicativeOperator
    = '*'
    | '/'

UnaryOperator
    = '--'
    = '++'

Try, Catch, Finally

TODO

Try
    = try '{' Program '}' Catch
Catch
    = catch '(' ident ')' '{' Program '}' Catch
    | Finally
    | epsilon
Finally
    = finally '{' Program '}'

Building

Compilation:

scons

Clean:

scons -c

Running

See:

./SubJs -h

Architecture

There are 3 most important classes:

  1. Lexer
  2. Parser
  3. Interpreter

Lexer

Lexer is constructed using string object (code).

Lexer generates 3 types of tokens:

  • symbol (like variable name)
  • constant (like 123 or "text")
  • keyword (like "for", "if", "+=" ...)

Parser

Parser is constructed using Lexer object. Parser provides methods to access generated tree. Parser is LL (left-left) type.

Interpreter

Interpreter is constructed using Parser object. Range of possibilities will be defined in later state of the project.

Examples

Interpretable code examples

Examples - Declaration:

var a
var a,b,c,d
var a=1,b,c='test',d

Examples - OneLineCommaOperator:

Prints 3

abc=2, abc++, print(abc)

Examples - Get:

Prints 22

print([1,22,3,4][0,2,'test',1])

Examples - Get2:

Prints "test-join"

a = ["test", "join"]
a.join("-")
print(a)

Examples - FunctionExpression:

Prints "abc"

function fun(){
    return 'abc'
}
fun['call']()

Prints "abc"

(1,2,3, (function fun(){print('abc')}))()

Syntax Error

function a(){print("abc")} .abc

Syntax Ok

1, function a(){print("abc")} .abc

Examples - IfStatement

Prints 2

if (1,2,3,0) {print(1)} else {print(2)}

Prints 1

if (1,2,3,1) {print(1)} else {print(2)}

Complex examples

Prints "abc"

aaa = "aaa"
ccc = "ccc"
function abc(param){
    print("abc")
}
abc["aaaccc"] = abc
abc[aaa + (function fff(){return "ccc";}())](123);

Differences from JavaScript

continue, break and return statements not inside loop/function are not an syntax error, but will cause error during interpretation.

Unary operators '--' and '++' works only postfix.

Functions can be used in OneLineCommaOperator (see syntax).

There is no endline token. For example this code:

a = 1 b = 1

Will work in this interpreter, but it doesn't run in JavaScript. This difference simplifies the syntax. Let's look at these examples:

a
=
1
a
[0]

These are valid JavaScript examples. In the second example [0] there is actually get(0) operator, not an list.

Not supported JS syntax examples

Defining maps like:

a = {'abc':1, 'def':2}

Function expressions without '{' '}'

function fun()print('abc');

If statements without '{' '}'

if(1)print(1);

Not supported keywords list:

  • case
  • class
  • debugger
  • default
  • delete
  • export
  • extends
  • import
  • in
  • instanceof
  • super
  • switch
  • this
  • throw
  • typeof
  • void
  • with
  • yield

subjs's People

Contributors

msypetkowski avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.