Giter Site home page Giter Site logo

opcodevm's Introduction

A vector oriented bytecode engine.

Can be applied to:

  • files
    • column orientated databases
  • network traffic
    • IDS
    • firewalling
  • HTTP handler
    • RTB

Issues

  • 'analysis' tool where all opcode implementations are tested and the fastest is picked for future runs
  • as well as the init() function in a plugin, need a cleanup() hook too (OpenCL leaves crap everywhere)
  • support more that the two deep ('accelerated' and 'regular') op chains, might want to cycle through them to deal with alignment bits (maybe better to just guarantee alignment though?)
  • need to check in each op for any alignment needs, as after an offset change things might mis-aligned
  • implement scatter gather vector support (needed for datagram payloads)
  • 10 SQL Tricks That You Didn’t Think Were Possible - operations that I need to be able to do
  • 32 bit support, though 4bn records would still be a limit
  • input sources
  • think about a slower low latency option suitable for real time streaming data (NAPI-esque)
  • actual client/server, rather than hard coded files and programs
  • add a PIPELINE environment variable to add instruction pipelining to be used where there is SMT support
    • as an instruction is working through the dataset, the next instruction is being simultaneously processed
    • I suspect trailing the leading instruction by a L1 cache line size will be needed, plus to keep locality between those threads
    • insert a leading instruction to the program that uses __builtin_prefetch()/Software Prefetching
    • for the non-SMT case, can we use -fprefetch-loop-arrays or __builtin_prefetch() trivially without complicating the code with a pile of conditionals?
  • need to add libhwloc
    • INSTANCES to have an affinity per core
    • PIPELINE to have an affinity where each shared CPU thread is pinned to the same core
  • more codes
    • need an internal data store to aggregate data into
    • to handle packet oriented data, maybe keep the thought of co-routine like behaviour resumption
  • figure out something better that -m{arch,tune}=native for CFLAGS
  • compile only the ops that will work for the target, for example do not cook x86_64 on ARM kit
  • fix variance
  • {Net,Open}BSD and Mac OS X support
    • remove GNU'isms

Preflight

Debian

apt-get install ocl-icd-opencl-dev opencl-headers libpcap-dev

Build

Simply type:

make

The following environment variables are available:

  • NDEBUG: optimised build
  • NPROT: disable address protection, helpful for ASM reading
  • NOSTRIP: do not strip the binary (default when not using NDEBUG)

Usage

time env NODISP=1 NOCL=1 ./opcodevm

The following environment variables are available:

  • NODISP: do not display the results
  • NOARCH: skip arch specific jets
  • NOCL: skip CL specific jets (recommended as this is slow!)
  • INSTANCES (default: 1): engine parallelism (0 sets to getconf _NPROCESSORS_ONLN)

Profiling Opcode Implementations

The following pins the task to the first CPU and prints out the three minimum CPU cycle runs (PERF_COUNT_HW_REF_CPU_CYCLES), followed by the average and its variance, and finally by the maximum cycle time:

taskset 1 ./utils/profile code/bswap.so code/bswap/c.so

N.B. the 'noop' result is to give an indication of the magnitude overhead of the profiling its-self

The following environment variables are available:

  • CYCLES (default: 1000): number of runs
  • BESTOF (default: 3): print best of X minimums
  • LENGTH (default: half of _SC_LEVEL2_CACHE_SIZE): workset size

Sample Data

HistData Example

mkdir -p store
cat DAT_ASCII_EURUSD_T_201603.csv | cut -d, -f2 | perl -ne 'print pack "f>", $_' > store/test
for I in $(seq 1 100); do cat store/test >> store/test2; done; mv store/test2 store/test

Engine

Notation

<a>         vector
[a]         array
 a          immediate

References:

I           immediate
C           column
M           memory (scratch)
S           store
G           global

Two dimension targets:

OC_Tab      (a)  <-  (b)

The dimension targets:

OC_Tabc     (a)  <-  (b) op  (c)

OC_TCMM     C<a> <- M[b] op M[c]
OC_TCMI     C<a> <- M[b] op c
OC_TMIC     M[a] <-   b  op C<c>
...

Notes:

  • a can be equal to b and/or c
  • OC_TCxx/OC_TCx, where destination is a column, makes the instruction suitable for pipelining, however at the cost of RAM (including L2 CPU cache!)

Registers

C<>         column, map to file/buffer
M[]         memory (scratch), zero'd per stride (window used for pipelining)
G[]         global, map to trie/bloom/sketch/...
S[]         store, pointers to C<> or M[]

Notes:

  • got to solve commutative as we process the columns in strides and roll up
  • C<>/G[] can be used read-only (MAP_PRIVATE) or read-write
  • C<>/G[] when backed by a file can be used as a cache

Operations

map         G[]  <- {file,zero'd trie,bloom,sketch,...}
map         C[]  <- {file,zero'd buffer}

alias       S[]  <- [CM]

fetch       S    <- G[]
store       G[]  <- S

load        [CM] <- [CMI]

operate     [CM] <- [CMI] op [CMI]

Opcodes

Map and Alias

Handled out-of-bound as part of engine initialisation.

Fetch

TODO

Store

TODO

Load

TODO

ALU

Operations:

OC_ALU+OC_ADD+OC_Tabc     (a) <- (b) +  (c)

OC_MUL                    (a) <- (b) *  (c)
OC_DIV                    (a) <- (b) /  (c)
OC_AND                    (a) <- (b) &  (c)
OC_OR                     (a) <- (b) |  (c)
OC_SHF                    (a) <- (b) >> (c)    # (c) when negative is left shift

Misc

Suitable for buffer C<> types where the payload can be a packet, so letting you extract words of length d:

OC_MISC+OC_BUF+OC_Tabc   {C<a>,M[a]} <- (b)[(c):d]

Not exposed (internally used when loading in data from C<>):

OC_MISC+OC_BSWP          C<a>        <- bswap(C<a>)

Reading Material

opcodevm's People

Contributors

jimdigriz avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.