Giter Site home page Giter Site logo

fedemengo / d2bist Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 2.81 MB

A CLI utility to handle data as binary string

Home Page: https://pkg.go.dev/github.com/fedemengo/d2bist

License: The Unlicense

Go 99.75% Makefile 0.25%
binary-strings kolmogorov-complexity entropy

d2bist's Introduction

d2bist

CLI utility to handle data as binary strings

The need behind it is to have a quick way to visualize programs as binary string while playing around with AIT

  • Convert data to a binary string of 0 and 1
  • Statistical analysis of 0 and 1 distributions
    • Number of bit string of variable length (0, 00, 000, 0000, 1, 11, 111, 1111 and so on)
  • Visualize binary string as image
  • Support online compression and decompression

Examples

Analyze the first 1M bits of PI and generating an image

curl https://gist.githubusercontent.com/fedemengo/bb99f9cb5a8491092e3d749a7b5910fa/raw/5b0fd1d3ba5f4f4cda41bfad02d598a4ca276ae6/pi_b2_1M_mathematica 2>/dev/null | tail -c1000000 | d2bist encode -s -png pi >/dev/null

bits: 1000000

0: 500279 - 0.50028 %
1: 499721 - 0.49972 %

00: 250481 - 0.25048 %
01: 249798 - 0.24980 %
10: 249797 - 0.24980 %
11: 249923 - 0.24992 %

000: 125313 - 0.12531 %
001: 125168 - 0.12517 %
010: 124661 - 0.12466 %
011: 125136 - 0.12514 %
100: 125167 - 0.12517 %
101: 124630 - 0.12463 %
110: 125136 - 0.12514 %
111: 124787 - 0.12479 %

text

Plot the entropy of PI calculated on chunks of 65536 bits with symbols of 64 bits

text

Plot the entropy of d2bist calculated on chunks of 40960 bits with symbols of 4096 bits

text

Checkout other usage examples

d2bist's People

Contributors

dependabot[bot] avatar fedemengo avatar

Watchers

 avatar

d2bist's Issues

Compare command

Compare two pieces of data to check

  • similarities (how's defined)
  • mutual entropy

Truncation due to non-integer byte size

The problem is the following

dd if=/dev/urandom bs=1 count=2 2>/dev/null | d2bist -rcap 14 decode | d2bist decode -s
10010000

bits: 8

0: 6 - 0.75000 %
1: 2 - 0.25000 %
  • Reading 2 blocks of 1 byte each (2 bytes, 16 bits)
  • Capping the input data to 14 bits
  • Only $8 \cdot \bigg\lfloor \dfrac{|data|}{8}\bigg\rfloor$ bits are written to output, in this case (8 bits)

This correctly works when outputting a string of bits

> dd if=/dev/urandom bs=1 count=2 2>/dev/null | d2bist -rcap 14 decode -str | d2bist encode -s
11100100001000

bits: 14

0: 9 - 0.64286 %
1: 5 - 0.35714 %

So it's a matter of fixing the output of byte data. A quick workaround is to output $8 \cdot \bigg\lceil \dfrac{|data|}{8}\bigg\rceil$ bits of data but ideally I want to be able to tell if I'm receiving

  • $N$ bytes of data containing $\text{bitsCount} = 8N$ bits of data or
  • $N$ bytes of data containing $7N < \text{bitsCount} \leq 8N$ bits of data

Calculate arbitrary k-mer

Right now $k$-mer calculation is limited to an hardcoded $k$, add CLI flags instead to

  • Set k= (maybe also k<=?)
  • Output should be ordered on count or kmer lexicographic order
  • Cap the number of kmer to top $N$

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.