Giter Site home page Giter Site logo

jsonance's Introduction

jsonance - WIP / library for analyzing JSON for metadata

jsonance, rhymes with "resonance", is a ... [todo]

j := new(options)
j := open(previousAnalysis, options)

r := j.reader()

r2 := r.prevReader() // Navigate to past

b := j.createBranch(branchName, options)

b2 := b.fork(options)

b.close()

b.summary()

batch.setOpaque(opaqueKey, opaqueVal)

a := j.analyze(doc)

batch.add(vbucketId, seq, key, a)
batch.delete(vbucketId, seq, key)
batch.commit()
batch.close()

j.analysis()

from cbdatasource (or any data source)...

get-opaque
  j.reader.getOpaque()

rollbackEx(vbucketID uint16, vbucketUUID uint64, rollbackSeq uint64) error

onSnapshotStart(vbucketID uint16, snapStartSeq, snapEndSeq uint64, snapType uint32)

set-opaque(vbucketID uint64, []byte)
  b.setOpaque

get-opaque(vbucketID uint64) ([]byte, lastSeq, err)
  b.getOpaque

DataUpdate(vbucketID uint16, key []byte, seq uint64, r *gomemcached.MCRequest)
  b.onMutation

DataDelete(vbucketID uint16, key []byte, seq uint64, r *gomemcached.MCRequest)

analysis thoughts

want the first time a field shows up
want the first time a "fingerprint" (multi-field schema) shows up
want the last time a field shows up
want the last time a fingerprint shows up

what does "first time" / "last time" mean?

  idea: treat the the vbId->seqNum pairs as a vector clock

do missing fields mean it's a different fingerprint?

are brand new additional field(s) associated like inheritance relationship?

  ABCD "contains-a" / has-a ABC?

  ABC --> ABCD  ---+--> ABCDE
      --> ABCE  --/

  assumption / heuristic: most fields are additive

  when ABC shows up...

    A ==> t1
    B ==> t1
    C ==> t1

    t1: [A,B,C], parents: nil

  when ABCD shows up...

    t2: [A,B,C,D], parents: t1

    A ==> t2, t1
    B ==> t2, t1
    C ==> t2, t1
    D ==> t2

  when ABCE shows up...

    t3: [A,B,C,E], parents: t1

    A ==> t3, t2, t1
    B ==> t3, t2, t1
    C ==> t3, t2, t1
    D ==>     t2
    E ==> t3

  when ABCDE shows up

    t4: [A,B,C,D,E], parents: t2, t3

    A ==> t4, t3, t2, t1
    B ==> t4, t3, t2, t1
    C ==> t4, t3, t2, t1
    D ==> t4,     t2
    E ==> t4, t3

  when ABX shows up

    t5: [A,B,X], parents: nil

    A ==> t5, t4, t3, t2, t1
    B ==> t5, t4, t3, t2, t1
    C ==>     t4, t3, t2, t1
    D ==>     t4,     t2
    E ==>     t4, t3
    X ==> t5

generate short fieldId's?

what about UUID's degenerate case of a nested map?
or data-time fields degenerate case?

histograms for array lengths?

what about type fields (type: beer, type: brewery)?

pseudocode ideas

inputs: data map[string]interface{} rev rev

kvs := processData(data, rev)

sigs := constructSigs(kvs, rev) // Short for signatures.

mergeSigs(sigsState, sigs) // Track aggregates and superset-of matches of sigs.

example: processData({ "title": "star wars", "genre": "sci-fi" }, "rev-123") => [ { "name": "title", "path": "", "type": "string", // "string", "number", "object", "array", "null", "boolean" "typeEx": null, // "datetime" (rfcXxxx?), "int", "float" "val": "star wars", ==> track aggregates of min, max, count, lenMin, lenMax, lenTot "rev": "rev-123", ==> latch on existence, first write wins, like a min }, { "name": "genre", "path": "", "type": "string", "typeEx": null, "val": "sci-fi", "rev": "rev-123", } ]

sigs is roughly... several kinds of sigs, each with a... unique hash after... group by path+name group by path+name+type group by path+name+type+typeEx

 what about null's?

example analysis

source: {
  sourceName: "..."
},
branches: {
  "": {
  },
  "20180829-234123": {
    parent: ""
    opaque: {
    }
  }
}

example PINDEX_META...

{
  "name": "bs0_5ea163404f446bb6_13aa53f3",
  "uuid": "ad2b4749569cafe4",
  "indexType": "fulltext-index",
  "indexName": "bs0",
  "indexUUID": "5ea163404f446bb6",
  "indexParams": "{\"doc_config\":{\"mode\":\"type_field\",\"type_field\":\"type\"},\"mapping\":{\"default_analyzer\":\"standard\",\"default_datetime_parser\":\"dateTimeOptional\",\"default_field\":\"_all\",\"default_mapping\":{\"dynamic\":false,\"enabled\":true,\"properties\":{\"description\":{\"dynamic\":false,\"enabled\":true,\"fields\":[{\"analyzer\":\"\",\"include_in_all\":false,\"include_term_vectors\":false,\"index\":true,\"name\":\"description\",\"store\":false,\"type\":\"text\"}]}}},\"default_type\":\"_default\",\"index_dynamic\":false,\"store_dynamic\":false},\"store\":{\"kvStoreName\":\"mossStore\"}}",
  "sourceType": "couchbase",
  "sourceName": "beer-sample",
  "sourceUUID": "8f6e4f2e74d953213609fdd59396f6a9",
  "sourceParams": "{}",
  "sourcePartitions": "0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170"
}

sourcePoint: "2934" sourcePoints: { "2934": { parent: "2933" } }

verbiage / trying to name the historic data points... srcRev snapshots points versions sha (a'la git sha) token savepoints rollback ref id tag generation / genTag ancestry / ancestor tag birth record fingerprint lineage point pedigree descent source context

population populace colony settlers

// ParseFailOverLog parses a byte array to an array of [vbucketUUID, // seqNum] pairs. func ParseFailOverLog(body []byte) ([][]uint64, error) { flog := make([][]uint64, len(body)/16) for i, j := 0, 0; i < len(body); i += 16 { uuid := binary.BigEndian.Uint64(body[i : i+8]) seqn := binary.BigEndian.Uint64(body[i+8 : i+16]) flog[j] = []uint64{uuid, seqn} j++ } return flog, nil }

failOverLog... vbID => vbUUID => seqNum

MISON parser http://www.vldb.org/pvldb/vol10/p1118-li.pdf

  • fast json parser
  • speculative locations of fields, both logical vs physical locations
  • SIMD popcnt
  • projections pushed down to json parser

jsonance's People

Contributors

steveyen avatar

Watchers

Hideki Itakura avatar  avatar Thuan Nguyen avatar Sarath Lakshman avatar  avatar  avatar Mike Wiederhold avatar Jeffry Morris avatar Pasin Suriyentrakorn avatar Abhishek Singh avatar James Cloos avatar Dean Proctor avatar Dipti Borkar avatar Alex Ma avatar Brett Lawson avatar  avatar Wayne Siu avatar Sundar Sridharan avatar  avatar  avatar Sriram Ganesan avatar Dan Owen avatar Manu Dhundi avatar Andrew Reslan avatar Dave Finlay avatar  avatar  avatar vickiezeng avatar Couchbase Robot avatar  avatar Adam Fraser avatar  avatar Keshav Murthy avatar Laura Czajkowski avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.