Giter Site home page Giter Site logo

Comments (5)

slowkow avatar slowkow commented on May 22, 2024 1

Here is R code for reading the .kana format:

read_kana <- function(filename) {
  f <- file(filename, "rb")
  head <- readBin(f, "raw", n = 16, endian = "little")
  n_bytes <- readBin(f, "integer", n = 1, size = 8, endian = "little")
  body <- readBin(f, "raw", n = n_bytes, endian = "little")
  json_string <- memDecompress(body, asChar = TRUE)
  data <- jsonlite::fromJSON(json_string)
}

data <- read_kana("My_Analysis_Title.kana")
names(data)
#  [1] "inputs"                     "quality_control_metrics"
#  [3] "quality_control_thresholds" "quality_control_filtered"
#  [5] "normalization"              "feature_selection"
#  [7] "pca"                        "neighbor_index"
#  [9] "snn_find_neighbors"         "snn_build_graph"
# [11] "snn_cluster_graph"          "choose_clustering"
# [13] "marker_detection"           "custom_marker_management"
# [15] "tsne"                       "umap"
x <- data$umap$contents$y$`_TypedArray_values`
y <- data$umap$contents$y$`_TypedArray_values`
length(x)
# [1] 5050
length(y)
# [1] 5050

from kana.

LTLA avatar LTLA commented on May 22, 2024

This is yet to be documented and is subject to change, but I'll give you the rundown.

First 8 bytes specify a 64-bit uinteger in little-endian, specifying the format. Currently this is just used to denote whether the data files are embedded (0) or linked (1).

Next 8 bytes are another 64-bit uinteger specifying the format version. You can ignore this for now.

Next 8 bytes are another 64-bit uinteger specifying the size of the blob containing a gzipped JSON with the analysis parameters and results. Let's call this value n.

Next n bytes contain a Gzipped JSON. If you unzip this, you'll have one property per step in the analysis, where each value is itself a dictionary with parameters (a dictionary of parameters) and contents (the contents, usually the results).

Remaining bytes contain the embedded input files. If you already have the files somewhere, you can just ignore this section, but if you don't, you can use the offsets and sizes in the inputs of the JSON to cut out the files.

tl;dr Ignore the first 16 bytes, convert the next 8 bytes to an integer, and then use that to cut out the Gzipped JSON.

At some point we may provide R/Python utilities to interpret these files and populate the corresponding data structures, e.g., SingleCellExperiment objects. Right now, these files are just intended for saving/transfer of analyses within kana.

from kana.

slowkow avatar slowkow commented on May 22, 2024

Here is Python code for reading the .kana file format:

import struct
import zlib
import json

def read_kana(filename):
    file = open(filename, "rb")
    # Skip the first 16 bytes
    head = file.read(16)
    # Here < indicates little-Endian, and Q means we want to pack an
    # unsigned long long (8 bytes).
    n_bytes, = struct.unpack('<Q', file.read(8))
    gzipped_json = bytes(bytearray(file.read(n_bytes)))
    file.close()
    # 15 + 32 should autodetect gzip data or zlib data.
    data = json.loads(zlib.decompress(gzipped_json, 15+32))
    return data

data = read_kana("My_Analysis_Title.kana")
data['umap']['parameters']
# {'num_epochs': 500, 'num_neighbors': 15, 'min_dist': 0.01, 'animate': False}
for key in data.keys():
    print(key)
# inputs
# quality_control_metrics
# quality_control_thresholds
# quality_control_filtered
# normalization
# feature_selection
# pca
# neighbor_index
# snn_find_neighbors
# snn_build_graph
# snn_cluster_graph
# choose_clustering
# marker_detection
# custom_marker_management
# tsne
# umap
x = data['umap']['contents']['x']['_TypedArray_values']
y = data['umap']['contents']['y']['_TypedArray_values']
len(x)
# 5050
len(y)
# 5050

Please take it and use it as you wish!

from kana.

bmaranville avatar bmaranville commented on May 22, 2024

If you wanted to use hdf5 as an output format as well, h5wasm supports writing hdf5 files in the browser. It looks like it would be pretty straightforward to expose the H5Ocopy function from the hdf5 C API, which would allow one to pack source hdf5 files into the output hdf5 file alongside the analysis.

from kana.

jkanche avatar jkanche commented on May 22, 2024

we now have a separate repo to track changes in the versioning of the .kana file. Its HDF5 based compared to our first version and its documented here - https://github.com/LTLA/kanaval

from kana.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.