When I click "Export" and then "Download to file" I get a binary file with the .kana e

Here is R code for reading the .kana format: <div class="highlight highlight-sourc

Here is Python code for reading the .kana file format: <div class="highlight highl

If you wanted to use hdf5 as an output format as well, <a href="https://github.com/usn

What is the format of the .kana file? about kana HOT 5 CLOSED

slowkow commented on May 22, 2024

What is the format of the .kana file?

from kana.

Comments (5)

slowkow commented on May 22, 2024 1

Here is R code for reading the .kana format:

read_kana <- function(filename) {
  f <- file(filename, "rb")
  head <- readBin(f, "raw", n = 16, endian = "little")
  n_bytes <- readBin(f, "integer", n = 1, size = 8, endian = "little")
  body <- readBin(f, "raw", n = n_bytes, endian = "little")
  json_string <- memDecompress(body, asChar = TRUE)
  data <- jsonlite::fromJSON(json_string)
}

data <- read_kana("My_Analysis_Title.kana")
names(data)
#  [1] "inputs"                     "quality_control_metrics"
#  [3] "quality_control_thresholds" "quality_control_filtered"
#  [5] "normalization"              "feature_selection"
#  [7] "pca"                        "neighbor_index"
#  [9] "snn_find_neighbors"         "snn_build_graph"
# [11] "snn_cluster_graph"          "choose_clustering"
# [13] "marker_detection"           "custom_marker_management"
# [15] "tsne"                       "umap"

x <- data$umap$contents$y$`_TypedArray_values`
y <- data$umap$contents$y$`_TypedArray_values`
length(x)
# [1] 5050
length(y)
# [1] 5050

from kana.

LTLA commented on May 22, 2024

This is yet to be documented and is subject to change, but I'll give you the rundown.

First 8 bytes specify a 64-bit uinteger in little-endian, specifying the format. Currently this is just used to denote whether the data files are embedded (0) or linked (1).

Next 8 bytes are another 64-bit uinteger specifying the format version. You can ignore this for now.

Next 8 bytes are another 64-bit uinteger specifying the size of the blob containing a gzipped JSON with the analysis parameters and results. Let's call this value n.

Next n bytes contain a Gzipped JSON. If you unzip this, you'll have one property per step in the analysis, where each value is itself a dictionary with parameters (a dictionary of parameters) and contents (the contents, usually the results).

Remaining bytes contain the embedded input files. If you already have the files somewhere, you can just ignore this section, but if you don't, you can use the offsets and sizes in the inputs of the JSON to cut out the files.

tl;dr Ignore the first 16 bytes, convert the next 8 bytes to an integer, and then use that to cut out the Gzipped JSON.

At some point we may provide R/Python utilities to interpret these files and populate the corresponding data structures, e.g., SingleCellExperiment objects. Right now, these files are just intended for saving/transfer of analyses within kana.

from kana.

slowkow commented on May 22, 2024

Here is Python code for reading the .kana file format:

import struct
import zlib
import json

def read_kana(filename):
    file = open(filename, "rb")
    # Skip the first 16 bytes
    head = file.read(16)
    # Here < indicates little-Endian, and Q means we want to pack an
    # unsigned long long (8 bytes).
    n_bytes, = struct.unpack('<Q', file.read(8))
    gzipped_json = bytes(bytearray(file.read(n_bytes)))
    file.close()
    # 15 + 32 should autodetect gzip data or zlib data.
    data = json.loads(zlib.decompress(gzipped_json, 15+32))
    return data

data = read_kana("My_Analysis_Title.kana")

data['umap']['parameters']
# {'num_epochs': 500, 'num_neighbors': 15, 'min_dist': 0.01, 'animate': False}

for key in data.keys():
    print(key)
# inputs
# quality_control_metrics
# quality_control_thresholds
# quality_control_filtered
# normalization
# feature_selection
# pca
# neighbor_index
# snn_find_neighbors
# snn_build_graph
# snn_cluster_graph
# choose_clustering
# marker_detection
# custom_marker_management
# tsne
# umap

x = data['umap']['contents']['x']['_TypedArray_values']
y = data['umap']['contents']['y']['_TypedArray_values']
len(x)
# 5050
len(y)
# 5050

Please take it and use it as you wish!

from kana.

bmaranville commented on May 22, 2024

If you wanted to use hdf5 as an output format as well, h5wasm supports writing hdf5 files in the browser. It looks like it would be pretty straightforward to expose the H5Ocopy function from the hdf5 C API, which would allow one to pack source hdf5 files into the output hdf5 file alongside the analysis.

from kana.

jkanche commented on May 22, 2024

we now have a separate repo to track changes in the versioning of the .kana file. Its HDF5 based compared to our first version and its documented here - https://github.com/LTLA/kanaval

from kana.

What is the format of the .kana file? about kana HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent