Comments (5)
Here is R code for reading the .kana format:
read_kana <- function(filename) {
f <- file(filename, "rb")
head <- readBin(f, "raw", n = 16, endian = "little")
n_bytes <- readBin(f, "integer", n = 1, size = 8, endian = "little")
body <- readBin(f, "raw", n = n_bytes, endian = "little")
json_string <- memDecompress(body, asChar = TRUE)
data <- jsonlite::fromJSON(json_string)
}
data <- read_kana("My_Analysis_Title.kana")
names(data)
# [1] "inputs" "quality_control_metrics"
# [3] "quality_control_thresholds" "quality_control_filtered"
# [5] "normalization" "feature_selection"
# [7] "pca" "neighbor_index"
# [9] "snn_find_neighbors" "snn_build_graph"
# [11] "snn_cluster_graph" "choose_clustering"
# [13] "marker_detection" "custom_marker_management"
# [15] "tsne" "umap"
x <- data$umap$contents$y$`_TypedArray_values`
y <- data$umap$contents$y$`_TypedArray_values`
length(x)
# [1] 5050
length(y)
# [1] 5050
from kana.
This is yet to be documented and is subject to change, but I'll give you the rundown.
First 8 bytes specify a 64-bit uinteger in little-endian, specifying the format. Currently this is just used to denote whether the data files are embedded (0) or linked (1).
Next 8 bytes are another 64-bit uinteger specifying the format version. You can ignore this for now.
Next 8 bytes are another 64-bit uinteger specifying the size of the blob containing a gzipped JSON with the analysis parameters and results. Let's call this value n
.
Next n
bytes contain a Gzipped JSON. If you unzip this, you'll have one property per step in the analysis, where each value is itself a dictionary with parameters
(a dictionary of parameters) and contents
(the contents, usually the results).
Remaining bytes contain the embedded input files. If you already have the files somewhere, you can just ignore this section, but if you don't, you can use the offsets and sizes in the inputs
of the JSON to cut out the files.
tl;dr Ignore the first 16 bytes, convert the next 8 bytes to an integer, and then use that to cut out the Gzipped JSON.
At some point we may provide R/Python utilities to interpret these files and populate the corresponding data structures, e.g., SingleCellExperiment
objects. Right now, these files are just intended for saving/transfer of analyses within kana.
from kana.
Here is Python code for reading the .kana file format:
import struct
import zlib
import json
def read_kana(filename):
file = open(filename, "rb")
# Skip the first 16 bytes
head = file.read(16)
# Here < indicates little-Endian, and Q means we want to pack an
# unsigned long long (8 bytes).
n_bytes, = struct.unpack('<Q', file.read(8))
gzipped_json = bytes(bytearray(file.read(n_bytes)))
file.close()
# 15 + 32 should autodetect gzip data or zlib data.
data = json.loads(zlib.decompress(gzipped_json, 15+32))
return data
data = read_kana("My_Analysis_Title.kana")
data['umap']['parameters']
# {'num_epochs': 500, 'num_neighbors': 15, 'min_dist': 0.01, 'animate': False}
for key in data.keys():
print(key)
# inputs
# quality_control_metrics
# quality_control_thresholds
# quality_control_filtered
# normalization
# feature_selection
# pca
# neighbor_index
# snn_find_neighbors
# snn_build_graph
# snn_cluster_graph
# choose_clustering
# marker_detection
# custom_marker_management
# tsne
# umap
x = data['umap']['contents']['x']['_TypedArray_values']
y = data['umap']['contents']['y']['_TypedArray_values']
len(x)
# 5050
len(y)
# 5050
Please take it and use it as you wish!
from kana.
If you wanted to use hdf5 as an output format as well, h5wasm supports writing hdf5 files in the browser. It looks like it would be pretty straightforward to expose the H5Ocopy function from the hdf5 C API, which would allow one to pack source hdf5 files into the output hdf5 file alongside the analysis.
from kana.
we now have a separate repo to track changes in the versioning of the .kana file. Its HDF5 based compared to our first version and its documented here - https://github.com/LTLA/kanaval
from kana.
Related Issues (20)
- Some proposals from users HOT 11
- TypeError: e is not an Object. (evaluating '"_bioconductor_SLICE"in e') HOT 3
- [JOSS] Caption error HOT 2
- [JOSS] Expand on performance claims HOT 7
- [JOSS] Add description for non-specialist audience HOT 3
- [JOSS] Comparison of web app and CLI from user perspective HOT 5
- [JOSS] Comparisons with other apps HOT 2
- [JOSS] Community guidelines HOT 2
- [JOSS] Instructions for deployment HOT 2
- [JOSS] Web app usage guidance HOT 4
- Subsetting by annotation is broken HOT 2
- move `summarizearray` from bakana to kana
- show # of cells in each group
- Fix issue with artifactdb zip file reader in explore mode
- Error that occurs when the number of cells is large HOT 7
- markers fails in explore mode if we don't have any row names HOT 1
- explore mode fails when switching between modalities
- Possible to visualize pre-done clustering? HOT 10
- missing annotations HOT 4
- Identifying expression of a gene across annotations
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kana.