kanaverse / kana Goto Github PK

Single cell analysis in the browser

License: MIT License

JavaScript 96.90% HTML 0.24% CSS 2.82% Dockerfile 0.04%

bioinformatics cite-seq exploratory-data-analysis interactive-analysis interactive-visualizations rna-seq single-cell webassembly

kana's People

Contributors

Stargazers

Watchers

Forkers

zhihua-chen michaelschulzgsh ets-reactnative5 jpcartailler genostack ygao61 jkanche manzhaohui catgirl69 petehaitch llewelld zhangguangxin1234

kana's Issues

aesthetics

Animations for t-SNE/UMAP (#39)
sliders histogram changes to gradient (#50)
save tsne to gallery (#49)
change qc plot ticks to normal text (#50)
send dims in the data payloads (#50)
filter markers set default to 0 for lfc and delta-d (#50)
Fancier would be to save and restore the state from gallery
change the pong game to conways game of life

Fix the t-SNE/UMAP message passing

Consolidate code inside each worker file to keep things more understandable.
Bump out the delay for non-animated runs.
Add a dedicated response for animation restart.

gene panel without marker genes

@hcorrada

What is the format of the .kana file?

When I click "Export" and then "Download to file" I get a binary file with the .kana extension.

Suppose a user wants to export the results and then import them into Python or R.

How can we do that?

redo embedding visualizations using epiviz.gl

PR: #97

Feature request: Re-analysing a user-specified subset of data

It would be great if it were possible to select a subset of cells and re-analyse that subset.
E.g., In a dataset of PBMCs, select all 'B cells' (based on the cluster annotations) and re-analyse to look for subclusters within the B cell population.
Is this something planned for future release or that would be feasible to implement?

serialization format for storing analysis

Converter from TypedArray to JSON arrays (handled by #34).
Figure out IndexedDB/inline file format (#57)
transfer serialized buffer to main thread (handled by #34).
Adapt options to be able to load state or from IndexedDB (#57)
Fix bugs with serialization code (handled by #34).
Fix bugs with unserialization code. (#44)
UI changes to save/load various formats (#43)

animate fails after loading a saved analysis (.kana or indexeddb)

boxplot of gene expression across clusters/annotation

Could you provide a sample dataset?

Could I please ask if you could provide a data file that is known to work with this app?

I tried GSE117963_10X_whole_aorta_filtered_gene_bc_matrices_h5.h5 from this URL: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE117963

The app seems to be doing nothing. In the side bar on the right, this is what I see:

Generating nearest neighbor graph to compute clusters....

After 10 minutes, the same message is still displayed and the app is still doing nothing.

When I click the wrench icon ("What's happening?"), this is what I see:

critical vs non-critical error messages

this would help with what shows up on the UI. if its critical, there's nothing to do other than reload app

non-critical error messages are mostly notifications

reset analysis when starting a new analysis on an existing one

Possibly a separate issue, but the app freaks out if you try to start a new analysis on top of an already-present analysis without refreshing in between.

mean - variance trend to gallery plots

panning the dimensionality plot is sensitive to canvas width/heights

White screen bug with H5AD and HDF5 format

Hi,

I have noticed a bug when adding input files through the '10x HDF5 matrix' or the 'H5AD' option. If the Add button is pressed before selecting a file then the web page will go completely blank and requires a refresh. This also happens if there is already another .h5ad file loaded.

reset app

support any arbitrary reduced dims

mostly for kana-lite for read only access since files may contain dimensions other than tSNE or UMAP (probably also not named the same)

change worker response format

{
"type": "reducedDim"
"resp": {
    "x": [...],
    "y": [...],
    "name": "<DIM NAME>"
  }
}

Manage all app changes to this generic format
DimPlot component also needs to dynamically generate entries based on the names

switching between umap and tsne - data needs to be reindexed

Smoothing kernels for the QC metrics are wack

According to kana:

But according to R:

There shouldn't be so much density at 14-ish. This is confusing because I was worrying that the entire bulk of cells was being filtered out, but it turned out not to be so.

Also the filter threshold is at 15.3 here and that doesn't look like it on the plot.

Feature request: Accept a URL to an input file

Suppose we have a URL like this:

https://www.jkanche.com/kana/?10xh5=https://example.com/myfile.h5

It would be great if we could click that link and have Kana automatically retrieve the file and load it into the app.

This feature facilitates sharing with colleagues.

This feature also unlocks the possibility to create a light-weight Chrome extension (similar to BioJupies) that adds a "Kana button" to the web page when the extension detects a filename like *.h5 on the page. In one click, the file can be automatically loaded in a new Kana page and ready for analysis.

This feature also enables accepting a URL for a file in the input form instead of choosing a local file on the user's disk:

distributed analysis

using cloudflare workers ?

React Best Practice

useState

never modify state directly, use the 'setState' function to do so (i.e. Analysis/index.js line 52)
define state as such const [state, setState] = useState(value)
if you don't plan on changing the state of a const, simply define it as a const rather than state (i.e. Header/index.js line 28)

props

destructure props to only pass what you need, makes code cleaner and easier to follow (i.e. const Component = ({prop1, prop2}) instead of const Component = (props) )

other

Stats/index.js : remove <></> and {}
Spinners : should README be in this folder? AppToaster.js should be index.js?
Plots : organize plots in folders
inconsistent caps for .css files

show gene used for gradient and provide option to clear

reconfigure header

its getting too busy up there, collapse logs + export + info into a single dropdown

Pulling the Wasm binaries

You can now pull in the scran.js build artifacts as part of a CI job, most easily with:

curl -L https://github.com/jkanche/scran.js/releases/download/latest-web/scran.js > scran.js

Note that these artifacts are continually updated, so the "X days ago" isn't necessarily reflective of when the action last ran.

Modularize & Use comlink for web worker communication

Modularize worker code (#69)
move public/scran to src/
make sure webpack and react's build system still works with worker code
use Comlink - https://github.com/GoogleChromeLabs/comlink

sometimes there's a race condition somewhere

dynamic benchmarks

Currently the PBMC 68K dataset on a laptop with 8 cores & 16 gigs takes ~6 minutes (30K genes x 68k cells)

Ask user permission for storing metrics -

dataset dimensions (num of genes and cells)
Time for the analysis to complete
configuration - cores & memory

Helps us better guesstimate analysis times for different datasets

deployment links incorrect

Hi,

both links on the github page lead to a 404, but the link on twitter is correct/works.

cheers

Gene names and symbols

Figure out gene names/symbols #52)
main thread owns gene names (#38).
handle gzipped files (#34).
hardcode human and mouse mitochondrial genes (#40 -> #48)
- UI to support use hardcoded one or provide a regex

refactor app context to lower components

makes app more responsive, also helps with #47

firefox dimplot compatibility

frikin firefox!!!

https://developer.mozilla.org/en-US/docs/Web/API/OffscreenCanvas

Firefox 44
footnote Partial support
disabled From version 44: this feature is behind thegfx.offscreencanvas.enabled preferences. To change preferences in Firefox, visit about:config.
footnote See bug 1390089.

firefox 44 was ages ago...

redo gallery section

mostly UI

Error when performing batch correction on single H5AD dataset

Mock up an H5AD in R:

suppressPackageStartupMessages(library(scater))
suppressPackageStartupMessages(library(zellkonverter))

set.seed(1000)
sce <- mockSCE()
dim(sce)
#> [1] 2000  200
# Will use `Treatment` as a fake batch variable.
table(sce$Treatment)
#> 
#> treat1 treat2 
#>    100    100

writeH5AD(sce, file = "mockSCE.h5ad")
#> ℹ Using the 'counts' assay as the X matrix

Loading that into https://www.jkanche.com/kana/, selecting Treatment as batch variable, then hitting 'Analyze' yields for me:

CC: @dunstone-a

Local instance fails to start, code: 'ERR_OSSL_EVP_UNSUPPORTED'

Awesome app! yarn installation works but attempts to start it fail. Tried npm run start and yarn start. The error from the latter command:

Starting the development server...

/Users/som846993/Downloads/kana/node_modules/react-scripts/scripts/start.js:19
  throw err;
  ^

Error: error:0308010C:digital envelope routines::unsupported
    at new Hash (node:internal/crypto/hash:67:19)
    at Object.createHash (node:crypto:130:10)
    at module.exports (/Users/som846993/Downloads/kana/node_modules/webpack/lib/util/createHash.js:135:53)
    at NormalModule._initBuildHash (/Users/som846993/Downloads/kana/node_modules/webpack/lib/NormalModule.js:417:16)
    at /Users/som846993/Downloads/kana/node_modules/webpack/lib/NormalModule.js:452:10
    at /Users/som846993/Downloads/kana/node_modules/webpack/lib/NormalModule.js:323:13
    at /Users/som846993/Downloads/kana/node_modules/loader-runner/lib/LoaderRunner.js:367:11
    at /Users/som846993/Downloads/kana/node_modules/loader-runner/lib/LoaderRunner.js:233:18
    at context.callback (/Users/som846993/Downloads/kana/node_modules/loader-runner/lib/LoaderRunner.js:111:13)
    at /Users/som846993/Downloads/kana/node_modules/react-scripts/node_modules/babel-loader/lib/index.js:59:103 {
  opensslErrorStack: [ 'error:03000086:digital envelope routines::initialization error' ],
  library: 'digital envelope routines',
  reason: 'unsupported',
  code: 'ERR_OSSL_EVP_UNSUPPORTED'
}

Node.js v17.1.0
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.

save all plots on the interface

I don't think the custom selection indices are sorted

From printing the payload.selection received by the worker on a computeCustomMarkers signal.

Safest to sort them on the bakana side, provided that this has no ill effects on the kana side.

Current DAG attempt

  diff(options) {
    var self = this;
    // could do something smarter later

    if (this.state.files != options.files) {
      var match = 0;
      for (const [idx, m] of this.state.files.entries()) {
        if (JSON.stringify(m[0]) != JSON.stringify(options.files[idx][0])) {
          match++;
          break;
        }
      }
    }

    if (match > 0) {
      return 0;
    }

    var reneighbor = true;
    var linear_ops = [ 
      "qc",
      "fSelection",
      "pca",
      "build_neighbor_index",
      "snn_find_neighbors",
      "snn_build_graph",
      "snn_cluster_graph",
      "markerGene"
    ];
    var linear_rerun = [];

    for (const [idx, op] of linear_ops.entries()) {
      if (diff_param(op)) {
        linear_rerun = linear_ops.slice(idx);
        break;
      } else if (op == "build_neighbor_index") {
        reneighbor = false;
      }
    }

    var tsne_ops = [
      "tsne_init",
      "tsne_run"
    ];
    var tsne_rerun = [];

    if (!reneighbor) {
      for (const [idx, op] of tsne_ops.entries()) {
        if (diff_param(op)) {
          tsne_rerun = tsne_ops.slice(idx);
          break;
        }
      }
    }

    var umap_ops = [
      "umap_find_neighbors",
      "umap_init",
      "umap_run"
    ];
    var umap_rerun = [];

    if (!reneighbor) {
      for (const [idx, op] of umap_ops.entries()) {
        if (diff_param(op)) {
          umap_rerun = umap_ops.slice(idx);
        }
      }
    }

    return { "linear": linear_rerun, "tsne": tsne_rerun, "umap": umap_rerun };
  }

Decided not to attempt to generalize it. Rather, we have a linear process that always runs on the parent worker. If any tasks are allocated to tsne or umap, they will run on a child worker.

The tricky part is - how do we pass an embound C++ object to another worker? It seems that the objects have a .$$.ptr method that we could call to get the pointer offset to each object in the SharedArrayBuffer.

gray out gallery plots during reanalysis

Full ADT Support

on Preflight send ADT aware response (kanaverse/bakana#20)

New Steps

ADT specific steps
- adt_qualitycontrol doesn't have mt-* proportions
- adt_normalization (kanaverse/bakana#20)
- adt_pca (same as normal pca response) -> ADT PCA plot

Existing Steps

cell_filtering step (this will give us # of cells)
PCA and batch correction is split - no msg
- PCA Step: change correction to mnn changes pca block to weights
combined embedding (ADT + RNA) - no msg
markers now have multi-modal (defaults to RNA)

Parameters

bakana docs for ADT Steps - https://github.com/LTLA/bakana/tree/adt-again/src/adt
Existing steps split into new steps - combined embeddings and batch correction

Visualize cluster hierarchy across resolutions

Repo for pre-saved analysis for various datasets

Aesthetic fiddling

Switch analysis modal to a left-side panel that uses the entire height of the page. #93
Highlight the header that leads to the currently selected popover. #93

Support input formats

Get HDF5 to work (https://github.com/jkanche/kana/pull/31)
UI support for choosing various formats (#46)
Figure out how uses can choose between different input formats (#57)
Autofill state when loading existing analysis (#57)
validating input files based on extensions (https://github.com/jkanche/kana/pull/36)

Multiple datasets not working

I tried loading 2 datasets, both in '10x HDF5 matrix' format (from CellRanger v7) and get the following error:

08:56:46: (downloadsdb) store initialized
08:56:46: (kanadb) store initialized
08:56:47: analysis state created
08:56:47: bakana initialized
08:58:24: preflight_input finished
08:58:26: preflight_input finished
08:58:31: preflight_input finished
08:59:11: preflight_input finished
08:59:19: preflight_input finished
08:59:26: --- Analyis started---
08:59:26: Error: cannot assign undefined parameter to 'inputs.sample_factor'

The same error occurs if I use the 'Matrix Market file' format of these datasets.
These two datasets both contain GEX and ADT assays.
Each dataset works fine when loaded and analysed individually.

Can I provide any other info that will help figure out what's going wrong?

Integration of multiple datasets

Dear all,

thanks to the authors for this great web page, I am really beginner in coding and scData analysis.
Would there be any option to extend the applicability of this tool to integrate several datasets, preserving however the origin of every single one?
Thanks a lot and best wishes, Michael

UMAP

I believe this snippet should work as intended:

    <script src="scran.js"></script>
    <script type="text/javascript">
    Module.onRuntimeInitialized = function () {
        var npoints = 1000;
        var ndim = 50;

        var inptr = Module._malloc(npoints * ndim * 8); 
        var inputs = new Float64Array(Module.HEAPF64.buffer, inptr, npoints * ndim);
        inputs.forEach(function(x, i) {
            this[i] = Math.random();
        }, inputs);

        var outptr = Module._malloc(npoints * 2 * 8);
        var outputs = new Float64Array(Module.HEAPF64.buffer, outptr, npoints * 2);
        
        // 15 neighbors, 500 epochs, 0.01 min dist, true for approximate NN search
        var umap = Module.initialize_umap(inptr, ndim, npoints, 15, 500, 0.01, true, outptr);
 
        // Running epochs in clumps of 1000 milliseconds. Turn this down to get updates faster.
        Module.run_umap(umap, 1000, outptr);
        Module.run_umap(umap, 1000, outptr);
        Module.run_umap(umap, 1000, outptr);
        console.log(outputs);

        umap.delete();
        Module._free(outptr);
        Module._free(inptr);
    }
    </script>

Gene symbols or ID for cell annotation

Hi, thanks for this amazing applications that allow us to quickly produce results on a browser. I have a question with regards to the cell annotation using reference. I understand that the software attempts to "guess" whether the first columns of "genes.tsv" file is Ensembl ID or symbol, then use that to infer whether to use symbol or ID from the references. However, the first column may not necessarily be symbol or Ensembl ID. For example, in the case of looking at novel isoforms, the first column can sometimes be used to denote a new isoforms and called "ID.XXX" and this can cause the guessing to be thrown out of the loop.

For example, when I name the first column as "PB.12.10--WASH7P--novel_in_catalog", the result is different from just naming it "WASH7P". I will however need to name it the former since I'm trying to decipher which isoform it is.

In the case of unconventional naming of first columns, is it possible to force the software to use the second column (which is usually symbol/gene names) for cell annotation?

Following this question, is there any other part of the software that users need to be aware of whereby unconventional naming of first columns can cause an issue?

Thank you!

clean up documentation

build system for hosting on github pages (jkanche@34f4aa1, jkanche@a3ff49a, jkanche@c17d018 )
update documentation
refactor