Giter Site home page Giter Site logo

kanaverse / kana Goto Github PK

View Code? Open in Web Editor NEW
133.0 133.0 12.0 130.7 MB

Single cell analysis in the browser

Home Page: https://kanaverse.org/kana/

License: MIT License

JavaScript 96.90% HTML 0.24% CSS 2.82% Dockerfile 0.04%
bioinformatics cite-seq exploratory-data-analysis interactive-analysis interactive-visualizations rna-seq single-cell webassembly

kana's People

Contributors

jkanche avatar llewelld avatar ltla avatar petehaitch avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

kana's Issues

aesthetics

  • Animations for t-SNE/UMAP (#39)
  • sliders histogram changes to gradient (#50)
  • save tsne to gallery (#49)
  • change qc plot ticks to normal text (#50)
  • send dims in the data payloads (#50)
  • filter markers set default to 0 for lfc and delta-d (#50)
  • Fancier would be to save and restore the state from gallery
  • change the pong game to conways game of life

Fix the t-SNE/UMAP message passing

  • Consolidate code inside each worker file to keep things more understandable.
  • Bump out the delay for non-animated runs.
  • Add a dedicated response for animation restart.

What is the format of the .kana file?

When I click "Export" and then "Download to file" I get a binary file with the .kana extension.

Suppose a user wants to export the results and then import them into Python or R.

How can we do that?

Feature request: Re-analysing a user-specified subset of data

It would be great if it were possible to select a subset of cells and re-analyse that subset.
E.g., In a dataset of PBMCs, select all 'B cells' (based on the cluster annotations) and re-analyse to look for subclusters within the B cell population.
Is this something planned for future release or that would be feasible to implement?

serialization format for storing analysis

  • Converter from TypedArray to JSON arrays (handled by #34).
  • Figure out IndexedDB/inline file format (#57)
  • transfer serialized buffer to main thread (handled by #34).
  • Adapt options to be able to load state or from IndexedDB (#57)
  • Fix bugs with serialization code (handled by #34).
  • Fix bugs with unserialization code. (#44)
  • UI changes to save/load various formats (#43)

Could you provide a sample dataset?

Could I please ask if you could provide a data file that is known to work with this app?

I tried GSE117963_10X_whole_aorta_filtered_gene_bc_matrices_h5.h5 from this URL: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE117963

The app seems to be doing nothing. In the side bar on the right, this is what I see:

Generating nearest neighbor graph to compute clusters....

After 10 minutes, the same message is still displayed and the app is still doing nothing.

When I click the wrench icon ("What's happening?"), this is what I see:

image

critical vs non-critical error messages

this would help with what shows up on the UI. if its critical, there's nothing to do other than reload app

non-critical error messages are mostly notifications

White screen bug with H5AD and HDF5 format

Hi,

I have noticed a bug when adding input files through the '10x HDF5 matrix' or the 'H5AD' option. If the Add button is pressed before selecting a file then the web page will go completely blank and requires a refresh. This also happens if there is already another .h5ad file loaded.

support any arbitrary reduced dims

mostly for kana-lite for read only access since files may contain dimensions other than tSNE or UMAP (probably also not named the same)

  • change worker response format
{
"type": "reducedDim"
"resp": {
    "x": [...],
    "y": [...],
    "name": "<DIM NAME>"
  }
}
  • Manage all app changes to this generic format
  • DimPlot component also needs to dynamically generate entries based on the names

Smoothing kernels for the QC metrics are wack

According to kana:

Screenshot from 2022-06-15 21-12-14

But according to R:

Screenshot from 2022-06-15 21-13-01

There shouldn't be so much density at 14-ish. This is confusing because I was worrying that the entire bulk of cells was being filtered out, but it turned out not to be so.

Also the filter threshold is at 15.3 here and that doesn't look like it on the plot.

Feature request: Accept a URL to an input file

Suppose we have a URL like this:

https://www.jkanche.com/kana/?10xh5=https://example.com/myfile.h5

It would be great if we could click that link and have Kana automatically retrieve the file and load it into the app.

This feature facilitates sharing with colleagues.

This feature also unlocks the possibility to create a light-weight Chrome extension (similar to BioJupies) that adds a "Kana button" to the web page when the extension detects a filename like *.h5 on the page. In one click, the file can be automatically loaded in a new Kana page and ready for analysis.

This feature also enables accepting a URL for a file in the input form instead of choosing a local file on the user's disk:

React Best Practice

useState

  • never modify state directly, use the 'setState' function to do so (i.e. Analysis/index.js line 52)
  • define state as such const [state, setState] = useState(value)
  • if you don't plan on changing the state of a const, simply define it as a const rather than state (i.e. Header/index.js line 28)

props

  • destructure props to only pass what you need, makes code cleaner and easier to follow (i.e. const Component = ({prop1, prop2}) instead of const Component = (props) )

other

  • Stats/index.js : remove <></> and {}
  • Spinners : should README be in this folder? AppToaster.js should be index.js?
  • Plots : organize plots in folders
  • inconsistent caps for .css files

reconfigure header

its getting too busy up there, collapse logs + export + info into a single dropdown

Pulling the Wasm binaries

You can now pull in the scran.js build artifacts as part of a CI job, most easily with:

curl -L https://github.com/jkanche/scran.js/releases/download/latest-web/scran.js > scran.js

Note that these artifacts are continually updated, so the "X days ago" isn't necessarily reflective of when the action last ran.

dynamic benchmarks

Currently the PBMC 68K dataset on a laptop with 8 cores & 16 gigs takes ~6 minutes (30K genes x 68k cells)

Ask user permission for storing metrics -

  • dataset dimensions (num of genes and cells)
  • Time for the analysis to complete
  • configuration - cores & memory

Helps us better guesstimate analysis times for different datasets

Gene names and symbols

  • Figure out gene names/symbols #52)
  • main thread owns gene names (#38).
  • handle gzipped files (#34).
  • hardcode human and mouse mitochondrial genes (#40 -> #48)
    • UI to support use hardcoded one or provide a regex

Error when performing batch correction on single H5AD dataset

Mock up an H5AD in R:

suppressPackageStartupMessages(library(scater))
suppressPackageStartupMessages(library(zellkonverter))

set.seed(1000)
sce <- mockSCE()
dim(sce)
#> [1] 2000  200
# Will use `Treatment` as a fake batch variable.
table(sce$Treatment)
#> 
#> treat1 treat2 
#>    100    100

writeH5AD(sce, file = "mockSCE.h5ad")
#> ℹ Using the 'counts' assay as the X matrix

Loading that into https://www.jkanche.com/kana/, selecting Treatment as batch variable, then hitting 'Analyze' yields for me:
Screen Shot 2022-04-20 at 2 46 53 pm

CC: @dunstone-a

Local instance fails to start, code: 'ERR_OSSL_EVP_UNSUPPORTED'

Awesome app! yarn installation works but attempts to start it fail. Tried npm run start and yarn start. The error from the latter command:

Starting the development server...

/Users/som846993/Downloads/kana/node_modules/react-scripts/scripts/start.js:19
  throw err;
  ^

Error: error:0308010C:digital envelope routines::unsupported
    at new Hash (node:internal/crypto/hash:67:19)
    at Object.createHash (node:crypto:130:10)
    at module.exports (/Users/som846993/Downloads/kana/node_modules/webpack/lib/util/createHash.js:135:53)
    at NormalModule._initBuildHash (/Users/som846993/Downloads/kana/node_modules/webpack/lib/NormalModule.js:417:16)
    at /Users/som846993/Downloads/kana/node_modules/webpack/lib/NormalModule.js:452:10
    at /Users/som846993/Downloads/kana/node_modules/webpack/lib/NormalModule.js:323:13
    at /Users/som846993/Downloads/kana/node_modules/loader-runner/lib/LoaderRunner.js:367:11
    at /Users/som846993/Downloads/kana/node_modules/loader-runner/lib/LoaderRunner.js:233:18
    at context.callback (/Users/som846993/Downloads/kana/node_modules/loader-runner/lib/LoaderRunner.js:111:13)
    at /Users/som846993/Downloads/kana/node_modules/react-scripts/node_modules/babel-loader/lib/index.js:59:103 {
  opensslErrorStack: [ 'error:03000086:digital envelope routines::initialization error' ],
  library: 'digital envelope routines',
  reason: 'unsupported',
  code: 'ERR_OSSL_EVP_UNSUPPORTED'
}

Node.js v17.1.0
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.

Current DAG attempt

  diff(options) {
    var self = this;
    // could do something smarter later

    if (this.state.files != options.files) {
      var match = 0;
      for (const [idx, m] of this.state.files.entries()) {
        if (JSON.stringify(m[0]) != JSON.stringify(options.files[idx][0])) {
          match++;
          break;
        }
      }
    }

    if (match > 0) {
      return 0;
    }

    var reneighbor = true;
    var linear_ops = [ 
      "qc",
      "fSelection",
      "pca",
      "build_neighbor_index",
      "snn_find_neighbors",
      "snn_build_graph",
      "snn_cluster_graph",
      "markerGene"
    ];
    var linear_rerun = [];

    for (const [idx, op] of linear_ops.entries()) {
      if (diff_param(op)) {
        linear_rerun = linear_ops.slice(idx);
        break;
      } else if (op == "build_neighbor_index") {
        reneighbor = false;
      }
    }

    var tsne_ops = [
      "tsne_init",
      "tsne_run"
    ];
    var tsne_rerun = [];

    if (!reneighbor) {
      for (const [idx, op] of tsne_ops.entries()) {
        if (diff_param(op)) {
          tsne_rerun = tsne_ops.slice(idx);
          break;
        }
      }
    }

    var umap_ops = [
      "umap_find_neighbors",
      "umap_init",
      "umap_run"
    ];
    var umap_rerun = [];

    if (!reneighbor) {
      for (const [idx, op] of umap_ops.entries()) {
        if (diff_param(op)) {
          umap_rerun = umap_ops.slice(idx);
        }
      }
    }

    return { "linear": linear_rerun, "tsne": tsne_rerun, "umap": umap_rerun };
  }

Decided not to attempt to generalize it. Rather, we have a linear process that always runs on the parent worker. If any tasks are allocated to tsne or umap, they will run on a child worker.

The tricky part is - how do we pass an embound C++ object to another worker? It seems that the objects have a .$$.ptr method that we could call to get the pointer offset to each object in the SharedArrayBuffer.

Full ADT Support

New Steps

  • ADT specific steps
    • adt_qualitycontrol doesn't have mt-* proportions
    • adt_normalization (kanaverse/bakana#20)
    • adt_pca (same as normal pca response) -> ADT PCA plot

Existing Steps

  • cell_filtering step (this will give us # of cells)
  • PCA and batch correction is split - no msg
    • PCA Step: change correction to mnn changes pca block to weights
  • combined embedding (ADT + RNA) - no msg
  • markers now have multi-modal (defaults to RNA)

Parameters

Aesthetic fiddling

  • Switch analysis modal to a left-side panel that uses the entire height of the page. #93
  • Highlight the header that leads to the currently selected popover. #93

Multiple datasets not working

I tried loading 2 datasets, both in '10x HDF5 matrix' format (from CellRanger v7) and get the following error:

08:56:46: (downloadsdb) store initialized
08:56:46: (kanadb) store initialized
08:56:47: analysis state created
08:56:47: bakana initialized
08:58:24: preflight_input finished
08:58:26: preflight_input finished
08:58:31: preflight_input finished
08:59:11: preflight_input finished
08:59:19: preflight_input finished
08:59:26: --- Analyis started---
08:59:26: Error: cannot assign undefined parameter to 'inputs.sample_factor'

The same error occurs if I use the 'Matrix Market file' format of these datasets.
These two datasets both contain GEX and ADT assays.
Each dataset works fine when loaded and analysed individually.

Can I provide any other info that will help figure out what's going wrong?

Integration of multiple datasets

Dear all,

  1. thanks to the authors for this great web page, I am really beginner in coding and scData analysis.
  2. Would there be any option to extend the applicability of this tool to integrate several datasets, preserving however the origin of every single one?
  3. Thanks a lot and best wishes, Michael

UMAP

I believe this snippet should work as intended:

    <script src="scran.js"></script>
    <script type="text/javascript">
    Module.onRuntimeInitialized = function () {
        var npoints = 1000;
        var ndim = 50;

        var inptr = Module._malloc(npoints * ndim * 8); 
        var inputs = new Float64Array(Module.HEAPF64.buffer, inptr, npoints * ndim);
        inputs.forEach(function(x, i) {
            this[i] = Math.random();
        }, inputs);

        var outptr = Module._malloc(npoints * 2 * 8);
        var outputs = new Float64Array(Module.HEAPF64.buffer, outptr, npoints * 2);
        
        // 15 neighbors, 500 epochs, 0.01 min dist, true for approximate NN search
        var umap = Module.initialize_umap(inptr, ndim, npoints, 15, 500, 0.01, true, outptr);
 
        // Running epochs in clumps of 1000 milliseconds. Turn this down to get updates faster.
        Module.run_umap(umap, 1000, outptr);
        Module.run_umap(umap, 1000, outptr);
        Module.run_umap(umap, 1000, outptr);
        console.log(outputs);

        umap.delete();
        Module._free(outptr);
        Module._free(inptr);
    }
    </script>

Gene symbols or ID for cell annotation

Hi, thanks for this amazing applications that allow us to quickly produce results on a browser. I have a question with regards to the cell annotation using reference. I understand that the software attempts to "guess" whether the first columns of "genes.tsv" file is Ensembl ID or symbol, then use that to infer whether to use symbol or ID from the references. However, the first column may not necessarily be symbol or Ensembl ID. For example, in the case of looking at novel isoforms, the first column can sometimes be used to denote a new isoforms and called "ID.XXX" and this can cause the guessing to be thrown out of the loop.

For example, when I name the first column as "PB.12.10--WASH7P--novel_in_catalog", the result is different from just naming it "WASH7P". I will however need to name it the former since I'm trying to decipher which isoform it is.

In the case of unconventional naming of first columns, is it possible to force the software to use the second column (which is usually symbol/gene names) for cell annotation?

Following this question, is there any other part of the software that users need to be aware of whereby unconventional naming of first columns can cause an issue?

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.