Giter Site home page Giter Site logo

chanzuckerberg / cellxgene Goto Github PK

View Code? Open in Web Editor NEW
584.0 33.0 111.0 244.7 MB

An interactive explorer for single-cell transcriptomics data

Home Page: https://chanzuckerberg.github.io/cellxgene/

License: MIT License

JavaScript 68.39% HTML 0.37% CSS 0.24% Python 30.06% Dockerfile 0.02% Makefile 0.72% Shell 0.10% AppleScript 0.10%
scientific visualization scrna-seq transcriptomics dataviz

cellxgene's Introduction

an interactive explorer for single-cell transcriptomics data

DOI PyPI PyPI - Downloads GitHub last commit Push Tests Compatibility Tests Code Coverage

CZ CELLxGENE Annotate (pronounced "cell-by-gene") is an interactive data explorer for single-cell datasets, such as those coming from the Human Cell Atlas. Leveraging modern web development techniques to enable fast visualizations of at least 1 million cells, we hope to enable biologists and computational researchers to explore their data.

Whether you need to visualize one thousand cells or one million, CELLxGENE Annotate helps you gain insight into your single-cell data.

Getting started

The comprehensive guide to CZ CELLxGENE Annotate

The CZ CELLxGENE Annotate documentation is your one-stop-shop for information about CELLxGENE Annotate! You may be particularly interested in:

Quick start

To install CELLxGENE Annotate you need Python 3.6+. We recommend installing Annotate into a conda or virtual environment.

Install the package.

pip install cellxgene

Launch Annotate with an example anndata file

cellxgene launch https://cellxgene-example-data.czi.technology/pbmc3k.h5ad

To explore more datasets already formatted for Annotate, check out the Demo data or see Preparing your data to learn more about formatting your own data for CELLxGENE Annotate.

Supported browsers

CELLxGENE Annotate currently supports the following browsers:

  • Google Chrome 61+
  • Edge 15+
  • Firefox 60+

Please file an issue if you would like us to add support for an unsupported browser.

Finding help

We'd love to hear from you! For questions, suggestions, or accolades, join the #cellxgene-users channel on the CZI Science Community Slack and say "hi!".

For any errors, report bugs on Github.

Developing with CZ CELLxGENE Annotate

Contributing

We warmly welcome contributions from the community! Please see our contributing guide and don't hesitate to open an issue or send a pull request to improve CELLxGENE Annotate. Please see the dev_docs for pull request suggestions, unit test details, local documentation preview, and other development specifics.

This project adheres to the Contributor Covenant code of conduct. By participating, you are expected to uphold this code. Please report unacceptable behavior to [email protected].

Reuse

This project was started with the sole goal of empowering the scientific community to explore and understand their data. As such, we encourage other scientific tool builders in academia or industry to adopt the patterns, tools, and code from this project. All code is freely available for reuse under the MIT license.

Before extending CELLxGENE Annotate, we encourage you to reach out to us with ideas or questions. It might be possible that an extension could be directly contributed, which would make it available for a wider audience, or that it's on our roadmap and under active development.

See the CELLxGENE extensions section of our documentation for examples of community use and CELLxGENE extensions.

Security

If you believe you have found a security issue, we would appreciate notification. Please send email to [email protected].

Inspiration

We've been heavily inspired by several other related single-cell visualization projects, including the UCSC Cell Browser, Cytoscape, Xena, ASAP, GenePattern, and many others. We hope to explore collaborations where useful as this community works together on improving interactive visualization for single-cell data.

We were inspired by Mike Bostock and the crossfilter team for the design of our filtering implementation.

We have been working closely with the scanpy team to integrate with their awesome analysis tools. Special thanks to Alex Wolf, Fabian Theis, and the rest of the team for their help during development and for providing an example dataset.

We are eager to explore integrations with other computational backends such as Seurat or Bioconductor

cellxgene's People

Contributors

ambrosejcarr avatar ashin-czi avatar atarashansky avatar atolopko-czi avatar bento007 avatar bkmartinjr avatar blrnw3 avatar colinmegill avatar csweaver avatar dependabot[bot] avatar ebezzi avatar fionagriffin avatar freeman-lab avatar ihnorton avatar jakeyheath avatar lesliecodes avatar maniarathi avatar mattcai avatar mckinsel avatar mdunitz avatar millenniumfalconmechanic avatar mweiden avatar neuromusic avatar prete avatar roaga avatar seve avatar sidneymbell avatar signechambers1 avatar snyk-bot avatar tihuan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cellxgene's Issues

splat when doing differential expression compute on large data set

using pbmc33k data set, computed differential expression where both data sets were all cells. Generated an error.

POST http://pbmc33k.cxg.czi.technology/api/v0.1/expression 400 (BAD REQUEST)

d3.js:127 Uncaught (in promise) TypeError: Cannot read property 'length' of undefined
    at Object.extent (d3.js:127)
    at t.maybeSetupScalesAndDrawAxes ((index):1)
    at t.componentWillReceiveProps ((index):1)
    at c.updateComponent ((index):1)
    at c.receiveComponent ((index):1)
    at Object.receiveComponent ((index):1)
    at c._updateRenderedComponent ((index):1)
    at c._performComponentUpdate ((index):1)
    at c.updateComponent ((index):1)
    at c.performUpdateIfNecessary ((index):1)
    at Object.performUpdateIfNecessary ((index):1)
    at a ((index):1)
    at r.perform ((index):1)
    at o.perform ((index):1)
    at o.perform ((index):1)
    at Object.w [as flushBatchedUpdates] ((index):1)
    at r.closeAll ((index):1)
    at r.perform ((index):1)
    at Object.batchedUpdates ((index):1)
    at Object.e [as enqueueUpdate] ((index):1)
    at r ((index):1)
    at Object.enqueueSetState ((index):1)
    at i.r.setState ((index):1)
    at i.onStateChange ((index):1)
    at Object.notify ((index):1)
    at e.notifyNestedSubs ((index):1)
    at i.onStateChange ((index):1)
    at p ((index):1)
    at (index):1
    at (index):1
    at (index):1
    at (index):1
    at dispatch ((index):1)
    at (index):1
    at <anonymous>

gene expression count returning unexpected cell name

on the t. muris end point, the single gene expression count request is returning a cell that was not present in the original /cells response.

See command line example below. There are 2446 of these "new" cell names returned by a POST to /expression, of which P3.D042103.3_11_M.1 is but one example.

# curl -s -d '{"genelist":["Anxa5"]}'  -H "Content-Type: application/json" -X POST http://tabulamuris.cxg.czi.technology/api/v0.1/expression | grep P3.D042103.3_11_M.1
        "cellname": "P3.D042103.3_11_M.1",
# curl -s http://tabulamuris.cxg.czi.technology/api/v0.1/cells | grep P3.D042103.3_11_M.1
#

Note: added code to the front-end to defend against this: see src/actions/index.js:cleanupExpressionResponse()

Regraph needs full filter

When regraphing a second time, cxg needs to send over the original filter + the new filter in order to give the rest api the full subset to filter on. The rest api maintains no memory of previous filters.

front-end interactive performance improvement ideas - tracking issue

Tracking issue for interactive performance improvements in the front-end. Please add any ideas you may have to the issue. These are primarily "big" changes that would require some substantial refactoring.

Starter list:

  • we could ~ halve the memory used by dense metadata storage if it was organized so that the metadata field names were stored only once (right now they are stored for each cell). Sparse metadata would need to continue to be stored on a per-cell basis. Eg,
   cells: {
      metadata: {
           fieldname1: [ ... values of all cells as an array, eg, 'foo', 'bar', 'baz' ],
           fieldname2: [ ... same ... ].,
           ...
      }
   }
}
  • REST API is verbose and not tightly encoded, and download time will be a problem as data set sizes grow. There appear to be some low hanging improvements we could make.
  • Most of the selection/deselection computation on categorical metadata boils down to set operations. Are there improvements if we start modelling the selection set as a collection, rather than with annotations on the individual cells?

excessive GPU utilization

Current webgl code runs the GPU constantly if the application is an active tab. This will eat laptop batteries for lunch and generally thrashes my laptop.

We should look at ways to only render when necessary.

categorical group check boxes state

The group check-boxes for categorical metadata don't quite work right and can show conflicting states:

Pick any category group, eg., EM2Cluster:

  1. test 1 - deselect the group, and then select a few of the options --> the group-level checkbox is unselected
  2. test 2 - select the group, then deselect a few options --> the group-level checkbox is selected

this is confusing -- the group level checkbox can be in either state when some options are selected. If it is only present to provide group-level actions, perhaps it should be replaced by separate [all] and [clear] actions?

graph.js: excessive state bound

src/components/graph/graph.js: the Graph component connects to more state than it uses. When the component stabilizes, it would be good to clean this up.

Examples: ranges

Voronoi cluster selection

An invisible voronoi overlay painted on the SVG layer can catch mouse events. We can create an overlay using cluster centroids to let people click on clusters. This will help when there are 100 clusters and 20 of them are different shades of green.

https://bl.ocks.org/mbostock/8033015

image

Local Client: Tracking Issue

Migrate from a hosted backend to a local one. Users can pip install cellxgene and then point it at their directory locally to launch a webserver.

500 error from REST API

Load the T. Muris endpoint. Select a set of cells (a few thousand) and put into Selection 1. Select a large number of cells (25,000 or larger - eg, most of the cells) and put in Selection 2.

Click Compute differential expression

Results in a 500 HTTP response from the back-end, on the /diffexpression endpoint. Seems to be triggered by very large number of cells in the selection.

use _.get()

There are quite a few places in the code that look like:

  const vertices =
    state.cells.cells && state.cells.cells.data.graph
      ? state.cells.cells.data.graph
      : null;

This can be improved by using _.get(). The above code becomes:

  const vertices = _.get('state.cells.cells.data.graph', null);

status indicator needed during costly operations

During high latency operations (eg, calculation of differential expression), the UI would benefit from a busy status indicator (eg, a 'computing....' footer or some such signal). Given that we render each canvas asynchronously, may need one per canvas (eg, a red dot in the corner when it is being re-rendered)

REST API issues / potential areas of improvement - tracking issue

This is a tracking thread for ideas/debate about improvements to the cellxgene REST API. Please add your own thoughts and/or reactions.

Starter list of issues/ideas:

  • Semantics of /cells response object:

    1. API mixes together external (submitter) cell ID and the API-internal data structure cell id. These should be separate concepts. We should separate these concepts, and use a much more compact "ID" for the API internals.
    2. cellids are encoded into external metadata (with name 'CellName' - this pollutes the namespace)
  • /cells OTA performance:

    1. don't transmit cellids more than once per cell (currently /cells transmits each ID three times)
    2. API internal (response) data structure should use much small encoding for a cellid (eg, a number, or make it implicit)
    3. Current metadata encoding works best for sparse metadata, but wastes a lot of bandwidth if metadata is not sparse (the metadata field names are re-transmitted for each cell). We could cut response size dramatically by using different data structures for sparse and dense metadata.
  • /initialize:

    1. better documentation of the schema model used for schema field in /initialize. Ie, what are the types, what are their characteristics, etc.
    2. schema sub-object should only contain data model info (include is a UI hint, and should be not be in the the data schema).
  • naming is inconsistent, eg, /cells uses CellName, and /expression uses cellname

  • it would be very helpful if all endpoints, when recieving an unqualified request, returned cells
    in the same order and with the same dimensionality, so that cellname / cellid doesn't have to be
    explicitly used to link the two (position in the array is sufficient)

  • For regraphing, there is a lot of overhead in using /cells, as it returns all metadata as well as the new graph. We should have /metadata, and /graph, and get rid of /cells. If we need the grouping information (options), that could be a separate endpoint as well. Much more flexible for the front-end.

Note on API chattiness (OTA bandwidth): this will only be an issue when the back-ends are much faster. Currently, most of the "download" time is actually waiting for the back-end to respond (time-to-first-byte). This is true for both the EM2 and ScanPy back-ends.

posting junk to /diffexpression causes crash in server

Rather than posting a list of CellNames, I accidentally posted the stuff below, and the server crashed (returned a 502). The example below has been edited to keep it short.

{"celllist1":[
{"CellName":"AAAGAGACGGCATT","EM2Cluster":"0","cluster_id":"1","tSNE_1":"-3.52069103297318","tSNE_2":"-3.46894020526272","__cellIndex__":15,"__selected__":true,"__color__":"rgba(0,0,0,1)","__colorRGB__":[0,0,0],"__x__":0.4337592343704319,"__y__":0.43125663054081603},
{"CellName":"AAATCAACCCTATT","EM2Cluster":"1","cluster_id":"5","tSNE_1":"-2.69271065101789","tSNE_2":"1.95420307068623","__cellIndex__":26,"__selected__":true,"__color__":"rgba(0,0,0,1)","__colorRGB__":[0,0,0],"__x__":0.444891742821794,"__y__":0.5189382985520843},
{"CellName":"AAATCCCTCCACAA","EM2Cluster":"0","cluster_id":"1","tSNE_1":"0.840431446555436","tSNE_2":"-0.0234178421436222","__cellIndex__":30,"__selected__":true,"__color__":"rgba(0,0,0,1)","__colorRGB__":[0,0,0],"__x__":0.49239617060571467,"__y__":0.48696401878339307}, 
... ], 
"celllist2": [ ... ], 
num_genes: 7}

server 500 error when regraphing

To reproduce:

  1. load http://tabulamuris.cxg.czi.technology/
  2. Open the categorical metadata selector for EM2Cluster
  3. Deselect category option 1
  4. Click regraph button
  5. console shows 500 error from server
GET http://tabulamuris.cxg.czi.technology/api/v0.1/cells?EM2Cluster=0&EM2Cluster=2&EM2Cluster=3&EM2Cluster=4&EM2Cluster=5&EM2Cluster=6&EM2Cluster=7&EM2Cluster=8&EM2Cluster=9&EM2Cluster=10&EM2Cluster=11&EM2Cluster=12&EM2Cluster=13&EM2Cluster=14&EM2Cluster=15&EM2Cluster=16&EM2Cluster=17&EM2Cluster=18&EM2Cluster=19&EM2Cluster=20&EM2Cluster=21&EM2Cluster=22&EM2Cluster=23&EM2Cluster=24&EM2Cluster=25&EM2Cluster=26&EM2Cluster=27&EM2Cluster=28&EM2Cluster=29&EM2Cluster=30&EM2Cluster=31&EM2Cluster=32&EM2Cluster=33&EM2Cluster=34&EM2Cluster=35&EM2Cluster=36&EM2Cluster=37&EM2Cluster=38&EM2Cluster=39&EM2Cluster=40&EM2Cluster=41&EM2Cluster=42&EM2Cluster=43&EM2Cluster=44&EM2Cluster=45&EM2Cluster=46&EM2Cluster=47&EM2Cluster=48&EM2Cluster=49&EM2Cluster=50&EM2Cluster=51&EM2Cluster=52&EM2Cluster=53&EM2Cluster=54&EM2Cluster=55&EM2Cluster=56&EM2Cluster=57&EM2Cluster=58&EM2Cluster=59&EM2Cluster=60&EM2Cluster=61&EM2Cluster=62&EM2Cluster=63&EM2Cluster=64&EM2Cluster=65&EM2Cluster=66&EM2Cluster=67&EM2Cluster=68&EM2Cluster=69&EM2Cluster=70&EM2Cluster=71&EM2Cluster=72&EM2Cluster=73&EM2Cluster=74&EM2Cluster=75&EM2Cluster=76&EM2Cluster=77&EM2Cluster=78&EM2Cluster=79&EM2Cluster=80&EM2Cluster=81&EM2Cluster=82&EM2Cluster=83&EM2Cluster=84&EM2Cluster=85&EM2Cluster=86&EM2Cluster=87&EM2Cluster=88&EM2Cluster=89&EM2Cluster=90&EM2Cluster=91&EM2Cluster=92&EM2Cluster=93&EM2Cluster=94&EM2Cluster=95&EM2Cluster=96&EM2Cluster=97&EM2Cluster=98&EM2Cluster=99&EM2Cluster=100&EM2Cluster=101&EM2Cluster=NoCluster 
500 (INTERNAL SERVER ERROR) 

One Million Cells: Tracking Issue

App is "usable" on a modern laptop (eg, macbook pro) with 1M cells loaded, eg, speed and interactive performance are reasonable. On more typical data sets (eg, 250K cells), speed & interactive performance should be excellent (no jank, etc.)

Backend Concurrency: Tracking Issue

A single back-end, running on a modest AWS instance, can handle interactive serving for 10+ users. Larger instances can survive 100+ concurrent users.

Medium post on cellxgene

or perhaps multiple medium posts:

  • general announcement
  • technical design: architectural & technological approaches/patterns

Memory error on GET

$ curl http://tabulamuris.cxg.czi.technology/api/v0.1/expression

Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1997, in call
return self.wsgi_app(environ, start_response)
File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1985, in wsgi_app
response = self.handle_exception(e)
File "/usr/local/lib/python3.5/dist-packages/flask_restful/init.py", line 273, in error_router
return original_handler(e)
File "/usr/local/lib/python3.5/dist-packages/flask_cors/extension.py", line 161, in wrapped_function
return cors_after_request(app.make_response(f(*args, **kwargs)))
File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1540, in handle_exception
reraise(exc_type, exc_value, tb)
File "/usr/local/lib/python3.5/dist-packages/flask/_compat.py", line 32, in reraise
raise value.with_traceback(tb)
File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1982, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1614, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python3.5/dist-packages/flask_restful/init.py", line 273, in error_router
return original_handler(e)
File "/usr/local/lib/python3.5/dist-packages/flask_cors/extension.py", line 161, in wrapped_function
return cors_after_request(app.make_response(f(*args, **kwargs)))
File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1517, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/usr/local/lib/python3.5/dist-packages/flask/_compat.py", line 32, in reraise
raise value.with_traceback(tb)
File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1612, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1598, in dispatch_request
return self.view_functions<**req.view_args|rule.endpoint>
File "/usr/local/lib/python3.5/dist-packages/flask_restful/init.py", line 480, in wrapper
resp = resource(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/flask_restful_swagger_2/init.py", line 39, in decorator
return f(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/flask/views.py", line 84, in view
return self.dispatch_request(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/flask_restful/init.py", line 595, in dispatch_request
resp = meth(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/flask_restful_swagger_2/swagger.py", line 219, in inner
return f(self, *args, **kwargs)
File "/app/application.py", line 1028, in get
data = parse_exp_data(limit=40, unexpressed_genes=unexpressed_genes)
File "/app/application.py", line 294, in parse_exp_data
expression = get_expression(cells, genes)
File "/app/application.py", line 351, in get_expression
raise error
File "/app/application.py", line 345, in get_expression
expression = e.getDenseExpressionMatrix("AllGenes", cellset)
MemoryError: std::bad_alloc

diff expression scatter plot disappears

To reproduce:

  1. select two cell sets and compute differential expression
  2. click on Expression tab, and select an X and Y gene.
    -- you should now see diffexp scatterplot --
  3. Click Metadata tab
  4. Click Expression tab

At this point, the expression tab shows up, missing the scatterplot. It just doesn't get re-rendered.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.