Giter Site home page Giter Site logo

theislab / cellrank_reproducibility_preprint Goto Github PK

View Code? Open in Web Editor NEW
4.0 4.0 0.0 233.72 MB

Code to reproduce results from the CellRank preprint

Home Page: https://cellrank.org

License: BSD 3-Clause "New" or "Revised" License

Jupyter Notebook 99.95% Python 0.05% Shell 0.01% R 0.01%
fate machine-learning mapping reproducibility reproducible-research reproducible-science scrna-seq

cellrank_reproducibility_preprint's People

Contributors

marius1311 avatar michalk8 avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

cellrank_reproducibility_preprint's Issues

CoC

My remarks:

  • for the paths, instead of get_paths function which return a dictionary, let's create 1 file where the paths are defined as constans - I think it will be more readable doing and that we only import paths that we need, e.g.
from path import CACHE_DIR, FIG_DIR
  • If we go for the approach above, let's also define some naming convention for these constants (e.g. directories ending with _DIR, data-related stuff prefixed with DATA_, caching with CACHE_, etc.)
  • environment.yml or requirements.txt - I'd provide a conda environment.yaml named cellrank-reproduciblity with all the correct package versions (if something requires a different version, then a separate yaml file will be within that directory)
  • I'd also create a small skeleton .ipynb as basis for all notebooks - this should contain e.g. importing the default packages (like scanpy/cellrank/etc), printing the versions and the sections you mention (section not needed, like Plot results in preprocessing notebooks will be removed when filling the notebook up)
  • initials in the notebooks: it's sometimes hard for me to distinguish ml and mk, maybe if we could use different aliases or capitalize it or move it to the front before the date
  • same for the dates, YYYYMMDD is not friendly format (at least for me) to read, I'd include dashes as YYYY-MM-DD.
  • relative paths: do you mean relative to this repo's root or relative to the position of the file/notebook? I assume you mean the latter
  • I'd also make 1 issue for 1 figure or their dependency and do regular PRs

Clean up and add the pancreas notebooks

This concerns the main figures 2 and 3 as well as a number of supplemental figures. Add the Palantir pseudotime to the dataset on figshare and add the magic imputed data as an extra array to figshare.

Test if the pipeline works

TODOs:

  • add directories to .gitkeep!
  • test download morris data
  • merge the pickles/csvs into 1
  • test loading/preprocessing or Morris data
  • test runtime benchmark
  • test memory benchmark
  • test robustness benchmark

Repo size

For some reason, it's huge... Did any of us commit and data inside?
Inspecting this, it's git objects (245M ./.git/objects).
And 99M ./notebooks

I suggest we prune this once everything is done - I can do a test run on my private fork to see if we can prune the git objects.

Caching

I haven't yet added this to the README. I'm still going to need scachepy in my notebooks because I don't want to re-compute velocities and my stochastic kernel each time I have to re-generate a figure. I suggest we have a caching directory that mirrors the structure of the data directory. We won't share the actual cached files because they are too large but I will place .gitkeep files so that we have the same folder structure. What are your thoughts on this?

Print all relevant versions

@michalk8 let's keep in mind to print important package versions like FateID, STEMNET or Palantir in the banchmark notebooks as these are not included in cr.logging.print_versions()

Concept figure

Move the notebook to compute the concept figure, clean it up and prep.

Scattered to do's

I'm collecting to do's from various notebooks here:

  • in the main uncertainty notebook, insert links to to the stochastic MC notebook and also to the robustness notebook
  • make sure supplemental gene trends are saved to the same directory.
  • remove the code of conduct again.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.