Giter Site home page Giter Site logo

pooled-cell-painting-image-processing's Introduction

pooled-cell-painting-image-processing

pooled-cell-painting-image-processing's People

Contributors

bethac07 avatar erinweisbart avatar rsenft1 avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pooled-cell-painting-image-processing's Issues

Add checks so things don't launch in multiple copies

This is an issue that's a bigger potential issue for steps of the workflow that are triggered by steps that make many files as opposed to one:

If the final several jobs of a step all finish ~ the same time, several of them may start to try to trigger the next step; for example, when I just ran step 6 -> 7, it was triggered 3 times; frankly, this could have been much worse.

We'd need some sort of a state variable that tells it not to run more than once; because of latency of things like making queues, not sure that's the best thing to do. We could do that + limit concurrency, but that would be sloooooow for things that have thousands of triggers. Maybe an SNS or SQS message? need to ponder this.

Write make_pipelines for each step

Auto generate CellProfiler pipelines based on the information input in the metadata.

  • 1_CP_Illum_Correction_Calculation
  • 2_CP_Illum_Correction_Application
  • 3_Segmentation check
  • 5_BC_Illum_Correction_Calculation
  • 6_BC_Illum_Correction_Application
  • 7_BC_Preprocessing
  • 9_Analysis

Refactor- everything writes to its own folder

In some cases, like for things that will go into the stitching script, we may want a structure that's something like Plate/Well/Site, or Plate-Well/Site, but everything should end up in a site-specific folder.

Refactor - move to Bray et al names

To make mining against other arrayed sets easier, where possible we should do a find-and-replace for DAPI to DNA, ConA to ER, etc - making the things compartment-named rather than dye named.

(This is easy to do if you have the whole repo open in IE VSCode, so let me know if it would be helpful for me to do so. Holding off for the moment since I'm not sure if you have active stuff you would like to push first so I don't break mergeability).

Handle small-round well stitches

It so-happens that in our historical use cases, round == "too big to stitch without quartering", but this is not true universally, esp for other well sizes (12, 24, etc). We probably want to explicitly pass whether or not to quarter alongside round, because we only want to quarter if we absolutely have to.

Enhancement- refactor to use step functions instead of individual lambda triggers?

https://docs.aws.amazon.com/step-functions/latest/dg/welcome.html

Basically, rather than having an S3 trigger for each Lambda function, we move those S3 triggers to step functions, and have the S3 uploads trigger the step functions and have step functions handle lambda execution, such as a) a shutdown lambda for the previous step and b) a startup lambda for the next step.

Cons: it's one more thing to do, and to have to maintain.
Pro: we separate the logic and the workflow from the actual execution of the thing. That presumably makes it easier to adjust the logic if/when we need to, it auto-graphs the logic for us, etc.

Keep or drop FIFO queue checking?

There's a component of the lambda function that checks the FIFO queue to prevent the lambda from launching duplicate infrastructure. Currently, I need to bypass this part of the code to run a lambda function manually. Should this be incorporated into the version of the lambda functions we have, or should we just have instructions/code to create a FIFO queue so it can be used with manual triggering.

Map handoffs and checkpoints for genome scale runs of image analysis workflow

(Moved here from https://github.com/broadinstitute/pooled-cell-painting-analysis/issues/83)

Right now, the workflow has several stages and/or proposed stages

  1. CellPainting illumination correction calculation
  2. CellPainting illumination correction
  3. CellPainting segmentation pipeline (see #81)
  4. CellPainting stitching and splitting into tiles
  5. Barcode calling illumination correction calculation
  6. Barcode illumination correction application (see #82) and alignment.
  7. Barcode color compensation and barcode calling "sanity check".
  8. Barcode stitching and rescaling and splitting into tiles
  9. Final profiling + barcoding pipeline

This creates 2 pre-setup steps, and at least 6 handoffs. Right now, each one is manual, with manual quality checks at each. For each one, we need to a) decide how we're going to do file handling and b) decide if and how we will determine success (quantitative cutoff? How/where do we check it? Manual visual inspection of something? Same thing) or if we think it can just proceed with something like an Amazon Lambda trigger.

  • Pre-1-2

  • Pre-5-6

  • 1 to 2

  • 2 to 3

  • 2 to 4

  • 3 to 9 - MANUAL

  • 4 to 9 - MANUAL

  • 5 to 6

  • 6 to 7

  • 7 to 8

  • 8 to 9 - MANUAL

OPTIONAL BUT REALLY NICE

  • Post 7 (maybe to feed 8)
  • 4-to-8 alignment check
  • 9 - setup and run
  • 9- cleanup

Misc notes

  • For pre 1-2 and pre 5-6, trigger by upload of pipeline
  • No check needed between 1 and 2 or between 5 and 6, ok to set auto-triggers
  • 2 to 3 - in the illum correction application pipeline, which will be grouped by Plate EDITED 2020-05-18, Plate and Well, have it also calculate thresholds of cell segmentation stain and export that as a CSV. Then write a lambda function that (best case scenario) edits pipeline file with (50th? 75th? percentile) new threshold max to run a segmentation check pipeline, which runs on every 10th image and checks segmentation with and without a max (worst case scenario) emails you to run that pipeline. No auto-feed 3 to 9, should be manually checked. Ideally, triggered by either creation of the post-2 summary CSV OR a new pipeline being uploaded - CONFIRMED 2020-05-18
  • 2 to 4 - automatic, triggered by lambda (specifically post-2 summary CSV).
  • 4 to 9- should save 10x downsampled thing. No auto-trigger of 9, needs to be manually checked.
  • 5 to 6 because of all the alignments is SLOW; it need to be batched as small as is practical, but while we are still using one file per well, batching to single files is less-than-ideal because loading of all files for a well takes 15-25 minutes for the first site from a well, and mere seconds for the one after that.
  • 6 to 7 should be triggered either by 6 completing OR upload of a new pipeline, ideally - DONE June 11th
  • 7 to 8- in an ideal world, auto trigger if % perfect greater than threshold value (70%?); if not doable, auto-trigger stitching then human does manual check for any plates that may need to be adjusted.
  • 8 to 9- should save 10x downsampled thing. No auto-trigger of 9, needs to be manually checked.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.