broadinstitute / cytominer_scripts Goto Github PK

View Code? Open in Web Editor NEW

1.0 9.0 1.0 128 KB

Scripts for processing morphological profiling data using cytominer

License: MIT License

Shell 3.51% R 96.49%

carpenter-lab morphology image-profiling

cytominer_scripts's People

Contributors

Stargazers

Watchers

Forkers

gwaybio

cytominer_scripts's Issues

Change default aggregation to be mean instead of median

This will make it consistent with cytotools::aggregate
https://github.com/cytomining/cytotools/blob/master/R/aggregate.R#L17

Specify compartments to aggregate via commandline parameter

`sample.R` fails with integer metadata

The error I receive:

Error in bind_rows_(x, .id) :
  Column `Metadata_diff_day` can't be converted from integer to character
 Calls: %>% ... withVisible -> <Anonymous> -> bind_rows -> bind_rows_ -> .Call
 Execution halted

The code causing the problems:

https://github.com/broadinstitute/cytominer_scripts/blob/master/sample.R#L80-L82

How to fix

I think all metadata columns should be coerced into characters. I will try and file a PR

Should backend creation use first image set as a "template"?

Trying to make a backend with process.sh, I had issues because my first image had failed QC and my pipeline was set to skip the rest if it did so, so there was only a small subset of the usual amount of data present in that folder- only the Image.csv (with a smaller number of columns), no object CSVs. It therefore for all images only added the common columns, which led to reasonable errors when it went to aggregate the object tables at the end and could not find any.

I hacked around it by deleting all the folders before the first well that actually had been plated with cells, but it seems to me that

in creation of the backend we want to pull in all the data that's there, we can deal with NaNs from missing data later
there's a reasonable chance of this happening again, as well A01 (and other edge/corner wells) are often not plated into in smaller experiments but in a large fraction of those cases the whole plate is just imaged anyway.

Feel free to disagree though, that's why I phrased it as a question.

End of the error string below, I doubt it's helpful but just in case

(builtins.OSError) /home/ubuntu/efs/{redacted}/workspace/software/cytominer_scripts/.4b21aa7e-6e45-11e7-8ea1-0e60212e428a:1: expected 218 columns but found 568 - extras ignored
 [SQL: 'sqlite3 -nullvalue \'\' -separator , -cmd .import "/home/ubuntu/efs/{redacted}/workspace/software/cytominer_scripts/.4b21aa7e-6e45-11e7-8ea1-0e60212e428a" "Image" /home/ubuntu/ebs_tmp/2017_07_12_Batch1/AU00027623//AU00027623.sqlite']
[Fri Jul 21 16:41:41 UTC 2017] Looking up AU00027623.sqlite on permanent store
[Fri Jul 21 16:41:41 UTC 2017] /home/ubuntu/bucket/projects/{redacted}/workspace/backend/2017_07_12_Batch1/AU00027623/AU00027623.sqlite not found
[Fri Jul 21 16:41:41 UTC 2017] Creating /home/ubuntu/ebs_tmp/2017_07_12_Batch1/AU00027623//AU00027623.sqlite

real    127m40.736s
user    62m50.242s
sys     4m23.562s
[Fri Jul 21 18:49:21 UTC 2017] Indexing /home/ubuntu/ebs_tmp/2017_07_12_Batch1/AU00027623//AU00027623.sqlite
Error: near line 3: no such table: main.Cells
Error: near line 4: no such table: main.Cytoplasm
Error: near line 5: no such table: main.Nuclei

real    0m0.054s
user    0m0.009s
sys     0m0.023s
[Fri Jul 21 18:49:22 UTC 2017] Aggregating /home/ubuntu/ebs_tmp/2017_07_12_Batch1/AU00027623//AU00027623.sqlite
Error in rsqlite_send_query(conn@ptr, statement) : no such table: cells
Calls: %>% ... initialize -> initialize -> rsqlite_send_query -> .Call
Execution halted

real    0m0.993s
user    0m0.603s
sys     0m0.056s
[Fri Jul 21 18:49:23 UTC 2017] /home/ubuntu/ebs_tmp/2017_07_12_Batch1/AU00027623//AU00027623.csv not created / does not exist. Exiting.

Change cytominer::select to cytominer::variable_select

Store profiles in a directory different from backend

backend should store only sqlite (or whatever format that is being used for storing single cell data)

Create option to explicitly specify all paths

Currently, most scripts assume a specific folder structure, which is great for keeping the options compact (only need to specify batchname and plate_id for most cases). But this makes it inflexible. Keep the current options, but also have the option to explicitly specify paths.

See the http://docopt.org docs to make sure we do it the right way.

These are the scripts that need to be updated:

select.R
sample.R
preselect.R
normalize.R
compare_plates.R
collapse.R
audit.R
annotate.R

Implement normalize.quantiles as preprocessing

`preselect.R` Don't assume multiple identical plates

preselect.R assumes that replicates can be found by looking for 2 plates that have an identical "Metadata_Plate_Map_Name" and then saying the replicates are just a matter of matching wells across these identical plates.

In some experiments, however, each plate may be unique, and replicates may be found in either a different location on the same (or even another) plate. Allowing the user an optional flag to pass something else would be helpful.

`aggregate` should handle QC columns

Specify option to filter out images that fail QC

Specify isolated vs. colony definition

in #30 we introduce an additional aggregate option: --sc_type.

Currently, the definitions are based on a specific cell painting variable (Cells_Neighbors_NumberOfNeighbors_Adjacent). More specifically defined in broadinstitute/cmQTL#9

It would be great if these were not hardcoded! (the change is probably beyond the scope of #30 as no other projects (to my knowledge) require this flag)

Implement QC functions for profiling

The purpose of this issue is to create a list. Once we settle on a list, we will close the issue and create an issue per QC item. We also need to decide where to implement this – here or in http://github.com/CellProfiler/cytominer

Plot illumination corrections functions
Plot salient features on a plate map to see if there are any trends
- Cell count
- IntegratedIntensity
- PercentMaximal
Show excluded wells on a plate map
Check for rotation of the plate layout
- Plate map with cell counts
- Cluster all the wells across plates

Resolve `evaluation nested too deeply` issues

@bethac07 reported this

ubuntu@ip-10-0-3-243:~/efs/2019_06_04_Cardiomyocytes_AnantChopra_Bayer/workspace/software/cytominer_scripts$ ./preselect.R \
>   --batch_id ${BATCH_ID} \
>   --input ../../parameters/${BATCH_ID}/sample/${BATCH_ID}_normalized_sample.feather \
>   --operations correlation_threshold
Error: evaluation nested too deeply: infinite recursion / options(expressions=)?
Execution halted

This was reported for preselect but it can occur in other functions that use feather

The issue appears to be related to this issue wesm/feather#372

Update the packages mentioned here and the problem should go away: wesm/feather#372 (comment)

`preselect.R` for replicate_correlation does not use subset rows

Specifically in line 103.

operation <- replicate_correlation ignores subset != NULL.

Rather than pulling from sample, replicate_correlation pulls from df. (Lines of interest)

Replace `row_number` with `dense_rank`

tidyverse/dplyr#2988
Replace row_number

`annotate.R` fails when plate name has underscores in it

It seems to ignore everything after the first underscore, so if my backend is in ../../backend/batch/Experiment1_Day1_1/plate, annotate fails because it's looking in ../../backend/batch/Experiment1/plate. It seems to write out to the correct place, and the steps after that seem to work ok IIRC.

Confusing Error in `preselect.R`

I am performing a replicate_correlation variable selection with preselect.R.

The error I receive is:

INFO [2019-05-17 15:44:25] Subsetting using Metadata_Well != 'dummy'
INFO [2019-05-17 15:44:25] Performing replicate_correlation...
Joining, by = c("Metadata_Plate_Map_Name", "Metadata_Well")
Error in grouped_df_impl(data, unname(vars), drop) :
  Column `variable` is unknown
Calls: %>% ... group_by.data.frame -> grouped_df -> grouped_df_impl -> .Call
Execution halted

I believe the error is generated in this call to cytominer::replicate_correlation

In cytominer::replicate_correlation, perhaps the error is happening here. Either way, this is something that I need to look into and fix.