broadinstitute / cytominer_scripts Goto Github PK
View Code? Open in Web Editor NEWScripts for processing morphological profiling data using cytominer
License: MIT License
Scripts for processing morphological profiling data using cytominer
License: MIT License
This will make it consistent with cytotools::aggregate
https://github.com/cytomining/cytotools/blob/master/R/aggregate.R#L17
Error in bind_rows_(x, .id) :
Column `Metadata_diff_day` can't be converted from integer to character
Calls: %>% ... withVisible -> <Anonymous> -> bind_rows -> bind_rows_ -> .Call
Execution halted
https://github.com/broadinstitute/cytominer_scripts/blob/master/sample.R#L80-L82
I think all metadata columns should be coerced into characters. I will try and file a PR
Trying to make a backend with process.sh
, I had issues because my first image had failed QC and my pipeline was set to skip the rest if it did so, so there was only a small subset of the usual amount of data present in that folder- only the Image.csv (with a smaller number of columns), no object CSVs. It therefore for all images only added the common columns, which led to reasonable errors when it went to aggregate the object tables at the end and could not find any.
I hacked around it by deleting all the folders before the first well that actually had been plated with cells, but it seems to me that
Feel free to disagree though, that's why I phrased it as a question.
End of the error string below, I doubt it's helpful but just in case
(builtins.OSError) /home/ubuntu/efs/{redacted}/workspace/software/cytominer_scripts/.4b21aa7e-6e45-11e7-8ea1-0e60212e428a:1: expected 218 columns but found 568 - extras ignored
[SQL: 'sqlite3 -nullvalue \'\' -separator , -cmd .import "/home/ubuntu/efs/{redacted}/workspace/software/cytominer_scripts/.4b21aa7e-6e45-11e7-8ea1-0e60212e428a" "Image" /home/ubuntu/ebs_tmp/2017_07_12_Batch1/AU00027623//AU00027623.sqlite']
[Fri Jul 21 16:41:41 UTC 2017] Looking up AU00027623.sqlite on permanent store
[Fri Jul 21 16:41:41 UTC 2017] /home/ubuntu/bucket/projects/{redacted}/workspace/backend/2017_07_12_Batch1/AU00027623/AU00027623.sqlite not found
[Fri Jul 21 16:41:41 UTC 2017] Creating /home/ubuntu/ebs_tmp/2017_07_12_Batch1/AU00027623//AU00027623.sqlite
real 127m40.736s
user 62m50.242s
sys 4m23.562s
[Fri Jul 21 18:49:21 UTC 2017] Indexing /home/ubuntu/ebs_tmp/2017_07_12_Batch1/AU00027623//AU00027623.sqlite
Error: near line 3: no such table: main.Cells
Error: near line 4: no such table: main.Cytoplasm
Error: near line 5: no such table: main.Nuclei
real 0m0.054s
user 0m0.009s
sys 0m0.023s
[Fri Jul 21 18:49:22 UTC 2017] Aggregating /home/ubuntu/ebs_tmp/2017_07_12_Batch1/AU00027623//AU00027623.sqlite
Error in rsqlite_send_query(conn@ptr, statement) : no such table: cells
Calls: %>% ... initialize -> initialize -> rsqlite_send_query -> .Call
Execution halted
real 0m0.993s
user 0m0.603s
sys 0m0.056s
[Fri Jul 21 18:49:23 UTC 2017] /home/ubuntu/ebs_tmp/2017_07_12_Batch1/AU00027623//AU00027623.csv not created / does not exist. Exiting.
backend
should store only sqlite
(or whatever format that is being used for storing single cell data)
Currently, most scripts assume a specific folder structure, which is great for keeping the options compact (only need to specify batchname and plate_id for most cases). But this makes it inflexible. Keep the current options, but also have the option to explicitly specify paths.
See the http://docopt.org docs to make sure we do it the right way.
These are the scripts that need to be updated:
select.R
sample.R
preselect.R
normalize.R
compare_plates.R
collapse.R
audit.R
annotate.R
preselect.R
assumes that replicates can be found by looking for 2 plates that have an identical "Metadata_Plate_Map_Name"
and then saying the replicates are just a matter of matching wells across these identical plates.
In some experiments, however, each plate may be unique, and replicates may be found in either a different location on the same (or even another) plate. Allowing the user an optional flag to pass something else would be helpful.
Specify option to filter out images that fail QC
in #30 we introduce an additional aggregate
option: --sc_type
.
Currently, the definitions are based on a specific cell painting variable (Cells_Neighbors_NumberOfNeighbors_Adjacent
). More specifically defined in broadinstitute/cmQTL#9
It would be great if these were not hardcoded! (the change is probably beyond the scope of #30 as no other projects (to my knowledge) require this flag)
The purpose of this issue is to create a list. Once we settle on a list, we will close the issue and create an issue per QC item. We also need to decide where to implement this – here or in http://github.com/CellProfiler/cytominer
@bethac07 reported this
ubuntu@ip-10-0-3-243:~/efs/2019_06_04_Cardiomyocytes_AnantChopra_Bayer/workspace/software/cytominer_scripts$ ./preselect.R \
> --batch_id ${BATCH_ID} \
> --input ../../parameters/${BATCH_ID}/sample/${BATCH_ID}_normalized_sample.feather \
> --operations correlation_threshold
Error: evaluation nested too deeply: infinite recursion / options(expressions=)?
Execution halted
This was reported for preselect
but it can occur in other functions that use feather
The issue appears to be related to this issue wesm/feather#372
Update the packages mentioned here and the problem should go away: wesm/feather#372 (comment)
Specifically in line 103.
operation <- replicate_correlation
ignores subset != NULL
.
Rather than pulling from sample
, replicate_correlation
pulls from df
. (Lines of interest)
tidyverse/dplyr#2988
Replace row_number
It seems to ignore everything after the first underscore, so if my backend is in ../../backend/batch/Experiment1_Day1_1/plate
, annotate fails because it's looking in ../../backend/batch/Experiment1/plate
. It seems to write out to the correct place, and the steps after that seem to work ok IIRC.
I am performing a replicate_correlation
variable selection with preselect.R
.
The error I receive is:
INFO [2019-05-17 15:44:25] Subsetting using Metadata_Well != 'dummy'
INFO [2019-05-17 15:44:25] Performing replicate_correlation...
Joining, by = c("Metadata_Plate_Map_Name", "Metadata_Well")
Error in grouped_df_impl(data, unname(vars), drop) :
Column `variable` is unknown
Calls: %>% ... group_by.data.frame -> grouped_df -> grouped_df_impl -> .Call
Execution halted
I believe the error is generated in this call to cytominer::replicate_correlation
In cytominer::replicate_correlation
, perhaps the error is happening here. Either way, this is something that I need to look into and fix.
Remove Image_
prefix to be compatible with cytomining/cytominer-database#86
The null threshold quantile should be considered a hyperparameter.
It should be a quick fix to add a command line argument with 0.95
as default.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.