I'm excited to use this tool, but it's been a struggle to get it to work for me. I think the issue is my input files because I can run the example dataset.
I am working with a Seurat object. I've exported the markers, umap dimensions and cluster calls as tab separated text. I make a call to Comet
from the command line it and looks like it's running. It certainly is from top
. It goes for about 3 hours. An output folder is created. But the folder is empty. It doesn't throw any specific errors but it does show the following during run-time:
Creating discrete expression matrix...
Insufficient floating point precision for calculating or reporting the exact XL-mHG test statistic; the true value is too small. Using "0" instead.(The XL-mHG p-value will also be reported as "0".)
Insufficient floating point precision for calculating or reporting the exact XL-mHG test statistic; the true value is too small. Using "0" instead.(The XL-mHG p-value will also be reported as "0".)
I am not sure what the problem is. Is there a stderr
or log file to see what's going on?
Also, relatedly, the docs would benefit greatly from a tutorial showing how to get the input files out from a Seurat object, since that is such a common procedure.
Here is my code to get the input files out from Seurat and to the command line.
# matrix
matrix_cometsc <- GetAssayData(so) # so is Seurat Object
write.table(as.matrix(matrix_cometsc), file=here("data", "COMETSC", "markers.txt"), row.names=TRUE, col.names=TRUE, sep = "\t", quote = FALSE)
#UMAP embeddings
umap_cometsc <- Embeddings(so, reduction = "umap")
write.table(umap_cometsc, file=here("data", "COMETSC", "vis.txt"), row.names=TRUE, col.names=FALSE, sep = "\t", quote = FALSE)
#cluster IDs
cluster_cometsc <- noquote(as.matrix(Idents(so)))
write.table(cluster_cometsc, file=here("data", "COMETSC", "cluster.txt"), row.names=TRUE, col.names=FALSE, sep = "\t", quote = FALSE)
Part of the issue is with the marker (matrix) because of that first tab above the row names. I had to manually add it like this:
sed '1s/.*/\t&/' markers.txt > markers2.txt
Also, my command to Comet
is the following:
#! /bin/bash
source ~/comet/bin/activate
Comet markers2.txt vis.txt cluster.txt -C 16 -K 4 -Count true output/
And for some reference, here is a sample of markers2.txt
with the tabs indicated by ^I
^ID1_TTCAGGATCAAGCCAT^ID1_GTGGAGATCTGCTTAT^ID1_GCACGGTCACTCAGAT^ID1_TATACCTGTCTTACTT
MIR1302-2HG^I0^I0.0766241526725224^I0^I0
FAM138A^I0^I0^I0^I0
OR4F5^I0^I0^I0^I0
AL627309.1^I0.103146952196364^I0.0766241526725224^I0.0823802232731239^I0.0918193591402592
AL627309.3^I0^I0^I0^I0
AL627309.2^I0^I0^I0^I0
AL627309.4^I0^I0^I0^I0
AL732372.1^I0^I0^I0^I0