Giter Site home page Giter Site logo

proseg output support about voyager HOT 10 OPEN

pachterlab avatar pachterlab commented on September 22, 2024
proseg output support

from voyager.

Comments (10)

lambdamoses avatar lambdamoses commented on September 22, 2024

The output does not match the standard output format of any commercial technology, unless you use Xenium Ranger to convert it and then use readXenium. The output includes the gene count matrix and cell metadata as csv files when the options are enabled. Use data.table to read it if it's large. Then reformat the gene count matrix data frame into a dgCMatrix with cell IDs as column names and gene IDs or symbols as row names. The cell segmentation output is GeoJSON, which can be read with sf::st_read(), which will give you an sf data frame. Then call the SpatialFeatureExperiment() constructor. See the SFE vignette for an example of calling the constructor: https://pachterlab.github.io/SpatialFeatureExperiment/articles/SFE.html#object-construction

from voyager.

cstrlln avatar cstrlln commented on September 22, 2024

Thanks I gave it a try and haven't been able to create the SFE object.

Here is what I did:

library(SpatialFeatureExperiment)


cellmeta <- data.table::fread('data/cell-metadata.csv.gz')

expcounts <- data.table::fread('data/expected-counts.csv.gz')

segmentation <- geojsonsf::geojson_sf('data/cell-polygons.geojson')


###

mat <- as.matrix(expcounts)
mat <- as(t(mat), "dgCMatrix")

sfe <- SpatialFeatureExperiment(list(counts = mat), colData = cellmeta,
                                colGeometries = list(foo = segmentation))

This is the error I get:

> sfe <- SpatialFeatureExperiment(list(counts = mat), colData = cellmeta,               
+                                 colGeometries = list(foo = segmentation))
Error in wk_handle.wk_wkb(wkb, s2_geography_writer(oriented = oriented,  : 
  Loop 0 is not valid: Edge 34 crosses edge 36                          
In addition: Warning messages:                                          
1: In st_is_longlat(x) :                                                
  bounding box has potentially an invalid value range for longlat data  
2: In st_is_longlat(x) :                                                
  bounding box has potentially an invalid value range for longlat data 

This is the error I get if I try without the geojson file:

> sfe <- SpatialFeatureExperiment(list(counts = mat), colData = cellmeta)
Error in dimnames(x) <- dn : 
  length of 'dimnames' [2] not equal to array extent 

I checked for dimnames and dim and it looks fine:

> str(dimnames(mat))                                                      
List of 2
 $ : chr [1:1193] "Uba52" "Carmn" "Eif5a" "Ccnd1" ...
 $ : chr [1:2061] "1" "2" "3" "4" ...
> dim(mat)
[1] 1193 2061

I have tried a couple of things but can't get it to work, would it be possible to send you the files so that you can take a look?

from voyager.

lambdamoses avatar lambdamoses commented on September 22, 2024

Regarding this error:

Error in wk_handle.wk_wkb(wkb, s2_geography_writer(oriented = oriented,  : 
  Loop 0 is not valid: Edge 34 crosses edge 36                          
In addition: Warning messages:                                          
1: In st_is_longlat(x) :                                                
  bounding box has potentially an invalid value range for longlat data  
2: In st_is_longlat(x) :                                                
  bounding box has potentially an invalid value range for longlat data 

Did sf::st_read("data/cell-polygons.geojson") not work? An annoying thing about using geospatial packages is that they often assume a coordinate reference system (CRS), which is irrelevant to the histological space. Basically CRS's are different ways the 3D globe is projected to 2D maps. It seems that in your case, geojsonsf gave your data a CRS (must be 4326) which makes sf think that your values are longitudes and latitudes. You can do st_crs(segmentation) <- NA to remove the CRS. st_read doesn't give you a CRS by default.

Regarding the error:

Error in dimnames(x) <- dn : 
  length of 'dimnames' [2] not equal to array extent 

You haven't supplied the spatialCoords or spatialCoordsNames argument. The default for spatialCoordsNames is c("x", "y"). If spatialCoords is not specified, then spatialCoordsNames should be found among the columns of colData. I suppose I should update that example in the vignette for a more general case.

What's the output of traceback() after you get this error? I don't know where it comes from. It might be caused by say the length of the column names or row names in dn doesn't match the number of columns or rows in the array. It can be when the number of rows in segmentation or number of rows in cellmeta doesn't match the number of columns in mat. From your output, it seems that you didn't assign the cell IDs to the column names of mat. Assuming that columns in mat already match the rows in cellmeta and both cellmeta and segmentation have a column cell_id for the cell ID. If they don't match, you can match them by segmentation <- segmentation[match(segmentation$cell_id, cellmeta$cell_id),].

Another possible cause is that segmentation has geometry type POLYGON but some cells have multiple polygons, causing segmentation to have more rows than cellmeta. If that's the case, then use segmentation2 <- sf::aggregate(segmentation, list(cell_id = segmentation$cell_id), unique) to convert to MULTIPOLYGON where each cell has one geometry. I think I'll add a function to the SFE package to deal with scenarios like this.

See the Geocomputation with R book to learn more about operations on sf data frames and geometries.

from voyager.

cstrlln avatar cstrlln commented on September 22, 2024

Thanks I'll try this, I guess I just need to understand the object construction better.

Are there any columns that are looked for that have to have harcoded values, like cell_id?

from voyager.

lambdamoses avatar lambdamoses commented on September 22, 2024

The column name "cell_id" is just an example. Any name can be used, but there should be a column for cell IDs to match the cell metadata to the gene count matrix and to the cell segmentation polygons, unless you are already very sure that they are all in the same order in the cell metadata, gene count matrix, and polygons.

from voyager.

cstrlln avatar cstrlln commented on September 22, 2024

It does seem that there is a slight missmatch between the cell id in the raw cellmeta and the names in the count matrix and segmentation.

The rownames of expcounts (genes in columns and cells in rows in raw file) are a cell name starting in "1":

> head(rownames(expcounts))
[1] "1" "2" "3" "4" "5" "6"

Whereas in cellmeta the 'cell' value starts at '0'

> head(cellmeta)                                                                  
    cell centroid_x centroid_y centroid_z   fov cluster volume population
   <int>      <num>      <num>      <num> <int>   <int>  <num>      <int>
1:     0  353.22656   293.0781   4.765625    99       1  320.0         27
2:     1  213.03372   403.7463   4.519795    99       1  852.5        186
3:     2   77.85435   142.6783   6.038043    99       0 1150.0        508
4:     3  319.43000   233.6000   4.500000    99       1  500.0         80
5:     4  114.46059   110.4803   6.422414    99       0  507.5        197
6:     5  165.97500   464.0200   5.937500    99       1  500.0        122

And the values in segmentation cell column are also starting at '0', so I'll try to match them all.
Also, sf::st_read didnt work I think because I was trying to read the .gz file directly(?), thats why I moved to geojson_sf and you are correct that it added a CRS. Also the geojson file I used is made of multipolygons:

> head(segmentation)                                                              
Simple feature collection with 6 features and 1 field
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 68 ymin: 104 xmax: 359 ymax: 470
Geodetic CRS:  WGS 84
  cell                       geometry
1    0 MULTIPOLYGON (((349 287, 34...
2    1 MULTIPOLYGON (((204 402, 20...
3    2 MULTIPOLYGON (((70 138, 70 ...
4    3 MULTIPOLYGON (((314 233, 31...
5    4 MULTIPOLYGON (((110 111, 11...
6    5 MULTIPOLYGON (((161 463, 16...

Thanks for the feedback I'll try again tomorrow with your suggestions.

from voyager.

lambdamoses avatar lambdamoses commented on September 22, 2024

Also, sf::st_read didnt work I think because I was trying to read the .gz file directly(?)

I think it's probably because of the gz. Do cellmeta and segmentation have the same number of rows, which should be the same as the number of columns in the gene count matrix? Also, when the rows are cells, the gene count matrix csv file might have an unnamed column for cell ID which are row names of the matrix. When it's unnamed, data.table reads it into R as the first column with name V1 and whatever row names you get are just the default which is 1:nrow(df).

from voyager.

alikhuseynov avatar alikhuseynov commented on September 22, 2024

@cstrlln did it work for you to make the SFE object?

from voyager.

cstrlln avatar cstrlln commented on September 22, 2024

@alikhuseynov I got side tracked trying another tool, so didn't test the suggestions yet but plan to try in the next couple weeks. I really think it might be the cell names though.

from voyager.

alikhuseynov avatar alikhuseynov commented on September 22, 2024

@alikhuseynov I got side tracked trying another tool, so didn't test the suggestions yet but plan to try in the next couple weeks. I really think it might be the cell names though.

ok, if you send us the output files or deposit them somewhere where we can download them, I will try to help making that SFE object.

from voyager.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.