Giter Site home page Giter Site logo

Comments (4)

theroggy avatar theroggy commented on June 15, 2024

You can pass the -explodecollections parameter to ogr2ogr to convert multi-part geometries to single part in the output. However, this won't update the area obviously.

I wonder though, could you explain why you want to explode the geometries?

from open-buildings.

cholmes avatar cholmes commented on June 15, 2024

You can pass the -explodecollections parameter to ogr2ogr to convert multi-part geometries to single part in the output. However, this won't update the area obviously.

Ah, good to know. But yeah, this feels like it needs a bit more customization than what you can do with GDAL out of the box.

I wonder though, could you explain why you want to explode the geometries?

It's really just for this particular google buildings dataset. It's distributed in CSV with WKT, and some small percent of the geometries are multipolygons (certainly less than 1%, perhaps even less than 0.1%?). The data set was clearly made by computer vision people who don't understand geospatial, and in my experience a number of tools work better if you have all of one geometry type. Yes, shapefile munges them together, so most 'can deal', but it feels far cleaner to have exactly one geometry type - especially with these buildings, it makes sense to me that each building would be one row.

But as I mentioned in #53 (comment) it'd be much nicer to just have a clean library that compares read and write performance from any major format to another. I'd not even include 'csv' in that, and it wouldn't need to do any exploding of geometries.

from open-buildings.

theroggy avatar theroggy commented on June 15, 2024

You can pass the -explodecollections parameter to ogr2ogr to convert multi-part geometries to single part in the output. However, this won't update the area obviously.

Ah, good to know. But yeah, this feels like it needs a bit more customization than what you can do with GDAL out of the box.

It depends... you don't need to do customizations but without them the area will have to be recalculated for all rows... which is a bit less efficient with that low a percentage of exploded rows.

I wonder though, could you explain why you want to explode the geometries?

It's really just for this particular google buildings dataset. It's distributed in CSV with WKT, and some small percent of the geometries are multipolygons (certainly less than 1%, perhaps even less than 0.1%?). The data set was clearly made by computer vision people who don't understand geospatial, and in my experience a number of tools work better if you have all of one geometry type. Yes, shapefile munges them together, so most 'can deal', but it feels far cleaner to have exactly one geometry type - especially with these buildings, it makes sense to me that each building would be one row.

OK. I always do the other way around: if there is a mixture, I convert everything to multipolygon so it can be stored in one table/file.
FYI: pyogrio automatically converts all geometries to MultiPolygons if you save a GeoDataFrame with both Polygons and MultiPolygons.

But as I mentioned in #53 (comment) it'd be much nicer to just have a clean library that compares read and write performance from any major format to another. I'd not even include 'csv' in that, and it wouldn't need to do any exploding of geometries.

I'm not sure I'll get to it, at least not on short term, but if you would be interested, you can find some other benchmarks involving geo operations I did in the past here: https://github.com/geofileops/geobenchmark

from open-buildings.

cholmes avatar cholmes commented on June 15, 2024

It depends... you don't need to do customizations but without them the area will have to be recalculated for all rows... which is a bit less efficient with that low a percentage of exploded rows.

Yeah, I just meant you can't do an easy one-liner from ogr2ogr that does it all in one. And agreed, a second run just to recalculate area won't make the comparison great. I think it's fine for it to not explode rows, the other two options just enabled this all in one pass.

OK. I always do the other way around: if there is a mixture, I convert everything to multipolygon so it can be stored in one table/file.

Yeah, that's the practical way to do things, given the state of geospatial data formats (shapefile still being widely used) and the state of the tools. With this I was working towards distributing data in a 'better' way, and it just strikes me it's better to be able to differentiate between multi-polygons and polygons. If this was 'facilities' that could include multiple buildings in each then a multipolygon makes sense. If it's supposed to be every building, but some are squeezed into a single geometry then that makes less sense.

FYI: pyogrio automatically converts all geometries to MultiPolygons if you save a GeoDataFrame with both Polygons and MultiPolygons.

Cool - good to know. I forget what tool I was working with but there was one that was barfing if you just threw this dataset at it.

I'm not sure I'll get to it, at least not on short term, but if you would be interested, you can find some other benchmarks involving geo operations I did in the past here: https://github.com/geofileops/geobenchmark

Oh nice! I'll check it out.

from open-buildings.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.