Giter Site home page Giter Site logo

Comments (6)

felixpalmer avatar felixpalmer commented on May 20, 2024 2

I agree on point 1.

My feeling on 2 is that the primary_column should be restricted to be a top-level column, for a couple of reasons:

  • As we are designing a geo-format, it feels natural that the geographic information is available at the top level.
  • We are at the 0.1 stage of the spec I think it is best to have this restriction and review it later. It will make it easier to make headway with implementations
  • How does this interact with the https://github.com/geopandas/geo-arrow-spec/? We don't want to add flexibility to the GeoParquet spec which makes it hard to implement in the linked GeoArrow spec

from geoparquet.

cholmes avatar cholmes commented on May 20, 2024

Thanks for the great feedback!

For 1. I think the column path makes good sense.

For 2. I lean towards restricting primary geometry column to be top-level, so that conversion to geojson / shapefile is clear, and straightforward in implementation. And I suppose making primary_column optional makes sense, but I feel like it'd be good to have something nudging people towards defining it if possible. But I certainly see the usefulness of allowing big parquet datasets that just have a nested geospatial value to be compliant without making them say 'this is a geo file'.

from geoparquet.

cholmes avatar cholmes commented on May 20, 2024

Call 11/7

For first version (1.0.0) we want to limit geometry columns to only being at the top-level. There are very few geospatial packages that would be able to understand it. But if someone has a use case for nested geometry columns we can potentially add it in the future.

And repetition is optional or required (not repeated).

Need to update the spec in describing the geometry columns to be specific that we don't support grouped and repetition level is required or optional.

from geoparquet.

mentin avatar mentin commented on May 20, 2024

I think it is right decision for v1.

But I also wonder if there are many geospatial packages that support multiple geometry columns? I would think most that don't support nesting / repetition would also ignore all the columns besides "primary_column", and then nesting / repetition of additional geometry columns should not matter :).

We do have several customers who use repeated geometry columns. Typically, the primary geometry column is top level required column, and it is broken into parts, which are stored as nested or/and repeated columns. What I remember:

  • a building and individual floors as repeated geometry,
  • a linestring path and repeated struct containing vertices from such linestring with additional data columns (think of M/Z dimension on steroids - where you can have many columns of arbitrary types for each vertex).

In these cases the primary geometry column is non-nested, non-repeated, but there are other columns that are nested inside repeated struct.

from geoparquet.

tschaub avatar tschaub commented on May 20, 2024

Yeah, I can imagine this will be something that is revisited. From a writer's perspective, given that Parquet is capable of representing repeated and group fields, it is somewhat odd that a "geo" extension would restrict that. I guess we are anticipating the needs of readers in adding this restriction - but it may turn out to be unnecessarily restrictive.

from geoparquet.

jorisvandenbossche avatar jorisvandenbossche commented on May 20, 2024

But I also wonder if there are many geospatial packages that support multiple geometry columns?

GeoPandas supports this, and it seems R sf does as well (https://cran.r-project.org/web/packages/sf/vignettes/sf6.html#how-does-sf-deal-with-secondary-geometry-columns). PostGIS supports this as well (https://gis.stackexchange.com/questions/176263/can-a-postgis-table-or-view-have-two-geometry-columns).
I know that GDAL also supports this in their OGR data model and C API, but it depends on the bindings to GDAL whether it's actually supported (I know that the python bindings right now will only return a single (first) geometry column).

I can certainly see the use case of repeated (list/array type) geometry columns. I also assume that databases (like BigQuery) that have both a proper array type and geometry/geography type will typically not limit combining those two in a repeated geometry type?

from geoparquet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.