Comments (2)
Some advantages/disadvantages I can think of for the different options how to specify this:
{"encoding": "geoarrow.point", ...}
Pro is that this the encoding value fully describes the geoarrow type. But a disadvantage is that this adds a whole series of possible values for the "encoding" key. This makes handling of this key a bit more complex (although in Python terms it would be col["encoding"].startswith("geoarrow")
instead of col["encoding"] == "geoarrow"
)
{"encoding": "geoarrow", "extension_name": "geoarrow.point"}
Pro is that this adds only a single new "encoding" value. But then you also still need to check the value of the other key to get the actual type.
If we go with this, I would rather use a different key than "extension_name"
. The "extension" in this is a rather Arrow-specific term, and while the encoding itself is also called "geoarrow", this can still be implemented by Parquet implementations or systems that don't have anything to do with Arrow. We could also use a more generic "geoarrow_type"?
{"encoding": "geoarrow", ..., "geometry_types": ["Point"]}
Similar advantage of only adding a single "encoding" value, and additional advantage of not having to add a custom key that is only needed for geoarrow encoded data like above. But clear disadvantage is that you need to transform and combine the two keys manually to get the actual geoarrow type name.
from geoparquet.
I don't like that last option because there are GeoArrow extension types for WKT and WKB. Even if they aren't necessarily allowed/encouraged for use in this spec, I don't think we can guarantee that there is one canonical extension name per combination of geometry types and functionally the extension name is what is required for a reader implementation
I do think that we should probably require to use "encoding": "WKB"
for those cases, and disallow "encoding": "geoarrow.wkb"
, because otherwise that gives two ways to specify the same? And while this requires some name mapping from geoarrow-aware writers, it ensures that all existing readers will still work fine for files using WKB.
(which I think also makes this option of using "encoding": "geoarrow"
combined with geometry_types
a possibility, although still not necessarily a preferred option)
The second consideration is which GeoArrow memory layouts to allow.
I think we should best list the options that are allowed. We can always expand that later if geoarrow grows more options.
(for the example you gave, is there a reason you only listed "geoarrow.multipolygon" and not "geoarrow.polygon"?)
For the interleaved vs separated layout: I think it is clear that the separated layout has the most benefit in combination with Parquet, because of the statistics you get for free (and maybe better compression / faster reading). But I am not fully sure we should only allow that layout. It's certainly possible to have a case where you don't care about this, and you just need the fastest possible option to store and re-read a bunch of data. And if your target system needs interleaved data (like shapely/geopandas), storing as interleaved might be the fastest option (although I should verify this in practice!)
For the actual specification update, we should probably detail for the different geoarrow types to which Parquet type it maps.
from geoparquet.
Related Issues (20)
- Declare `Z` in the geometry type? HOT 5
- Geometry types HOT 1
- Add tests on JSON schema, to ensure that updates to the schema are valid / good
- Publish schemas on geoparquet.org HOT 1
- Consistent case HOT 1
- Propose and document release process HOT 2
- Cut new release HOT 2
- make website to share info about geoparquet HOT 2
- Default value of the crs field is too vague
- Restrictions on column names? HOT 1
- value not present vs null HOT 2
- Clarify support for zero geometry columns HOT 2
- PROJJSON for CRS, WKT for CRS and ISO19111 HOT 6
- WKT support for 3/4D using Z and/or M HOT 10
- Schema version invalid HOT 11
- Simplify or remove script dependencies HOT 3
- PROJJSON schema version HOT 4
- Is it possible to define a transform alongside a CRS, similar to geotiff? HOT 3
- Recommendation on the Arrow specific type for the WKB geometry column ? HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from geoparquet.