Comments (19)
Yes, that's true. But on the other hand if it takes too long to wait for it in Apache, the community will just come up with their own definition.
Microsoft Planetary Computer is currently using application/x-parquet
, which I'll also adopt for now.
from geoparquet.
If Apache Parquet used
application/apache.parquet
would it be appropriate to useapplication/apache.parquet; application=geoparquet
, since geoparquet files are valid parquet files? Maybe it'd be weird to haveapplication
in the media type twice.
Not necessarily a helpful answer, but having followed a bit the discussions regarding COG (and not sure we managed to reach an endorsed conclusion), I find that understanding IANA rules for MIME types tends to require a dedicated expertise. For example https://www.rfc-editor.org/rfc/rfc6838.html#page-13 mentions " Media types MAY elect to use one or more media type parameters[...] the names, values, and meanings of any parameters MUST be fully specified when a media type is registered in the standards tree". So I guess that a provision for allowing application=geoparquet should be made at the time application/apache.parquet is registered (unless other rules just ban it or make it already possible...)
from geoparquet.
I agree that having a parameter is better than the geo+
thing. So a parameter would be the best option, no strong preference on which to use though so profile=geo
is fine.
from geoparquet.
I'd hope that in STAC people use the https://github.com/stac-extensions/table extension.
STAC Browser could also try to simply infer the "Show on map" from the table:primary_geometry
.
So I'm not feeling as strongly anymore. So let's start with application/vnd.apache.parquet
, indeed.
from geoparquet.
I suppose having a registered MIME type for Parquet itself (the issue you linked) would be a required first step (since a geoparquet file is technically just a parquet file).
from geoparquet.
It would be good to establish a convention that differentiates between parquet and geoparquet. I'm happy to update the media types in the Planetary Computer as needed.
I'm not familiar with the various components of a media type, but COG uses image/tiff; application=geotiff; profile=cloud-optimized
. If Apache Parquet used application/apache.parquet
would it be appropriate to use application/apache.parquet; application=geoparquet
, since geoparquet files are valid parquet files? Maybe it'd be weird to have application
in the media type twice.
from geoparquet.
OGC contacted IANA to ask about adding parameters such as profile or application (to GeoTiff in this case) and they said that you can't simply "add" them if the original type is already registered. So you'd need to discuss that with Apache upfront or otherwise register your own, e.g. application/geo+apache.parquet
(although I'm not sure they would use a . in the name...)
from geoparquet.
Moving to beta.2. I think we should also try to get in touch with the core Parquet people and see if we can help them register something, even if it's just application/vnd.apache.parquet
.
There was also general consensus on a recent call that we don't want a geo-specific parquet mimetype, we'd just use the general parquet one, and users would rely on the presence of geo metadata in the parquet file (or they could also guess that it'd be likely if there are shapefiles / geopackages of the same data). Happy for arguments on why we should have our own special geo one, and sounds like the time to do so is when apache applies for theirs. But I think we don't really want geospatial systems that distribute a non-geo parquet and a geoparquet. And we also hope that eventually all parquet readers would at least know to identify the standard geo data.
from geoparquet.
The same reasons why we want a COG and GeoTiff specific media type over just using "image/tiff" also applies here. If you need to read the file anyway (partially), then you could also just omit media types completely.
Example: Without specific media types, STAC Browser would try to visualize a 100GB non-GeoTiff TIFF on a map, which is not a nice user experience and likely crash your browser. But by knowing it's a COG I can visualize just those and reject all the others.
So +1 on having at least geo-specific parameter...
from geoparquet.
I don't feel that strongly about this on either way, except that an optional parameter (like application/parquet; profile=geo
or whatever) seems better to me than the geo+parquet
, since then a generic parquet reader would have a better chance of actually opening it.
But I do think the time is 'now' to decide what we want, since as pointed out above the only real chance IANA seems to give for optional parameters is on registration. We can likely help the Apache people with the process, since OGC has experience working with IANA, and we can also just point them to the form to do a vnd registration - you just fill out https://www.iana.org/form/media-types. It does seem like there's precedent for a 'project steering committee' to submit for the official IANA types, with Apache Arrow, Thrift and Node all being submitted by the steering committees or a member on them. But @ogcscotts can likely help navigate the process / talk to the right people.
So seems like we should determine which direction we want to go, and if we want an optional parameter we should determine what we'd like, before engaging with them.
from geoparquet.
As @jorisvandenbossche mentioned in the meeting today, there's currently progress on Parquet getting a MIME/Media type: https://issues.apache.org/jira/browse/PARQUET-1889
from geoparquet.
Great, so application/vnd.apache.parquet
is the intention, so I'll update my implementations to support both application/vnd.apache.parquet
and application/x-parquet
for now. Do we still intend to add a geo profile or so for GeoParquet?
from geoparquet.
If @cholmes is right in his above comment, we'd have to register a geo
profile along with the current parquet registration?
from geoparquet.
Yes we would, if we go the official route. But we also never did with COG, and everyone just agreed on a de-facto standard of image/tiff; application=geotiff; profile=cloud-optimized
from geoparquet.
If @cholmes is right in his above comment, we'd have to register a geo profile along with the current parquet registration?
Yeah, if we want something listed in the official IANA then we'd need to do it. Like @m-mohr points out we can just add something on. With COG we wanted to get something registered but it basically wasn't an option.
So now is the time to try to advocate for it, if we want to. But I think we were leaning away from that, as mentioned in #115 (comment) I can try to bring it up at the next meeting, but if someone feels like we should push for a 'geo' profile then it'd be good to make the case here. I can't think of the use case where it's really essential, and it seems simpler for the file itself be the place to figure out if it's geo or not. And then not risk it being declared geo but not actually. And I don't see a case where it'd be good to have a non-geo parquet and a geoparquet version of the same file.
from geoparquet.
Same reason as for why COGs have a profile: A client can just detect easily what it is and whether it can render it without actually loading (parts of) the file. Think STAC Browser (and ol-stac, stac-layer, ...) for example... @cholmes
from geoparquet.
If a Parquet file is served with a media type that indicates that it is GeoParquet, a client cannot blindly try to render it, for example (the same is true for COG, despite what others may believe). Before deciding what to do with the contents of a Parquet file, a client would need to read the footer - this is true for geo and non-geo Parquet. After reading the footer, you can see if it has the geo metadata.
@m-mohr - can you provide more specific examples of what a client like STAC Browser would do if it knew that a Parquet file was GeoParquet? If the answer is that it would display the geo-specific metadata, then this is going to require reading the footer of the file - which you can safely do for a non-geo Parquet file as well (and you might want to do anyway to show the user something about the data).
from geoparquet.
It's all about giving users the nicest user experience without loading a whole lot of headers upfront. Thinking more about it, it might be more relevant for COGs than it is for GeoParquet files though.
Example:
Having a STAC file with 12 COGs, I'd like to display a default COG for visualization purposes. If I'd just have image/tiff or the geotiff equivalent as media type, I'd need to read all headers of 12 COG files to know which I might be able to render. Having a specific media type I know upfront whether I can or can not render the 12 COG files. I know there are edge cases, but by default STAC Browser indeed tries to blindly render it, which works in many cases. But it could also be just to differentiate the buttons: "Download" or "Show on map". It's a small UX improvement.
from geoparquet.
Looks like there's now a parquet media type, see #115
Search 'parquet' on https://www.iana.org/assignments/media-types/media-types.xhtml
application/vnd.apache.parquet
I think they could have gotten application/parquet pretty easily, but this does seem consistent with the other apache ones.
I'm going to go ahead and make a PR without a 'geo' parquet media type - we can revisit and add it later if there is a lot of value.
I do wonder if there's a 'hint' we could give in STAC, for 'show on map'. I also do think it's not crazy to try to blindly render, as most parquet in STAC will likely be geoparquet.
from geoparquet.
Related Issues (20)
- Schema version invalid HOT 11
- Simplify or remove script dependencies HOT 3
- PROJJSON schema version HOT 4
- Metadata encoding options for GeoArrow-encoded columns in GeoParquet metadata HOT 2
- Is it possible to define a transform alongside a CRS, similar to geotiff? HOT 3
- Recommendation on the Arrow specific type for the WKB geometry column ? HOT 5
- Antimeridian Crossings and bbox HOT 5
- Update example files for 1.1 HOT 3
- The releases on the repository can be misleading regarding the status of GeoParquet as an OGC Standard HOT 1
- Clarify projection of bounding box columns HOT 3
- Mixed concerns: Encoding + Geometry Type HOT 12
- Covering Schema
- Clarify recommended file extension HOT 9
- List of Submitting Organisations HOT 3
- Enforce pull requests and approvals for all repository updates HOT 4
- Require status checks to pass before merging HOT 4
- Synchronise requirements in the metanorma asciidoc files with those in the gpq validator HOT 1
- add support wkt or wkt2 formats for crs HOT 26
- Thoughts about a first-class GEOMETRY data type in Parquet? HOT 17
- Start a 'best practices' document
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from geoparquet.