Giter Site home page Giter Site logo

Media type about geoparquet HOT 19 CLOSED

m-mohr avatar m-mohr commented on June 1, 2024
Media type

from geoparquet.

Comments (19)

m-mohr avatar m-mohr commented on June 1, 2024 1

Yes, that's true. But on the other hand if it takes too long to wait for it in Apache, the community will just come up with their own definition.

Microsoft Planetary Computer is currently using application/x-parquet, which I'll also adopt for now.

from geoparquet.

rouault avatar rouault commented on June 1, 2024 1

If Apache Parquet used application/apache.parquet would it be appropriate to use application/apache.parquet; application=geoparquet, since geoparquet files are valid parquet files? Maybe it'd be weird to have application in the media type twice.

Not necessarily a helpful answer, but having followed a bit the discussions regarding COG (and not sure we managed to reach an endorsed conclusion), I find that understanding IANA rules for MIME types tends to require a dedicated expertise. For example https://www.rfc-editor.org/rfc/rfc6838.html#page-13 mentions " Media types MAY elect to use one or more media type parameters[...] the names, values, and meanings of any parameters MUST be fully specified when a media type is registered in the standards tree". So I guess that a provision for allowing application=geoparquet should be made at the time application/apache.parquet is registered (unless other rules just ban it or make it already possible...)

from geoparquet.

m-mohr avatar m-mohr commented on June 1, 2024 1

I agree that having a parameter is better than the geo+ thing. So a parameter would be the best option, no strong preference on which to use though so profile=geo is fine.

from geoparquet.

m-mohr avatar m-mohr commented on June 1, 2024 1

I'd hope that in STAC people use the https://github.com/stac-extensions/table extension.
STAC Browser could also try to simply infer the "Show on map" from the table:primary_geometry.
So I'm not feeling as strongly anymore. So let's start with application/vnd.apache.parquet, indeed.

from geoparquet.

jorisvandenbossche avatar jorisvandenbossche commented on June 1, 2024

I suppose having a registered MIME type for Parquet itself (the issue you linked) would be a required first step (since a geoparquet file is technically just a parquet file).

from geoparquet.

TomAugspurger avatar TomAugspurger commented on June 1, 2024

It would be good to establish a convention that differentiates between parquet and geoparquet. I'm happy to update the media types in the Planetary Computer as needed.

I'm not familiar with the various components of a media type, but COG uses image/tiff; application=geotiff; profile=cloud-optimized. If Apache Parquet used application/apache.parquet would it be appropriate to use application/apache.parquet; application=geoparquet, since geoparquet files are valid parquet files? Maybe it'd be weird to have application in the media type twice.

from geoparquet.

m-mohr avatar m-mohr commented on June 1, 2024

OGC contacted IANA to ask about adding parameters such as profile or application (to GeoTiff in this case) and they said that you can't simply "add" them if the original type is already registered. So you'd need to discuss that with Apache upfront or otherwise register your own, e.g. application/geo+apache.parquet (although I'm not sure they would use a . in the name...)

from geoparquet.

cholmes avatar cholmes commented on June 1, 2024

Moving to beta.2. I think we should also try to get in touch with the core Parquet people and see if we can help them register something, even if it's just application/vnd.apache.parquet.

There was also general consensus on a recent call that we don't want a geo-specific parquet mimetype, we'd just use the general parquet one, and users would rely on the presence of geo metadata in the parquet file (or they could also guess that it'd be likely if there are shapefiles / geopackages of the same data). Happy for arguments on why we should have our own special geo one, and sounds like the time to do so is when apache applies for theirs. But I think we don't really want geospatial systems that distribute a non-geo parquet and a geoparquet. And we also hope that eventually all parquet readers would at least know to identify the standard geo data.

from geoparquet.

m-mohr avatar m-mohr commented on June 1, 2024

The same reasons why we want a COG and GeoTiff specific media type over just using "image/tiff" also applies here. If you need to read the file anyway (partially), then you could also just omit media types completely.
Example: Without specific media types, STAC Browser would try to visualize a 100GB non-GeoTiff TIFF on a map, which is not a nice user experience and likely crash your browser. But by knowing it's a COG I can visualize just those and reject all the others.
So +1 on having at least geo-specific parameter...

from geoparquet.

cholmes avatar cholmes commented on June 1, 2024

I don't feel that strongly about this on either way, except that an optional parameter (like application/parquet; profile=geo or whatever) seems better to me than the geo+parquet, since then a generic parquet reader would have a better chance of actually opening it.

But I do think the time is 'now' to decide what we want, since as pointed out above the only real chance IANA seems to give for optional parameters is on registration. We can likely help the Apache people with the process, since OGC has experience working with IANA, and we can also just point them to the form to do a vnd registration - you just fill out https://www.iana.org/form/media-types. It does seem like there's precedent for a 'project steering committee' to submit for the official IANA types, with Apache Arrow, Thrift and Node all being submitted by the steering committees or a member on them. But @ogcscotts can likely help navigate the process / talk to the right people.

So seems like we should determine which direction we want to go, and if we want an optional parameter we should determine what we'd like, before engaging with them.

from geoparquet.

kylebarron avatar kylebarron commented on June 1, 2024

As @jorisvandenbossche mentioned in the meeting today, there's currently progress on Parquet getting a MIME/Media type: https://issues.apache.org/jira/browse/PARQUET-1889

from geoparquet.

m-mohr avatar m-mohr commented on June 1, 2024

Great, so application/vnd.apache.parquet is the intention, so I'll update my implementations to support both application/vnd.apache.parquet and application/x-parquet for now. Do we still intend to add a geo profile or so for GeoParquet?

from geoparquet.

kylebarron avatar kylebarron commented on June 1, 2024

If @cholmes is right in his above comment, we'd have to register a geo profile along with the current parquet registration?

from geoparquet.

m-mohr avatar m-mohr commented on June 1, 2024

Yes we would, if we go the official route. But we also never did with COG, and everyone just agreed on a de-facto standard of image/tiff; application=geotiff; profile=cloud-optimized

from geoparquet.

cholmes avatar cholmes commented on June 1, 2024

If @cholmes is right in his above comment, we'd have to register a geo profile along with the current parquet registration?

Yeah, if we want something listed in the official IANA then we'd need to do it. Like @m-mohr points out we can just add something on. With COG we wanted to get something registered but it basically wasn't an option.

So now is the time to try to advocate for it, if we want to. But I think we were leaning away from that, as mentioned in #115 (comment) I can try to bring it up at the next meeting, but if someone feels like we should push for a 'geo' profile then it'd be good to make the case here. I can't think of the use case where it's really essential, and it seems simpler for the file itself be the place to figure out if it's geo or not. And then not risk it being declared geo but not actually. And I don't see a case where it'd be good to have a non-geo parquet and a geoparquet version of the same file.

from geoparquet.

m-mohr avatar m-mohr commented on June 1, 2024

Same reason as for why COGs have a profile: A client can just detect easily what it is and whether it can render it without actually loading (parts of) the file. Think STAC Browser (and ol-stac, stac-layer, ...) for example... @cholmes

from geoparquet.

tschaub avatar tschaub commented on June 1, 2024

If a Parquet file is served with a media type that indicates that it is GeoParquet, a client cannot blindly try to render it, for example (the same is true for COG, despite what others may believe). Before deciding what to do with the contents of a Parquet file, a client would need to read the footer - this is true for geo and non-geo Parquet. After reading the footer, you can see if it has the geo metadata.

@m-mohr - can you provide more specific examples of what a client like STAC Browser would do if it knew that a Parquet file was GeoParquet? If the answer is that it would display the geo-specific metadata, then this is going to require reading the footer of the file - which you can safely do for a non-geo Parquet file as well (and you might want to do anyway to show the user something about the data).

from geoparquet.

m-mohr avatar m-mohr commented on June 1, 2024

It's all about giving users the nicest user experience without loading a whole lot of headers upfront. Thinking more about it, it might be more relevant for COGs than it is for GeoParquet files though.

Example:
Having a STAC file with 12 COGs, I'd like to display a default COG for visualization purposes. If I'd just have image/tiff or the geotiff equivalent as media type, I'd need to read all headers of 12 COG files to know which I might be able to render. Having a specific media type I know upfront whether I can or can not render the 12 COG files. I know there are edge cases, but by default STAC Browser indeed tries to blindly render it, which works in many cases. But it could also be just to differentiate the buttons: "Download" or "Show on map". It's a small UX improvement.

from geoparquet.

cholmes avatar cholmes commented on June 1, 2024

Looks like there's now a parquet media type, see #115

Search 'parquet' on https://www.iana.org/assignments/media-types/media-types.xhtml

application/vnd.apache.parquet

I think they could have gotten application/parquet pretty easily, but this does seem consistent with the other apache ones.

I'm going to go ahead and make a PR without a 'geo' parquet media type - we can revisit and add it later if there is a lot of value.

I do wonder if there's a 'hint' we could give in STAC, for 'show on map'. I also do think it's not crazy to try to blindly render, as most parquet in STAC will likely be geoparquet.

from geoparquet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.