Giter Site home page Giter Site logo

Comments (4)

ericphanson avatar ericphanson commented on June 16, 2024

Just below they suggest name spacing the value passed there: https://arrow.apache.org/docs/format/Columnar.html#extension-types. That bit reads to me like it is not a placeholder, but rather the customization is in the value (not the key).

from arrow-julia.

DrChainsaw avatar DrChainsaw commented on June 16, 2024

But the metadata is a dict, so the namespacing they suggest would be pointless if only applied to values since they will be overwritten.

from arrow-julia.

ericphanson avatar ericphanson commented on June 16, 2024

Maybe we can check against another implementation by seeing the metadata produced by an extension type there, e.g. following https://arrow.apache.org/docs/python/generated/pyarrow.ExtensionType.html#pyarrow.ExtensionType.extension_name.

It doesn't seem clear if a column can have more than one extension type though. It could be there's only 1 key on purpose so that different implementations can share that key to define an extension to the arrow spec overall (e.g. we if we all agree what a foo is, we define an extension name for that, serialize that metadata, and then read it in as a foo when possible). Which maybe then means your suggestion is that arrow-julia shouldn't be using "extension types" specifically for metadata that is only used by that implementation, and should use other keys for that.

Would appreciate any feedback from someone who understands the spec better

from arrow-julia.

DrChainsaw avatar DrChainsaw commented on June 16, 2024

After searching through the arrow repo after arrow:extension is seems like you might be right. Here it is defined as a const in the cpp code for example and I could not find any trace of it being manipulated or changed anywhere (which would be a quite strange thing to do as well).

The c code seems to accept the metadata as a vector of pairs through, so it would in theory allow for multiple identical keys, but python, java and Julia use dicts so there is no way to have it through any of them.

It is a bit unclear to me what the ExtensionType stuff does in arrow and what one gets for buying in to it. However, most things points to it being a mechanism for your foo example.

I close the issue now since my initial understanding of the extension metadata was incorrect and I don't think there is any action needed here. Just reopen or open another issue if there is a point about late conversion stuff (e.g. String <-> Symbol) not fitting into the definition of extensions.

from arrow-julia.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.