Giter Site home page Giter Site logo

Comments (9)

julianpeeters avatar julianpeeters commented on June 17, 2024

Hi,

Thanks again for poking at this.

The expectation appears to be that null fields are printed: Avro docs, and I don't see a way to get Avro to selectively write fields. If I missed something, please feel free to elaborate on why you would expect the null fields be omitted.

Cheers,
Julian

from avro-scala-macro-annotations.

sutram avatar sutram commented on June 17, 2024

The use case for this request is that we use Avro for over-the-wire data transport and it is a little more network efficient if we can exclude null fields instead of keeping them as part of the payload. This is especially true when the Avro record has a lot of null fields.

But you are right, even I don't see a way to exclude null fields using Avro so the logic for doing it will have to be written as part of converting the case class to an Avro record.

from avro-scala-macro-annotations.

julianpeeters avatar julianpeeters commented on June 17, 2024

tldr;
Good idea, but doesn't work out of scope. Strongly encourage the experiment, and curious about the resulting system.

The good news: the penalty for an optional null is only 1 byte (that marks the fact that it's a union) plus the computation to write and read that byte.

The bad news: I like you're good idea, but I can think of only two ways to implement it (although neither require macros):

A) Pre-filter the data into case classes without the Options (duh)
B) Generate the records possible subschemas, a schema registry (of some sort), and wrapping the messages in (key, avro), and manage schemas on a per-record basis (records that had a field with a value of None would be written with a schema that lacked that field). NOTE: THIS BUG NEEDS TO BE FIXED IN ORDER FOR THIS APROACH TO WORK.

A major problem with "B" is that using a registry obviously slows things down, and further, each optional field means double the number of schemas that need to be stored in the registry. As few as 10 optional fields would swamp the Kafka schema registry for example, whose default max is set at 1000 for performance reasons.

Upon further inspection, plan B wont work in Avro. A class' fields and the writer's schema must evolve in tandem. Take for example

MyRecord(i=None, j=0)

Without altering Avro itself, the datum to be written is {"i": null, "j": 0}
If one tries to write that datum with a schema that excludes the first field,

{"type":"record","name":"MyRecord","namespace":"com.example","doc":"Auto-generated schema","fields":[{"name":"j","type":"int"}]}

then the datumwriter thinks there's only one field, and therefore only tries to get the first field from the datum. So when the datum says its got an null at that position, the datumwriter chokes, expecting an int.

I wish I remember where I read something to the effect that the strategies for encoding/evolution in Thrift and Protobuf were better than Avro for this kind of thing (but this kind of thing only :) ).

Cheers

from avro-scala-macro-annotations.

FelixGV avatar FelixGV commented on June 17, 2024

This is an interesting topic...

Scala encourages Options because (or maybe that's just my flawed
understanding) it considers null (and the accompanying null-checks) to be
an anti-pattern. That being said, Scala actually has first-class nulls as
well, presumably for smoother inter-op with Java. The end result is that in
Scala, an Option[x] can actually be one of three things: None, Some(x) or
Some(null).

There is an argument to be made that the Some(null) is an anti-pattern and
that avro-scala-macro should just convert that to None. But since Scala
actually supports the three different states for any given Option, it's
entirely possible that some people might want to represent all of these
states...

Oh well!

-F
On Wed, Jun 10, 2015 at 21:33 Julian Peeters [email protected]
wrote:

Closed #15
#15.


Reply to this email directly or view it on GitHub
#15 (comment)
.

from avro-scala-macro-annotations.

julianpeeters avatar julianpeeters commented on June 17, 2024

Hi Felix,

I get the same impression that null is thought of as an anti-pattern, and it looks like Doug Cutting is in that camp.

Can Some(null) really exist as an Option[AnythingButNull]?

When I try I get:
an expression of type Null is ineligible for implicit conversion

from avro-scala-macro-annotations.

FelixGV avatar FelixGV commented on June 17, 2024

Thanks for the link. It does seem Doug is in that camp, indeed.

I'm pretty sure I've already pattern-matched against nulls, but perhaps I
never pattern-matched against Some(null)... I'm not sure now. I haven't
used Scala in more than a year now so my memory is fuzzy :3

-F
On Thu, Jun 11, 2015 at 00:33 Julian Peeters [email protected]
wrote:

Hi Felix,

I get the same impression that null is thought of as an anti-pattern, and
it looks like Doug Cutting is in that camp
http://apache-avro.679487.n3.nabble.com/why-avro-has-a-special-type-NULL-tc1469368.html.

Can Some(null) really exist as an Option[AnythingButNull]?

When I try I get:
an expression of type Null is ineligible for implicit conversion


Reply to this email directly or view it on GitHub
#15 (comment)
.

from avro-scala-macro-annotations.

julianpeeters avatar julianpeeters commented on June 17, 2024

I don't doubt that you have! From what I understand, there are some other contexts where Some(null) does work... ah Scala.

from avro-scala-macro-annotations.

julianpeeters avatar julianpeeters commented on June 17, 2024

Relevant: FieldAssembler

Nullable = type: [int, null], default: int
Optional = type: [null, int], default: null

from avro-scala-macro-annotations.

FelixGV avatar FelixGV commented on June 17, 2024

Ah yeah... Avro can only have a default of the first type in a union... Not
sure if there is really any practical implication to that or if it's just
an implementation quirk of Avro.

IOW, both a nullable and an optional should probably translate to an Option
in Scala.
On Fri, Jun 12, 2015 at 16:42 Julian Peeters [email protected]
wrote:

Relevant: FieldAssembler
https://avro.apache.org/docs/1.7.7/api/java/org/apache/avro/SchemaBuilder.FieldAssembler.html

Nullable = type: [int, null], default: int
Optional = type: [null, int], default: null


Reply to this email directly or view it on GitHub
#15 (comment)
.

from avro-scala-macro-annotations.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.