Giter Site home page Giter Site logo

avro-scala-macro-annotations's Introduction

Herein lie assorted macro annotations for working with Avro in Scala:

  1. @AvroTypeProvider("path/to/schema") - Convert Avro Schemas to Scala case class definitions for use in your favorite Scala Avro runtime.

  2. @AvroRecord - Use Scala case classes to represent your Avro SpecificRecords, serializable by the Apache Avro runtime (a port of Avro-Scala-Compiler-Plugin).

Macros are an experimental feature of Scala. Avrohugger is a more traditional alternative.

Get the dependency:

For Scala 2.11.x and 2.12.x (for Scala 2.10.x please use version 0.4.9 with sbt 0.13.8+):

    libraryDependencies += "com.julianpeeters" % "avro-scala-macro-annotations_2.11" % "0.11.1"

Macro annotations are only available in Scala 2.10.x, 2.11.x, and 2.12.x with the macro paradise plugin. Their inclusion in official Scala might happen in Scala 2.13 - official docs. To use the plugin, add the following build.sbt:

    addCompilerPlugin("org.scalamacros" % "paradise" % "2.1.0" cross CrossVersion.full)

In your IDE of choice you may have to explicitly load this compiler plugin. In Eclipse for example, you can do so by providing the full path under the Xplugin, found in the advanced Scala compiler preferences; you should have the jar in a path like ~/.ivy2/cache/org.scalamacros/paradise_2.10.4/jars/paradise_2.10.4-2.0.1.jar.

Usage:

Use the annotations separately, or together like this:

        package sample

        import com.julianpeeters.avro.annotations._

        @AvroTypeProvider("data/input.avro")
        @AvroRecord
        case class MyRecord()

First the fields are added automatically from an Avro Schema in a file, then the methods necessary for de/serialization are generated for you, all at compile time. Please see warnings below.

Supported data types:

int

float

long

double

boolean

string

null

array*

map

record

union**

*Arrays are represented by List[T], where T is any other supported type.

**Optional fields of type [null, t] are represented by Option[T]

The remaining avro types, fixed, enum, and union (beyond nullable fields), are not yet supported.

1) Avro-Type-Provider

If your use-case is "data-first" and you're using an Avro runtime library that allows you to use Scala case classes to represent your Avro records, then you are probably a little weary of transcribing Avro Schemas into their Scala case class equivalents.

Annotate an "empty" case class, and its members will be generated automatically at compile time using the data found in the Schema of a given file:

given the schema automatically found in input.avro or input.avsc:

        {"type":"record","name":"MyRecord","namespace":"tutorial","doc":"Auto-generated schema","fields":[{"name":"x","type":{"type":"record","name":"Rec","doc":"Auto-generated schema","fields":[{"name":"i","type":"int","doc":"Auto-Generated Field"}]},"doc":"Auto-Generated Field","default":{"i":4}}]}}

annotated empty case classes:

        import com.julianpeeters.avro.annotations._

        @AvroTypeProvider("data/input.avro")
        case class Rec()

        @AvroTypeProvider("data/input.avro")
        case class MyRecord()

expand to:

       package tutorial

       import com.julianpeeters.avro.annotations._

       @AvroTypeProvider("data/input.avro")
       case class Rec(i: Int)

       @AvroTypeProvider("data/input.avro")
       case class MyRecord(x: Rec = Rec(4))

Please note:

  1. The datafile must be available at compile time.

  2. The filepath must be a String literal.

  3. The name of the empty case class must match the record name exactly (peek at the schema in the file, if needed).

  4. The order of class definition must be such that the classes that represent the most-nested records are expanded first.

  5. A class that is doubly annotated with @AvroTypeProvider and @AvroRecord will be updated with vars instead of vals.

2) Avro-Record:

Implements SpecificRecord at compile time so you can use Scala case classes to represent Avro records (like Scalavro or Salat-Avro, but for the Apache Avro runtime so that it runs on your cluster). Since Avro-Scala-Compiler-Plugin doesn't work with Scala 2.10+ and the compiler still stumps me, I ported the serialization essentials over to use Scala Macro Annotations instead.

Now you can annotate a case class that you'd like to have serve as your Avro record:

        package sample

        @AvroRecord
        case class A(var i: Int)

        @AvroRecord
        case class B(var a: Option[A] = None)

expands to implement SpecificRecord, adding put, get, and getSchema methods, and a static lazy val SCHEMA$ with the schema:

        {"type":"record","name":"B","namespace":"sample","doc":"Auto-generated schema","fields":[{"name":"a","type":["null",{"type":"record","name":"A","doc":"Auto-generated schema","fields":[{"name":"i","type":"int","doc":"Auto-Generated Field"}]}],"doc":"Auto-Generated Field",default: null}]}

Use the expanded class as you would a code-gen'd class with any SpecificRecord API. e.g.:

        //Writing avros
        val datumWriter = new SpecificDatumWriter[B](B.SCHEMA$)
        val dataFileWriter = new DataFileWriter[B](datumWriter)


        //Reading avros
        val userDatumReader = new SpecificDatumReader[B](B.SCHEMA$)
        val dataFileReader = new DataFileReader[B](file, userDatumReader)

Please note:

  1. If your framework is one that relies on reflection to get the Schema, it will fail since Scala fields are private. Therefore preempt it by passing in a Schema to DatumReaders and DatumWriters (as in the Avro example above).

  2. Fields must be vars in order to be compatible with the SpecificRecord API.

  3. A class that is doubly annotated with @AvroTypeProvider and @AvroRecord will automatically be updated with vars instead of vals.

  4. An annotatee may extend a trait (to become a mixin after expansion) but not a class, since SpecificRecordBase will need to occupy that position.

avro-scala-macro-annotations's People

Contributors

davidmweber avatar i-am-the-slime avatar jarrodu avatar julianpeeters avatar rjharmon avatar xeno-by avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

avro-scala-macro-annotations's Issues

Supported collection types

When I use Seq or Array, I get the following error:

java.lang.UnsupportedOperationException: Could not generate schema. Cannot support yet: Seq[String]

With List, it seems to work. Please document which collection types are supported, and if possible support Seq as well.

Can generated getSchema method always return same Schema instance?

When using @AvroRecord, the generated getSchema method returns a new instance of Schema on every call. This could be inefficient in general, and caused this specific issue for me recently:

https://groups.google.com/d/msg/confluent-platform/gkmtn2FO4Ug/IIsp8tZHT0QJ

Would it be possible to have every instance of the same case class return the same Schema instance? I think ending up with something like this would work:

@AvroRecord
case class MyMessage(var a: String, var b: Int) {
  //generated getSchema method
  def getSchema: Schema = MyMessage.schema
}

//generated companion object
object MyMessage {
  lazy val schema = new Schema.Parser().parse(${generateSchema(name.toString, namespace, indexedFields).toString})
}

This seems to be essentially what the Avro Java code generator produces (a static final Schema object). I could try taking a run at this, but I've never done anything with Scala macros before so might take awhile. Mainly just wanted to open this issue to get thoughts from others.

Adding default values in the schema?

Hi,
Is adding default values supported for the @AvroRecord annotation? Even if I set a default values for fields within case classes, it doesn't show up as a default value. Given this partial JSON:

 {"name": "platform_deviceid", "type": "string"}

It would be great if I set a default value for the case class field like this:

var platform_deviceid: String = "" 

that turns into:

 {"name": "platform_deviceid", "type": "string", "default": ""} 

Using default values is useful during schema evolution (for example, when deleting the field shown above in future iterations of the schema)

Null valued field exclusion

Hi,
Given the following case class:

@AvroRecord
case class Playback (
  var quality: Option[Quality]
)

If I set the value of quality to None, I would expect that the value simple does not appear in the resulting JSON. Instead, it seems to appear like this:

{"quality" : null}

Is it possible to remove it from the resulting JSON?

I assume this will be a little complicated because if this is the only field (or all fields are null) in the record then most likely you will also have to recursively remove the parent record from the JSON.

Thanks.

Binary Avro representation fails

Hi,
I am trying to use the @AvroRecord annotation to create a binary Avro representation of my Scala case classes. It is failing with a null pointer exception and I can't figure out why.

Here is the code:

import com.julianpeeters.avro.annotations._

import org.apache.avro.specific.{SpecificDatumWriter, SpecificData}
import org.apache.avro.io.EncoderFactory

import java.io.ByteArrayOutputStream

@AvroRecord
case class Platform (
  var deviceId: Option[String],
  var deviceType: Option[String]
)

@AvroRecord
case class Quality (
  var bitrate: Option[Int],
  var previousBitRate: Option[Int]
)

@AvroRecord
case class Playback (
  var quality: Option[Quality]
)

@AvroRecord
case class AtomicSlice (
  var timestamp_received: Long,
  var platform: Option[Platform],
  var playback: Option[Playback]
)

object ASMain {

  def main(args: Array[String]) = {

    val platform = Platform(deviceId=scala.None, deviceType=Some("abc"))

    val as = AtomicSlice(
      timestamp_received=1234567890000L,
      platform=Some(platform),
      playback=scala.None)

    println(AtomicSlice.SCHEMA$)

    val sw = new SpecificDatumWriter[AtomicSlice]

    val out = new java.io.ByteArrayOutputStream()
    val encoder = EncoderFactory.get().binaryEncoder(out, null)
    sw.write(as, encoder)
    encoder.flush
    val ba = out.toByteArray
    out.close


  }
}

Here is the exception:

[error] (run-main-0) java.lang.NullPointerException
java.lang.NullPointerException
        at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:87)
        at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
        at com.twc.needle.domain.ASMain$.main(AtomicSlice.scala:51)
        at com.twc.needle.domain.ASMain.main(AtomicSlice.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)

Requirement of var for records

Is there any plan to eliminate the need to use var for records? I do not mind annotations or macro usage, but using vars for otherwise immutable data structures like events used for eventsourcing and such feels wrong :/

Error reading an array of schmas

In order to avoid repeat the definition of a record multiple and be able to reuse them across a schema I tried defining them like this:

[
{
    "type": "record",
    "name": "Bar",
    "fields": [ ]
},
{
    "type": "record",
    "name": "Foo",
    "fields": [
        {"name": "first", "type": "Bar"}
        {"name": "second", "type": "Bar"}
    ]
}
]

When I tried to use this with @AvroTypeProvider I got:
java.lang.RuntimeException: no record type found in the union from path/to/schema.avsc

The problem is that https://github.com/julianpeeters/avro-scala-macro-annotations/blob/master/macros/src/main/scala/avro/scala/macro/annotations/provider/FileParser.scala#L26 should check if x.getType == RECORD, not x itself.

That issue aside what I wanted to suggest that instead of returning the first schema from the union that has a record type you could maybe return either all of them or at least the last one? That's because if I put Bar after Foo I get parse error as Bar is not recognised. But left as in the example, the parsed Schema object corresponding to Foo will contain multiple occurrences of Bar so the existing finding mechanisms should work if the last schema of the union was returned.

Type macro to Unit

Is it possible to not return something from the macros? sbt console is full of discarded non-Unit value [warn] @AvroRecord messages.

How to use it properly?

When I looked at the examples, case class always not taking any explicit parameters.
But when I was trying to use it at my code,

@AvroTypeProvider("src/main/avro/outputs/myclass.avro")
@AvroRecord
case class MyClass()

GCalendar(value1, value2)
I got too many arguments for method apply

Did I miss something? Could anyone please help

Compile time typecheck fails on doubly+ nested types

case class(x: List[Option[Int]]) fails to expand.

Used to work when types were converted to Strings for the typematcher, but after moving to TypeRefs (for typesafety), c.typecheck returns List[Option[...]]

How can I typecheck the whole type? Can I typecheck recursively? I've added the question on SO.

Serializing with generic writer and deserializing with specific reader

Is that possible? Im rewriting some stuff using your library and came across this issue that when using generics for serializing and then your lib for deserializing using specific reader its not grabbing an int union, but just putting a null value

So, in this test you can see whats the deal, I have a base schema which I am expanding with your lib, I took it and added a gender field, so I can serialize using generics and then deserialize using the base one on the case class, below the code

val schema2 = new Schema.Parser().parse("""{
      "namespace": "example.avro",
      "type": "record",
      "name": "Person",
      "fields": [
         {"name": "name", "type": "string"},
         {"name": "age", "type": ["int", "null"]},
         {"name": "gender", "type": ["string", "null"]}
      ]
    }""")

    val work = new GenericRecordBuilder(schema2)
      .set("name", "Jeff")
      .set("age", 23)
      .set("gender", "M")
      .build

    val writer = new GenericDatumWriter[GenericRecord](schema2)
    val baos = new ByteArrayOutputStream
    val encoder = EncoderFactory.get().binaryEncoder(baos, null);
    writer.write(work, encoder)
    encoder.flush
    println(work)

    val datumReader = new SpecificDatumReader[Person](Person.SCHEMA$);
    val binDecoder = DecoderFactory.get().binaryDecoder(baos.toByteArray, null);
    val gwork2 = datumReader.read(null, binDecoder);

    println(gwork2)
    work.get("name").toString should be(gwork2.get("name").toString)
    work.get("age") should be(gwork2.get("age"))

the outputs of the prints are this

{"name": "Jeff", "age": 23, "gender": "M"}
{"name": "Jeff", "age": null}

I checked namespaces and everything seems ok, what else could I be missing here?

Support AVDL files

Would be really cool to support avdl files. JSON files don't allow importing other files so they are not such a good fit for larger projects.

Namespaced case class and Namespace-less schema: Avro fails to resolve if record is part of a union

Records in namespace-less schemas are most naturally represented by case classes in the default package (i.e. no package). That's not very useful, so it's nice that Avro can resolve the record just fine if the case class is in a package when there is no namespace in the schema, however reading and writing fails for records whose fields are unions of record types.

The issues seem to be due to the mismatch between a) the expected and actual schemas, and b) the full names of records vs specific classes. Avro tries to resolve the record found in the union but no class matches the full name.

Thus, I believe this is an Avro issue, but so far no response on the users mailing list:
http://apache-avro.679487.n3.nabble.com/Issues-reading-and-writing-namespace-less-schemas-from-namespaced-Specific-Records-tc4032092.html

Compile fails with Scala 2.10

My code compiles fine with Scala 2.11.6 but fails with Scala 2.10

Given the following code:

package com.twc.needle.domain

import com.julianpeeters.avro.annotations._

import org.apache.avro.specific.SpecificDatumWriter
import org.apache.avro.io.EncoderFactory
import org.apache.avro.generic.GenericDatumReader
import org.apache.avro.generic.GenericData.Record
import org.apache.avro.io.{DecoderFactory, EncoderFactory}

import java.io.ByteArrayOutputStream

@AvroRecord
case class Platform (
  var deviceId: Option[String],
  var deviceType: Option[String]
)

@AvroRecord
case class Quality (
  var bitrate: Option[Int],
  var previousBitrate: Option[Int]
)

@AvroRecord
case class Playback (
  var quality: Option[Quality]
)

@AvroRecord
case class AtomicSlice (
  var timestamp_received: Long,
  var platform: Option[Platform],
  var playback: Option[Playback]
)

object ASMain {

  def main(args: Array[String]) = {

    val platform = Platform(deviceId=None, deviceType=Some("abc"))

    val as = AtomicSlice(
      timestamp_received=1234567890000L,
      platform=Some(platform),
      playback=None)

    println(as.toString)
    println(AtomicSlice.SCHEMA$)

    val sw = new SpecificDatumWriter[AtomicSlice](AtomicSlice.SCHEMA$)

    val out = new java.io.ByteArrayOutputStream()
    val encoder = EncoderFactory.get().binaryEncoder(out, null)
    sw.write(as, encoder)
    encoder.flush
    val ba = out.toByteArray
    out.close

    val reader = new GenericDatumReader[Record](AtomicSlice.SCHEMA$)

    val decoder = DecoderFactory.get().binaryDecoder(ba, null)
    val decoded = reader.read(null, decoder)

    println(decoded.toString)

  }
}

Here is the build.sbt

name := "domain-model"

version := "1.0"

//scalaVersion := "2.10.5"

resolvers += Resolver.sonatypeRepo("releases")

addCompilerPlugin("org.scalamacros" % "paradise" % "2.1.0-M5" cross CrossVersion.full)

libraryDependencies ++= Seq(
  "com.julianpeeters" % "avro-scala-macro-annotations_2.10" % "0.4"
)

Here is the error:

[info] Compiling 1 Scala source to /Users/mahesh/TWC/MaprVagrant/domain-model/target/scala-2.10/classes...
[error] /Users/mahesh/TWC/MaprVagrant/domain-model/src/main/scala/com/twc/needle/domain/AtomicSlice.scala:49: value SCHEMA$ is not a member of object com.twc.needle.domain.AtomicSlice
[error]     println(AtomicSlice.SCHEMA$)
[error]                         ^
[error] /Users/mahesh/TWC/MaprVagrant/domain-model/src/main/scala/com/twc/needle/domain/AtomicSlice.scala:51: value SCHEMA$ is not a member of object com.twc.needle.domain.AtomicSlice
[error]     val sw = new SpecificDatumWriter[AtomicSlice](AtomicSlice.SCHEMA$)
[error]                                                               ^
[error] /Users/mahesh/TWC/MaprVagrant/domain-model/src/main/scala/com/twc/needle/domain/AtomicSlice.scala:60: value SCHEMA$ is not a member of object com.twc.needle.domain.AtomicSlice
[error]     val reader = new GenericDatumReader[Record](AtomicSlice.SCHEMA$)
[error]                                                             ^
[error] three errors found
[error] (compile:compile) Compilation failed
[error] Total time: 2 s, completed Jun 5, 2015 9:51:13 AM
[mahesh@trishul:~/TWC/MaprVagrant/domain-model]$

Can't parse plain text JSON schemas

How do you create the schemas in your tests? They are somehow binary. When using a plain text file with a JSON schema inside I get an exception saying: Not a data file.
My guess is, that the schemas in your tests are actually serialised instances of some Avro schema object instance. It would be great if you could outline how to go from a plaintext file to the type of schema you are using.

Add a License file

Could you put a license on this code? Something like Apache would be great.

Support enums

Avro's SpecificData requires a Java enum. Is it even possible to annotate and expand top-level Java definitions?

What would it take to add a fixed field to this?

I am guessing you'd want to create another macro, do some checking to make sure it has a single field (ByteBuffer or Array[Byte]) and inherit from SpecificFixed, and add a macro on to the class @FixedSize(num).

Any ideas how to do this? I am having a harder time with this macro stuff than I thought =(

logger didn't work out so well...

Added a logger to AvroTypeProvider so that IDE users could more easily see their directory's file path, but loggers don't appear to be silenceable when a macro is published:

http://stackoverflow.com/questions/32330081/how-can-i-change-the-log-level-of-a-scala-macro

With so much attention being given to scala.meta rather than supporting macros, I don't expect that this issue will be addressed soon.

So, if the logger gets too annoying, just let me know and I'll rip it out.

Support for nesting in a more hierarchical manner

This is possibly out of the scope of this library but it would be nice to handle nested annotations as such:

{
  "type": "record",
  "name": "TestMessage",
  "namespace": "com.julianpeeters.example",
  "fields": [
    {"name": "message", "type": "string"},
    {
      "name": "metaData",
      "type": "com.julianpeeters.example.Metadata"
    }
  ]
}

And the nested message:

{
  "type": "record",
  "name": "MetaData",
  "namespace": "com.julianpeeters.example",
  "fields": [
    {"name": "source", "type": "string"},
    {"name": "timestamp", "type": "string"}
  ]
}

This would definitely improve the usability of the library in situations where large data models are represented. Would be glad to take up this work with some guidance on where to start looking. Perhaps a plug-in approach would be best (lots of features I'd like to use in conjunction with Avro as Scala macros take off; validation, type providers, etc)?

What am I doing wrong?

Sorry if this is not the appropriate forum, but I have some questions surrounding your library that Im really struggling with - I'm new to Avro and Scala so forgive me if I'm missing the obvious.

I have a simple Avro Schema Asof.avsc generated by another system for me to supply them avro files in this schema. My assumption from reading your docs, would be to use something along the lines of whats shown in AvroTypeProviderExample. But when I do that, get compile errors.

Asof.avsc:

{
  "type" : "record",
  "name" : "Asof",
  "namespace" : "risk",
  "fields" : [ {
    "name" : "value",
    "type" : "string"
  } ]
}

my code:

package risk

import com.julianpeeters.avro.annotations._
import org.apache.avro.specific._
import org.apache.avro.generic._
import org.apache.avro.file._
import java.io.File


@AvroTypeProvider("src/main/avro/Asof.avsc")
@AvroRecord
case class Asof()



object AvroConverter extends App {
  println(Asof)
  val record = Asof("20160912")//compile error too many arguments to method apply


  val file = File.createTempFile("AsofTest", "avro")
  file.deleteOnExit()


  val userDatumWriter = new SpecificDatumWriter[Asof]
  val dataFileWriter = new DataFileWriter[Asof](userDatumWriter)
  dataFileWriter.create(record.getSchema(), file);//compile error cannot resolve symbol getSchema
  dataFileWriter.append(record);
  dataFileWriter.close();


  val schema = new DataFileReader(file, new GenericDatumReader[GenericRecord]).getSchema
  val userDatumReader = new SpecificDatumReader[Asof](schema)
  val dataFileReader = new DataFileReader[Asof](file, userDatumReader)
  val sameRecord = dataFileReader.next()


  println("deserialized record is the same as a new record based on the schema in the file?: " + (sameRecord == record) )


}

I'm using scala 2.10 and appropriate version on your lib with that - can you please advise what I am doing wrong here?

Issues reading and writing evolved records with the old schemas

Say we want to add a field to a record, then write more records the old schema.

In Java this is possible by using the new record with the old schema, as long as you used that exact schema for reading (or else use the new schema provided it has default values that can fill in for the lack of the added field).

But something goes wrong when using Java from this library.

Reference case class:

 Case Class MyRecord(i: Int)`
 val rec = MyRecord(0)

rec gets encoded as a 0 in the byte array.

"Evolved" case class (the nullable field j has been added):

Case Class MyRecord(i: Int, j: Option[Int])
val rec = MyRecord(0, None)

rec gets encoded as 2 bytes: one to represent the value of the int, the other to specify the first member of the array (in this case a null, so no other bytes are written).

Writing a new, "evolved" record with the old schema seems to work correctly, as rec appears to be converted into a 0 in the byte array. Just like it did before the new field was added correctly, just like it did when Java handled this correctly.

But what works in Java fails when reading. We get {"i": 0,"j":1}instead of {"i": 0}. A default value for the int within the option slipped in.

Will add a fix and some tests.

More complete example:

package com.example

import com.julianpeeters.avro.annotations._

import org.apache.avro.specific.SpecificDatumWriter
import org.apache.avro.specific.SpecificDatumReader
import org.apache.avro.io.{DecoderFactory, EncoderFactory}

import java.io.ByteArrayOutputStream


@AvroRecord
case class MyRecord(var i: Int, var j: Option[Int])

object Main {

  def main(args: Array[String]) = {
     val myRecord = MyRecord(i=1, j=null)
     val subSchema = new org.apache.avro.Schema.Parser().parse("""{"type":"record","name":"MyRecord","namespace":"com.twc.needle.domain","doc":"Auto-generated schema","fields":[{"name":"i","type":"int","doc":"Auto-Generated Field"}]}
""")
   val writer = new SpecificDatumWriter[MyRecord](subSchema)

    val out = new java.io.ByteArrayOutputStream()
    val encoder = EncoderFactory.get().binaryEncoder(out, null)
    writer.write(myRecord, encoder)


    encoder.flush
    val ba = out.toByteArray
    out.close

    val reader = new SpecificDatumReader[MyRecord](subSchema)
    val decoder = DecoderFactory.get().binaryDecoder(ba, null)
    val decoded = reader.read(null, decoder)


    println(decoded.toString)
  }
}

Unable to compile with nested classes, referenced from different files

I noticed some very weird behaviour then an annotated class contains another annotated class (both stored in same file), and these classes are imported/used in multiple other.scala files.

Commenting out one of the usages, solves the compilation issue. So far it doesn't look very easy to come up with a minimal example...

Any ideas?

[error] Schemas.scala:81: exception during macro expansion:
[error] java.lang.UnsupportedOperationException: Cannot support yet: UserEx
[error]     at com.julianpeeters.avro.annotations.AvroRecordMacro$.com$julianpeeters$avro$annotations$AvroRecordMacro$$createSchema$1(AvroRecordMacro.scala:108)
[error]     at com.julianpeeters.avro.annotations.AvroRecordMacro$$anonfun$2.apply(AvroRecordMacro.scala:111)
[error]     at com.julianpeeters.avro.annotations.AvroRecordMacro$$anonfun$2.apply(AvroRecordMacro.scala:111)
[error]     at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
[error]     at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
[error]     at scala.collection.immutable.List.foreach(List.scala:318)
[error]     at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
[error]     at scala.collection.AbstractTraversable.map(Traversable.scala:105)
[error]     at com.julianpeeters.avro.annotations.AvroRecordMacro$.generateSchema$1(AvroRecordMacro.scala:111)
[error]     at com.julianpeeters.avro.annotations.AvroRecordMacro$.impl(AvroRecordMacro.scala:261)
[error]   @AvroRecord
[error]    ^

Efficient serialisation of avro annotated case-classes

it looks quite natural to use operate on the same case classes for simple data processing. The operations which need the objects to be serialized (e.g. groupby, shuffle in scalding/Spark) - currently with Chill+Kryo, this would incur huge overhead - some 1KB+ for every record just because of the SCHEMA$ field

so what should be the preferred way to (de)serialise these records?

of course simplest is to write a custom serializer for every data type :(, but it would be nice to have a generic helper to do so.

one possibility could be to use CaseClass.unapply and apply (which don't include SCHEMA$), and then somehow feed this into Chill. something along these lines:
https://github.com/twitter/chill/blob/42da58580409885afda0f3139835d82329009ca2/chill-bijection/src/test/scala/com/twitter/chill/CustomSerializationSpec.scala#L45-L63
but I cant yet find how to make this less verbose, and simpler

one could try to use Chill-Avro serialization, but apparently it doesn't work with annotations:

// https://github.com/twitter/chill/blob/911de385658aa012121620add4889a364d408d2f/chill-avro/src/main/scala/com/twitter/chill/avro/AvroSerializer.scala
import com.twitter.chill.avro.AvroSerializer
import com.julianpeeters.avro.annotations.AvroRecord

@AvroRecord
case class Test(var test: List[String])

AvroSerializer.SpecificRecordBinarySerializer[Test]

org.apache.avro.AvroRuntimeException: java.lang.IllegalAccessException: Class org.apache.avro.specific.SpecificData can not access a member of class Test with modifiers "private final"
at org.apache.avro.specific.SpecificData.createSchema(SpecificData.java:250)
at org.apache.avro.specific.SpecificData.getSchema(SpecificData.java:189)
at org.apache.avro.specific.SpecificDatumWriter.(SpecificDatumWriter.java:33)
at com.twitter.bijection.avro.SpecificAvroCodecs$.toBinary(AvroCodecs.scala:106)
at com.twitter.chill.avro.AvroSerializer$.SpecificRecordBinarySerializer(AvroSerializer.scala:37)
... 43 elided
Caused by: java.lang.IllegalAccessException: Class org.apache.avro.specific.SpecificData can not access a member of class Test with modifiers "private final"
at sun.reflect.Reflection.ensureMemberAccess(Reflection.java:109)
at java.lang.reflect.AccessibleObject.slowCheckMemberAccess(AccessibleObject.java:261)
at java.lang.reflect.AccessibleObject.checkAccess(AccessibleObject.java:253)
at java.lang.reflect.Field.get(Field.java:376)
at org.apache.avro.specific.SpecificData.createSchema(SpecificData.java:240)
... 47 more

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.