Giter Site home page Giter Site logo

avro-scala-macro-annotations's Issues

Can generated getSchema method always return same Schema instance?

When using @AvroRecord, the generated getSchema method returns a new instance of Schema on every call. This could be inefficient in general, and caused this specific issue for me recently:

https://groups.google.com/d/msg/confluent-platform/gkmtn2FO4Ug/IIsp8tZHT0QJ

Would it be possible to have every instance of the same case class return the same Schema instance? I think ending up with something like this would work:

@AvroRecord
case class MyMessage(var a: String, var b: Int) {
  //generated getSchema method
  def getSchema: Schema = MyMessage.schema
}

//generated companion object
object MyMessage {
  lazy val schema = new Schema.Parser().parse(${generateSchema(name.toString, namespace, indexedFields).toString})
}

This seems to be essentially what the Avro Java code generator produces (a static final Schema object). I could try taking a run at this, but I've never done anything with Scala macros before so might take awhile. Mainly just wanted to open this issue to get thoughts from others.

Namespaced case class and Namespace-less schema: Avro fails to resolve if record is part of a union

Records in namespace-less schemas are most naturally represented by case classes in the default package (i.e. no package). That's not very useful, so it's nice that Avro can resolve the record just fine if the case class is in a package when there is no namespace in the schema, however reading and writing fails for records whose fields are unions of record types.

The issues seem to be due to the mismatch between a) the expected and actual schemas, and b) the full names of records vs specific classes. Avro tries to resolve the record found in the union but no class matches the full name.

Thus, I believe this is an Avro issue, but so far no response on the users mailing list:
http://apache-avro.679487.n3.nabble.com/Issues-reading-and-writing-namespace-less-schemas-from-namespaced-Specific-Records-tc4032092.html

Add a License file

Could you put a license on this code? Something like Apache would be great.

Requirement of var for records

Is there any plan to eliminate the need to use var for records? I do not mind annotations or macro usage, but using vars for otherwise immutable data structures like events used for eventsourcing and such feels wrong :/

What am I doing wrong?

Sorry if this is not the appropriate forum, but I have some questions surrounding your library that Im really struggling with - I'm new to Avro and Scala so forgive me if I'm missing the obvious.

I have a simple Avro Schema Asof.avsc generated by another system for me to supply them avro files in this schema. My assumption from reading your docs, would be to use something along the lines of whats shown in AvroTypeProviderExample. But when I do that, get compile errors.

Asof.avsc:

{
  "type" : "record",
  "name" : "Asof",
  "namespace" : "risk",
  "fields" : [ {
    "name" : "value",
    "type" : "string"
  } ]
}

my code:

package risk

import com.julianpeeters.avro.annotations._
import org.apache.avro.specific._
import org.apache.avro.generic._
import org.apache.avro.file._
import java.io.File


@AvroTypeProvider("src/main/avro/Asof.avsc")
@AvroRecord
case class Asof()



object AvroConverter extends App {
  println(Asof)
  val record = Asof("20160912")//compile error too many arguments to method apply


  val file = File.createTempFile("AsofTest", "avro")
  file.deleteOnExit()


  val userDatumWriter = new SpecificDatumWriter[Asof]
  val dataFileWriter = new DataFileWriter[Asof](userDatumWriter)
  dataFileWriter.create(record.getSchema(), file);//compile error cannot resolve symbol getSchema
  dataFileWriter.append(record);
  dataFileWriter.close();


  val schema = new DataFileReader(file, new GenericDatumReader[GenericRecord]).getSchema
  val userDatumReader = new SpecificDatumReader[Asof](schema)
  val dataFileReader = new DataFileReader[Asof](file, userDatumReader)
  val sameRecord = dataFileReader.next()


  println("deserialized record is the same as a new record based on the schema in the file?: " + (sameRecord == record) )


}

I'm using scala 2.10 and appropriate version on your lib with that - can you please advise what I am doing wrong here?

Compile time typecheck fails on doubly+ nested types

case class(x: List[Option[Int]]) fails to expand.

Used to work when types were converted to Strings for the typematcher, but after moving to TypeRefs (for typesafety), c.typecheck returns List[Option[...]]

How can I typecheck the whole type? Can I typecheck recursively? I've added the question on SO.

Compile fails with Scala 2.10

My code compiles fine with Scala 2.11.6 but fails with Scala 2.10

Given the following code:

package com.twc.needle.domain

import com.julianpeeters.avro.annotations._

import org.apache.avro.specific.SpecificDatumWriter
import org.apache.avro.io.EncoderFactory
import org.apache.avro.generic.GenericDatumReader
import org.apache.avro.generic.GenericData.Record
import org.apache.avro.io.{DecoderFactory, EncoderFactory}

import java.io.ByteArrayOutputStream

@AvroRecord
case class Platform (
  var deviceId: Option[String],
  var deviceType: Option[String]
)

@AvroRecord
case class Quality (
  var bitrate: Option[Int],
  var previousBitrate: Option[Int]
)

@AvroRecord
case class Playback (
  var quality: Option[Quality]
)

@AvroRecord
case class AtomicSlice (
  var timestamp_received: Long,
  var platform: Option[Platform],
  var playback: Option[Playback]
)

object ASMain {

  def main(args: Array[String]) = {

    val platform = Platform(deviceId=None, deviceType=Some("abc"))

    val as = AtomicSlice(
      timestamp_received=1234567890000L,
      platform=Some(platform),
      playback=None)

    println(as.toString)
    println(AtomicSlice.SCHEMA$)

    val sw = new SpecificDatumWriter[AtomicSlice](AtomicSlice.SCHEMA$)

    val out = new java.io.ByteArrayOutputStream()
    val encoder = EncoderFactory.get().binaryEncoder(out, null)
    sw.write(as, encoder)
    encoder.flush
    val ba = out.toByteArray
    out.close

    val reader = new GenericDatumReader[Record](AtomicSlice.SCHEMA$)

    val decoder = DecoderFactory.get().binaryDecoder(ba, null)
    val decoded = reader.read(null, decoder)

    println(decoded.toString)

  }
}

Here is the build.sbt

name := "domain-model"

version := "1.0"

//scalaVersion := "2.10.5"

resolvers += Resolver.sonatypeRepo("releases")

addCompilerPlugin("org.scalamacros" % "paradise" % "2.1.0-M5" cross CrossVersion.full)

libraryDependencies ++= Seq(
  "com.julianpeeters" % "avro-scala-macro-annotations_2.10" % "0.4"
)

Here is the error:

[info] Compiling 1 Scala source to /Users/mahesh/TWC/MaprVagrant/domain-model/target/scala-2.10/classes...
[error] /Users/mahesh/TWC/MaprVagrant/domain-model/src/main/scala/com/twc/needle/domain/AtomicSlice.scala:49: value SCHEMA$ is not a member of object com.twc.needle.domain.AtomicSlice
[error]     println(AtomicSlice.SCHEMA$)
[error]                         ^
[error] /Users/mahesh/TWC/MaprVagrant/domain-model/src/main/scala/com/twc/needle/domain/AtomicSlice.scala:51: value SCHEMA$ is not a member of object com.twc.needle.domain.AtomicSlice
[error]     val sw = new SpecificDatumWriter[AtomicSlice](AtomicSlice.SCHEMA$)
[error]                                                               ^
[error] /Users/mahesh/TWC/MaprVagrant/domain-model/src/main/scala/com/twc/needle/domain/AtomicSlice.scala:60: value SCHEMA$ is not a member of object com.twc.needle.domain.AtomicSlice
[error]     val reader = new GenericDatumReader[Record](AtomicSlice.SCHEMA$)
[error]                                                             ^
[error] three errors found
[error] (compile:compile) Compilation failed
[error] Total time: 2 s, completed Jun 5, 2015 9:51:13 AM
[mahesh@trishul:~/TWC/MaprVagrant/domain-model]$

Type macro to Unit

Is it possible to not return something from the macros? sbt console is full of discarded non-Unit value [warn] @AvroRecord messages.

What would it take to add a fixed field to this?

I am guessing you'd want to create another macro, do some checking to make sure it has a single field (ByteBuffer or Array[Byte]) and inherit from SpecificFixed, and add a macro on to the class @FixedSize(num).

Any ideas how to do this? I am having a harder time with this macro stuff than I thought =(

Support enums

Avro's SpecificData requires a Java enum. Is it even possible to annotate and expand top-level Java definitions?

Binary Avro representation fails

Hi,
I am trying to use the @AvroRecord annotation to create a binary Avro representation of my Scala case classes. It is failing with a null pointer exception and I can't figure out why.

Here is the code:

import com.julianpeeters.avro.annotations._

import org.apache.avro.specific.{SpecificDatumWriter, SpecificData}
import org.apache.avro.io.EncoderFactory

import java.io.ByteArrayOutputStream

@AvroRecord
case class Platform (
  var deviceId: Option[String],
  var deviceType: Option[String]
)

@AvroRecord
case class Quality (
  var bitrate: Option[Int],
  var previousBitRate: Option[Int]
)

@AvroRecord
case class Playback (
  var quality: Option[Quality]
)

@AvroRecord
case class AtomicSlice (
  var timestamp_received: Long,
  var platform: Option[Platform],
  var playback: Option[Playback]
)

object ASMain {

  def main(args: Array[String]) = {

    val platform = Platform(deviceId=scala.None, deviceType=Some("abc"))

    val as = AtomicSlice(
      timestamp_received=1234567890000L,
      platform=Some(platform),
      playback=scala.None)

    println(AtomicSlice.SCHEMA$)

    val sw = new SpecificDatumWriter[AtomicSlice]

    val out = new java.io.ByteArrayOutputStream()
    val encoder = EncoderFactory.get().binaryEncoder(out, null)
    sw.write(as, encoder)
    encoder.flush
    val ba = out.toByteArray
    out.close


  }
}

Here is the exception:

[error] (run-main-0) java.lang.NullPointerException
java.lang.NullPointerException
        at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:87)
        at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
        at com.twc.needle.domain.ASMain$.main(AtomicSlice.scala:51)
        at com.twc.needle.domain.ASMain.main(AtomicSlice.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)

logger didn't work out so well...

Added a logger to AvroTypeProvider so that IDE users could more easily see their directory's file path, but loggers don't appear to be silenceable when a macro is published:

http://stackoverflow.com/questions/32330081/how-can-i-change-the-log-level-of-a-scala-macro

With so much attention being given to scala.meta rather than supporting macros, I don't expect that this issue will be addressed soon.

So, if the logger gets too annoying, just let me know and I'll rip it out.

Support AVDL files

Would be really cool to support avdl files. JSON files don't allow importing other files so they are not such a good fit for larger projects.

Serializing with generic writer and deserializing with specific reader

Is that possible? Im rewriting some stuff using your library and came across this issue that when using generics for serializing and then your lib for deserializing using specific reader its not grabbing an int union, but just putting a null value

So, in this test you can see whats the deal, I have a base schema which I am expanding with your lib, I took it and added a gender field, so I can serialize using generics and then deserialize using the base one on the case class, below the code

val schema2 = new Schema.Parser().parse("""{
      "namespace": "example.avro",
      "type": "record",
      "name": "Person",
      "fields": [
         {"name": "name", "type": "string"},
         {"name": "age", "type": ["int", "null"]},
         {"name": "gender", "type": ["string", "null"]}
      ]
    }""")

    val work = new GenericRecordBuilder(schema2)
      .set("name", "Jeff")
      .set("age", 23)
      .set("gender", "M")
      .build

    val writer = new GenericDatumWriter[GenericRecord](schema2)
    val baos = new ByteArrayOutputStream
    val encoder = EncoderFactory.get().binaryEncoder(baos, null);
    writer.write(work, encoder)
    encoder.flush
    println(work)

    val datumReader = new SpecificDatumReader[Person](Person.SCHEMA$);
    val binDecoder = DecoderFactory.get().binaryDecoder(baos.toByteArray, null);
    val gwork2 = datumReader.read(null, binDecoder);

    println(gwork2)
    work.get("name").toString should be(gwork2.get("name").toString)
    work.get("age") should be(gwork2.get("age"))

the outputs of the prints are this

{"name": "Jeff", "age": 23, "gender": "M"}
{"name": "Jeff", "age": null}

I checked namespaces and everything seems ok, what else could I be missing here?

Can't parse plain text JSON schemas

How do you create the schemas in your tests? They are somehow binary. When using a plain text file with a JSON schema inside I get an exception saying: Not a data file.
My guess is, that the schemas in your tests are actually serialised instances of some Avro schema object instance. It would be great if you could outline how to go from a plaintext file to the type of schema you are using.

Supported collection types

When I use Seq or Array, I get the following error:

java.lang.UnsupportedOperationException: Could not generate schema. Cannot support yet: Seq[String]

With List, it seems to work. Please document which collection types are supported, and if possible support Seq as well.

Unable to compile with nested classes, referenced from different files

I noticed some very weird behaviour then an annotated class contains another annotated class (both stored in same file), and these classes are imported/used in multiple other.scala files.

Commenting out one of the usages, solves the compilation issue. So far it doesn't look very easy to come up with a minimal example...

Any ideas?

[error] Schemas.scala:81: exception during macro expansion:
[error] java.lang.UnsupportedOperationException: Cannot support yet: UserEx
[error]     at com.julianpeeters.avro.annotations.AvroRecordMacro$.com$julianpeeters$avro$annotations$AvroRecordMacro$$createSchema$1(AvroRecordMacro.scala:108)
[error]     at com.julianpeeters.avro.annotations.AvroRecordMacro$$anonfun$2.apply(AvroRecordMacro.scala:111)
[error]     at com.julianpeeters.avro.annotations.AvroRecordMacro$$anonfun$2.apply(AvroRecordMacro.scala:111)
[error]     at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
[error]     at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
[error]     at scala.collection.immutable.List.foreach(List.scala:318)
[error]     at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
[error]     at scala.collection.AbstractTraversable.map(Traversable.scala:105)
[error]     at com.julianpeeters.avro.annotations.AvroRecordMacro$.generateSchema$1(AvroRecordMacro.scala:111)
[error]     at com.julianpeeters.avro.annotations.AvroRecordMacro$.impl(AvroRecordMacro.scala:261)
[error]   @AvroRecord
[error]    ^

How to use it properly?

When I looked at the examples, case class always not taking any explicit parameters.
But when I was trying to use it at my code,

@AvroTypeProvider("src/main/avro/outputs/myclass.avro")
@AvroRecord
case class MyClass()

GCalendar(value1, value2)
I got too many arguments for method apply

Did I miss something? Could anyone please help

Null valued field exclusion

Hi,
Given the following case class:

@AvroRecord
case class Playback (
  var quality: Option[Quality]
)

If I set the value of quality to None, I would expect that the value simple does not appear in the resulting JSON. Instead, it seems to appear like this:

{"quality" : null}

Is it possible to remove it from the resulting JSON?

I assume this will be a little complicated because if this is the only field (or all fields are null) in the record then most likely you will also have to recursively remove the parent record from the JSON.

Thanks.

Adding default values in the schema?

Hi,
Is adding default values supported for the @AvroRecord annotation? Even if I set a default values for fields within case classes, it doesn't show up as a default value. Given this partial JSON:

 {"name": "platform_deviceid", "type": "string"}

It would be great if I set a default value for the case class field like this:

var platform_deviceid: String = "" 

that turns into:

 {"name": "platform_deviceid", "type": "string", "default": ""} 

Using default values is useful during schema evolution (for example, when deleting the field shown above in future iterations of the schema)

Issues reading and writing evolved records with the old schemas

Say we want to add a field to a record, then write more records the old schema.

In Java this is possible by using the new record with the old schema, as long as you used that exact schema for reading (or else use the new schema provided it has default values that can fill in for the lack of the added field).

But something goes wrong when using Java from this library.

Reference case class:

 Case Class MyRecord(i: Int)`
 val rec = MyRecord(0)

rec gets encoded as a 0 in the byte array.

"Evolved" case class (the nullable field j has been added):

Case Class MyRecord(i: Int, j: Option[Int])
val rec = MyRecord(0, None)

rec gets encoded as 2 bytes: one to represent the value of the int, the other to specify the first member of the array (in this case a null, so no other bytes are written).

Writing a new, "evolved" record with the old schema seems to work correctly, as rec appears to be converted into a 0 in the byte array. Just like it did before the new field was added correctly, just like it did when Java handled this correctly.

But what works in Java fails when reading. We get {"i": 0,"j":1}instead of {"i": 0}. A default value for the int within the option slipped in.

Will add a fix and some tests.

More complete example:

package com.example

import com.julianpeeters.avro.annotations._

import org.apache.avro.specific.SpecificDatumWriter
import org.apache.avro.specific.SpecificDatumReader
import org.apache.avro.io.{DecoderFactory, EncoderFactory}

import java.io.ByteArrayOutputStream


@AvroRecord
case class MyRecord(var i: Int, var j: Option[Int])

object Main {

  def main(args: Array[String]) = {
     val myRecord = MyRecord(i=1, j=null)
     val subSchema = new org.apache.avro.Schema.Parser().parse("""{"type":"record","name":"MyRecord","namespace":"com.twc.needle.domain","doc":"Auto-generated schema","fields":[{"name":"i","type":"int","doc":"Auto-Generated Field"}]}
""")
   val writer = new SpecificDatumWriter[MyRecord](subSchema)

    val out = new java.io.ByteArrayOutputStream()
    val encoder = EncoderFactory.get().binaryEncoder(out, null)
    writer.write(myRecord, encoder)


    encoder.flush
    val ba = out.toByteArray
    out.close

    val reader = new SpecificDatumReader[MyRecord](subSchema)
    val decoder = DecoderFactory.get().binaryDecoder(ba, null)
    val decoded = reader.read(null, decoder)


    println(decoded.toString)
  }
}

Efficient serialisation of avro annotated case-classes

it looks quite natural to use operate on the same case classes for simple data processing. The operations which need the objects to be serialized (e.g. groupby, shuffle in scalding/Spark) - currently with Chill+Kryo, this would incur huge overhead - some 1KB+ for every record just because of the SCHEMA$ field

so what should be the preferred way to (de)serialise these records?

of course simplest is to write a custom serializer for every data type :(, but it would be nice to have a generic helper to do so.

one possibility could be to use CaseClass.unapply and apply (which don't include SCHEMA$), and then somehow feed this into Chill. something along these lines:
https://github.com/twitter/chill/blob/42da58580409885afda0f3139835d82329009ca2/chill-bijection/src/test/scala/com/twitter/chill/CustomSerializationSpec.scala#L45-L63
but I cant yet find how to make this less verbose, and simpler

one could try to use Chill-Avro serialization, but apparently it doesn't work with annotations:

// https://github.com/twitter/chill/blob/911de385658aa012121620add4889a364d408d2f/chill-avro/src/main/scala/com/twitter/chill/avro/AvroSerializer.scala
import com.twitter.chill.avro.AvroSerializer
import com.julianpeeters.avro.annotations.AvroRecord

@AvroRecord
case class Test(var test: List[String])

AvroSerializer.SpecificRecordBinarySerializer[Test]

org.apache.avro.AvroRuntimeException: java.lang.IllegalAccessException: Class org.apache.avro.specific.SpecificData can not access a member of class Test with modifiers "private final"
at org.apache.avro.specific.SpecificData.createSchema(SpecificData.java:250)
at org.apache.avro.specific.SpecificData.getSchema(SpecificData.java:189)
at org.apache.avro.specific.SpecificDatumWriter.(SpecificDatumWriter.java:33)
at com.twitter.bijection.avro.SpecificAvroCodecs$.toBinary(AvroCodecs.scala:106)
at com.twitter.chill.avro.AvroSerializer$.SpecificRecordBinarySerializer(AvroSerializer.scala:37)
... 43 elided
Caused by: java.lang.IllegalAccessException: Class org.apache.avro.specific.SpecificData can not access a member of class Test with modifiers "private final"
at sun.reflect.Reflection.ensureMemberAccess(Reflection.java:109)
at java.lang.reflect.AccessibleObject.slowCheckMemberAccess(AccessibleObject.java:261)
at java.lang.reflect.AccessibleObject.checkAccess(AccessibleObject.java:253)
at java.lang.reflect.Field.get(Field.java:376)
at org.apache.avro.specific.SpecificData.createSchema(SpecificData.java:240)
... 47 more

Support for nesting in a more hierarchical manner

This is possibly out of the scope of this library but it would be nice to handle nested annotations as such:

{
  "type": "record",
  "name": "TestMessage",
  "namespace": "com.julianpeeters.example",
  "fields": [
    {"name": "message", "type": "string"},
    {
      "name": "metaData",
      "type": "com.julianpeeters.example.Metadata"
    }
  ]
}

And the nested message:

{
  "type": "record",
  "name": "MetaData",
  "namespace": "com.julianpeeters.example",
  "fields": [
    {"name": "source", "type": "string"},
    {"name": "timestamp", "type": "string"}
  ]
}

This would definitely improve the usability of the library in situations where large data models are represented. Would be glad to take up this work with some guidance on where to start looking. Perhaps a plug-in approach would be best (lots of features I'd like to use in conjunction with Avro as Scala macros take off; validation, type providers, etc)?

Error reading an array of schmas

In order to avoid repeat the definition of a record multiple and be able to reuse them across a schema I tried defining them like this:

[
{
    "type": "record",
    "name": "Bar",
    "fields": [ ]
},
{
    "type": "record",
    "name": "Foo",
    "fields": [
        {"name": "first", "type": "Bar"}
        {"name": "second", "type": "Bar"}
    ]
}
]

When I tried to use this with @AvroTypeProvider I got:
java.lang.RuntimeException: no record type found in the union from path/to/schema.avsc

The problem is that https://github.com/julianpeeters/avro-scala-macro-annotations/blob/master/macros/src/main/scala/avro/scala/macro/annotations/provider/FileParser.scala#L26 should check if x.getType == RECORD, not x itself.

That issue aside what I wanted to suggest that instead of returning the first schema from the union that has a record type you could maybe return either all of them or at least the last one? That's because if I put Bar after Foo I get parse error as Bar is not recognised. But left as in the example, the parsed Schema object corresponding to Foo will contain multiple occurrences of Bar so the existing finding mechanisms should work if the last schema of the union was returned.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.