The avro-scala-macro-annotations from mkolod

##Herein lie assorted macro annotations for working with Avro in Scala:

@AvroTypeProvider("path/to/schema") - Automatically convert Avro Schemas to Scala case class definitions for use in your favorite Scala Avro runtime.
@AvroRecord - Use Scala classes to represent your Avro records, serializable by the Apache Avro runtime (a port of Avro-Scala-Compiler-Plugin).

Get the dependency for 2.11.x (for Scala 2.10.5 please use version 0.4):

    libraryDependencies += "com.julianpeeters" % "avro-scala-macro-annotations_2.11" % "0.5"

Macro annotations are only available in Scala 2.10.x and 2.11.x with the macro paradise plugin. Their inclusion in official Scala might happen in Scala 2.12 - official docs. To use the plugin, add the following build.sbt:

    addCompilerPlugin("org.scalamacros" % "paradise" % "2.1.0-M5" cross CrossVersion.full)

Use the annotations separately, or together like this:

    package sample
    
    import com.julianpeeters.avro.annotations._
     
    @AvroTypeProvider("data/input.avro")
    @AvroRecord
    case class MyRecord()

First the fields are added automatically from an Avro Schema in a file, then the methods necessary for de/serialization are generated for you, all at compile time.

##1) Avro-Type-Provider If your use-case is "data-first" and you're using an Avro runtime library that allows you to use Scala case classes to represent your Avro records, then you are probably a little weary of transcribing Avro Schemas into their Scala case class equivalents.

Annotate an "empty" case class, and its members will be generated automatically at compile time using the data found in the Schema of a given file:

given the schema automatically found in input.avro or input.avsc:

    {"type":"record","name":"MyRecord","namespace":"tutorial","doc":"Auto-generated schema","fields":[{"name":"x","type":{"type":"record","name":"Rec","doc":"Auto-generated schema","fields":[{"name":"i","type":"int","doc":"Auto-Generated Field"}]},"doc":"Auto-Generated Field"}]}}

annotated empty case classes:

    import com.julianpeeters.avro.annotations._

    @AvroTypeProvider("data/input.avro")
    case class Rec()
     
    @AvroTypeProvider("data/input.avro")
    case class MyRecord()

expand to:

    package tutorial

    import com.julianpeeters.avro.annotations._

    @AvroTypeProvider("data/input.avro")
    case class Rec(i: Int)
     
    @AvroTypeProvider("data/input.avro")
    case class MyRecord(x: Rec)

####Please note:

The datafile must be available at compile time.
The filepath must be a String literal.
The name of the empty case class must match the record name exactly (peek at the schema in the file, if needed).
The order of class definition must be such that the classes that represent the most-nested records are expanded first.
A class that is doubly annotated with @AvroTypeProvider and @AvroRecord will be updated with vars instead of vals.

##2) Avro-Record: Implements SpecificRecord at compile time so you can use Scala case classes to represent Avro records (like Scalavro or Salat-Avro, but for the Apache Avro runtime so that it runs on your cluster). Since Avro-Scala-Compiler-Plugin doesn't work with Scala 2.10+ and the compiler still stumps me, I ported the serialization essentials over to use Scala Macro Annotations instead.

Now you can annotate a case class that you'd like to have serve as your Avro record:

    package sample

    @AvroRecord
    case class A(var i: Int)

    @AvroRecord
    case class B(var a: Option[A])

expands to implement SpecificRecord with put, get, and getSchema methods, with the schema:

    {"type":"record","name":"B","namespace":"sample","doc":"Auto-generated schema","fields":[{"name":"a","type":["null",{"type":"record","name":"A","doc":"Auto-generated schema","fields":[{"name":"i","type":"int","doc":"Auto-Generated Field"}]}],"doc":"Auto-Generated Field"}]}}

Use the expanded class as you would a code-gen'd class with any SpecificRecord API. e.g.:

    //Writing avros - no reflection
    val datumWriter = new SpecificDatumWriter[B]
    val dataFileWriter = new DataFileWriter[B](datumWriter)


    //Reading avros - no reflection allowed since Scala fields are always private
    val schema = new DataFileReader(file, new GenericDatumReader[GenericRecord]).getSchema 
    val userDatumReader = new SpecificDatumReader[B](schema)
    val dataFileReader = new DataFileReader[B](file, userDatumReader)

####Please note:

If your framework is one that relies on reflection to get the Schema, it will fail since Scala fields are private. Therefore preempt it by passing in a Schema and using no-argument constructors when necessary (as in the Avro example above).
Fields must be vars in order to be compatible with the SpecificRecord API
Works with the following Avro datatypes:
int float long double boolean string null array record

Nullable fields are represented by Option, and array by List. map, fixed, enum, and union (beyond nullable fields) are not yet supported.

A class that is doubly annotated with @AvroTypeProvider and @AvroRecord will automatically be updated with vars instead of vals
*For Scalding Only: Provide a null argument (e.g. @AvroRecord(null) ) to force the omission of a namespace in the generated schema. This must be done in order to read files with no namespace in the schema into case classes.
*For Scala 2.10.5: The order of class definition must be such that the classes that represent the most-nested records are defined and annotated first.

mkolod / avro-scala-macro-annotations Goto Github PK

avro-scala-macro-annotations's Introduction

avro-scala-macro-annotations's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent