sksamuel / avro4s Goto Github PK

Avro schema generation and serialization / deserialization for Scala

License: Apache License 2.0

Scala 91.78% Java 8.21% Shell 0.01%

scala avro avro-schema schema-generation scala-macros coproduct serialization

avro4s's Introduction

This is a community project - PRs will be accepted and releases published by the maintainer

Avro4s is a schema/class generation and serializing/deserializing library for Avro written in Scala. The objective is to allow seamless use with Scala without the need to write boilerplate conversions yourself, and without the runtime overhead of reflection. Hence, this is a macro based library and generates code for use with Avro at compile time.

The features of the library are:

Schema generation from classes at compile time
Boilerplate free serialization of Scala types into Avro types
Boilerplate free deserialization of Avro types to Scala types

Versioning

The master branch contains version 5.0.x which is designed for Scala 3. PRs are welcome. This version may have minor breaking changes compared to previous major release in order to support the new features of Scala 3.

The previous major version is 4.0.x located at branch release/4.0.x and is released for Scala 2.12 and Scala 2.13. This version is in support mode only. Bug reports are welcome and bug fixes will be released. No new features will be added.

Please raise PRs using branch names scala2/* and scala3/* depending on which version of Scala your work is targeting.

Schemas

Unlike Json, Avro is a schema based format. You'll find yourself wanting to generate schemas frequently, and writing these by hand or through the Java based SchemaBuilder classes can be tedious for complex domain models. Avro4s allows us to generate schemas directly from case classes at compile time via macros. This gives you both the convenience of generated code, without the annoyance of having to run a code generation step, as well as avoiding the peformance penalty of runtime reflection based code.

Let's define some classes.

case class Ingredient(name: String, sugar: Double, fat: Double)
case class Pizza(name: String, ingredients: Seq[Ingredient], vegetarian: Boolean, vegan: Boolean, calories: Int)

To generate an Avro Schema, we need to use the AvroSchema object passing in the target type as a type parameter. This will return an org.apache.avro.Schema instance.

import com.sksamuel.avro4s.AvroSchema
val schema = AvroSchema[Pizza]

Where the generated schema is as follows:

{
   "type":"record",
   "name":"Pizza",
   "namespace":"com.sksamuel",
   "fields":[
      {
         "name":"name",
         "type":"string"
      },
      {
         "name":"ingredients",
         "type":{
            "type":"array",
            "items":{
               "type":"record",
               "name":"Ingredient",
               "fields":[
                  {
                     "name":"name",
                     "type":"string"
                  },
                  {
                     "name":"sugar",
                     "type":"double"
                  },
                  {
                     "name":"fat",
                     "type":"double"
                  }
               ]
            }
         }
      },
      {
         "name":"vegetarian",
         "type":"boolean"
      },
      {
         "name":"vegan",
         "type":"boolean"
      },
      {
         "name":"calories",
         "type":"int"
      }
   ]
}

You can see that the schema generator handles nested case classes, sequences, primitives, etc. For a full list of supported object types, see the table later.

Overriding class name and namespace

Avro schemas for complex types (RECORDS) contain a name and a namespace. By default, these are the name of the class and the enclosing package name, but it is possible to customize these using the annotations AvroName and AvroNamespace.

For example, the following class:

package com.sksamuel
case class Foo(a: String)

Would normally have a schema like this:

{
  "type":"record",
  "name":"Foo",
  "namespace":"com.sksamuel",
  "fields":[
    {
      "name":"a",
      "type":"string"
    }
  ]
}

However we can override the name and/or the namespace like this:

package com.sksamuel

@AvroName("Wibble")
@AvroNamespace("com.other")
case class Foo(a: String)

And then the generated schema looks like this:

{
  "type":"record",
  "name":"Wibble",
  "namespace":"com.other",
  "fields":[
    {
      "name":"a",
      "type":"string"
    }
  ]
}

Note: It is possible, but not necessary, to use both AvroName and AvroNamespace. You can just use either of them if you wish.

Overriding a field name

The AvroName annotation can also be used to override field names. This is useful when the record instances you are generating or reading need to have field names different from the scala case classes. For example if you are reading data generated by another system, or another language.

Given the following class.

package com.sksamuel
case class Foo(a: String, @AvroName("z") b : String)

Then the generated schema would look like this:

{
  "type":"record",
  "name":"Foo",
  "namespace":"com.sksamuel",
  "fields":[
    {
      "name":"a",
      "type":"string"
    },
    {
      "name":"z",
      "type":"string"
    }    
  ]
}

Notice that the second field is z and not b.

Note: @AvroName does not add an alternative name for the field, but an override. If you wish to have alternatives then you want to use @AvroAlias.

Adding properties and docs to a Schema

Avro allows a doc field, and arbitrary key/values to be added to generated schemas. Avro4s supports this through the use of AvroDoc and AvroProp annotations.

These properties works on either complex or simple types - in other words, on both fields and classes. For example:

package com.sksamuel
@AvroDoc("hello, is it me you're looking for?")
case class Foo(@AvroDoc("I am a string") str: String, @AvroDoc("I am a long") long: Long, int: Int)

Would result in the following schema:

{  
  "type": "record",
  "name": "Foo",
  "namespace": "com.sksamuel",
  "doc":"hello, is it me you're looking for?",
  "fields": [  
    {  
      "name": "str",
      "type": "string",
      "doc" : "I am a string"
    },
    {  
      "name": "long",
      "type": "long",
      "doc" : "I am a long"
    },
    {  
      "name": "int",
      "type": "int"
    }
  ]
}

An example of properties:

package com.sksamuel
@AvroProp("jack", "bruce")
case class Annotated(@AvroProp("richard", "ashcroft") str: String, @AvroProp("kate", "bush") long: Long, int: Int)

Would generate this schema:

{
  "type": "record",
  "name": "Annotated",
  "namespace": "com.sksamuel",
  "fields": [
    {
      "name": "str",
      "type": "string",
      "richard": "ashcroft"
    },
    {
      "name": "long",
      "type": "long",
      "kate": "bush"
    },
    {
      "name": "int",
      "type": "int"
    }
  ],
  "jack": "bruce"
}

Overriding a Schema

Behind the scenes, AvroSchema uses an implicit SchemaFor. This is the core typeclass which generates an Avro schema for a given Java or Scala type. There are SchemaFor instances for all the common JDK and SDK types, as well as macros that generate instances for case classes.

In order to override how a schema is generated for a particular type you need to bring into scope an implicit SchemaFor for the type you want to override. As an example, lets say you wanted all integers to be encoded as Schema.Type.STRING rather than the standard Schema.Type.INT.

To do this, we just introduce a new instance of SchemaFor and put it in scope when we generate the schema.

implicit val intOverride = SchemaFor[Int](SchemaBuilder.builder.stringType)

case class Foo(a: Int)
val schema = AvroSchema[Foo]

Note: If you create an override like this, be aware that schemas in Avro are mutable, so don't share the values that the typeclasses return.

Transient Fields

Avro4s does not support the @transient anotation to mark a field as ignored, but instead supports its own @AvroTransient annotation to do the same job. Any field marked with this will be excluded from the generated schema.

package com.sksamuel
case class Foo(a: String, @AvroTransient b: String)

Would result in the following schema:

{  
  "type": "record",
  "name": "Foo",
  "namespace": "com.sksamuel",
  "fields": [  
    {  
      "name": "a",
      "type": "string"
    }
  ]
}

Field Mapping

If you are dealing with Avro data generated in other languages then it's quite likely the field names will reflect the style of that language. For example, Java may prefer camelCaseFieldNames but other languages may use snake_case_field_names or PascalStyleFieldNames. By default the name of the field in the case class is what will be used, and you've seen earlier that you can override a specific field with @AvroName, but doing this for every single field would be insane.

So, avro4s provides a FieldMapper for this. You simply bring into scope an instance of FieldMapper that will convert the scala field names into a target type field names.

For example, lets take a scala case and generate a schema using snake case.

package com.sksamuel
case class Foo(userName: String, emailAddress: String)
implicit val snake: FieldMapper = SnakeCase
val schema = AvroSchema[Foo]

Would generate the following schema:

{
  "type": "record",
  "name": "Foo",
  "namespace": "com.sksamuel",
  "fields": [
    {
      "name": "user_name",
      "type": "string"
    },
    {
      "name": "email_address",
      "type": "string"
    }
  ]
}

You can also define your own field mapper:

package com.sksamuel
case class Foo(userName: String, emailAddress: String)
implicit val short: FieldMapper = {
  case "userName"     => "user"
  case "emailAddress" => "email"
}
val schema = AvroSchema[Foo]

Would generate the following schema:

{
  "type": "record",
  "name": "Foo",
  "namespace": "com.sksamuel",
  "fields": [
    {
      "name": "user",
      "type": "string"
    },
    {
      "name": "email",
      "type": "string"
    }
  ]
}

Field Defaults

Avro4s will take into account default values on fields. For example, the following class case class Wibble(s: String = "foo") would be serialized as:

{
  "type": "record",
  "name": "Wibble",
  "namespace": "com.sksamuel.avro4s.schema",
  "fields": [
    {
      "name": "s",
      "type": "string",
      "default" : "foo"
    }
  ]
}

However if you wish the scala default to be ignored, then you can annotate the field with @AvroNoDefault. So this class case class Wibble(@AvroNoDefault s: String = "foo") would be serialized as:

{
  "type": "record",
  "name": "Wibble",
  "namespace": "com.sksamuel.avro4s.schema",
  "fields": [
    {
      "name": "s",
      "type": "string"
    }
  ]
}

Enums and Enum Defaults

AVRO Enums from Scala Enums, Java Enums, and Sealed Traits

Avro4s maps scala enums, java enums, and scala sealed traits to the AVRO enum type. For example, the following scala enum:

object Colours extends Enumeration {
  val Red, Amber, Green = Value
}

when referenced in a case class:

case class Car(colour: Colours.Value)

results in the following AVRO schema (e.g. using val schema = AvroSchema[Car]):

{
  "type" : "record",
  "name" : "Car",
  "fields" : [ {
    "name" : "colour",
    "type" : {
      "type" : "enum",
      "name" : "Colours",
      "symbols" : [ "Red", "Amber", "Green" ]
    }
  } ]
}

Avro4s will also convert a Java enum such as:

public enum Wine {
    Malbec, Shiraz, CabSav, Merlot
}

into an AVRO enum type:

{
  "type": "enum",
  "name": "Wine",
  "symbols": [ "Malbec", "Shiraz", "CabSav", "Merlot" ]
}

And likewise, avro4s will convert a sealed trait such as:

sealed trait Animal
@AvroSortPriority(0) case object Cat extends Animal
@AvroSortPriority(-1) case object Dog extends Animal

into the following AVRO enum schema:

{
  "type" : "enum",
  "name" : "Animal",
  "symbols" : [ "Cat", "Dog" ]
}

With @AvroSortPriority attribute, elements are sorted in descending order, by the priority specified (the element with the highest priority will be put as first).

According to Avro specification, when an element is not found the first compatible element defined in the union is used. For this reason order of the elements should not be changed when compatibility is important. Add new elements at the end.

An alternative solution is to use the @AvroUnionPosition attribute passing a number that will be sorted ascending, from lower to upper:

  sealed trait Fruit
  @AvroUnionPosition(0)
  case object Unknown extends Fruit
  @AvroUnionPosition(1)
  case class Orange(size: Int) extends Fruit
  @AvroUnionPosition(2)
  case class Mango(size: Int) extends Fruit

This will generate the following AVRO schema:

[
    {
        "type" : "record",
        "name" : "Unknown",
        "fields" : [ ]
    },
    {
        "type" : "record",
        "name" : "Orange",
        "fields" : [ {
            "name" : "size",
            "type" : "int"
        } ]
    },
    {
        "type" : "record",
        "name" : "Mango",
        "fields" : [ {
            "name" : "size",
            "type" : "int"
        } ]
    }
]

Field Defaults vs. Enum Defaults

As with any AVRO field, you can specify an enum field's default value as follows:

case class Car(colour: Colours.Value = Colours.Red)

resulting in the following AVRO schema:

{
  "type" : "record",
  "name" : "Car",
  "fields" : [ {
    "name" : "colour",
    "type" : {
      "type" : "enum",
      "name" : "Colours",
      "symbols" : [ "Red", "Amber", "Green" ]
    },
    "default": "Red"
  } ]
}

One benefit of providing a field default is that the writer can later remove the field without breaking existing readers. In the Car example, if the writer doesn't provide a value for the colour field, the reader will default the colour to Red.

But what if the writer would like to extend the Colour enumeration to include the colour Orange:

object Colours extends Enumeration {
  val Red, Amber, Green, Orange = Value
}

resulting in the following AVRO schema?

{
  "type" : "record",
  "name" : "Car",
  "fields" : [ {
    "name" : "colour",
    "type" : {
      "type" : "enum",
      "name" : "Colours",
      "symbols" : [ "Red", "Amber", "Green", "Orange" ]
    },
    "default": "Red"
  } ]
}

If a writer creates an Orange Car:

Car(colours = Colours.Orange)

readers using the older schema (the one without the new Orange value), will fail with a backwards compatibility error. I.e. readers using the previous version of the Car schema don't know the colour Orange, and therefore can't read the new Car record.

To enable writers to extend enums in a backwards-compatible way, AVRO allows you to specify a default enum value as part of the enum type's definition:

{
  "type" : "enum",
  "name" : "Colours",
  "symbols" : [ "Red", "Amber", "Green" ],
  "default": "Amber"
}

Note that an enum's default isn't the same as an enum field's default as showed below, where the enum default is Amber and the field's default is Red:

{
  "type" : "record",
  "name" : "Car",
  "fields" : [ {
    "name" : "colour",
    "type" : {
      "type" : "enum",
      "name" : "Colours",
      "symbols" : [ "Red", "Amber", "Green" ],
      "default": "Amber"
    },
    "default": "Red"
  } ]
}

Note that the field's default and the enum's default need not be the same value.

The field's default answers the question:

What value should the reader use if the writer didn't specify the field's value?

In the schema example above, the answer is Red.

The enum's default value answers the question:

What value should the reader use if the writer specifies an enum value that the reader doesn't recognize?

In the example above, the answer is Amber.

In summary, as long as a writer specified a the default enum value in previous versions of an enum's schema, the writer can add new enum values without breaking older readers. For example, we can add the colour Orange to the Colour enum's list of symbol/values without breaking older readers:

{
  "type" : "record",
  "name" : "Car",
  "fields" : [ {
    "name" : "colour",
    "type" : {
      "type" : "enum",
      "name" : "Colours",
      "symbols" : [ "Red", "Amber", "Green", "Orange" ],
      "default": "Amber"
    },
    "default": "Red"
  } ]
}

Specifically, given Amber as the enum's default, an older AVRO reader that receives an Orange Car will default the Car's colour to Amber, the enum's default.

The following sections describe how to define enum defaults through avro4s for scala enums, java enums, and sealed traits.

Defining Enum Defaults for Scala Enums

For scala enums such as:

object Colours extends Enumeration {
   val Red, Amber, Green = Value
}

avro4s gives you two options:

You can define an implicit SchemaFor using the ScalaEnumSchemaFor[E].apply(default: E) method where the method's default argument is one of the enum's values or ...
You can use the @AvroEnumDefault annotation to declare the default enum value.

For example, to create an implicit SchemaFor for an scala enum with a default enum value, use the ScalaEnumSchemaFor[E].apply(default: E) method as follows:

implicit val schemaForColours: SchemaFor[Colours.Value] = ScalaEnumSchemaFor[Colours.Value](default = Colours.Amber)

resulting in the following AVRO schema:

{
  "type" : "enum",
  "name" : "Colours",
  "symbols" : [ "Red", "Amber", "Green" ],
  "default": "Amber"
}

Or, to declare the default enum value, you can use the @AvroEnumDefault annotation as follows:

@AvroEnumDefault(Colours.Amber)
object Colours extends Enumeration {
   val Red, Amber, Green = Value
}

resulting in the same AVRO schema:

{
  "type" : "enum",
  "name" : "Colours",
  "symbols" : [ "Red", "Amber", "Green" ],
  "default": "Amber"
}

You can also use the following avro4s annotations to change a scala enum's name, namespace, and to add additional properties:

@AvroName
@AvroNamespace
@AvroProp

For example:

@AvroName("MyColours")
@AvroNamespace("my.namespace")
@AvroEnumDefault(Colours.Green)
@AvroProp("hello", "world")
object Colours extends Enumeration {
  val Red, Amber, Green = Value
}

resulting in the following AVRO schema:

{
  "type" : "enum",
  "name" : "MyColours",
  "namespace" : "my.namespace",
  "symbols" : [ "Red", "Amber", "Green" ],
  "default": "Amber",
  "hello" : "world"
}

Note that if you're using an enum from, for example, a 3rd party library and without access to the source code, you may not be able to use the @AvroEnumDefault annotation, in which case you'll need to use the ScalaEnumSchemaFor[E].apply(default: E) method instead.

Defining Enum Defaults for Java Enums

For java enums such as:

public enum Wine {
  Malbec, 
  Shiraz, 
  CabSav, 
  Merlot
}

avro4s gives you two options to define an enum's default value:

You can define an implicit SchemaFor using the JavaEnumSchemaFor[E].apply(default: E) method where the method's default argument is one of the enum's values or ...
You can use the @AvroJavaEnumDefault annotation to declare the default enum value.

For example, to create an implicit SchemaFor for an enum with a default enum value, use the JavaEnumSchemaFor[E].apply(default: E) method as follows:

implicit val schemaForWine: SchemaFor[Wine] = JavaEnumSchemaFor[Wine](default = Wine.Merlot)

Or, to declare the default enum value, use the @AvroJavaEnumDefault annotation as follows:

public enum Wine {
  Malbec, 
  Shiraz, 
  @AvroJavaEnumDefault CabSav,
  Merlot
}

Avro4s also supports the following java annotations for java enums:

@AvroJavaName
@AvroJavaNamespace
@AvroJavaProp

Putting it all together, you can define a java enum with using avro4s's annotations as follows:

@AvroJavaName("MyWine")
@AvroJavaNamespace("my.namespace")
@AvroJavaProp(key = "hello", value = "world")
public enum Wine {
  Malbec, 
  Shiraz, 
  @AvroJavaEnumDefault CabSav,
  Merlot
}

resulting in the following AVRO schema:

{
  "type": "enum",
  "name": "MyWine",
  "namespace": "my.namespace",
  "symbols": [
    "Malbec",
    "Shiraz",
    "CabSav",
    "Merlot"
  ],
  "default": "CabSav",
  "hello": "world"
}

Defining Enum Defaults for Sealed Traits

For sealed traits, you can define the trait's default enum using the @AvroEnumDefault annotation as follows:

@AvroEnumDefault(Dog)
sealed trait Animal
@AvroSortPriority(0) case object Cat extends Animal
@AvroSortPriority(-1) case object Dog extends Animal

resulting in the following AVRO schema:

{
  "type" : "enum",
  "name" : "Animal",
  "symbols" : [ "Cat", "Dog" ],
  "default" : "Dog"
}

Avro Fixed

Avro supports the idea of fixed length byte arrays. To use these we can either override the schema generated for a type to return Schema.Type.Fixed. This will work for types like String or UUID. You can also annotate a field with @AvroFixed(size). For example:

package com.sksamuel
case class Foo(@AvroFixed(7) mystring: String)
val schema = AvroSchema[Foo]

Will generate the following schema:

{
  "type": "record",
  "name": "Foo",
  "namespace": "com.sksamuel",
  "fields": [
    {
      "name": "mystring",
      "type": {
        "type": "fixed",
        "name": "mystring",
        "size": 7
      }
    }
  ]
}

If you have a value type that you always want to be represented as fixed, then rather than annotate every single location it is used, you can annotate the value type itself.

package com.sksamuel

@AvroFixed(4)
case class FixedA(bytes: Array[Byte]) extends AnyVal

case class Foo(a: FixedA)
val schema = AvroSchema[Foo]

And this would generate:

{
  "type": "record",
  "name": "Foo",
  "namespace": "com.sksamuel",
  "fields": [
    {
      "name": "a",
      "type": {
        "type": "fixed",
        "name": "FixedA",
        "size": 4
      }
    }
  ]
}

Finally, these annotated value types can be used as top level schemas too:

package com.sksamuel

@AvroFixed(6)
case class FixedA(bytes: Array[Byte]) extends AnyVal
val schema = AvroSchema[FixedA]

{
  "type": "fixed",
  "name": "FixedA",
  "namespace": "com.sksamuel",
  "size": 6
}

Controlling order of types in generated union schemas

The order of types in a union is significant in Avro, e.g the schemas type: ["int", "float"] and type: ["float", "int"] are different. This can cause problems when generating schemas for sealed trait hierarchies. Ideally we would generate schemas using the source code declaration order of the types. So for example:

sealed trait Animal
case class Dog(howFriendly: Float) extends Animal
case class Fish(remembersYou: Boolean) extends Animal

Should generate a schema where the order of types in the unions is Dog, Fish. Unfortunately, the SchemaFor macro can sometimes lose track of what the declaration order is - especially with larger hierarchies. In any situation where this is happening you can use the @AvroSortPriority annotation to explicitly control what order the types appear in. @AvroSortPriority takes a single float argument, which is the priority this field should be treated with, higher priority means closer to the beginning of the union. For example:

sealed trait Animal
@AvroSortPriority(1)
case class Dog(howFriendly: Float) extends Animal
@AvroSortPriority(2)
case class Fish(remembersYou: Boolean) extends Animal

Would output the types in the union as Fish,Dog.

Recursive Schemas

Avro4s supports recursive schemas. Customizing them requires some thought, so if you can stick with the out-of-the-box provided schemas and customization via annotations.

Customizing Recursive Schemas

The simplest way to customize schemas for recursive types is to provide custom SchemaFor instances for all types that form the recursion. Given for example the following recursive Tree type,

sealed trait Tree[+T]
case class Branch[+T](left: Tree[T], right: Tree[T]) extends Tree[T]
case class Leaf[+T](value: T) extends Tree[T]

it is easy to customize recursive schemas by providing a SchemaFor for both Tree and Branch:

import scala.collection.JavaConverters._

val leafSchema = AvroSchema[Leaf[Int]]
val branchSchema = Schema.createRecord("CustomBranch", "custom schema", "custom", false)
val treeSchema = Schema.createUnion(leafSchema, branchSchema)
branchSchema.setFields(Seq(new Schema.Field("left", treeSchema), new Schema.Field("right", treeSchema)).asJava)

val treeSchemaFor: SchemaFor[Tree[Int]] = SchemaFor(treeSchema)
val branchSchemaFor: SchemaFor[Branch[Int]] = SchemaFor(branchSchema)

If you want to customize the schema for one type that is part of a type recursion (e.g., Branch[Int]) while using generated schemas, this can be done as follows (sticking with the above example):

// 1. Use implicit def here so that this SchemaFor gets summoned for Branch[Int] in steps 6. and 10. below
// 2. Implement a ResolvableSchemaFor instead of SchemaFor directly so that SchemaFor creation can be deferred
implicit def branchSchemaFor: SchemaFor[Branch[Int]] = new ResolvableSchemaFor[Branch[Int]] {
  def schemaFor(env: DefinitionEnvironment[SchemaFor], update: SchemaUpdate): SchemaFor[Branch[Int]] =
    // 3. first, check whether SchemaFor[Branch[Int]] is already defined and return that if it is 
    env.get[Branch[Int]].getOrElse {
      // 4. otherwise, create an incomplete SchemaFor (it initially lacks fields)
      val record: SchemaFor[Branch[Int]] = SchemaFor(Schema.createRecord("CustomBranch", "custom schema", "custom", false))
      // 5. extend the definition environment with the created SchemaFor[Branch[Int]]
      val nextEnv = env.updated(record)
      // 6. summon a schema for Tree[Int] (using the Branch[Int] from step 1. through implicits)
      // 7. resolve the schema to get a finalized schema for Tree[Int]
      val treeSchema = SchemaFor[Tree[Int]].resolveSchemaFor(nextEnv, NoUpdate).schema
      // 8. close the reference cycle between Branch[Int] and Tree[Int]
      val fields = Seq(new Schema.Field("left", treeSchema), new Schema.Field("right", treeSchema))
      record.schema.setFields(fields.asJava)
      // 9. return the final SchemaFor[Branch[Int]]
      record
    }
}

// 10. summon Schema for tree and kick off encoder resolution.
val treeSchema = AvroSchema[Tree[Int]]

Input / Output

Serializing

Avro4s allows us to easily serialize case classes using an instance of AvroOutputStream which we write to, and close, just like you would any regular output stream. An AvroOutputStream can be created from a File, Path, or by wrapping another OutputStream. When we create one, we specify the type of objects that we will be serializing and provide a writer schema. For example, to serialize instances of our Pizza class:

import java.io.File
import com.sksamuel.avro4s.AvroOutputStream

val pepperoni = Pizza("pepperoni", Seq(Ingredient("pepperoni", 12, 4.4), Ingredient("onions", 1, 0.4)), false, false, 598)
val hawaiian = Pizza("hawaiian", Seq(Ingredient("ham", 1.5, 5.6), Ingredient("pineapple", 5.2, 0.2)), false, false, 391)

val schema = AvroSchema[Pizza]

val os = AvroOutputStream.data[Pizza].to(new File("pizzas.avro")).build()
os.write(Seq(pepperoni, hawaiian))
os.flush()
os.close()

Deserializing

We can easily deserialize a file back into case classes. Given the pizzas.avro file we generated in the previous section on serialization, we will read this back in using the AvroInputStream class. We first create an instance of the input stream specifying the types we will read back, the source file, and then build it using a reader schema.

Once the input stream is created, we can invoke iterator which will return a lazy iterator that reads on demand the data in the file.

In this example, we'll load all data at once from the iterator via toSet.

import com.sksamuel.avro4s.AvroInputStream

val schema = AvroSchema[Pizza]

val is = AvroInputStream.data[Pizza].from(new File("pizzas.avro")).build(schema)
val pizzas = is.iterator.toSet
is.close()

println(pizzas.mkString("\n"))

Will print out:

Pizza(pepperoni,List(Ingredient(pepperoni,12.2,4.4), Ingredient(onions,1.2,0.4)),false,false,500)
Pizza(hawaiian,List(Ingredient(ham,1.5,5.6), Ingredient(pineapple,5.2,0.2)),false,false,500)

Binary and JSON Formats

You can serialize as binary or json by specifying the format when creating the input or output stream. In the earlier example we use data which is considered the "default" for Avro.

To use json or binary, you can do the following:

AvroOutputStream.binary.to(...).build(...)
AvroOutputStream.json.to(...).build(...)

AvroInputStream.binary.from(...).build(...)
AvroInputStream.json.from(...).build(...)

Note: Binary serialization does not include the schema in the output.

Avro Records

In Avro there are two container interfaces designed for complex types - GenericRecord, which is the most commonly used, along with the lesser used SpecificRecord. These record types are used with a schema of type Schema.Type.RECORD.

To interface with the Avro Java API or with third party frameworks like Kafka it is sometimes desirable to convert between your case classes and these records, rather than using the input/output streams that avro4s provides.

To perform conversions, use the RecordFormat typeclass which converts to/from case classes and Avro records.

Note: In Avro, GenericRecord and SpecificRecord don't have a common Record interface (just a Container interface which simply provides for a schema without any methods for accessing values), so avro4s has defined a Record trait, which is the union of the GenericRecord and SpecificRecord interfaces. This allows avro4s to generate records which implement both interfaces at the same time.

To convert from a class into a record:

case class Composer(name: String, birthplace: String, compositions: Seq[String])
val ennio = Composer("ennio morricone", "rome", Seq("legend of 1900", "ecstasy of gold"))
val schema: Schema = AvroSchema[Composer]
implicit val toRecord: ToRecord[Composer] = ToRecord.apply[Composer](schema)
implicit val fromRecord: FromRecord[Composer] = FromRecord.apply[Composer](schema)
val format: RecordFormat[Composer] = RecordFormat.apply[Composer](schema)
// record is a type that implements both GenericRecord and Specific Record
val record = format.to(ennio)

And to go from a record back into a type:

// given some record from earlier
val record = ...
val format = RecordFormat[Composer]
val ennio = format.from(record)

Usage as a Kafka Serde

The com.sksamuel.avro4s.kafka.GenericSerde class can be used as a Kafka Serdes to serialize/deserialize case classes into Avro records with Avro4s. Note that this class is not integrated with the schema registry.

  import java.util.Properties
  import org.apache.kafka.clients.CommonClientConfigs
  import org.apache.kafka.clients.producer.ProducerConfig
  import com.sksamuel.avro4s.BinaryFormat

  case class TheKafkaKey(id: String)
  case class TheKafkaValue(name: String, location: String)

  val producerProps = new Properties();
  producerProps.put(CommonClientConfigs.BOOTSTRAP_SERVERS_CONFIG, "...")
  producerProps.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, new GenericSerde[TheKafkaKey](BinaryFormat))
  producerProps.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, new GenericSerde[TheKafkaValue](BinaryFormat))
  new ProducerConfig(producerProps)

Type Mappings

Avro4s defines two typeclasses, Encoder and Decoder which do the work of mapping between scala values and Avro compatible values. Avro has no understanding of Scala types, or anything outside of it's built in set of supported types, so all values must be converted to something that is compatible with Avro. There are built in encoders and decoders for all the common JDK and Scala SDK types, including macro generated instances for case classes.

For example a java.sql.Timestamp is usually encoded as a Long, and a java.util.UUID is encoded as a String.

Decoders do the same work, but in reverse. They take an Avro value, such as null and return a scala value, such as Option.

Some values can be mapped in multiple ways depending on how the schema was generated. For example a String, which is usually encoded as org.apache.avro.util.Utf8 could also be encoded as an array of bytes if the generated schema for that field was Schema.Type.BYTES. Therefore some encoders will take into account the schema passed to them when choosing the avro compatible type. In the schemas section you saw how you could influence which schema is generated for types.

Built in Type Mappings

import scala.collection.{Array, List, Seq, Iterable, Set, Map, Option, Either}
import shapeless.{:+:, CNil}

The following table shows how types used in your code will be mapped / encoded in the generated Avro schemas and files. If a type can be mapped in multiple ways, it is listed more than once.

Scala Type	Schema Type	Logical Type	Encoded Type
String	STRING		Utf8
String	FIXED		GenericFixed
String	BYTES		ByteBuffer
Boolean	BOOLEAN		java.lang.Boolean
Long	LONG		java.lang.Long
Int	INT		java.lang.Integer
Short	INT		java.lang.Integer
Byte	INT		java.lang.Integer
Double	DOUBLE		java.lang.Double
Float	FLOAT		java.lang.Float
UUID	STRING	UUID	Utf8
LocalDate	INT	Date	java.lang.Int
LocalTime	INT	time-millis	java.lang.Int
LocalDateTime	LONG	timestamp-nanos	java.lang.Long
java.sql.Date	INT	Date	java.lang.Int
Instant	LONG	Timestamp-Millis	java.lang.Long
Timestamp	LONG	Timestamp-Millis	java.lang.Long
BigDecimal	BYTES	Decimal<8,2>	ByteBuffer
BigDecimal	FIXED	Decimal<8,2>	GenericFixed
BigDecimal	STRING	Decimal<8,2>	String
Option[T]	UNION<null,T>		null, T
Array[Byte]	BYTES		ByteBuffer
Array[Byte]	FIXED		GenericFixed
ByteBuffer	BYTES		ByteBuffer
Seq[Byte]	BYTES		ByteBuffer
List[Byte]	BYTES		ByteBuffer
Vector[Byte]	BYTES		ByteBuffer
Array[T]	ARRAY		Array[T]
Vector[T]	ARRAY		Array[T]
Seq[T]	ARRAY		Array[T]
List[T]	ARRAY		Array[T]
Set[T]	ARRAY		Array[T]
sealed trait of case classes	UNION<A,B,..>		A, B, ...
sealed trait of case objects	ENUM<A,B,..>		GenericEnumSymbol
Map[String, V]	MAP		java.util.Map[String, V]
Either[A,B]	UNION<A,B>		A, B
A :+: B :+: C :+: CNil	UNION<A,B,C>		A, B, ...
case class T	RECORD		GenericRecord with SpecificRecord
Scala enumeration	ENUM		GenericEnumSymbol
Java enumeration	ENUM		GenericEnumSymbol
Scala tuples	RECORD		GenericRecord with SpecificRecord
Option[Either[A,B]]	UNION<null,A,B>		null, A, B
option of sealed trait of case classes	UNION<null,A,B,..>		null, A, B, ...
option of sealed trait of case objects	UNION<null,A,B,..>		null, GenericEnumSymbol

To select the encoding in case multiple encoded types exist, create a new Encoder with a corresponding SchemaFor instance to the via withSchema. For example, creating a string encoder that uses target type BYTES works like this:

val stringSchemaFor = SchemaFor[String](Schema.create(Schema.Type.BYTES))
val stringEncoder = Encoder[String].withSchema(stringSchemaFor)

Custom Type Mappings

It is very easy to add custom type mappings. To do this, we bring into scope a custom implicit of Encoder[T] and/or Decoder[T].

For example, to create a custom type mapping for a type Foo which writes out the contents in upper case, but always reads the contents in lower case, we can do the following:

case class Foo(a: String, b: String)

implicit object FooEncoder extends Encoder[Foo] {

  override val schemaFor = SchemaFor[Foo]

  override def encode(foo: Foo) = {
    val record = new GenericData.Record(schema)
    record.put("a", foo.a.toUpperCase)
    record.put("b", foo.b.toUpperCase)
    record
  }
}

implicit object FooDecoder extends Decoder[Foo] {

  override val schemaFor = SchemaFor[Foo]

  override def decode(value: Any) = {
    val record = value.asInstanceOf[GenericRecord]
    Foo(record.get("a").toString.toLowerCase, record.get("b").toString.toLowerCase)
  }
}

Another example is changing the way we serialize LocalDateTime to store these dates as ISO strings. In this case, we are writing out a String rather than the default Long so we must also change the schema type. Therefore, we must add an implicit SchemaFor as well as the encoders and decoders.

implicit val LocalDateTimeSchemaFor = SchemaFor[LocalDateTime](Schema.create(Schema.Type.STRING))

implicit object DateTimeEncoder extends Encoder[LocalDateTime] {

  override val schemaFor = LocalDateTimeSchemaFor

  override def encode(value: LocalDateTime) = 
    ISODateTimeFormat.dateTime().print(value)
}

implicit object DateTimeDecoder extends Decoder[LocalDateTime] {

  override val schemaFor = LocalDateTimeSchemaFor

  override def decode(value: Any) = 
    ISODateTimeFormat.dateTime().parseDateTime(value.toString)
}

These typeclasses must be implicit and in scope when you use AvroSchema or RecordFormat.

Coproducts

Avro supports generalised unions, eithers of more than two values. To represent these in scala, we use shapeless.:+:, such that A :+: B :+: C :+: CNil represents cases where a type is A OR B OR C. See shapeless' documentation on coproducts for more on how to use coproducts.

Sealed hierarchies

Scala sealed traits/classes are supported both when it comes to schema generation and conversions to/from GenericRecord. Generally sealed hierarchies are encoded as unions - in the same way like Coproducts. Under the hood, shapeless Generic is used to derive Coproduct representation for sealed hierarchy.

When all descendants of sealed trait/class are singleton objects, optimized, enum-based encoding is used instead.

Decimal scale, precision and rounding mode

In order to customize the scale and precision used by BigDecimal schema generators, bring an implicit ScalePrecision instance into scope.before using AvroSchema.

import com.sksamuel.avro4s.ScalePrecision

case class MyDecimal(d: BigDecimal)

implicit val sp = ScalePrecision(4, 20)
val schema = AvroSchema[MyDecimal]

{
  "type":"record",
  "name":"MyDecimal",
  "namespace":"com.foo",
  "fields":[{
    "name":"d",
    "type":{
      "type":"bytes",
      "logicalType":"decimal",
      "scale":"4",
      "precision":"20"
    }
  }]
}

When encoding values, it may be necessary to round values if they need to be converted to the scale used by the schema. By default this is RoundingMode.UNNECESSARY which will throw an exception if rounding is required. In order to change this, add an implicit RoundingMode value before the Encoder is generated.

case class MyDecimal(d: BigDecimal)

implicit val sp = ScalePrecision(4, 20)
val schema = AvroSchema[MyDecimal]

implicit val roundingMode = RoundingMode.HALF_UP
val encoder = Encoder[MyDecimal]

Type Parameters

When serializing a class with one or more type parameters, the avro name used in a schema is the name of the raw type, plus the actual type parameters. In other words, it would be of the form rawtype__typeparam1_typeparam2_..._typeparamN. So for example, the schema for a type Event[Foo] would have the avro name event__foo.

You can disable this by annotating the class with @AvroErasedName which uses the JVM erased name - in other words, it drops type parameter information. So the aforementioned Event[Foo] would be simply event.

Selective Customisation

You can selectively customise the way Avro4s generates certain parts of your hierarchy, thanks to implicit precedence. Suppose you have the following classes:

case class Product(name: String, price: Price, litres: BigDecimal)
case class Price(currency: String, amount: BigDecimal)

And you want to selectively use different scale/precision for the price and litres quantities. You can do this by forcing the implicits in the corresponding companion objects.

object Price {
  implicit val sp = ScalePrecision(10, 2)
  implicit val schema = SchemaFor[Price]
}

object Product {
  implicit val sp = ScalePrecision(8, 4)
  implicit val schema = SchemaFor[Product]
}

This will result in a schema where both BigDecimal quantities have their own separate scale and precision.

Cats Support

If you use cats in your domain objects, then Avro4s provides a cats module with schemas, encoders and decoders for some cats types. Just import import com.sksamuel.avro4s.cats._ before calling into the macros.

case class Foo(list: NonEmptyList[String], vector: NonEmptyVector[Boolean])
val schema = AvroSchema[Foo]

Refined Support

If you use refined in your domain objects, then Avro4s provides a refined module with schemas, encoders and decoders for refined types. Just import import com.sksamuel.avro4s.refined._ before calling into the macros.

case class Foo(nonEmptyStr: String Refined NonEmpty)
val schema = AvroSchema[Foo]

Mapping Recursive Types

Avro4s supports encoders and decoders for recursive types. Customizing them is possible, but involved. As with customizing SchemaFor instances for recursive types, the simplest way to customize encoders and decoders is to provide a custom encoder and decoder for all types that form the recursion.

If that isn't possible, you can customize encoders / decoders for one single type and participate in creating a cyclic graph of encoders / decoders. To give an example, consider the following recursive type for trees.

sealed trait Tree[+T]
case class Branch[+T](left: Tree[T], right: Tree[T]) extends Tree[T]
case class Leaf[+T](value: T) extends Tree[T]

For this, a custom Branch[Int] encoder can be defined as follows.

// 1. use implicit def so that Encoder.apply[Tree[Int]] in step 7. and 10. below picks this resolvable encoder for branches.
// 2. implement a ResolvableEncoder instead of Encoder directly so that encoder creation can be deferred
implicit def branchEncoder: Encoder[Branch[Int]] = new ResolvableEncoder[Branch[Int]] {

def encoder(env: DefinitionEnvironment[Encoder], update: SchemaUpdate): Encoder[Branch[Int]] =
  // 3. lookup in the definition environment whether we already have created an encoder for branch.
  env.get[Branch[Int]].getOrElse {

    // 4. use var here to first create an acyclic graph and close it later.
    var treeEncoder: Encoder[Tree[Int]] = null

    // 5. create a partially initialized encoder for branches (it lacks a value for treeEncoder on creation).
    val encoder = new Encoder[Branch[Int]] {
      val schemaFor: SchemaFor[Branch[Int]] = SchemaFor[Branch[Int]]

      def encode(value: Branch[Int]): AnyRef =
        ImmutableRecord(schema, Seq(treeEncoder.encode(value.left), treeEncoder.encode(value.right)))
    }

    // 6. extend the definition environment with the newly created encoder so that subsequent lookups (step 3.) can return it
    val nextEnv = env.updated(encoder)

    // 7. resolve the tree encoder with the extended environment; the extended env will be passed back to the lookup
    //    performed in step 3. above.
    // 9. complete the initialization by closing the reference cycle: the branch encoder and tree encoder now 
    //    reference each other.
    treeEncoder = Encoder.apply[Tree[Int]].resolveEncoder(nextEnv, NoUpdate)
    encoder
  }
}

// 10. summon encoder for tree and kick off encoder resolution.
val toRecord = ToRecord[Tree[Int]]

Why is this so complicated? Glad you asked! Turns out that caring for performance, providing customization via annotations, and using Magnolia for automatic typeclass derivation (which is great in itself) are three constraints that aren't easy to combine. This design is the best we came up with; if you have a better design for this, please contribute it!

Using avro4s in your project

Gradle

compile 'com.sksamuel.avro4s:avro4s-core_2.12:xxx'

SBT

libraryDependencies += "com.sksamuel.avro4s" %% "avro4s-core" % "xxx"

Maven

<dependency>
    <groupId>com.sksamuel.avro4s</groupId>
    <artifactId>avro4s-core_2.12</artifactId>
    <version>xxx</version>
</dependency>

Check the latest released version on Maven Central

Contributions

Contributions to avro4s are always welcome. Good ways to contribute include:

Raising bugs and feature requests
Fixing bugs and enhancing the DSL
Improving the performance of avro4s
Adding to the documentation

avro4s's People

Contributors

Stargazers

Watchers

Forkers

antwnis gitter-badger garyfrost danielgrigg stheppi andrewstevenson thomasdacosta annovi dlynam01 giampaolotrapasso bbnsumanth jeroentervoorde remcobeckers whazenberg jecos searler simonsouter jiaqianjing nkgfirecream ecompositor vgmartinez jxie418 max5599 kalvinnchau rasjones andrey-ilinykh markschaake dpratt bestnaja mtranter yuvalitzchakov epiphyllum krzemin daniel-dostal gcpagano netanelrabinowitz qantas dankins ovotech larochef piotr-kalanski ryanworsley gameduell julien-lafont dmnpignaud hongwei1 vikumar-ciena constantine2nd ericaovo nicolaenmv ogirardot ivc-inform xelnaga svroonland sksundaram-learning dnvriend ralic navicore realstraw rawyler kiara-riley asvyazin credimi rjprice72 aminehamila rfarahmand holinov maxkorolev ptzagk agustafson 2chilled ylabi wzorgdrager newta rolandomanrique maasg tmnd1991 banno sullis harishraj felixmulder jadireddi s4nk dzeveloper nterry haandude binio zhen-hao tyler-clark blondas alexjg vilinski ljurukov coreyauger kpmeen grofao z4f1r0v enrobsop scala-steward zzeekk

avro4s's Issues

Unused import on AvroOutputStream usage

Copy pasted from my SO question:

I have small test class like this:

package test.avro

object Test extends App {
  import java.io.ByteArrayOutputStream
  import com.sksamuel.avro4s.AvroOutputStream

  case class Composer(name: String, birthplace: String, compositions: Seq[String])
  val ennio = Composer("ennio morricone", "rome", Seq("legend of 1900", "ecstasy of gold"))

  val baos = new ByteArrayOutputStream()
  val output = AvroOutputStream.json[Composer](baos)
  output.write(ennio)
  output.close()
  print(baos.toString("UTF-8"))
}

With relevant settings:

scalaVersion := "2.11.8"
libraryDependencies += "com.sksamuel.avro4s" %% "avro4s-core" % "1.6.1"

When I try to compile it I receive following error message:

[error] [path on my drive...]/src/main/scala/test/avro/Test.scala:1: Unused import
[error] package pl.combosolutions.jobstream.common
[error] ^
[error] one error found
[error] (compile:compileIncremental) Compilation failed

I saw that there was similar error reported on avro4s issue tracker, but with implicit error not unused imports. However that was in version 1.5.0 - I am using version 1.6.1 (and tried several versions in-between to check if that's not some random regression). Changing avro4j import to import com.sksamuel.avro4s._ didn't help either.

On the other hand error message is similar to this one. I use Scala 2.11.8, but just in case I checked whether changing to 2.11.7 would help (it didn't).

What else can I try to figure out what is the source of such weird behavior? Is it something that I missed or a bug? Of so where should I file it? I suspect it is something with ToRecord trait macros, but I cannot tell for sure.

Removal of "-Ywarn-unused-import" make things work again - should I assume it is a bug in a library?

Modernize the build

The SBT docs recommend using .sbt files in favor of the "old" project/*.scala method. Also, I think the build could be cleaned up by creating AutoPlugins for the global settings and for publishing. I'll issue a PR with these changes.

Sealed trait doesn't get encoded as an Avro enum

The Type Mappings table in the README suggests that I can use a sealed trait hierarchy to represent an enum, but it doesn't seem to work. Am I doing something wrong?

MyEnum.scala:

package foo

import com.sksamuel.avro4s._

sealed trait MyEnum
case object Wow extends MyEnum
case object Yeah extends MyEnum

case class Container(myEnum: MyEnum)

REPL:

import com.sksamuel.avro4s._

scala> AvroSchema[foo.MyEnum]
res1: org.apache.avro.Schema = {"type":"record","name":"MyEnum","namespace":"foo","fields":[]}

scala> AvroSchema[foo.Container]
res2: org.apache.avro.Schema = {"type":"record","name":"Container","namespace":"foo","fields":[{"name":"myEnum","type":{"type":"record","name":"MyEnum","fields":[]}}]}

I would expect the sealed trait to be encoded in the schema as:

{
  "type": "enum",
  "symbols": [ "Wow", "Yeah" ]
}

I thought this might be a symptom of SI-7046, but I can see that the shapeless Coproduct is derived just fine:

scala> Generic[foo.MyEnum]
res4: shapeless.Generic[foo.MyEnum]{type Repr = shapeless.:+:[foo.Wow.type,shapeless.:+:[foo.Yeah.type,shapeless.CNil]]} = anon$macro$3$1@57f23793

Avro4s should support Scala Enumerations

Hello,

I wrote the following code

val is = AvroInputStreamMyType

"body" is value that i got from a callback, and the type is array of byte (body: Array[Byte])

But this throws a compile time error

could not find implicit value for parameter builder:
Error:(48, 60) could not find implicit value for evidence parameter of type com.sksamuel.avro4s.FromRecord[MyType]
implicit val is = AvroInputStreamMyType
^
not enough arguments for method apply: (implicit evidence$1: com.sksamuel.avro4s.SchemaFor[MyType], implicit evidence$2: com.sksamuel.avro4s.FromRecord[MyType])com.sksamuel.avro4s.AvroInputStream[MyType] in object AvroInputStream.
Unspecified value parameter evidence$2.
implicit val is = AvroInputStreamMyType

Support generating Avro partitioning files (for Kite)

Scenario. We need to define an Avro schema, and an Avro Partitioning Strategy as well

i.e.

app_logging.avsc

{
  "type": "record",
  "name": "app_logging",
  "namespace": "my.namespace.com",
  "doc": "An application log",
  "fields": [
      {
          "name": "level",
          "type": "string",
          "doc" : "Level"
      }, {
          "name": "application_name",
          "type": "string",
          "doc" : "Name of the application"
      }
    ]
}

Partitioning strategy:
app_logging_partition.json

[
  {"type": "identity", "source": "application_name", "name": "component_name"}
]

Can't create schema for generic type

Hi,

I'm a fan of your library, having looked around at the others yours is cleanest and most well thought through. However I've found a limitation that may be a bug - not sure.

I'm trying to nest case classes like this;

case class Message[T](payload: T, identity: String = UUID.randomUUID().toString)
case class MyRecord(key: String, str1: String, str2: String, int1: Int) 

AvroSchema[Message[MyRecord]]

However I get the compiler error;
Could not find implicit SchemaFor[Message[MyRecord]] AvroSchema[Message[MyRecord]]

I need to avoid having a SchemaFor defined for every T.

Perhaps there's something I'm doing wrong? I've tried several ways to specify a SchemaFor, but not having a lot of luck.

Really keen to use your library on an up-coming project - please help!

Regards,
Ryan.

Remove generated dirs on sbt clean ?

When compiling a project having avro4s as a dependency - classes are generated at folders:
avro4s-core and avro4s-generator

Can they be automatically removed on sbt clean ?

Add scalajs support

I'm wondering if you could add support for Scala.js. The restriction is you can't do reflection, so it has to be macros and I think you'll need to use a library like https://github.com/mtth/avsc on the JS side. It might be a good fit with autowire.

Current solution I'm looking at is upickle (arity < 22) + circe (for arity > 22, but it makes the build extremely slow with shapeless and all), and boopickle for binary (not sure how compatible it's with all the browsers, and it's hard to debug since it doesn't have a human-readable codec). ScalaPB is also an alternative, but you need to write an IDL boilerplate.

Could not find implicit value for parameter builder: Shapeless.Lazy

Hello,

I wrote the following code

val schema = AvroSchema[SalesRecord]
val output = AvroOutputStream[SalesRecord](new File(outputLocation))
output.write(salesList)
output.flush
output.close

But this throws a compile time error

could not find implicit value for parameter builder: shapeless.Laxy[com.sksamuel.avro4s.AvroSchema[...]]

not enough arguments for method apply: ....
Unspecified value parameter builder.

Value class does not deserialize json

Thank you for implementing #50

Unfortunately there appears to be a defect in AvroInputStream for both binary and json

package com.sksamuel.avro4s.examples


import com.sksamuel.avro4s.AvroSchema
import org.scalatest.{Matchers, WordSpec}

case class Wrapper(val underlying: Int) extends AnyVal
case class Test(i:Int,w:Wrapper)

class ValueClassExamples extends WordSpec with Matchers {
  import java.io.{ByteArrayInputStream, ByteArrayOutputStream}
  import com.sksamuel.avro4s.{AvroInputStream, AvroOutputStream}

  "AvroStream json serialization" should {

    "round trip the value class " in {
      val baos = new ByteArrayOutputStream()
      val output = AvroOutputStream.json[Test](baos)
      output.write(Test(2,Wrapper(10)))
      output.close()

      val json = baos.toString("UTF-8")

      json shouldBe ("""{"i":2,"w":10}""") // as expected

      val in = new ByteArrayInputStream(json.getBytes("UTF-8"))
      val input = AvroInputStream.json[Test](in) //  <=== fails
      val result = input.iterator.toSeq
      result shouldBe Vector("")
    }
  }

}

The stack trace is

10 (of class java.lang.Integer)
scala.MatchError: 10 (of class java.lang.Integer)
    at com.sksamuel.avro4s.LowPriorityFromValue$$anon$1.apply(FromRecord.scala:22)

Deserialization seems failed

object AvroExample extends App {
  val pepperoni = Pizza("pepperoni", Seq(Ingredient("pepperoni", 12, 4.4), Ingredient("onions", 1, 0.4)), false, false, 98)
  val hawaiian = Pizza("hawaiian", Seq(Ingredient("ham", 1.5, 5.6), Ingredient("pineapple", 5.2, 0.2)), false, false, 91)

  val os = AvroOutputStream[Pizza](new File("pizzas.avro"))
  os.write(pepperoni)
  os.close()

  val is = AvroInputStream[Pizza](new File("pizzas.avro"))
  println(is.iterator.toList)
  is.close()
}

I follow example and get a empty List.., it seems deserialization failed

Read via kafka-avro-console-consumer

I write to topic using avro4s with GenericRecord java api. Then, when I try to read topic via kafka-avro-console-consumer (which is part of confluent platform) I will get following error:

Processed a total of 1 messages
[2016-11-08 09:42:47,646] ERROR Unknown error when running consumer:  (kafka.tools.ConsoleConsumer$:103)
org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id -1
Caused by: org.apache.kafka.common.errors.SerializationException: Unknown magic byte!

Using producer with plain java api does not produce the error above. Is it bug or I just misunderstood usage of avro4s library?

the problem of new schema creation for each record.

Hi Samuel,

I have a problem when using ToRecord for converting case class objects to GenericRecord for Kafka, since for each object a new schema is generated like at https://github.com/sksamuel/avro4s/blob/master/avro4s-macros/src/main/scala/com/sksamuel/avro4s/ToRecord.scala#L170, kafka takes an exception like that; https://groups.google.com/forum/#!msg/confluent-platform/gkmtn2FO4Ug/IIsp8tZHT0QJ

Could we have a chance to pass a pre-defined schema for a case class to ToRecord instead of generating at each time.

BigDecimal scale and precision support

Hello again,

I'm looking for support for setting the scale and precision of an Avro BigDecimal. This is possible via implicits and a new type that contains the scale and precision.

I will have a contribution for it soon.

How can I use this to generate the avro schema for a JSON payload?

$subject please.

Regards
Awanthika

fails with sealed abstract class

this test fails - when I've got a sealed base class

package com.sksamuel.avro4s

import java.io.File
import org.scalatest.{Matchers, WordSpec}

class EdgeCasesTest extends WordSpec with Matchers {

  val instance = Enterprise(EntityId("Dead"), Person("Julius Caesar", 2112))
  val go = GoPiece(GoBase.StringInstance("hello"), GoBase.EmptyInstance,
    GoBase.IntInstance(42), GoBase.EmptyInstance)

  val sealedAC = Wrapper(SealedAbstractClass1("a"), SealedAbstractClass3)

  "Avro4s" should {
    "support extends AnyVal" in {
      val tempFile = File.createTempFile("enterprise", ".avro")

      val writer = AvroOutputStream[Enterprise](tempFile)
      writer.write(instance)
      writer.close()

      val reader = AvroInputStream[Enterprise](tempFile)
      val enterprise = reader.iterator.next()
      enterprise should === (instance)
    }

    "support sealed case object instances 1" in {
      val tempFile = File.createTempFile("go", ".avro")

      val writer = AvroOutputStream[GoPiece](tempFile)
      writer.write(go)
      writer.close()

      val reader = AvroInputStream[GoPiece](tempFile)
      val enterprise = reader.iterator.next()
      enterprise should === (go)
    }

    "support sealed case object instances 2" in {
      val tempFile = File.createTempFile("sac", ".avro")

      val writer = AvroOutputStream[Wrapper](tempFile)
      writer.write(sealedAC)
      writer.close()

      val reader = AvroInputStream[Wrapper](tempFile)
      val enterprise = reader.iterator.next()
      enterprise should === (sealedAC)
    }
  }
}

case class Enterprise(id: EntityId, user: Person)
case class Person(name: String, age: Int)
case class EntityId(v: String) extends AnyVal

sealed abstract class SealedAbstractClass {}
case class SealedAbstractClass1(v: String) extends SealedAbstractClass
case class SealedAbstractClass2(v: Int) extends SealedAbstractClass
case object SealedAbstractClass3 extends SealedAbstractClass

case class Wrapper(a: SealedAbstractClass, b: SealedAbstractClass)

sealed abstract class GoBase {}
object GoBase {
  implicit class StringInstance(v: String) extends GoBase
  implicit class IntInstance(v: Int) extends GoBase
  case object EmptyInstance extends GoBase
}
case class GoPiece(a: GoBase, b: GoBase, c: GoBase, d: GoBase)

Incompatible migration with Option

For the case class:
Person(name: String)
I have the following schema:

{
  "type": "record",
  "name": "Person",
  "fields": [
    {
      "name": "name",
      "type": "string"
    }
  ]
}

I'd like to add an optional field "nickname":
Person(name: String, nickname: Option[String] = None)
That must result in the schema:

{
  "type": "record",
  "name": "Person",
  "fields": [
    {
      "name": "name",
      "type": "string"
    },
    {
      "name": "nickname",
      "type": [
        "null",
        "string"
      ],
      "default": null
    }
  ]
}

But unfortunately, the schema generated is missing the default to null:

{
  "type": "record",
  "name": "Person",
  "fields": [
    {
      "name": "name",
      "type": "string"
    },
    {
      "name": "nickname",
      "type": [
        "null",
        "string"
      ]
    }
  ]
}

And so it's not compatible.

Using Generic Type Serialisation

Hi,

Im trying to write a Generic serialization method for Avro4s :

  def toBinary[T](event: T)(implicit schemaFor: SchemaFor[T],  toRecord: ToRecord[T]) = {
    val baos = new ByteArrayOutputStream()
    val output = AvroOutputStream.binary[T](baos)
    output.write(event)
    output.close()
    baos.toByteArray
  }

But I get the following error:

could not find implicit value for evidence parameter of type com.sksamuel.avro4s.ToRecord[T]

I also try this:

def toBinary[T](event: T)(implicit schema: AvroSchema[T],  toRecord: ToRecord[T]) = {
    val baos = new ByteArrayOutputStream()
    val output = AvroOutputStream.binary[T](baos)
    output.write(event)
    output.close()
    baos.toByteArray
  }

and I get the following error:

org.apache.avro.reflect.AvroSchema does not take type parameters
[error]   def toBinary[T](event: T)(implicit schema: AvroSchema[T],  toRecord: ToRecord[T]) = {

Is there anyway I can generalise the serialisation?

JSON serialize fails - could not find implicit evidence

Am I doing something daft? From the example:

object AvroTest extends App {
  import com.sksamuel.avro4s.AvroOutputStream

  case class Composer(name: String, birthplace: String, compositions: Seq[String])
  val ennio = Composer("ennio morricone", "rome", Seq("legend of 1900", "ecstasy of gold"))

  val baos = new ByteArrayOutputStream()
  val output = AvroOutputStream.json[Composer](baos)
  output.write(ennio)
  output.close()
  println(baos.toString("UTF-8"))
}

gives

Error:(14, 47) could not find implicit value for evidence parameter of type com.sksamuel.avro4s.ToRecord[AvroTest.Composer]
  val output = AvroOutputStream.json[Composer](baos)
                                              ^

Error:(14, 47) not enough arguments for method json: (implicit evidence$5: com.sksamuel.avro4s.SchemaFor[AvroTest.Composer], implicit evidence$6: com.sksamuel.avro4s.ToRecord[AvroTest.Composer])com.sksamuel.avro4s.AvroJsonOutputStream[AvroTest.Composer].
Unspecified value parameter evidence$6.
  val output = AvroOutputStream.json[Composer](baos)
                                              ^

using sbt:

 sbt compile
[info] Loading project definition from /Users/nick/workspace/scala/avro4s-example/project
[info] Set current project to avro4s-example (in build file:/Users/nick/workspace/scala/avro4s-example/)
[info] Updating {file:/Users/nick/workspace/scala/avro4s-example/}avro4s-example...
[info] Resolving jline#jline;2.12.1 ...
[info] Done updating.
[info] Compiling 1 Scala source to /Users/nick/workspace/scala/avro4s-example/target/scala-2.11/classes...
[error] /Users/nick/workspace/scala/avro4s-example/src/main/scala-2.11/AvroTest.scala:12: could not find implicit value for evidence parameter of type com.sksamuel.avro4s.ToRecord[AvroTest.Composer]
[error]   val output = AvroOutputStream.json[Composer](baos)
[error]                                               ^
[error] one error found
[error] (compile:compileIncremental) Compilation failed
[error] Total time: 5 s, completed 05 Jul 2016 4:28:07 PM

Add support for thift => avro conversions

Explore how we can get from thrift => scala case class => avro
by using scrooge or any other library and entertain possible scenarios with thrift

Release 1.0.0 not on Maven Central

http://mvnrepository.com/artifact/com.sksamuel.avro4s/avro4s-core_2.11

displays just versions:
0.90.0 - 0.93.0

value schema in trait ToSchema of type org.apache.avro.Schema is not defined

Hello!

I get this build error when defining a custom type mapping with 1.5.1.

Error:(30, 19) object creation impossible, since value schema in trait ToSchema of type org.apache.avro.Schema is not defined
  implicit object InstantToSchema extends ToSchema[Instant] {

Here is the mapping code:

  implicit object InstantToSchema extends ToSchema[Instant] {
    override def apply(): Schema = Schema.create(Schema.Type.STRING)
  }

  implicit object InstantToValue extends ToValue[Instant] {
    override def apply(value: Instant): String = value.toString
  }

  implicit object InstantFromValue extends FromValue[Instant] {
    override def apply(value: Any, field: Field): Instant = Instant.parse(value.toString)
  }

This does not happen in 1.4.3 with the same code.

Cheers!

Primitive type schemas

Hi @sksamuel

When I used to do println(AvroSchema[Double]), it would return "double". Now it seems to do {"type":"record","name":"Double","namespace":"scala","fields":[]}.

Not sure if I'm doing something wrong here?

ADT

Given (defined in a different module, nesting the case class in a companion object UserEvent also doesn't work)

sealed trait UserEvent
case class Registered(id: UUID, name: String, email: String, password: String) extends UserEvent
case class EmailUpdated(id: UUID, email: String) extends UserEvent
case class PasswordUpdated(id: UUID, newPassword: String) extends UserEvent
case class LoggedIn(id: UUID, newPassword: String) extends UserEvent

Running

  val schema = AvroSchema[UserEvent]

  println(schema.toString(true))

Gives

{
  "type" : "record",
  "name" : "UserEvent",
  "namespace" : "statements.example",
  "fields" : [ {
    "name" : "newPassword",
    "type" : [ "null", "string" ]
  }, {
    "name" : "name",
    "type" : [ "null", "string" ]
  }, {
    "name" : "email",
    "type" : [ "null", "string" ]
  }, {
    "name" : "id",
    "type" : [ "string" ]
  }, {
    "name" : "password",
    "type" : [ "null", "string" ]
  } ]
}

I would expect a enum here. It should also include a discriminator field (like type or something)

As a side note, can't we use the shapeless type class derivation trick to create product and coproduct instances for SchemaFor[T] ?

WDYT?

Generating case classes from Avro schema using Gradle

Hello,
We use gradle. How can I get case class generated at build time?
Thanks!

Generating case classes from Json/IDL schema

Hi,

In the readme, it says "Class generation from schemas at build time".

I can't quite work out how to go about this though - do you have any pointers?

Thanks,
Stephen

AvroSchema produce different results in case of tagged fields.

I created simple project to reproduce this:
https://github.com/aludwiko/avro-tagging/blob/master/src/test/scala/ConversationStateSchemaSpec.scala

Both tests compile just fine, but depending on imports AvroSchema produce different results for:
https://github.com/aludwiko/avro-tagging/blob/master/src/main/scala/com/avrotagging/ConversationState.scala

How to use Avro4s to create generic KafkaDecoder for Spark streaming

I would like to use your great work but I'm stuck with my limited knowledge of Scala to solve the following problem. I know you possible don't know Kafka and Spark but it is more about reflection and context bounds.

RecordFormat[T] requires fromRecord and toRecord implicit values. Spark KafkaDirectStream takes Kafka Decoder type and creates instance of it via reflection looking for a single argument constructor. When fromRecord and toRecord are provided in decoder implementation via context bounds or as implicit values in separate parameter list compiler add them to constructor as additional arguments breaking the required contract.

Does anybody know how to provide these implicits in KafkaAvroGenericDecoder shown below so that RecordFormat is satisfied without breaking Spark code?

Goal is to let avro4s generate decoding from Avro GenericRecord to case class using Confluent Schema Registry.

Spark code to satisfy, VD is Decoder type and V is resulting case class, Decoder is interface defined by Kafka:

val valueDecoder = classTag[VD].runtimeClass.getConstructor(classOf[VerifiableProperties])
  .newInstance(props)
  .asInstanceOf[Decoder[V]]

The code I currently have, context bounds removed:

class KafkaAvroGenericDecoder[T]
  extends AbstractKafkaAvroDeserializer
  with Decoder[T] {

  def this(schemaRegistry: SchemaRegistryClient) = {
    this()
    this.schemaRegistry = schemaRegistry
  }

  def this(schemaRegistry: SchemaRegistryClient, props: VerifiableProperties) = {
    this(schemaRegistry)
    configure(deserializerConfig(props))
  }

  /** Constructor used by Kafka consumer */
  def this(props: VerifiableProperties) = {
    this()
    configure(new KafkaAvroDeserializerConfig(props.props))
  }

  override def fromBytes(bytes: Array[Byte]): T = {
    val record = deserialize(bytes).asInstanceOf[GenericRecord]
    val format = RecordFormat[T]
    format.from(record)
  }
}

object Kafka {
  def createStream[V : ClassTag](topics: Set[String])
    (implicit ssc: StreamingContext): InputDStream[(Array[Byte], V)] = {

    KafkaUtils.createDirectStream[Array[Byte], V, DefaultDecoder, KafkaAvroGenericDecoder[V]](
      ssc, session.kafkaConf, topics)

    // Or this one for test. Change return type to InputDStream[Array[Byte]]
    // createTestStream[V, KafkaAvroGenericDecoder[V]](ssc):
  }

  // just to print out list of constructors for decoder and try to create its instance
  private def createTestStream[V, VD : ClassTag](ssc: StreamingContext): InputDStream[Array[Byte]] = {
     val props = new Properties()
     props.put("bootstrap.servers", "localhost:9092")
     props.put("group.id", "test")
     props.put("schema.registry.url", "http://localhost:8989")
     props.put("partition.assignment.strategy", "range")

     classTag[VD].runtimeClass.getConstructors.foreach { x =>
       println("CONSTRUCTOR INFO: " + x)
     }

     val valueDecoder = classTag[VD].runtimeClass.getConstructor(classOf[VerifiableProperties])
       .newInstance(new VerifiableProperties(props))
       .asInstanceOf[Decoder[V]]

     val seq = Seq("k1".getBytes, "k2".getBytes, "k3".getBytes)
     val rdd = ssc.sparkContext.makeRDD(seq)

     new ConstantInputDStream(ssc, rdd)
    }
  }
}

// user code
object Streamz {
  def run()(implicit ssc: StreamingContext): Unit = {
    val topics = Set("test")
    val stream = Kafka.createStream[MyFancyEvent](topics)
    val result = stream.count()
    result.print()
  }
}

With context bounds:

class KafkaAvroGenericDecoder[T : FromRecord : ToRecord]
  extends AbstractKafkaAvroDeserializer
  with Decoder[T] { ... }

object Kafka {
  def createStream[V : FromRecord : ToRecord : ClassTag](topics: Set[String]) { ... }
}

Many thanks for any advice,
Vladimir

Add capability for scala ~> avro schema

Let's say i have a scala case class

package mypackge
case class MyClass(
  foo: Seq[Boolean]
)

And i want to quickly generate the appropriate Avro schema for the class above, and perform that in run-time so that an external UI can be build around it to allow easy (scala case) ~> (avro-schema) conversions

RecordFormat and polymorphic types

Hello!

I have a case class and RecordFormat.to usage like so:

// with type mappings for Instant in scope
case class Trade(tid: Long, price: BigDecimal, volume: BigDecimal, timestamp: Instant, tpe: String)

val trades = List(Trade(....))
val listFormat = RecordFormat[List[Trade]]

val listRecord = listFormat.to(trades)

which has compilation error:

could not find implicit value for parameter toRecord: com.sksamuel.avro4s.ToRecord[List[co.coinsmith.kafka.cryptocoin.Trade]]
[error]   val listFormat = RecordFormat[List[Trade]]

Is this code snippet supposed to work? I believe so.

Using Generic type doesn't populate the Record correctly

I am trying to use a generic on a function that parses my avro record using a schema. However when I use the generic it fails to set the data on the case class, when I use the explicit class on the function it works as expect.

I've attached a working example of what I'm talking about.
MyApp.txt

Could you explain why this doesn't work, if it should work and is it possible to fix or get around it ?

Support default values in Avro schema generated from a case class that has default parameter values

As of 2.10, Scala supports default parameter values. However, those values get lost in translation when generating a case class's Avro schema. Here is an illustrative example:

scala> import com.sksamuel.avro4s.AvroSchema
import com.sksamuel.avro4s.AvroSchema

scala> case class Ingredient(name: String = "flour")
defined class Ingredient

scala> val schema = AvroSchema[Ingredient]
schema: org.apache.avro.Schema = {"type":"record","name":"Ingredient","namespace":"$iw","fields":[{"name":"name","type":"string"}]}

Note that the default value "flour" for parameter name doesn't appear in the Avro schema generated for case class Ingredient. It would be nice if the generated schema were, instead,

{"type":"record","name":"Ingredient","namespace":"$iw","fields":[{"name":"name","type":"string","default":"flour"}]}

Since avro4s requires a later version (2.11.8, currently) than 2.10, I don't think there would be anything blocking such an enhancement.

AvroFixed Annotation Usage

I'm trying to create a schema that has some Fixed types, but I can't figure out how to do it.

I saw that there is an AvroFixed annotation and the README mentions that String can be a String or Fixed avro type.

This is what I tried:

@ case class FixedTest( @AvroFixed(4) value: String)

defined class FixedTest

@ val fixedSchema = AvroSchema[FixedTest]

fixedSchema: org.apache.avro.Schema = 
{"type":"record","name":"FixedTest","namespace":"$sess.cmd12","fields":[{"name":"value","type":"string"}]}

However the output doesn't have the fixed types. I looked a bit through the code, but I don't see anything similar to the other annotations where they check if the AvroFixed annotation exists.

I took a look and tried to see if I could take a stab at implementing it but I'm not sure how the Fixed type should exist, should it have an implicit ToSchema somewhere (I think this would mean it would need its own type and at that point no annotation would be needed?) or should it just be overridden at the end similar to how other annotations are handled.

    val field = new Schema.Field(name, schema, doc(annos), defaultNode)
    // check if the AvroField annotation exists, and if it does create a new schema or change this one here??
    aliases(annos).foreach(field.addAlias)
    addProps(annos, field.addProp)

If this is already done and I'm just going about this wrong let me know.
Any advice would be much appreciated!
Kalvin

Compatibility issue Serializing/Deserializing messages generated with avro4s and other languages

I have two separate applications that interact using a queue in the middle, and they communicate sending messages to each other, using Avro (Serialize and Deserialize).
The queue in the middle is managed by a message broker (RabbitMq)

The scenario is really simple:

one application create a message (based of a schema), serialize the message and put the message in the queue.
on the other side there is an other program that pick this message, deserialize the avro message and read the content.

This is my schema (really simple)

{
    "type": "record",
    "name": "Surface",
    "fields": [
    {
        "name": "surfaceId",
        "type": "string"
    }]
}

case class Surface(surfaceId: String)

I serialize the message using avro4s library and I send this message to message broker.
I use this code to serialized the message:

First scenario: two applications made in Scala

    val surface = Surface(surfaceId = "5423g343423")
    val stream = new ByteArrayOutputStream()
    val os = AvroOutputStream[Surface](stream, true)
    os.write(surface)
    os.flush()

On the other side there is an other application that grab the message from the queue, and deserialize the message with the avro4s library. Body is Array[Byte]

        val is = AvroInputStream[Surface](body)
        val result = is.iterator.toSet
        println(result.mkString("\n"))
        is.close()

This configuration works perfectly.

Second scenario: One application made in Scala serialize the message and on the other side an other application made in Java, deserialize the message and read the content.
I try to deserialize the message using the official Java module Apache Avro (https://avro.apache.org/docs/1.7.7/gettingstartedjava.html) I got an error and I am not able to deserialize the content of the message.

Third scenario: One application made in Scala serialize the message and on the other side an other application made in Python, deserialize the message and read the content.
I got an error when I tried to deserialized the message (same as with Java)

Forth scenario: One application made in Python serialize the message and on the other side an other application made in Scala, deserialize the message and read the content.
When I tried to deserialize the message using the avro4s module I got an error.

Fifth scenario: One application made in Python serialize the message and on the other side an other application made in Java, deserialize the message and read the content.
It works perfectly!

Sixth scenario: One application made in Python serialize the message and on the other side an other application made in JavaScript, deserialize the message and read the content.
It works perfectly!

I suspect there is a compatibility issue in Serializing and Deserializing messages using the avro4s and other library.

Or I'm doing something wrong ...

case classes that have attributes of type `Array[Byte]` cannot be read

Apparently no implicit instance for FromValue[Array[Byte]] can be found, making the deserialization to case classes containing Arrary[Byte] impossible.

not available for 2.10.x

The only file that needs to be changed is in AvroSchema.scala - some of the reflection code is different.

      // in Build.scala
    libraryDependencies ++= {
      if (scalaVersion.value.contains("2.11")) {
        Seq(
          "org.scala-lang"        % "scala-reflect" % scalaVersion.value,
          "com.chuusai"           %% "shapeless" % ShapelessVersion,
          "org.apache.avro"       % "avro" % AvroVersion,
          "org.slf4j"             % "slf4j-api" % Slf4jVersion,
          "log4j"                 % "log4j" % Log4jVersion % "test",
          "org.slf4j"             % "log4j-over-slf4j" % Slf4jVersion % "test",
          "org.scalatest"         %% "scalatest" % ScalatestVersion % "test"
        )
      } else Nil
    },
    libraryDependencies ++= {
      if (scalaVersion.value.contains("2.10")) {
        Seq(
          "org.scala-lang"        %  "scala-reflect"    % scalaVersion.value,
          "com.chuusai"           %% "shapeless"        % "2.2.5",
          "org.apache.avro"       %  "avro"             % AvroVersion,
          "org.slf4j"             %  "slf4j-api"        % Slf4jVersion,
          compilerPlugin("org.scalamacros" % "paradise_2.10.6" % "2.1.0"),
          "log4j"                 %  "log4j"            % Log4jVersion % "test",
          "org.slf4j"             %  "log4j-over-slf4j" % Slf4jVersion % "test",
          "org.scalatest"         %% "scalatest"        % "2.2.6"      % "test"
        )
      } else Nil
    },
    ...

  lazy val core = Project("avro4s-core", file("avro4s-core"))
    .settings(rootSettings: _*)
    .settings(publish := {})
    .settings(name := "avro4s-core")
    .settings(unmanagedSourceDirectories in Compile ++= {
      if (scalaVersion.value startsWith "2.10.")
        Seq(baseDirectory.value / "src"/ "main" / "scala-2.10")
      else if (scalaVersion.value startsWith "2.11.")
        Seq(baseDirectory.value / "src"/ "main" / "scala-2.11")
      else Nil
    })

enums don't work in unions or arrays

Hi there.

This may be related to #60 but I'm not sure; thought I'd report it anyway. It may also be two separate bugs; let me know if you'd like me to file it as two separate issues.

I can't seem to get an enum to work inside either a union or an array.

For the array case, I can create a wrapper record with one field that holds the array; I haven't tried the wrapper with the union.

I start with the avsc file and generate the Scala code. The scala case classes look fine, but blow up when trying to serialize using the data codec.

Order by context level avro schemas

Currently when transforming a JSON into Avro i.e. using https://www.landoop.com/tools/avro-scala-generator the order is type -> name -> namespace

{
  "type" : "record",
  "name" : "MyClass",
  "namespace" : "com.test.avro",

It would be more formal and give you a top down understanding if avro4s ordered avro schemas in contextual order i.e.

{
  "namespace" : "com.test.avro",
  "name" : "MyClass",
  "type" : "record",

Json Serialisation Support is missing

The library should support json as a way of serialisation. At the minute only binary and file serialisation is available.

Getting warning of local val never used, using AvroInputStream.data

When using AvroInputStream.data[SomeType](bytes) I'm getting a warning,

local val in value x$1 is never used

I use -Xfatal-warnings as scalac option, which means this turns into an error.
Am I doing something wrong here, or do you ignore warnings?

The below code triggers the warning:

package my.pckg

import com.sksamuel.avro4s._
case class Bla(name: String)
object Avro {
  def fromAvro(bytes: Array[Byte]) = {
    AvroInputStream.data[Bla](bytes)
  }
}

Example deserialization code from readme.md doesn't compile as is

Hi,

The example deserialization code on the main page :

import java.io.File
import com.sksamuel.avro4s.AvroInputStream

val is = AvroInputStream.data[Pizza](new File("pizzas.avro"))
val pizzas = is.iterator.toSet
is.close()
println(pizzas.mkString("\n"))

doesn't compile for me (using Scala 2.11.8 & avro4s 1.6.2):

Error:(13, 45) could not find implicit value for evidence parameter of type com.sksamuel.avro4s.FromRecord[scalapt.core.Pizza]
        val is = AvroInputStream.data[Pizza](new File("pizzas.avro"))
Error:(13, 45) not enough arguments for method data: (implicit evidence$23: com.sksamuel.avro4s.SchemaFor[scalapt.core.Pizza], implicit evidence$24: com.sksamuel.avro4s.FromRecord[scalapt.core.Pizza])com.sksamuel.avro4s.AvroDataInputStream[scalapt.core.Pizza].
Unspecified value parameter evidence$24.
        val is = AvroInputStream.data[Pizza](new File("pizzas.avro"))

Am I missing an implicit import?

Avro schema embedded along with binary

Hi,
I was trying to use your library for publishing Kafka messages with Avro. The problem I ran into is the consumer side cannot parse the message because schema is embedded with binary content. It looks like an object container file. Is there a way around it?

Thank you,
Arsen

Efficiently support value classes

This example

class Wrapper(val underlying: Int) extends AnyVal
case class Test(i:Int,w:Wrapper)
 AvroSchema[Test].toString(true)

Generates a complex record representation

{
  "type" : "record",
  "name" : "Test",
  "fields" : [ {
    "name" : "i",
    "type" : "int"
  }, {
    "name" : "w",
    "type" : {
      "type" : "record",
      "name" : "Wrapper",
      "fields" : [ {
        "name" : "underlying",
        "type" : "int"
      } ]
    }
  } ]
}

It would best if the schema simply represented w by the underlying int type.

Value classes are very useful, providing type-safety and domain operations without any run-time cost..

Deserialising ADT

Hi,
how to map to avro:

sealed trait A
case class B(id: String) extends A
case class C(id: String) extends A
case class D(a: A)



def toBinary[T: SchemaFor : ToRecord](event: T): Array[Byte] = {
  val baos = new ByteArrayOutputStream()
  val output = AvroOutputStream.binary[T](baos)
  output.write(event)
  output.close()
  baos.toByteArray
}

def fromBinary[T: SchemaFor : FromRecord](bytes: Array[Byte]) = {
  val in = new ByteArrayInputStream(bytes)
  val input = AvroInputStream.binary[T](in)
  input.close()
  input.iterator.toSeq
}

val d = D(B("1"))

val b = toBinary(d)

fromBinary(b)

When deserializing there is an error:

could not find implicit value for evidence parameter of type com.sksamuel.avro4s.FromRecord[

How can we deserialize ADTs, do we need a Packing Object

But the Packing object, Packs instances and not Type:

implicit val BasePack = Pack[Base](A -> 0, B -> 1)

Scala 2.12 Build

Hi,

What's the score with a Scala 2.12 build? The front page indicates that 1.6.2 has a 2.12 build, however I can only find 1.6.2 builds for M3 & M4 here:

http://search.maven.org/#search%7Cga%7C1%7Cavro4s-core_2.12

Thanks,
Jon

Avro4s should support UUIDs

Can't use custom type mappings for value classes

It seems like custom ToValue and FromValue instances for value classes are not being picked up by the ToRecord macro.

Example:

import java.io.ByteArrayOutputStream

import com.sksamuel.avro4s._
import org.apache.avro.Schema
import org.apache.avro.Schema.Field

object Main extends App {
  final case class Item(valueString: String) extends AnyVal {
    def value: Int = valueString.toInt
  }

  object Item {
    implicit object ItemToSchema extends ToSchema[Item] {
      override val schema: Schema = ToSchema.IntToSchema()
    }

    implicit object ItemToValue extends ToValue[Item] {
      override def apply(item: Item): Int = item.value
    }

    implicit object ItemFromValue extends FromValue[Item] {
      override def apply(value: Any, field: Field): Item =
        Item.apply(FromValue.IntFromValue(value, field).toString)
    }
  }

  final case class Group(item: Item)

  val group = Group(Item("123"))

  println(ToRecord[Group].apply(group))

  val baos = new ByteArrayOutputStream()
  val output = AvroOutputStream.json[Group](baos)
  output.write(group)
  output.close()

  println(baos.toString("UTF-8"))
}

Gives:

{"item": "123"}
[error] (run-main-2c) java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Number
java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Number
	at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:117)
	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73)
	at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:153)
	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:143)
	at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:105)
	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73)
	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:60)
	at com.sksamuel.avro4s.AvroJsonOutputStream.write(AvroOutputStream.scala:83)
...

(The ToSchema is picked up, but the ToValue and FromValue are not.)

Making Item a normal case class yields the expected results:

{"item": 123}
{"item":123}

Scala Enumerations inside collection

@A1kmm observed that an invalid schema was generated for collections that contain scala Enums, resulting in a NPE on deserialization, eg

object MyEnum extends Enumeration {
  val AnEnum = Value
}
val s = Set[MyEnum]()

NPE happens here:
https://github.com/sksamuel/avro4s/blob/master/avro4s-macros/src/main/scala/com/sksamuel/avro4s/FromRecord.scala#L77

Called from here (note no second parameter to apply, so default of null is used): https://github.com/sksamuel/avro4s/blob/master/avro4s-macros/src/main/scala/com/sksamuel/avro4s/FromRecord.scala#L95

Akka persistence - Added parameter

Hi,
i choose this library for serialize persistence actors in akka.
I use it like here: http://giampaolotrapasso.com/akka-persistence-using-avro-serialization/ but little smarter. Basicly i have problem with schem evolution:
If added new parameter to case class, persist actor is not restored and throw this:

java.util.NoSuchElementException: head of empty stream

My code:
Serializer class:

class CardEventSerializer extends SerializerWithStringManifest {

  override def identifier: Int = 1000100

  override def manifest(o: AnyRef): String = o.getClass.getName

  final val Test = classOf[Test].getName

  override def toBinary(o: AnyRef): Array[Byte] = {

    o match{
      case t: Test => toBinaryByClass(t, o)
      case _ => throw new IllegalArgumentException(s"Unknown event to serialize. $o")
    }
  }

  def toBinaryByClass[T: SchemaFor : ToRecord](c: T, o:AnyRef): Array[Byte] = {

    val os = new ByteArrayOutputStream
    val output = AvroOutputStream.binary[T](os)

    output.write(o.asInstanceOf[T])
    output.flush()
    output.close()
    os.toByteArray
  }

  override def fromBinary(bytes: Array[Byte], manifest: String): AnyRef = {
    manifest match {
      case TestManifest => fromBinaryByClass[Test](bytes)
      case _ => throw new IllegalArgumentException(s"Unable to handle manifest $manifest")
    }
  }

  def fromBinaryByClass[T: SchemaFor : FromRecord](bytes: Array[Byte]): T = {

    val is = new ByteArrayInputStream(bytes)
    val input = AvroInputStream.binary[T](is)
    is.close()
    input.iterator.toSeq.head
  }
}

My case class:

sealed trait TestEvents {
}
case class Test(p1: Long, p2: String, p3: Long) extends TestEvents

Now i persist actor...
And try to extend this like this:

case class Test(p1: Long, p2: String, p3: Long, p4: Int = 0) extends TestEvents

And crash...

Schema is generated fine. I try to print it, and default value is set in schema...
If i delete parameters, everything works fine. I was try use binary and json too.

So, my question is: Is some option to handle schema evolution?

Many thanks

Scala Type DateTime is not supported.

If i have org.joda.time.DateTime in my case class declaration, I am seeing an error while Invoking AvroInputStream . Please find below the error .It would be great if avro4s has a type matching this DateTime

my case class : a.b.c.d.e.pkg01.TransactionLine
Error:(90, 78) could not find implicit value for evidence parameter of type com.sksamuel.avro4s.FromRecord[a.b.c.d.e.pkg01.TransactionLine]
val is: AvroInputStream[TransactionLine] = AvroInputStream[TransactionLine](new File%28"/tmp/TransactionLine.avro"%29)

sksamuel / avro4s Goto Github PK

avro4s's Introduction

Versioning

Schemas

Overriding class name and namespace

Overriding a field name

Adding properties and docs to a Schema

Overriding a Schema

Transient Fields

Field Mapping

Field Defaults

Enums and Enum Defaults

AVRO Enums from Scala Enums, Java Enums, and Sealed Traits

Field Defaults vs. Enum Defaults

Defining Enum Defaults for Scala Enums

Defining Enum Defaults for Java Enums

Defining Enum Defaults for Sealed Traits

Avro Fixed

Controlling order of types in generated union schemas

Recursive Schemas

Customizing Recursive Schemas

Input / Output

Serializing

Deserializing

Binary and JSON Formats

Avro Records

Usage as a Kafka Serde

Type Mappings

Built in Type Mappings

Custom Type Mappings

Coproducts

Sealed hierarchies

Decimal scale, precision and rounding mode

Type Parameters

Selective Customisation

Cats Support

Refined Support

Mapping Recursive Types

Using avro4s in your project

Gradle

SBT

Maven

Contributions

avro4s's People

Contributors

Stargazers

Watchers

Forkers

avro4s's Issues

Recommend Projects

Recommend Topics

Recommend Org