julianpeeters / avrohugger Goto Github PK

View Code? Open in Web Editor NEW

201.0 10.0 120.0 1.58 MB

Generate Scala case class definitions from Avro schemas

License: Apache License 2.0

Scala 99.22% Java 0.78%

avro scala

avrohugger's Introduction

avrohugger

Schema-to-case-class code generation for working with Avro in Scala.

avrohugger-core: Generate source code at runtime for evaluation at a later step.
avrohugger-filesorter: Sort schema files for proper compilation order.
avrohugger-tools: Generate source code at the command line with the avrohugger-tools jar.

Alternative Distributions:

sbt: sbt-avrohugger - Generate source code at compile time with an sbt plugin.
Maven: avrohugger-maven-plugin - Generate source code at compile time with a maven plugin.
Mill: mill-avro - Generate source code at compile time with a Mill plugin.
Gradle: gradle-avrohugger-plugin - Generate source code at compile time with a gradle plugin.
mu-rpc: mu-scala - Generate rpc models, messages, clients, and servers.

Supported Formats: Standard, SpecificRecord
Supported Datatypes
Logical Types Support
Protocol Support
Doc Support
Usage
Warnings
Best Practices
Testing
Credits

Generates Scala case classes in various formats:

Standard Vanilla case classes (for use with Apache Avro's GenericRecord API, etc.)
SpecificRecord Case classes that implement SpecificRecordBase and therefore have mutable var fields (for use with the Avro Specific API - Scalding, Spark, Avro, etc.).

Supports generating case classes with arbitrary fields of the following datatypes:

Avro	`Standard`	`SpecificRecord`	Notes
INT	Int	Int	See Logical Types: `date`
LONG	Long	Long	See Logical Types: `timestamp-millis`
FLOAT	Float	Float
DOUBLE	Double	Double
STRING	String	String
BOOLEAN	Boolean	Boolean
NULL	Null	Null
MAP	Map	Map
ENUM	scala.Enumeration Scala case object Java Enum EnumAsScalaString	Java Enum EnumAsScalaString	See Customizable Type Mapping
BYTES	Array[Byte] BigDecimal	Array[Byte] BigDecimal	See Logical Types: `decimal`
FIXED	case class case class + schema	case class extending `SpecificFixed`	See Logical Types: `decimal`
ARRAY	Seq List Array Vector	Seq List Array Vector	See Customizable Type Mapping
UNION	Option Either Shapeless Coproduct	Option Either Shapeless Coproduct	See Customizable Type Mapping
RECORD	case class case class + schema	case class extending `SpecificRecordBase`	See Customizable Type Mapping
PROTOCOL	No Type Scala ADT	RPC trait Scala ADT	See Customizable Type Mapping
Date	java.time.LocalDate java.sql.Date Int	java.time.LocalDate java.sql.Date Int	See Customizable Type Mapping
TimeMillis	java.time.LocalTime Int	java.time.LocalTime Int	See Customizable Type Mapping
TimeMicros	java.time.LocalTime Long	java.time.LocalTime Long	See Customizable Type Mapping
TimestampMillis	java.time.Instant java.sql.Timestamp Long	java.time.Instant java.sql.Timestamp Long	See Customizable Type Mapping
TimestampMicros	java.time.Instant java.sql.Timestamp Long	java.time.Instant java.sql.Timestamp Long	See Customizable Type Mapping
LocalTimestampMillis	java.time.LocalDateTime Long	java.time.LocalDateTime Long	See Customizable Type Mapping
LocalTimestampMicros	java.time.LocalDateTime Long	java.time.LocalDateTime Long	See Customizable Type Mapping
UUID	java.util.UUID	java.util.UUID	See Customizable Type Mapping
Decimal	BigDecimal	BigDecimal	See Customizable Type Mapping

Logical Types Support:

NOTE: Currently logical types are only supported for Standard and SpecificRecord formats

date: Annotates Avro int schemas to generate java.time.LocalDate or java.sql.Date (See Customizable Type Mapping). Examples: avdl, avsc.
decimal: Annotates Avro bytes and fixed schemas to generate BigDecimal. Examples: avdl, avsc.
timestamp-millis: Annotates Avro long schemas to genarate java.time.Instant or java.sql.Timestamp or long (See Customizable Type Mapping). Examples: avdl, avsc.
uuid: Annotates Avro string schemas and idls to generate java.util.UUID (See Customizable Type Mapping). Example: avsc.
time-millis: Annotates Avro int schemas to genarate java.time.LocalTime or java.sql.Time or int

Protocol Support:

the records defined in .avdl, .avpr, and json protocol strings can be generated as ADTs if the protocols define more than one Scala definition (note: message definitions are ignored when this setting is used). See Customizable Type Mapping.
For SpecificRecord, if the protocol contains messages then an RPC trait is generated (instead of generating and ADT, or ignoring the message definitions).

Doc Support:

.avdl: Comments that begin with /** are used as the documentation string for the type or field definition that follows the comment.
.avsc, .avpr, and .avro: Docs in Avro schemas are used to define a case class' ScalaDoc
.scala: ScalaDocs of case class definitions are used to define record and field docs

Note: Currently Treehugger appears to generate Javadoc style docs (thus compatible with ScalaDoc style).

Usage

Library For Scala 2.12, 2.13, and 3
Parses Schemas and IDLs with Avro version 1.11
Generates Code Compatible with Scala 2.12, 2.13, 3

`avrohugger-core`

Get the dependency with:

"com.julianpeeters" %% "avrohugger-core" % "2.8.3"

Description:

Instantiate a Generator with Standard or SpecificRecord source formats. Then use

tToFile(input: T, outputDir: String): Unit

tToStrings(input: T): List[String]

where T can be File, Schema, or String.

Example

import avrohugger.Generator
import avrohugger.format.SpecificRecord
import java.io.File

val schemaFile = new File("path/to/schema")
val generator = new Generator(SpecificRecord)
generator.fileToFile(schemaFile, "optional/path/to/output") // default output path = "target/generated-sources"

where an input File can be .avro, .avsc, .avpr, or .avdl,

and where an input String can be the string representation of an Avro schema, protocol, IDL, or a set of case classes that you'd like to have implement SpecificRecordBase.

Customizable Type Mapping:

To reassign Scala types to Avro types, use the following (e.g. for customizing Specific):

import avrohugger.format.SpecificRecord
import avrohugger.types.ScalaVector

val myScalaTypes = Some(SpecificRecord.defaultTypes.copy(array = ScalaVector))
val generator = new Generator(SpecificRecord, avroScalaCustomTypes = myScalaTypes)

record can be assigned to ScalaCaseClass and ScalaCaseClassWithSchema(with schema in a companion object)
array can be assigned to ScalaSeq, ScalaArray, ScalaList, and ScalaVector
enum can be assigned to JavaEnum, ScalaCaseObjectEnum, EnumAsScalaString, and ScalaEnumeration
fixed can be assigned to ScalaCaseClassWrapper and ScalaCaseClassWrapperWithSchema(with schema in a companion object)
union can be assigned to OptionShapelessCoproduct, OptionEitherShapelessCoproduct, or OptionalShapelessCoproduct
int, long, float, double can be assigned to ScalaInt, ScalaLong, ScalaFloat, ScalaDouble
protocol can be assigned to ScalaADT and NoTypeGenerated
decimal can be assigned to e.g. ScalaBigDecimal(Some(BigDecimal.RoundingMode.HALF_EVEN)) and ScalaBigDecimalWithPrecision(None) (via Shapeless Tagged Types)

Specifically for unions:

Field Type ⬇️ / Behaviour ➡️	OptionShapelessCoproduct	OptionEitherShapelessCoproduct	OptionalShapelessCoproduct
`[{"type": "map", "values": "string"}]`	`Map[String, String]`	`Map[String, String]`	`Map[String, String] :+: CNil`
`["null", "double"]`	`Option[Double]`	`Option[Double]`	`Option[Double :+: CNil]`
`["int", "string"]`	`Int :+: String :+: CNil`	`Either[Int, String]`	`Int :+: String :+: CNil`
`["null", "int", "string"]`	`Option[Int :+: String :+: CNil]`	`Option[Either[Int, String]]`	`Option[Int :+: String :+: CNil]`
`["boolean", "int", "string"]`	`Boolean :+: Int :+: String :+: CNil`	`Boolean :+: Int :+: String :+: CNil`	`Boolean :+: Int :+: String :+: CNil`
`["null", "boolean", "int", "string"]`	`Option[Boolean :+: Int :+: String :+: CNil]`	`Option[Boolean :+: Int :+: String :+: CNil]`	`Option[Boolean :+: Int :+: String :+: CNil]`

Customizable Namespace Mapping:

Namespaces can be reassigned by instantiating a Generator with a custom namespace map:

val generator = new Generator(SpecificRecord, avroScalaCustomNamespace = Map("oldnamespace"->"newnamespace"))

Note: Namespace mappings work for with KafkaAvroSerializer but not for KafkaAvroDeserializer; if anyone knows how to configure the deserializer to map incoming schema names to target class names please speak up!

Wildcarding the beginning of a namespace is permitted, place a single asterisk after the prefix that you want to map and any matching schema will have its namespace rewritten. Multiple conflicting wildcards are not permitted.

val generator = new Generator(SpecificRecord, avroScalaCustomNamespace = Map("example.*"->"example.newnamespace"))

`avrohugger-filesorter`

Get the dependency with:

"com.julianpeeters" %% "avrohugger-filesorter" % "2.8.3"

Description:

To ensure dependent schemas are compiled in the proper order (thus avoiding org.apache.avro.SchemaParseException: Undefined name: "com.example.MyRecord" parser errors), sort avsc and avdl files with the sortSchemaFiles method on AvscFileSorter and AvdlFileSorterrespectively.

Example:

import avrohugger.filesorter.AvscFileSorter
import java.io.File

val sorted: List[File] = AvscFileSorter.sortSchemaFiles((srcDir ** "*.avsc"))

`avrohugger-tools`

Download the avrohugger-tools jar for Scala 2.12, or Scala 2.13 (>30MB!) and use it like the avro-tools jar Usage: [-string] (schema|protocol|datafile) input... outputdir:

generate generates Scala case class definitions:

java -jar /path/to/avrohugger-tools_2.12-2.8.3-assembly.jar generate schema user.avsc .

generate-specific generates definitions that extend Avro's SpecificRecordBase:

java -jar /path/to/avrohugger-tools_2.12-2.8.3-assembly.jar generate-specific schema user.avsc .

Warnings

If your framework is one that relies on reflection to get the Schema, it will fail since Scala fields are private. Therefore preempt it by passing in a Schema to DatumReaders and DatumWriters (e.g. val sdw = SpecificDatumWriter[MyRecord](schema)).
For the SpecificRecord format, generated case class fields must be mutable (var) in order to be compatible with the SpecificRecord API. Note: If your framework allows GenericRecord, avro4s provides a type class that converts to and from immutable case classes cleanly.
SpecificRecord requires that enum be represented as JavaEnum

Testing

To test for regressions, please run sbt:avrohugger> + test.

To test that generated code can be de/serialized as expected, please run:

sbt:avrohugger> + publishLocal
then clone sbt-avrohugger and update its avrohugger dependency to the locally published version
finally run sbt:sbt-avrohugger> scripted avrohugger/*, or, e.g., scripted avrohugger/GenericSerializationTests

Credits

Depends on Avro and Treehugger. avrohugger-tools is based on avro-tools.

Contributors:


Marius Soutier Brian London alancnet Matt Coffin Ryan Koval Simonas Gelazevicius Paul Snively Marco Stefani Andrew Gustafson Kostya Golikov Plínio Pantaleão Sietse de Kaper Martin Mauch Leon Poon	Paul Pearcy Matt Allen C-zito Tim Chan Saket Daniel Davis Zach Cox Diego E. Alonso Blas Fede Fernández Rob Landers Simon Petty Andreas Drobisch Timo Schmid Dmytro Orlov	Stefano Galarraga Lars Albertsson Eugene Platonov Jerome Wacongne Jon Morra Raúl Raja Martínez Kaur Matas Chris Albright Francisco Díaz Bobby Rauchenberg Leonard Ehrenfried François Sarradin niqdev rsitze-mmai	Julien BENOIT Adam Drakeford Carlos Silva ismail Benammar mcenkar Luca Tronchin LydiaSkuse Algimantas Milašius Leonard Ehrenfried Massimo Siani Konstantin natefitzgerald Victor steve-e

Criticism is appreciated.

Fork away, just make sure the tests pass before sending a pull request.

avrohugger's People

Contributors

Stargazers

Watchers

Forkers

galarragas ppearcy matt343 remster defg davidhoyt gitter-badger nextdude bbnsumanth stephendrewlds mapflat cory-p-oncota chuwy jxie418 arbjerg jozic c-zito jeroensoeters mariusz89016 nmurthy makubi mcoffin dmnpignaud timchan-lumoslabs wdicharry-rs snadorp x-xo-o s-gelazevicius bestnaja cherrera2001 data-carpentry markborcherding ryan-deak-zefr ennru prad-a-runtimeexception gameduell banno christopherdavenport kmatasfp david-mcgillicuddy-ovo mihai-ionut-aurel chrisalbright inafets sullis jdoe-id1 ronanbradley asantas93 maphi agustafson barambani fedefernandez coreyauger franciscodr lazyval caffeinateddave semanticbeeng pascals-ager toverney plinioj karolchmist withinboredom bobbyrauchenberg simonpetty leonardehrenfried smart-ass wbrege mobimeo michaellampe nightscape rmdouglas alpex92 joelittlejohn epalace nadavwr fsarradin artkostm natefitzgerald wgreven niqdev tidymaze talentformation orelse targeter dhlparcel jacktreble hamzanasirr mashicus aaronkub timo-schmid valjean1979 rajcspsg astangl daenyth tyger dr4ke616 mcenkar antz0x0z vinted alchimystic ltronky

avrohugger's Issues

Support recursive types

Right now there is a stackoverflow on recursive types. One example that causes this is:
https://svn.apache.org/repos/asf/avro/trunk/share/schemas/org/apache/avro/data/Json.avsc

I have a local fix, but need to do some more work on it.

Handle `{}` as a default value in IDL

The {} in ME last_meta_event = {}; results in new ME(, , ), which is incorrect. Check the behavior of Apache Avro Java compilation and conform to that as the defacto standard, wether it throws an error or treats it as a 0-arg ctor to generate new ME("asd", "sdf", None).

Example Idl:

@namespace("com.google.analytics")
protocol MEProtocol {
record ME {
    string type = "asd";
    string service = "sdf";
    union { null, string } service_timezone = null;
}
record Visit {
    ME first_meta_event = {"type": "asd", "service": "sdf", "service_timezone": null };
    ME last_meta_event = {};
  }
}

Consolidate json dependencies

Apache Avro uses Jackson internally, so Jackson was used in Avrohugger to handle default values.

When avrohugger.input.filesorter package was introduced, it used spray-json.

Should probably choose one and depend only on a single json library.

remove comment on the code returned from .*toStrings

Strings that get written to files should retain the comment.

Strings that get returned should have the comment removed (at a stage immediately before it's output).

(The comment is a warning that doesn't apply outside of writing/overwriting files.)

Ability to add mixins to generated case classes

I've been playing around with adding this and think it would be a nice to have. I have a simple first pass implementation that adds a list of traits to all records. This makes it possible for the case classes to honor other interfaces enabling generic handling at a more specific type than SpecificRecordBase.

What I really want though, is the ability to target a specific set of records with certain traits. Not quite sure the best approach, perhaps by namespace or filenames/locations.

I can share the current code if anyone is interested, opened this ticket to see if others had interest.

Generate AnyVals

Just an idea - if a record contains a single primitive value, it could extend AnyVal for better runtime performance.

Decimals

Decimals are quite solid in the avro spec (http://avro.apache.org/docs/1.7.7/spec.html#Decimal) and I need a BigDecimal at some point of the processing - ideally in the generated case class. I understand that this isn't supported at present.

If indeed it isn't, is it anywhere near the roadmap? If not, how difficult/easy do you think it will be to retrofit? Is that something you would be happy to accept, with some hand holding, as a contribution?

Schema evolution

I was wondering how schema evolution can be handled. For example, will I be able to read an earlier serialized version (written with an older Avro Schema) into a new case class (generated with a later version of the same Schema)?

AVDL imports don't handle transitive deps / wrong file order

I've encountered a problem with transitive AVDL imports that don't work because of the file order.

This works:

A.avdl
B.avdl -> import idl "A.avdl";
C.avdl -> import idl "B.avdl";

This doesn't:

A.avdl -> import idl "B.avdl";
B.avdl -> import idl "C.avdl";
C.avdl

Error message is NoSuchElementException: key not found.

Protocols extend Product which fails on Enumerations

I have this code generated which obviously fails because Enumeration is not a Product.

sealed trait TrackingProtocol extends Product with Serializable

final object Source extends Enumeration with TrackingProtocol {
  type Source = Value
  val Frontend, Backend = Value
}

Schema objects hard to get and reuse

Move schema val into a companion object (and generate a companion object!).
Have getSchema method reference the static schema in the companion object, rather than instantiate a new Schema object each time.

Support Scala 2.10 and Schemas with more than 22 parameters

For backwards-compatibility with Scala 2.10 (Hadoop distros sadly come with Spark on Scala 2.10), do not write out case classes but simply classes when a schema has more than 22 parameters and Scala 2.10 is used.

I'll try to submit a PR soonish.

sbt-avrohugger should be able to call the right version depending on the Scala version used.

Improve error messages arround importing schemas in idls

Exception in thread "main" java.lang.NullPointerException
at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)
at scala.collection.convert.Wrappers$JConcurrentMapWrapper.get(Wrappers.scala:350)
at avrohugger.format.abstractions.Importer$class.checkNamespace$1(Importer.scala:56)
at avrohugger.format.abstractions.Importer$$anonfun$getRecordImports$2.apply(Importer.scala:66)
at avrohugger.format.abstractions.Importer$$anonfun$getRecordImports$2.apply(Importer.scala:66)
at scala.collection.TraversableLike$$anonfun$filter$1.apply(TraversableLike.scala:264)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.filter(TraversableLike.scala:263)
at scala.collection.AbstractTraversable.filter(Traversable.scala:105)
at avrohugger.format.abstractions.Importer$class.getRecordImports(Importer.scala:66)
at avrohugger.format.specific.SpecificImporter$.getRecordImports(SpecificImporter.scala:20)
at avrohugger.format.specific.SpecificImporter$.getImports(SpecificImporter.scala:32)
at avrohugger.format.specific.SpecificScalaTreehugger$.asScalaCodeString(SpecificScalaTreehugger.scala:36)
at avrohugger.format.abstractions.SourceFormat$class.getScalaCompilationUnit(SourceFormat.scala:147)
at avrohugger.format.SpecificRecord$.getScalaCompilationUnit(SpecificRecord.scala:19)
at avrohugger.format.SpecificRecord$.asCompilationUnits(SpecificRecord.scala:86)
at avrohugger.format.SpecificRecord$.compile(SpecificRecord.scala:164)
at avrohugger.FileGenerator$$anonfun$schemaToFile$1.apply(FileGenerator.scala:35)
at avrohugger.FileGenerator$$anonfun$schemaToFile$1.apply(FileGenerator.scala:32)
at scala.collection.immutable.List.foreach(List.scala:318)
at avrohugger.FileGenerator$.schemaToFile(FileGenerator.scala:32)
at avrohugger.FileGenerator$$anonfun$fileToFile$1.apply(FileGenerator.scala:83)
at avrohugger.FileGenerator$$anonfun$fileToFile$1.apply(FileGenerator.scala:81)
at scala.collection.immutable.List.foreach(List.scala:318)
at avrohugger.FileGenerator$.fileToFile(FileGenerator.scala:81)
at avrohugger.Generator.fileToFile(Generator.scala:77)
at test.Testing$.main(Testing.scala:21)
at test.Testing.main(Testing.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)

add support for `sourceDirectories`

Feature: Use @switch to ensure that pattern matches in generated code are using switch

Example:

import scala.annotation.switch

case class Foo(var id: String, var flags: Array[Boolean]) extends org.apache.avro.specific.SpecificRecordBase {
  //...
  def get(field$: Int): AnyRef = {
    (field$: @switch) match {
      case pos if pos == 0 => {
        id
      }.asInstanceOf[AnyRef]
      case pos if pos == 1 => {
        scala.collection.JavaConversions.bufferAsJavaList({
          flags map { x =>
            x
          }
        }.toBuffer)
      }.asInstanceOf[AnyRef]
      case _ => new org.apache.avro.AvroRuntimeException("Bad index")
    }
  }
  //...
}

Can you provide an example of read/write avro file in HDFS by code generation with avrohugger?

Can you provide an example of read/write avro file in HDFS by code generation with avrohugger? Thanks!

Protocols should generate sealed class hierarchies

Currently protocol support is paltry, and simply generates separate files for the classes defined therein, whereas Java codegen generates an interface.

Seems natural to represent the interface as a sealed trait, and generate the classes in a single compilation unit.

Overridden companion objects should extend FunctionX

Relevant: http://stackoverflow.com/questions/25392422/how-do-i-create-an-explicit-companion-object-for-a-case-class-which-behaves-iden?lq=1

Support default values

and therefore support for schema evolution as well.

Support Enums

Hi,

could you please review this first enum draft:
https://github.com/mariussoutier/avrohugger/tree/enums

I'm not sure if it's the most elegant way to integrate parsing enumerations into your library. I'm also not sure how to integrate with SpecificRecord (but haven't really tried yet).

Cheers

Creating an enum within protocol fails due to (unnecessary?) base types of protocol trait

Creating an enum within a protocol generates code that does not compile.

sealed trait MyProtocol extends Product with Serializable

final object MyEnum extends Enumeration with MyProtocol {
  type MyEnum = Value
  val A, B = Value
}

Error:(6, 14) object creation impossible, since:
it has 3 unimplemented members.
/** As seen from object MyEnum, the missing signatures are as follows.
 *  For convenience, these are usable as stub implementations.
 */
  // Members declared in scala.Equals
  def canEqual(that: Any): Boolean = ???

  // Members declared in scala.Product
  def productArity: Int = ???
  def productElement(n: Int): Any = ???
final object MyEnum extends Enumeration with MyProtocol {
             ^

It would be a nice option to be able to generate an enum as a sealed trait with case objects.

Scala Toolbox regression in 2.11.8 can't parse code strings wherein a package is defined

Revert to 2.11.7 for now because:

case class Test1() is fine,

but

package test

case class Test2()

fails with the error:

ToolBoxError: : reflective compilation has failed: 
[error] 
[error] illegal start of definition  (StringInputParser.scala:65)
[error] avrohugger.input.parsers.StringInputParser$$anonfun$1.apply(StringInputParser.scala:65)
[error] avrohugger.input.parsers.StringInputParser$$anonfun$1.apply(StringInputParser.scala:64)
[error] avrohugger.input.parsers.StringInputParser.tryCaseClass$1(StringInputParser.scala:64)
[error] avrohugger.input.parsers.StringInputParser.tryIDL$1(StringInputParser.scala:47)
[error] avrohugger.input.parsers.StringInputParser.tryProtocol$1(StringInputParser.scala:38)
[error] avrohugger.input.parsers.StringInputParser.trySchema$1(StringInputParser.scala:29)
[error] avrohugger.input.parsers.StringInputParser.getSchemas(StringInputParser.scala:77)
[error] avrohugger.Generator.stringToStrings(Generator.scala:121)
[error] specific.SchemaGenSpec$$anonfun$1$$anonfun$apply$44.apply(SchemaGenSpec.scala:821)
[error] specific.SchemaGenSpec$$anonfun$1$$anonfun$apply$44.apply(SchemaGenSpec.scala:805)

Prevent records named Product

When a record is called Product, and this should be very common, there will be compiler errors because Product is part of the Scala standard library. It should either not be possible to use Product as a name, or it must be prefixed or whatever other workaround applies.

Support deeply nested array types

Given:

{
  "namespace": "com.acme",
  "name":      "a",
  "type":      "record",
  "fields": [
    { "name": "words", "type":  { "type": "array", "items": "string" }}
  ]
}

{
  "namespace": "com.acme",
  "name":      "b",
  "type":      "record",
  "fields": [
    { "name": "as", "type":  { "type": "array", "items": "com.acme.a" }}
  ]
}

{
  "namespace": "com.acme",
  "name":      "c",
  "type":      "record",
  "fields": [
    { "name": "bs",  "type":  { "type": "array", "items": "com.acme.b" }}
  ]
}

I get the following error:

[error]  not enough arguments for method apply: (words: Array[String])com.acme.model.a in object a.
[error] Unspecified value parameter words.
[error]           a()
[error]            ^

Revisit default values for empty constructor

Hi,

the required empty constructor initializes values to "non-empty" values, i.e. Option with Some, List with one element, Int to 1, Long to 1L, and so on. Could we change this to empty values, None, Nil, 0, 0L, ...?

Cheers

Marius

Generated Scala file imports old Scavro package

When generating Scavro case classes, the generated Scala file contains:

import com.oysterbooks.scavro.{AvroMetadata, AvroReader, AvroSerializeable}

This seems to be an outdated Scavro package name, and will not compile with the current Scavro release "org.oedura" %% "scavro" % "1.0.1".

When this is changed to the following, it compiles successfully:

import org.oedura.scavro.{AvroMetadata, AvroReader, AvroSerializeable}

Read comments from IDLs

Avro supports Javadoc-style comments in IDL files:

https://avro.apache.org/docs/1.7.7/idl.html#minutiae_comments

Explicitly referenced types from other namespaces inside UNION types aren't imported

getRecordSchemas fails to account for the possible presence of types from another namespace mentioned inside UNION types, meaning that those types will never be imported at the end of the day.

Support Maps

Support for @JsonCreator()

Would it be possible to add support for @JsonCreator() in the code generation of specific records? This would allow for example to let Jackson map Json to the generated classes whilst honoring optionals.

The resulting code would look like:


...
import com.fasterxml.jackson.annotation.JsonCreator
...

case class MyPropertyRecord @JsonCreator() (var propertyName: String, var tenantName: String) extends org.apache.avro.specific.SpecificRecordBase {

...

Remove final from objects

IntelliJ is telling me 'final' modifier is redundant for objects, so I guess we could remove that from the generator.

Generate Scala 2.12 compatible classes

Despite the README says generated classes are Scala 2.12 compatible, I got this error when using generated case classes in a Scala 2.12 project: java.lang.NoClassDefFoundError: scala/Product$class.
The avrohugger dependency is only available for Scala 2.11 so when I build my JAR I can't use the Scala 2.12 compiler in the same project. I want to use the same project to generate Scala source code and compile and build a JAR of classes.

Add test for top-level union

like: https://github.com/julianpeeters/avro-scala-macro-annotations/pull/37/files

GenericRecord -> Enumeration doesn't use .Value

Generating classes using the generic task, it seems Enumerations, when referenced in other classes, lack the .Value notation so the classes do not compile.

Conflicting classname of same name classes in namespaces.

Hi, I'm trying to use avrohugger but I came across conflicting the same name in namespaces with a compile error. I think generated class should be using the full path and not using import syntax.
How to fix this problem in the current version?

Thanks for your great job.
Cheers,

generated class

/** MACHINE-GENERATED FROM AVRO SCHEMA. DO NOT EDIT DIRECTLY */
package root.user_dim

import root.user_dim.user_properties.Value

import root.user_dim.user_properties.value.Value
// the name of Value class is conflicting in this class
case class User_Properties(key: Option[String], value: Option[Value])

ideally grenerated class

package root.user_dim
// using full path
case class User_Properties(key: Option[String], value: Option[root.user_dim.user_properties.Value])

given example.avsc .

{
  "type" : "record",
  "name" : "Root",
  "fields" : [ {
        "name" : "user_properties",
        "type" : {
          "type" : "array",
          "items" : {
            "type" : "record",
            "name" : "User_Properties",
            "namespace" : "root.user_dim",
            "fields" : [ {
              "name" : "value",
              "type" : [ {
                "type" : "record",
                "name" : "Value",
                "namespace" : "root.user_dim.user_properties",
                "fields" : [ {
                  "name" : "value",
                  "type" : [ {
                    "type" : "record",
                    "name" : "Value",
                    "namespace" : "root.user_dim.user_properties.value",
                    "fields" : [ {
                      "name" : "string_value",
                      "type" : [ "string", "null" ]
                    }]
                  }, "null" ]
                }]
              }, "null" ]
            } ]
          }
        }
      } ]
}

Classes are not entirely overwritten

It seems that generated classes are deleted and re-written, but just the content overwritten. So if a new class definition is shorter than the previous one, you can still see leftovers from the previous class and you get compile errors.

Expected a single top-level record, found a union of more than one type

Having an enum with schemas seems to not be supported. The example below is arbitrary, and could be worked around, but it's a recurring pattern.

[
{
"namespace": "bom.aaa.types",
"type": "record",
"name": "EmailFactor",
"fields": [
{
"name": "email",
"type": "string"
}
]
},
{
"namespace": "bom.aaa.types",
"type": "record",
"name": "PhoneFactor",
"fields": [
{
"name": "phone",
"type": "string"
}
]
},
{
"namespace": "bom.aaa.types",
"type": "record",
"name": "TwoFactorAuthentication",
"fields": [
{
"name": "factor",
"type": ["bom.aaa.types.EmailFactor", "bom.aaa.types.PhoneFactor"]
},
{
"name": "tokenId",
"type": "string"
}
]
}
]

In this case, the following error results:
fpatton:aaa fpatton$ sbt avro:generate
[info] Loading global plugins from /Users/fpatton/.sbt/0.13/plugins
[info] Loading project definition from /Users/fpatton/src/arena/git/aaa/project
[info] Set current project to aaa (in build file:/Users/fpatton/src/arena/git/aaa/)
[info] Compiling AVSC /Users/fpatton/src/arena/git/aaa/src/main/avro/bom/aaa/types/GeneralContext.avsc
[info] Compiling AVSC /Users/fpatton/src/arena/git/aaa/src/main/avro/bom/aaa/types/UserLogEntity.avsc
[info] Compiling AVSC /Users/fpatton/src/arena/git/aaa/src/main/avro/bom/aaa/types/User.avsc
[info] Compiling AVSC /Users/fpatton/src/arena/git/aaa/src/main/avro/bom/aaa/types/TwoFactorAuthentication.avsc
java.lang.RuntimeException: Expected a single top-level record, found a union of more than one type: List({"type":"record","name":"EmailFactor","namespace":"bom.aaa.types","fields":[{"name":"email","type":"string"}]}, {"type":"record","name":"PhoneFactor","namespace":"bom.aaa.types","fields":[{"name":"phone","type":"string"}]}, {"type":"record","name":"TwoFactorAuthentication","namespace":"bom.aaa.types","fields":[{"name":"factor","type":[{"type":"record","name":"EmailFactor","fields":[{"name":"email","type":"string"}]},{"type":"record","name":"PhoneFactor","fields":[{"name":"phone","type":"string"}]}]},{"name":"tokenId","type":"string"}]})
at scala.sys.package$.error(package.scala:27)
at avrohugger.input.parsers.FileInputParser$$anonfun$getSchemas$1.apply(FileInputParser.scala:37)
at avrohugger.input.parsers.FileInputParser$$anonfun$getSchemas$1.apply(FileInputParser.scala:32)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at avrohugger.input.parsers.FileInputParser.getSchemas(FileInputParser.scala:32)
at avrohugger.Generator.fileToFile(Generator.scala:53)
at sbtavrohugger.FileWriter$$anonfun$generateCaseClasses$2.apply(FileWriter.scala:24)
at sbtavrohugger.FileWriter$$anonfun$generateCaseClasses$2.apply(FileWriter.scala:22)
at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:60)
at scala.collection.mutable.MutableList.foreach(MutableList.scala:30)
at sbtavrohugger.FileWriter$.generateCaseClasses(FileWriter.scala:22)
at sbtavrohugger.formats.standard.GeneratorTask$$anonfun$caseClassGeneratorTask$1$$anonfun$1.apply(GeneratorTask.scala:36)
at sbtavrohugger.formats.standard.GeneratorTask$$anonfun$caseClassGeneratorTask$1$$anonfun$1.apply(GeneratorTask.scala:34)
at sbt.FileFunction$$anonfun$cached$1.apply(Tracked.scala:235)
at sbt.FileFunction$$anonfun$cached$1.apply(Tracked.scala:235)
at sbt.FileFunction$$anonfun$cached$2$$anonfun$apply$3$$anonfun$apply$4.apply(Tracked.scala:249)
at sbt.FileFunction$$anonfun$cached$2$$anonfun$apply$3$$anonfun$apply$4.apply(Tracked.scala:245)
at sbt.Difference.apply(Tracked.scala:224)
at sbt.Difference.apply(Tracked.scala:206)
at sbt.FileFunction$$anonfun$cached$2$$anonfun$apply$3.apply(Tracked.scala:245)
at sbt.FileFunction$$anonfun$cached$2$$anonfun$apply$3.apply(Tracked.scala:244)
at sbt.Difference.apply(Tracked.scala:224)
at sbt.Difference.apply(Tracked.scala:200)
at sbt.FileFunction$$anonfun$cached$2.apply(Tracked.scala:244)
at sbt.FileFunction$$anonfun$cached$2.apply(Tracked.scala:242)
at sbtavrohugger.formats.standard.GeneratorTask$$anonfun$caseClassGeneratorTask$1.apply(GeneratorTask.scala:38)
at sbtavrohugger.formats.standard.GeneratorTask$$anonfun$caseClassGeneratorTask$1.apply(GeneratorTask.scala:31)
at scala.Function6$$anonfun$tupled$1.apply(Function6.scala:35)
at scala.Function6$$anonfun$tupled$1.apply(Function6.scala:34)
at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47)
at sbt.$tilde$greater$$anonfun$$u2219$1.apply(TypeFunctions.scala:40)
at sbt.std.Transform$$anon$4.work(System.scala:63)
at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:226)
at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:226)
at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:17)
at sbt.Execute.work(Execute.scala:235)
at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:226)
at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:226)
at sbt.ConcurrentRestrictions$$anon$4$$anonfun$1.apply(ConcurrentRestrictions.scala:159)
at sbt.CompletionService$$anon$2.call(CompletionService.scala:28)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
error Expected a single top-level record, found a union of more than one type: List({"type":"record","name":"EmailFactor","namespace":"bom.aaa.types","fields":[{"name":"email","type":"string"}]}, {"type":"record","name":"PhoneFactor","namespace":"bom.aaa.types","fields":[{"name":"phone","type":"string"}]}, {"type":"record","name":"TwoFactorAuthentication","namespace":"bom.aaa.types","fields":[{"name":"factor","type":[{"type":"record","name":"EmailFactor","fields":[{"name":"email","type":"string"}]},{"type":"record","name":"PhoneFactor","fields":[{"name":"phone","type":"string"}]}]},{"name":"tokenId","type":"string"}]})
[error] Total time: 0 s, completed Nov 17, 2015 4:31:33 PM

Generating generic model classes results in StackOverflow error

I will investigate this a bit more later on, just pasting the stack trace for now.

This happens after generating the records. The project mainly contains records and some helper classes. It is the Scala.js part of a shared model project, the JVM part (using SpecificRecord) compiles just fine.

info] Compiling 89 Scala sources to /Users/.../Workspace/.../model/.js/target/scala-2.11/classes...
java.lang.StackOverflowError
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:760)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at scala.tools.nsc.typechecker.Typers$Typer.silent(Typers.scala:680)
    at scala.tools.nsc.typechecker.Typers$Typer.normalTypedApply$1(Typers.scala:4523)
    at scala.tools.nsc.typechecker.Typers$Typer.typedApply$1(Typers.scala:4579)
        ...

datetime support

Hi,
Do you have any intent to support logical types of time-millis etc.?
If not, where would be the right place inside your code to add this support?
Thanks!

Default maps are broken

Looks like I missed a test case... This record will reproduce:

record DefaultMap {
  map<string> test = {"Hello" : "world", "Merry" : "Christmas"};
}

The problematic code is in DefaultValueMatcher.scala:

    case Schema.Type.MAP => {
      val kvps = LIST(node.getFields.toList.map(e => LIT(e.getKey) ANY_-> fromJsonNode(e.getValue, schema.getValueType)))
      MAKE_MAP(kvps)
    }

Submitting a PR in a minute...

Bubble up error messages

Currently there's no way of knowing that the problem with your schema is, you just don't see any generated classes. So it'd be nice to see Avro exceptions in the sbt log output.

Support for Unions with more than two items

Consider the following schema

 {
         "name":"transformer",
         "type":[
             "null",
             {
                "name":"Transformer",
                "type":"record",
                "fields":[
                    {
                        "name":"builtIn",
                        "type": { "name": "BuiltIn", "type": "enum", "symbols": ["SNOWPLOW_TO_NESTED_JSON"] }
                    }
                ]
             },
             {
                 "name":"JavascriptTransformer",
                 "type":"record",
                 "fields":[
                     {
                         "name":"javascript",
                         "type": "string"
                     }
                 ]
              }
         ]
      }

This throws runtime exception unions not yet supported beyond nullable fields
from here

The Union in the above schema does have a null. But is violating (>2) number of items in union. However, Avro specs do permit multiple named records in a union.

See discussion here snowplow/kinesis-tee#9

Please advice.

Giving a field the name "field" causes generated specific record get/set variable shadowing and return of the field index instead of field value

"field" is a valid name for a record field but it is shadowed by the argument 'field:Int' for getting/setting by field id. Because of the cast to AnyRef the compiler doesn't catch this and you get a ClassCastException of integer to char sequence.

Generated method parameter should be renamed to something illegal in avro or something like "____field".

Example:
avro:
...
"fields" : [
{
"name" : "field",
"type" : "string"
}
]
...

generated specific case class:
case class MyCaseClass(var field: String, ....
def get(field: Int): AnyRef = {
field match {
case pos if pos == 0 => {
field // returning method's argument cast to AnyRef. thankfully this was a String in the record definition and not an Int. imagine the subtle bugs on that one hah!
}.asInstanceOf[AnyRef]
...

Location of problem:
DEF("get", AnyRefClass) withParams(PARAM("field", IntClass)) on line 28 of GetGenerator

Output avro enum as Scala String

Spark does not have a native representation of enums. This makes using avrohugger with Spark very difficult when the avro type has enums in it. The easiest way around this is to add the ability for avrohugger to read in enum types from the avro schema and generate these types as String. While I recognize this is not an accurate mapping of the supplied type, this will make loading typed data into Spark a LOT easier.

Update to avro 1.8.1

Avro 1.8.1 was released quite some time ago.
Is it possible to update the plugin?
https://github.com/julianpeeters/avrohugger/blob/master/project/Build.scala#L7

Generate Case-classes for Avro Specific API

what's the status of this? did you get any closer, have any hints or met some obstacles?
If that was relatively simple, I may try to implement the extending of SpecificRecordBase - our current use-case would be reading/writing Parquet-Avro in Scalding (it works fine generating the quite nasty Java-like SpecificRecord classes, but case-classes would be much nicer)

Use JavaConverters insetad of JavaConversions

starting from 2.12.0 classes generated by avrohugger will produce a warning (or an error if -Xfatal-warnings is enabled, which is not unusual)

object JavaConversions in package collection is deprecated (since 2.12.0): use JavaConverters
[error]             scala.collection.JavaConversions.mapAsScalaMap(map).toMap map { kvp =>

I can send a PR sometime this week

Generated SpecificRecord classes with Lists fail when reading using Avro's SpecificDatumReader

I have an avro schema that includes a list field. The generated SpecificRecord avrohugger code generated includes a get implementation as follows:

  def get(field: Int): AnyRef = {
    field match {
      case pos if pos == 0 => {
        java.util.Arrays.asList(({
          tests map { x =>
            x
          }
        }: _*))
      }.asInstanceOf[AnyRef]
      case _ => new org.apache.avro.AvroRuntimeException("Bad index")
    }
  }

org.apache.avro.specific.SpecificDatumReader explicitly attempts to re-use the specific record objects when reading subsequent records, and for List fields it attempts to call .clear() on the list it read on the previous object. The list returned by java.util.Arrays.asList does not support .clear, as shown by the following stack trace of a spark job:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.UnsupportedOperationException
    at java.util.AbstractList.remove(AbstractList.java:161)
    at java.util.AbstractList$Itr.remove(AbstractList.java:374)
    at java.util.AbstractList.removeRange(AbstractList.java:571)
    at java.util.AbstractList.clear(AbstractList.java:234)
    at org.apache.avro.generic.GenericDatumReader.newArray(GenericDatumReader.java:330)
    at org.apache.avro.reflect.ReflectDatumReader.newArray(ReflectDatumReader.java:77)
    at org.apache.avro.reflect.ReflectDatumReader.readArray(ReflectDatumReader.java:119)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:155)
    at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
    at org.apache.avro.reflect.ReflectDatumReader.readField(ReflectDatumReader.java:230)
    at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:155)
    at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
    at org.apache.avro.reflect.ReflectDatumReader.readField(ReflectDatumReader.java:230)
    at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
    at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
    at org.apache.avro.mapred.AvroRecordReader.next(AvroRecordReader.java:66)
    at org.apache.avro.mapred.AvroRecordReader.next(AvroRecordReader.java:32)
    at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:248)
    at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:216)
    at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
    at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
    at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
    at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply$mcV$sp(PairRDDFunctions.scala:1108)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply(PairRDDFunctions.scala:1108)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply(PairRDDFunctions.scala:1108)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1285)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1116)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1095)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
    at org.apache.spark.scheduler.Task.run(Task.scala:70)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)

It looks like the fix is for the generated code to create a new ArrayList rather than use Arrays.asList, I was hoping to submit a PR but I couldn't figure out how to make this change. I'm happy to create a standalone project that demonstrates this issue if that would be helpful.

I'm using java 1.7, avro 1.7.7 and sbt-avrohugger 0.6.1.

Polish overrides

consider adding javaSource so java files (e.g. Specific enum) can get custom output directories that make sense
Override settings may have a permanent output file in the custom scalaSource that survived clean during the test's creation, by hiding in the custom, non-target outputdir

Duplicated classes generated when importing shemas from different namespaces

When importing schemas from different namespaces the classes for the imported types are generated both in the imported schema and in the importing one.

I reproduced the issue as unit test in the pull request I'm about to submit

julianpeeters / avrohugger Goto Github PK

avrohugger's Introduction

avrohugger

Table of contents

Generates Scala case classes in various formats:

Supports generating case classes with arbitrary fields of the following datatypes:

Logical Types Support:

Protocol Support:

Doc Support:

Usage

avrohugger-core

Get the dependency with:

Description:

Example

Customizable Type Mapping:

Customizable Namespace Mapping:

avrohugger-filesorter

Get the dependency with:

Description:

Example:

avrohugger-tools

Warnings

Testing

Credits

Criticism is appreciated.

Fork away, just make sure the tests pass before sending a pull request.

avrohugger's People

Contributors

Stargazers

Watchers

Forkers

avrohugger's Issues

Recommend Projects

Recommend Topics

Recommend Org

`avrohugger-core`

`avrohugger-filesorter`

`avrohugger-tools`