Giter Site home page Giter Site logo

Comments (8)

pomadchin avatar pomadchin commented on July 2, 2024 1

hey @vinkaga I will find time to try smth out later this week, however, to make a more general solution I think it is possible to do a semiautomatic derivation of TypedEncoders for pojo classes via https://github.com/limansky/beanpuree

from frameless.

pomadchin avatar pomadchin commented on July 2, 2024

Hey @vinkaga! You'd need to convert Row into the MyTopPojo yourself. This is not the best way to do it but you can use Spark machinery to handle it:

// haven't tested this code though
// can be dervied via the Spark inbuilt machinery as well
val encoder: ExpressionEncoder[MyTopPojo] = TypedExpressionEncoder[T].asInstanceOf[ExpressionEncoder[T]]

// be careful with these functions though, they are pretty expensive
// serializer Row => InternalRow
lazy val toInternalRow = RowEncoder(encoder.schema).createSerializer()
// deserializer InternalRow => MyTopPojo
lazy val fromInternalRow = encoder.createDeserializer()

val dsr: Dataset[MyTopPojo] = 
  session.read.format("avro").load("../00000.avro").map(row => deserializer(serializer(row).copy()))
val typed: TypedDataset[MyTopPojo] = TypedDataset.create(dsr)

from frameless.

vinkaga avatar vinkaga commented on July 2, 2024

Here's what I tried

val encoder: ExpressionEncoder[MyTopPojo] = ExpressionEncoder[MyTopPojo]
lazy val serializer = RowEncoder(encoder.schema).createSerializer()
lazy val deserializer = encoder.createDeserializer()
val dsr: Dataset[MyTopPojo] = spark.read.format("avro")
    .load("s3://...")
    .map(row => deserializer(serializer(row).copy()))
val typed: TypedDataset[MyTopPojo] = TypedDataset.create(dsr)

I got error on map line

No implicit arguments of type: Encoder[MyTopPojo]

And on the last line

No implicit arguments of type: TypedEncoder[MyTopPojo]

from frameless.

pomadchin avatar pomadchin commented on July 2, 2024

@vinkaga oh that is PoJo not a case class, reading bad. You would need to manually define the TypedEncoder for your type.

I'm using this helper to deal with it (actually mb it is worth moving it into frameless).

Usage example:

public class MyTopPojo {

    public String name;
    public int age;

    public MyTopPojo(String name, int age) {
        this.name = name;
        this.age = age;
    }
}
implicit val myTopPojoTypedEncoder: TypedEncoder[MyTopPojo] =
  ManualTypedEncoder.newInstance[MyTopPojo](
    fields = List(
      RecordEncoderField(0, "name", TypedEncoder[String]),
      RecordEncoderField(1, "age", TypedEncoder[Int])
    )
  )

from frameless.

vinkaga avatar vinkaga commented on July 2, 2024

@pomadchin, MyTopPojo is several levels deep hierarchical Pojo. Is it possible to have the encoder be built without extensive manual work?

from frameless.

pomadchin avatar pomadchin commented on July 2, 2024

@vinkaga I actually noticed in your example:

implicit val encoder: ExpressionEncoder[MyTopPojo] = ExpressionEncoder[MyTopPojo]
lazy val serializer = RowEncoder(encoder.schema).createSerializer()
lazy val deserializer = encoder.createDeserializer()
val dsr: Dataset[MyTopPojo] = spark.read.format("avro")
    .load("s3://...")
    .map(row => deserializer(serializer(row).copy()))

from frameless.

vinkaga avatar vinkaga commented on July 2, 2024

@pomadchin, with that, I am still getting the errors. Not sure how to proceed further.

from frameless.

cchantep avatar cchantep commented on July 2, 2024

I think this can be closed for now.

from frameless.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.