Comments (8)
hey @vinkaga I will find time to try smth out later this week, however, to make a more general solution I think it is possible to do a semiautomatic derivation of TypedEncoders for pojo classes via https://github.com/limansky/beanpuree
from frameless.
Hey @vinkaga! You'd need to convert Row
into the MyTopPojo
yourself. This is not the best way to do it but you can use Spark machinery to handle it:
// haven't tested this code though
// can be dervied via the Spark inbuilt machinery as well
val encoder: ExpressionEncoder[MyTopPojo] = TypedExpressionEncoder[T].asInstanceOf[ExpressionEncoder[T]]
// be careful with these functions though, they are pretty expensive
// serializer Row => InternalRow
lazy val toInternalRow = RowEncoder(encoder.schema).createSerializer()
// deserializer InternalRow => MyTopPojo
lazy val fromInternalRow = encoder.createDeserializer()
val dsr: Dataset[MyTopPojo] =
session.read.format("avro").load("../00000.avro").map(row => deserializer(serializer(row).copy()))
val typed: TypedDataset[MyTopPojo] = TypedDataset.create(dsr)
from frameless.
Here's what I tried
val encoder: ExpressionEncoder[MyTopPojo] = ExpressionEncoder[MyTopPojo]
lazy val serializer = RowEncoder(encoder.schema).createSerializer()
lazy val deserializer = encoder.createDeserializer()
val dsr: Dataset[MyTopPojo] = spark.read.format("avro")
.load("s3://...")
.map(row => deserializer(serializer(row).copy()))
val typed: TypedDataset[MyTopPojo] = TypedDataset.create(dsr)
I got error on map line
No implicit arguments of type: Encoder[MyTopPojo]
And on the last line
No implicit arguments of type: TypedEncoder[MyTopPojo]
from frameless.
@vinkaga oh that is PoJo not a case class, reading bad. You would need to manually define the TypedEncoder
for your type.
I'm using this helper to deal with it (actually mb it is worth moving it into frameless).
Usage example:
public class MyTopPojo {
public String name;
public int age;
public MyTopPojo(String name, int age) {
this.name = name;
this.age = age;
}
}
implicit val myTopPojoTypedEncoder: TypedEncoder[MyTopPojo] =
ManualTypedEncoder.newInstance[MyTopPojo](
fields = List(
RecordEncoderField(0, "name", TypedEncoder[String]),
RecordEncoderField(1, "age", TypedEncoder[Int])
)
)
from frameless.
@pomadchin, MyTopPojo
is several levels deep hierarchical Pojo. Is it possible to have the encoder be built without extensive manual work?
from frameless.
@vinkaga I actually noticed in your example:
implicit val encoder: ExpressionEncoder[MyTopPojo] = ExpressionEncoder[MyTopPojo]
lazy val serializer = RowEncoder(encoder.schema).createSerializer()
lazy val deserializer = encoder.createDeserializer()
val dsr: Dataset[MyTopPojo] = spark.read.format("avro")
.load("s3://...")
.map(row => deserializer(serializer(row).copy()))
from frameless.
@pomadchin, with that, I am still getting the errors. Not sure how to proceed further.
from frameless.
I think this can be closed for now.
from frameless.
Related Issues (20)
- Snapshot publish failed
- Compatibility with Spark 3.2.1 HOT 11
- Cats-effect 3 roadmap HOT 1
- CI release failure HOT 7
- How should parse and convert data from an external medium in a generic way? HOT 2
- Frameless 0.13 release HOT 2
- spark 3.4 support - replacing dataTypeFor logic HOT 8
- 3.4 AgnosticEncoder support - Spark Connect HOT 1
- [feature] DatasetT HOT 1
- AVG and KMeans tests fix HOT 1
- Add scalafmt HOT 1
- Add support for TypedDeltaTable
- use HOT 1
- Iterate over TypedColumns with evidence
- Spark 3.5 update HOT 10
- type inference for .opt no longer works without explicit type argument in Scala 2.13.x HOT 3
- Defective schema generation on array/seq column HOT 5
- scalafmt was not maintained for some of the code? HOT 2
- Add TypedEncoder for shapeless Record. HOT 3
- Spark 4.0 / DBR 14.2+ - bleeding edge changes HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from frameless.