Giter Site home page Giter Site logo

Comments (5)

cerveada avatar cerveada commented on August 15, 2024

Abris is able to detect the version of writer schema and change the values according to newer schema version specified (reader schema). Like using default value for new field in the newer schema. You define the reader schema when you call Abris.

from abris.

talperetz1 avatar talperetz1 commented on August 15, 2024

Hi @cerveada Thanks for the response,
I didnt quite understood how its work.
I set my abris config and as i understood it happened only once when the query is first started

val abrisConfigBase = AbrisConfig.fromConfluentAvro .downloadReaderSchemaByLatestVersion .andRecordNameStrategy(schemaName,"") .usingSchemaRegistry(schemaRegistryUrl)

Now lets say that when I started the query and set the abrisConfig the latest version was 1 ( with two columns x,y)
and then I am read from kafka
spark.readStream.......

After a while my schema as been evolved and now its have a new optinal column (x,y,z)
So when I am reading the next batch I have some messages with the new version (x,y,z) and some message that still in kafka and yet consumed with the older version (x,y).
Now when I am using Abris to create a dataframe I am using my abris config which when it created the latest version was 1 (x,y) so the new messages which contain column z will miss this column. And its ok I understand the the spark must be restarted for the newer schema will be downloaded.
But my question is when I am reading from Kafka and receive the value column with Avro binary format how I can know that I receive some messages with schema version A and some messages with schema version B.

from abris.

cerveada avatar cerveada commented on August 15, 2024

Well, Abris detects the wrtiter schema Id here:

There is no mechanism to give this information to the user. It just tries to convert the data to reader schema and it will be successful if these two schemas are compatible.

Some ideas:

  • You can read the id yourself and detect the changes.

  • There is also an error handler in Abris that will allow you to react to failed records, but that may be too late for you.

from abris.

talperetz1 avatar talperetz1 commented on August 15, 2024

Sorry but what is exactly this buffer?

from abris.

cerveada avatar cerveada commented on August 15, 2024

Look into the linked source code to see more.

The schema id is part of the confluent avro format
https://github.com/AbsaOSS/ABRiS#confluent-avro-format

from abris.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.