Giter Site home page Giter Site logo

Retries about kafka-net HOT 9 OPEN

jroland avatar jroland commented on June 9, 2024
Retries

from kafka-net.

Comments (9)

Jroland avatar Jroland commented on June 9, 2024

I agree Timothy, this is an area of the driver that has zero tests and is not well flushed out. There is a little TODO comment in the BrokerRouter class which is basically saying this. This is the next bit that I am working on. I appreciate the comment, let me know if you have any other thoughts on areas to concentrate on.

from kafka-net.

Jroland avatar Jroland commented on June 9, 2024

Here is the relevant documentation which describes this:
The client does not need to keep polling to see if the cluster has changed; it can fetch metadata once when it is instantiated cache that metadata until it receives an error indicating that the metadata is out of date. This error can come in two forms: (1) a socket error indicating the client cannot communicate with a particular broker, (2) an error code in the response to a request indicating that this broker no longer hosts the partition for which data was requested.

from kafka-net.

Jroland avatar Jroland commented on June 9, 2024

Also the documentation states:
https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-MetadataResponse

Leader
The node id for the kafka broker currently acting as leader for this partition. If no leader exists because we are in the middle of a leader election this id will be -1.

We need to handle retries against all these cases listed.

from kafka-net.

Jroland avatar Jroland commented on June 9, 2024

Relevant Error Codes
LeaderNotAvailable
5
This error is thrown if we are in the middle of a leadership election and there is currently no leader for this partition and hence it is unavailable for writes.

NotLeaderForPartition
6
This error is thrown if the client attempts to send messages to a replica that is not the leader for some partition. It indicates that the clients metadata is out of date.

And possibly
ConsumerCoordinatorNotAvailableCode 15 The broker returns this error code for consumer metadata requests or offset commit requests if the offsets topic has not yet been created.

from kafka-net.

tnachen avatar tnachen commented on June 9, 2024

Yes you need to handle soft failures like LeaderNotAvailabe, etc. There are hard failures too such as MessageTooLarge or UnknownTopicAndPartition that retry cannot recover at all. So clients should only retry on soft failures. Currently we do exponential backoff on retries too to avoid hammering the brokers too much especially on timeouts since it can hammer the brokers too much and cause slow perf.

from kafka-net.

vchekan avatar vchekan commented on June 9, 2024

Just a note. "LeaderNotAvailable" is returned for auto-created topics.
https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/server/KafkaApis.scala#L664
So there it is a bug to accept all responses as successful ones in kafka-net:
https://github.com/Jroland/kafka-net/blob/master/src/kafka-net/BrokerRouter.cs#L251
Only those with error=0 should be accepted:
https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/common/ErrorMapping.scala#L32

from kafka-net.

vchekan avatar vchekan commented on June 9, 2024

Hi @Jroland,
Have you started working on this issue? If not yet, I'd like to give it a try.

Here are requirements I'd like to satisfy.

Requirements:

  1. Driver should distinguish intermittent errors from permanent ones.
  2. Driver should attempt to recover from intermittent errors.
  3. Recovery should be limited by timeout.

API:
4. Client should be notified that intermittent error happen.
5. Client should be notified that permanent error happen.
6. Client should be notified that intermittent error timed out and becomes permanent one.
7. When error notification happen, client should be provided with messages which have failed.
8. Upon shutdown, a flush of internal buffer should be attempted. Flush time should be limited with timeout.
9. After flush attempt, If there are messages left in the buffer, client should be notified and list of messages should be provided.

What do you think?

from kafka-net.

tnachen avatar tnachen commented on June 9, 2024

Also one thing to consider is that some failures (and probably most of the soft failures) implies a Metadata update is required.

from kafka-net.

Jroland avatar Jroland commented on June 9, 2024

Hey @vchekan, the only thing I have started to do with this issue is bubble up the disconnection event from the kafka socket class so the BrokerRouter is aware that we have lost connection to a broker. Otherwise I do not have any of the logic implemented for gracefully handling metadata changes.

So go ahead and start putting something together for handling this. It is the most urgent issue for the driver at the moment by far. My advice to you here on this one is start small and implement it in testable sections. It might be hard to pull it into the main branch otherwise.

from kafka-net.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.