Comments (9)
I agree Timothy, this is an area of the driver that has zero tests and is not well flushed out. There is a little TODO comment in the BrokerRouter class which is basically saying this. This is the next bit that I am working on. I appreciate the comment, let me know if you have any other thoughts on areas to concentrate on.
from kafka-net.
Here is the relevant documentation which describes this:
The client does not need to keep polling to see if the cluster has changed; it can fetch metadata once when it is instantiated cache that metadata until it receives an error indicating that the metadata is out of date. This error can come in two forms: (1) a socket error indicating the client cannot communicate with a particular broker, (2) an error code in the response to a request indicating that this broker no longer hosts the partition for which data was requested.
from kafka-net.
Also the documentation states:
https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-MetadataResponse
Leader
The node id for the kafka broker currently acting as leader for this partition. If no leader exists because we are in the middle of a leader election this id will be -1.
We need to handle retries against all these cases listed.
from kafka-net.
Relevant Error Codes
LeaderNotAvailable
5
This error is thrown if we are in the middle of a leadership election and there is currently no leader for this partition and hence it is unavailable for writes.
NotLeaderForPartition
6
This error is thrown if the client attempts to send messages to a replica that is not the leader for some partition. It indicates that the clients metadata is out of date.
And possibly
ConsumerCoordinatorNotAvailableCode 15 The broker returns this error code for consumer metadata requests or offset commit requests if the offsets topic has not yet been created.
from kafka-net.
Yes you need to handle soft failures like LeaderNotAvailabe, etc. There are hard failures too such as MessageTooLarge or UnknownTopicAndPartition that retry cannot recover at all. So clients should only retry on soft failures. Currently we do exponential backoff on retries too to avoid hammering the brokers too much especially on timeouts since it can hammer the brokers too much and cause slow perf.
from kafka-net.
Just a note. "LeaderNotAvailable" is returned for auto-created topics.
https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/server/KafkaApis.scala#L664
So there it is a bug to accept all responses as successful ones in kafka-net:
https://github.com/Jroland/kafka-net/blob/master/src/kafka-net/BrokerRouter.cs#L251
Only those with error=0 should be accepted:
https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/common/ErrorMapping.scala#L32
from kafka-net.
Hi @Jroland,
Have you started working on this issue? If not yet, I'd like to give it a try.
Here are requirements I'd like to satisfy.
Requirements:
- Driver should distinguish intermittent errors from permanent ones.
- Driver should attempt to recover from intermittent errors.
- Recovery should be limited by timeout.
API:
4. Client should be notified that intermittent error happen.
5. Client should be notified that permanent error happen.
6. Client should be notified that intermittent error timed out and becomes permanent one.
7. When error notification happen, client should be provided with messages which have failed.
8. Upon shutdown, a flush of internal buffer should be attempted. Flush time should be limited with timeout.
9. After flush attempt, If there are messages left in the buffer, client should be notified and list of messages should be provided.
What do you think?
from kafka-net.
Also one thing to consider is that some failures (and probably most of the soft failures) implies a Metadata update is required.
from kafka-net.
Hey @vchekan, the only thing I have started to do with this issue is bubble up the disconnection event from the kafka socket class so the BrokerRouter is aware that we have lost connection to a broker. Otherwise I do not have any of the logic implemented for gracefully handling metadata changes.
So go ahead and start putting something together for handling this. It is the most urgent issue for the driver at the moment by far. My advice to you here on this one is start small and implement it in testable sections. It might be hard to pull it into the main branch otherwise.
from kafka-net.
Related Issues (20)
- kafka-net for .NET Core HOT 2
- Max message size 4096 bytes? HOT 1
- OOM error
- Current development status ? HOT 1
- Does this library already contain support for Kafka Streams introduced in Kafka 0.10 HOT 1
- Send producer problem when the service broker is offline HOT 1
- Kafka-net Producer hangs When Sending Inside ArcMap
- Best way to persist and start from a Offset HOT 1
- "This protocol version is not supported." HOT 1
- Is it possible to alter the Whitelist while consuming messages?
- Can i use this library for open tracing using zipkin
- I am getting KafkaNet.Protocol.ResponseTimeoutException when sending data to kafka topic.Is there any way to overcome this issue?
- SendMessageAsync of Producer not completing
- hotspot issue with sendMessageAsync on partition level HOT 1
- It cannot connect to the localhost:9092
- kafka-net .Net Core version HOT 1
- How to List all Topics
- Support of StreamsAPI or connector API
- Buffer underrun. Increasing buffer size
- Kafka Cluster Issue when API Version =1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kafka-net.