Giter Site home page Giter Site logo

Comments (12)

dibbhatt avatar dibbhatt commented on June 2, 2024 1

This I haven't seen earlier. Have you able to solve this or still having same issue ?

To fix the ProcessedOffsetManager issue with Spark 2.0, there is a pull request you can refer to

https://github.com/dibbhatt/kafka-spark-consumer/pull/32/files

I haven't merge this as it will break for Spark 1.6. But you can try this changes and let me know if that works for you.

I need to find a way to handle this for both Spark 2.0 and earlier version of Spark.

from kafka-spark-consumer.

vroy007 avatar vroy007 commented on June 2, 2024

Hi~
I make the same changes referring to that pull request.
But it also has the same problems that the program can't deserialize the data model.
In detail, my producer program is:

`
String topicName = "TEST-TOPIC";
Properties props = new Properties();
props.put("bootstrap.servers", "x.x.x.x.88:9092, x.x.x.89:9092, x.x.x.90:9092");
props.put("producer.type", "sync");
props.put("acks", "all");
props.put("retries", 0);
props.put("batch.size", 16384);
props.put("linger.ms", 1);
props.put("buffer.memory", 33554432);
props.put("key.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer",
"org.apache.kafka.common.serialization.StringSerializer");

Producer<String, String> producer = new KafkaProducer<String, String>(props);

for(int i = 0; i < 10; i++)
producer.send(new ProducerRecord<String, String>(topicName, Integer.toString(i), Integer.toString(i)));

System.out.println("Message sent successfully");
producer.close();
`

I have test by run the ./kafka-consumer-console.sh. It can consume the messages from producer.

As I run the SampleConsumer for test, it can't resolve the messages.
In SampleConsumer, I used the following code to test

'
unionStreams.foreachRDD(new VoidFunction<JavaRDD>() {
@OverRide
public void call(JavaRDD rdd) throws Exception {
List rddList = rdd.collect();
System.out.println(" Number of records in this batch " + rddList.size());
for (MessageAndMetadata model : rddList)
System.out.println("model key = " + new String(model.getKey(), "UTF-8") + ", value = " + new String(model.getPayload(), "UTF-8"));
}
});
'

The result is : "Number of records in this batch 0" =====> rddList.size() = 0
I wonder if compatibility problems cause unable to deserialize the data.

Looking forward to your answer, thanks~

from kafka-spark-consumer.

dibbhatt avatar dibbhatt commented on June 2, 2024

This consumer by default consume offset from Latest kafka offset when it starts first time. For any successive run, it will consume from last processed offset. . So, if you publish say 100 messages to kafka, latest offset in Kafka is 101. When you start the consumer first time , it starts from offset 101. If your publisher is not running, and Kafka not getting any messages, Consumer wont pull anything. I guess that is happening here.

you can start the Consumer first, and then start the producer. As Consumer waiting for new messages from latest offset, any new messages got produced will be consumed.

from kafka-spark-consumer.

vroy007 avatar vroy007 commented on June 2, 2024

Yeah, I'm sure that I start the Consumer first. As the Consumer run successfully, I start the Producer to publish ten messages per time. And I try it several times, but the rddList's size is always zero.

When I run the SampleConsumer by the InteliJ IDEA in my Spark cluster master node, the logs show the error "java.lang.ClassNotFoundException: consumer.kafka.client.KafkaReceiver ". The error cause spark job failed. I guess that may be the reason why the DStream can't resolve the messages.

from kafka-spark-consumer.

dibbhatt avatar dibbhatt commented on June 2, 2024

Your topic has how many partitions ? Can you share the Receiver Property you used. You can share the log of the Executor also when you do the spark-submit

from kafka-spark-consumer.

vroy007 avatar vroy007 commented on June 2, 2024

3 partitions

the Consumer property is default.
the Broker property I will share it next Monday.

the whole logs of the Executor when I do the spark-submit are showed in my first comment of this issue.
If it is not clear, I will share all logs again.

thanks

from kafka-spark-consumer.

vroy007 avatar vroy007 commented on June 2, 2024

It works!!! Thank you so much.

from kafka-spark-consumer.

dibbhatt avatar dibbhatt commented on June 2, 2024

Good. What was the issue ?

from kafka-spark-consumer.

Srinivasan-pp avatar Srinivasan-pp commented on June 2, 2024

Hi,

I started my spark with package dibbhatt:kafka-spark-consumer:1.0.6 , and try to execute
scala> import consumer.kafka.ProcessedOffsetManager;
it is throwing me error: error: object ProcessedOffsetManager is not a member of package consumer.kafka

from kafka-spark-consumer.

Srinivasan-pp avatar Srinivasan-pp commented on June 2, 2024

according to the above discussion it seems to have faced the same issue and solved in spark 2.0 version, just would like to know whether the same fix been handled for earlier versions also.

from kafka-spark-consumer.

dibbhatt avatar dibbhatt commented on June 2, 2024

You can git clone the repo and build it with Spark 1.6. You may need to do some minor changes to make it work with Spark 1.6. Its very easy to do. Do let me know if you need any help.

from kafka-spark-consumer.

dibbhatt avatar dibbhatt commented on June 2, 2024

hi @vroy007 , if you do not mind, can you tell me name of your organization also if this consumer is running in production in your organization. I need it just for my reference.

from kafka-spark-consumer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.