I am trying to make a kafka consumer in Java but the consumer.poll(5000) method call return null value no matter what. here is the code:
package com.apache.kafka.consumer;
import java.util.Map;
import java.util.Properties;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.log4j.Logger;
import org.apache.kafka.clients.consumer.ConsumerRecords;
public class Consumer {
public static void main(String[] args) throws Exception {
final Logger logger = Logger.getLogger(Consumer.class);
//Kafka consumer configuration settings
String topicName = "mytopic";
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "test");
props.put("enable.auto.commit", "true");
props.put("auto.offset.reset","earliest");
props.put("auto.commit.interval.ms", "1000");
props.put("key.deserializer",
"org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer",
"org.apache.kafka.common.serialization.StringDeserializer");
props.put("partition.assignment.strategy", "range");
KafkaConsumer<String, String> consumer = new
KafkaConsumer<String, String>(props);
//Kafka Consumer subscribes list of topics here.
consumer.subscribe("sampletopic");
while (true) {
Map<String,ConsumerRecords<String, String>> records = consumer.poll(0);
for (ConsumerRecords<String, String> record : records.values()) {
System.out.println(records);
}
}
}
}
Please help!!!
I have created the topic already and also have added some data in it plus the zookeeper and kafka are running perfectly. I don;t know why the poll() method is returning null.
The call to poll needs to be in a loop, that's why the literature calls it the poll loop.
If its returning null its either polling too early and exiting the main or no data is in the topic
See the usage examples here https://kafka.apache.org/22/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html
Properties props = new Properties();
props.setProperty("bootstrap.servers", "localhost:9092");
props.setProperty("group.id", "test");
props.setProperty("enable.auto.commit", "true");
props.setProperty("auto.commit.interval.ms", "1000");
props.setProperty("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.setProperty("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("foo", "bar"));
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
for (ConsumerRecord<String, String> record : records)
System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
}
Notice the loop ^
Related
I have a cloudera Quickstart VM. i have installed Kafka parcels using Cloudera Manager and its working fine inside the VM using console based consumer and producer.
But when i try to use java based consumer it does not produce or consume messages. I can list the topics.
But i cannot consume messages.
following is my code.
package kafka_consumer;
import java.util.Arrays;
import java.util.Properties;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.serialization.StringDeserializer;
public class mclass {
public static void main(String[] args) {
Properties props = new Properties();
props.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "10.0.75.1:9092");
// Just a user-defined string to identify the consumer group
props.put(ConsumerConfig.GROUP_ID_CONFIG, "test");
// Enable auto offset commit
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "true");
props.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, "1000");
props.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
try (KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props)) {
// List of topics to subscribe to
consumer.subscribe(Arrays.asList("second_topic"));
for (String k_topic : consumer.listTopics().keySet()) {
System.out.println(k_topic);
}
while (true) {
try {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
System.out.printf("Offset = %d\n", record.offset());
System.out.printf("Key = %s\n", record.key());
System.out.printf("Value = %s\n", record.value());
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
}
}
And following is the output of the code. while the console producer is producing messages but consumer is not able to receive it.
PS: i can telnet the port and ip of the Kafka broker. I can even list the topics. Consumer is constantly running without crashing but no messages is being consumed.
I'm developing a simple java with spark streaming.
I configured a kafka jdbc connector (postgres to topic) and I wanna read it with a spark streaming consumer.
I'm able to read to topic correctly with:
./kafka-avro-console-consumer --bootstrap-server localhost:9092 --property schema.registry.url=http://localhost:8081 --property print.key=true --from-beginning --topic postgres-ip_audit
getting this results:
null
{"id":1557,"ip":{"string":"90.228.176.138"},"create_ts":{"long":1554819937582}}
when I use my java application with this config:
Map<String, Object> kafkaParams = new HashMap<>();
kafkaParams.put("bootstrap.servers", "localhost:9092");
kafkaParams.put("key.deserializer", StringDeserializer.class);
kafkaParams.put("value.deserializer", StringDeserializer.class);
kafkaParams.put("group.id", "groupStreamId");
kafkaParams.put("auto.offset.reset", "latest");
kafkaParams.put("enable.auto.commit", false);
I get results like that:
�179.20.119.53�����Z
Can someone point me how to fix my issue?
I try also to use a ByteArrayDeserializer and convert the bytes[] in to a string but I get always bad character results.
You can deserialize avro messages using io.confluent.kafka.serializers.KafkaAvroDeserializer and having schema registry in to manage the records schema.
Here is a sample code snippet
import java.util.Collections;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Map;
import java.util.Set;
import io.confluent.kafka.serializers.KafkaAvroDecoder;
import kafka.serializer.StringDecoder;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.streaming.Durations;
import org.apache.spark.streaming.api.java.JavaPairInputDStream;
import org.apache.spark.streaming.api.java.JavaStreamingContext;
import org.apache.spark.streaming.kafka.KafkaUtils;
import scala.Tuple2;
public class SparkStreaming {
public static void main(String... args) {
SparkConf conf = new SparkConf();
conf.setMaster("local[2]");
conf.setAppName("Spark Streaming Test Java");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaStreamingContext ssc = new JavaStreamingContext(sc, Durations.seconds(10));
processStream(ssc, sc);
ssc.start();
ssc.awaitTermination();
}
private static void processStream(JavaStreamingContext ssc, JavaSparkContext sc) {
System.out.println("--> Processing stream");
Map<String, String> props = new HashMap<>();
props.put("bootstrap.servers", "localhost:9092");
props.put("schema.registry.url", "http://localhost:8081");
props.put("group.id", "spark");
props.put("specific.avro.reader", "true");
props.put("value.deserializer", "io.confluent.kafka.serializers.KafkaAvroDeserializer");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
Set<String> topicsSet = new HashSet<>(Collections.singletonList("test"));
JavaPairInputDStream<String, Object> stream = KafkaUtils.createDirectStream(ssc, String.class, Object.class,
StringDecoder.class, KafkaAvroDecoder.class, props, topicsSet);
stream.foreachRDD(rdd -> {
rdd.foreachPartition(iterator -> {
while (iterator.hasNext()) {
Tuple2<String, Object> next = iterator.next();
Model model = (Model) next._2();
System.out.println(next._1() + " --> " + model);
}
}
);
});
}
}
Complete sample application is available in this github repo
You provided a StringDeserializer however you are sending values serialized with avro so you need to deserialized them accordingly. Using spark 2.4.0 (and the following deps compile org.apache.spark:spark-avro_2.12:2.4.1 you can achieve it by using from_avro function:
import org.apache.spark.sql.avro._
// `from_avro` requires Avro schema in JSON string format.
val jsonFormatSchema = new String(Files.readAllBytes(Paths.get("path/to/your/schema.avsc")))
val df = spark
.readStream
.format("kafka")
.option("kafka.bootstrap.servers", "host1:port1,host2:port2")
.option("subscribe", "topic1")
.load()
Dataset<Row> output = df
.select(from_avro(col("value"), jsonFormatSchema).as("user"))
.where("user.favorite_color == \"red\"")
.show()
If you need to use a schema registry (like you did with kafka-avro-console-consumer) it's not possible out of the box and need a to write a lot of code. I'll recommend using this lib https://github.com/AbsaOSS/ABRiS. However it's only compatible with spark 2.3.0
I am using Spring Kafka first time and I am not able to use Acknowledgement.acknowledge() method for manual commit in my consumer code. please let me know if anything missing in my consumer configuration or listener code. or else is there other way to handle acknowledge offset based on condition.
Here i'm looking solution like if the offset is not committed/ acknowledge manually, it should pick same message/offset by consumer.
Configuration
import java.util.HashMap;
import java.util.Map;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.common.serialization.StringDeserializer;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.kafka.annotation.EnableKafka;
import org.springframework.kafka.config.ConcurrentKafkaListenerContainerFactory;
import org.springframework.kafka.core.DefaultKafkaConsumerFactory;
import org.springframework.kafka.listener.AbstractMessageListenerContainer.AckMode;
#EnableKafka
#Configuration
public class ConsumerConfig {
#Value(value = "${kafka.bootstrapAddress}")
private String bootstrapAddress;
#Value(value = "${kafka.groupId}")
private String groupId;
#Bean
public ConcurrentKafkaListenerContainerFactory<String, String> containerFactory() {
Map<String, Object> props = new HashMap<String, Object>();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapAddress);
props.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
props.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, 100);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
StringDeserializer.class);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
StringDeserializer.class);
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<String, String>();
factory.setConsumerFactory(new DefaultKafkaConsumerFactory<String, String>(
props));
factory.getContainerProperties().setAckMode(AckMode.MANUAL_IMMEDIATE);
factory.getContainerProperties().setSyncCommits(true);
return factory;
}
}
Listener
private static int value = 1;
#KafkaListener(id = "baz", topics = "${message.topic.name}", containerFactory = "containerFactory")
public void listenPEN_RE(#Payload String message,
#Header(KafkaHeaders.RECEIVED_PARTITION_ID) int partition,
#Header(KafkaHeaders.OFFSET) int offsets,
Acknowledgment acknowledgment) {
if (value%2==0){
acknowledgment.acknowledge();
}
value++;
}
Set the enable-auto-commit property to false:
propsMap.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
Set the ack-mode to MANUAL_IMMEDIATE:
factory.getContainerProperties().setAckMode(AbstractMessageListenerContainer.AckMode.MANUAL_IMMEDIATE);
Then, in your consumer/listener code, you can commit the offset manually, like this:
#KafkaListener(topics = "testKafka")
public void receive(ConsumerRecord<?, ?> consumerRecord,
Acknowledgment acknowledgment) {
System.out.println("Received message: ");
System.out.println(consumerRecord.value().toString());
acknowledgment.acknowledge();
}
Update: I created a small POC for this. Check it out here, might help you.
You need to do the following
1) Set enable-auto-commit property to false
consumerConfigProperties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false");
2) Set the ACK Mode to MANUL_IMMEDIATE
factory.getContainerProperties().setAckMode(AckMode.MANUAL_IMMEDIATE);
3) For processed records you need to call acknowledgment.acknowledge();
4) for failed records call acknowledgment.nack(10);
Note: the nack method takes a long parameter which is the sleep time and it should be less than max.poll.interval.ms
Below is a sample code
#KafkaListener(id = "baz", topics = "${message.topic.name}", containerFactory = "containerFactory")
public void listenPEN_RE(#Payload String message,
#Header(KafkaHeaders.RECEIVED_PARTITION_ID) int partition,
#Header(KafkaHeaders.OFFSET) int offsets,
Acknowledgment acknowledgment) {
if (value%2==0){
acknowledgment.acknowledge();
} else {
acknowledgment.nack(10); //sleep time should be less than max.poll.interval.ms
}
value++;
}
You can do following:
1. store the current record offset to file or DB.
2. Implement your kafka listener class with ConsumerAware.
3. Call registerSeekCallback as given below:
(registerSeekCallback(ConsumerSeekCallback callback)
{
callback.seek(topic, partition, offset)
}
So when the consumer goes down or new consumer is assigned , it start reading fomr the offset stored in your DB.
That doesn't work that way in Apache Kafka.
For the currently running consumer we may never worry about committing offsets. We need them persisted only for new consumers in the same consumer group. The current one track its offset in the memory. I guess somewhere on Broker.
If you need to refetch the same message in the same consumer maybe the next poll round, you should consider to use seek() functionality: https://docs.spring.io/spring-kafka/docs/2.0.1.RELEASE/reference/html/_reference.html#seek
I have a small nitpick with OP's configuration. If using #KafkaListener with ConcurrentKafkaListenerContainerFactory, dont forget to make the state thread-safe. Using private static int value = 1; isnt going to help. Use AtomicInteger
Is there a way to start a consumer from a specific offset using the initial properties that we pass
I know there is props.put("auto.offset.reset", "earliest") but that gets me to the beginning.
However I want to go back and my scenarios are as follows
Specify an offset where I want to start at
Specify the time where I want to start
And I want to do that using the initial properties as a preferred option.
If that is not possible then using some other mechanism
Attaching my Simple Consumer code for reference
import java.util.Arrays;
import java.util.Properties;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
public class SimpleConsumer {
public static void main(String[] args) throws Exception {
String topicName = "test3";
Properties props = new Properties();
String groupId = "single";
// Kafka consumer configuration settings
props.put("bootstrap.servers", "mymachine:9092");
props.put("group.id", groupId);
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "1000");
props.put("session.timeout.ms", "30000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("auto.offset.reset", "earliest");
KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);
consumer.subscribe(Arrays.asList(topicName));
System.out.println("Starting the _NON-BATCH_ consumer ::: Topic=" + topicName+" GroupId="+groupId);
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
System.out.printf("%s (offset:%d, key:%s, partition = %s, topic = %s)", record.value(), record.offset(), record.key(), record.partition(), record.topic());
System.out.println();
}
}
}
}
For scenario 1, you can use KafkaConsumer.seek(TopicPartition, offset) to specify the offset from which you read.
For scenario 2, Kafka 0.10.1.0 offers KafkaConsumer.offsetsForTimes method, allowing you to lookup the offsets for the given partitions by timestamp, then invoking seek() method to retrieve the desired messages you want.
I'm using KafkaProducer with java. This is an example of code:
package com.mypackage.kafka.producer;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerRecord;
import java.util.Properties;
public class ProducerToTest
{
public static void main(String[] args)
{
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("retries", 0);
props.put("batch.size", 16384);
props.put("linger.ms", 1);
props.put("buffer.memory", 33554432);
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
KafkaProducer<String, String> producer = new KafkaProducer<String, String>(props);
for (int i = 0; i < 100; i++)
{
System.out.println(i);
producer.send(new ProducerRecord<String, String>("my-topic", Integer.toString(i), Integer.toString(i)));
}
producer.close();
}
}
I have a problem. If my Kafka is down and I run my code it stops at the first send and waits until Kafka is up again.
How can I detect connection errors or use send in a way that don't stop all execution?
Producer will try to re-connect till request.timeout.ms. It will not wait for broker up indefinitely. please refer to below detail information for request.timeout.ms.
The configuration controls the maximum amount of time the client will wait for the response of a request. If the response is not received before the timeout elapses the client will resend the request if necessary or fail the request if retries are exhausted.