I have an implementation of a KafkaConsumer in java, and currently it is never exiting the .poll method. When I drill down into the source code in debug mode I've found that it is getting stuck in the while loop in AbstractCoordinator.ensureCoordinatorKnown(), as the coordinator is never found.
The future returned from sendGroupMetadataRequest() in the loop fails the first time with org.apache.kafka.clients.consumer.internals.SendFailedException, and then will fail every subsequent time with org.apache.kafka.common.errors.GroupCoordinatorNotAvailableException: The group coordinator is not available.. Does anyone know why this might happen?
If I use the console producer/consumer I am able to successfully send and receive messages, it is only when I use my implementation of the KafkaConsumer. Additionally, the consumer does work on two of my servers so I know it is not the implementation of the consumer.
Here are the properties my consumer is created with:
Properties props = new Properties();
props.put("bootstrap.servers", "myserver:9000);
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("group.id", groupId);
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "1000");
props.put("session.timeout.ms", "30000");
Edit:
The topic is definitely created before the consumer starts.
Edit 2:
I deleted all of the brokers in my cluster and recreated them, and now I'm failing at a different point. In AbstractCoordinator.ensureActiveGroup() while trying to rejoin, the future returned from performGroupJoin() repeatedly fails with org.apache.kafka.common.errors.NotCoordinatorForGroupException: This is not the correct coordinator for this group.. Still not sure what is going on.
Edit 3:
I deleted the brokers and recreated them with a different id and now the .poll() method is returning and it's successfully consuming messages. I'd still like to know why it failed in the first place though so I can make sure it doesn't happen again.
Deleting the brokers and creating new ones fixed the problem. Still not sure went wrong with the brokers though.
Related
I have a fairly simple Kafka setup - 1 producer, 1 topic, 10 partitions, 10 KafkaConsumers all with the same group ID, all running on a single machine. When I process a file, the producer quickly creates 3269 messages, which the consumers happily start consuming. Everything runs fine for a while, but at a certain point the consumers start consuming duplicates - LOTS of duplicates. In fact, it looks like they just start consuming the message queue over again. If I let it run for a long time, the database will start receiving the same data entries 6 or more times. After doing some tests with logging, it looks like the consumers are re-consuming the same messages with the same unique message names.
As far as I can tell, no re-balancing is happening. Consumers are not dying or being added. It's the same 10 consumers, consuming the same 3269 messages over and over until I kill the process. If I just let it go, the consumers will write dozens of thousands of records, massively increasing the amount of data that really should be going into the database.
I'm fairly new to Kafka, but I'm kind of at a loss for why this is happening. I know Kafka doesn't guarantee exactly-once processing, and I'm ok with a couple duplicates here and there. I have code to prevent persisting the same records again. However, I'm not sure why the consumers would re-consume the queue over and over. I know that Kafka messages aren't deleted after they are consumed, but if all the consumers are in the same group, the offsets should prevent this, right? I understand a little bit about how offsets work, but as far as I know, they shouldn't be getting reset if there is no re-balancing, right? And the messages aren't timing out as far as I can tell. Is there a way for me to get my consumers to consume everything in the queue once-ish and then wait for more messages without re-consuming the same stuff forever?
Here are the proprties I pass in to the producer and consumers:
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("acks", "all");
props.put("retries", 0);
props.put("batch.size", 16384);
props.put("linger.ms", 1);
props.put("buffer.memory", 33554432);
props.put("group.id", "MyGroup");
props.put("num.partitions", 10);
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
MyIngester ingester = new MyIngester(args[0], props);
To me this seems to be an issue with acknowledging the receipt.
Try the following properties
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "100");
I have a single instance java application that uses KTable from Kafka Streams. Until recently I could retrieve all data using KTable when suddenly some of the messages seemed to vanish. There should be ~33k messages with unique keys there.
When I want to retrieve messages by key I don't get some of the messages. I use ReadOnlyKeyValueStore to retrieve messages:
final ReadOnlyKeyValueStore<GenericRecord, GenericRecord> store = ((KafkaStreams)streams).store(storeName, QueryableStoreTypes.keyValueStore());
store.get(key);
These are the configuration settings I set to the KafkaStreams.
final Properties config = new Properties();
config.put(StreamsConfig.APPLICATION_SERVER_CONFIG, serverId);
config.put(StreamsConfig.APPLICATION_ID_CONFIG, applicationId);
config.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
config.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
config.put(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, schemaRegistryUrl);
config.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG, GenericAvroSerde.class);
config.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG, GenericAvroSerde.class);
config.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, 0);
Kafka: 0.10.2.0-cp1
Confluent: 3.2.0
Investigations brought me to some very worrying insights. Using REST Proxy I manually read partitions and found out that some offsets return error.
Request:
/topics/{topic}/partitions/{partition}/messages?offset={offset}
{
"error_code": 50002,
"message": "Kafka error: Fetch response contains an error code: 1"
}
No client, neither java nor command line however return any error. They just skip over the faulty missing messages resulting in missing data in KTables. Everything was fine and without notice it seems that somehow some of the messages got corrupt.
I have two brokers and all the topics have the replication factor of 2 and are fully replicated. Both brokers separately return the same. Restarting brokers makes no difference.
What could possibly be the cause?
How to detect this case in a client?
By default Kafka Broker config key cleanup.policy is set to delete. Set it to compact to keep the latest message for each key. See compaction.
Deletion of old messages does not change the minimum offset so trying to retrieve message below it causes an error. The error is very vague. The Kafka Streams client will start reading messages from minimum offset so there is no error. The only visible effect is missing data in KTables.
While the application is running thanks to the caches all data might still be available even after messages are deleted from Kafka itself. They will vanish after cleanup.
I'm completely new to Kafka and i have some troubles using the KafkaProducer.
The send Method of the producer blocks exactly 1min and then the application proceeds without an exception. This is obviously some timeout but no Exception is thrown.
I can also see nothing really in the logs.
The servers seam to be setup correctly. If i use the bin/kafka-console-consumer and producer applications i can send and receive messages correctly. Also the code seams to work to some extend.
If i want to write to a topic which does not exist yet i can see in the /tmp/kafka-logs folder the new entry and also in the console output of the KafkaServer.
Here is the Code i use:
Properties props = ResourceUtils.loadProperties("kafka.properties");
Producer<String, String> producer = new KafkaProducer<>(props);
for (String line : lines)
{
producer.send(new ProducerRecord<>("topic", Id, line));
producer.flush();
}
producer.close();
The properties in the kafka.properties file:
bootstrap.servers=localhost:9092
key.serializer=org.apache.kafka.common.serialization.StringSerializer
value.serializer=org.apache.kafka.common.serialization.StringSerializer
acks=all
retries=0
batch.size=16384
linger.ms=1
buffer.memory=33554432
So, producer.send blocks for 1minute and then it continues. At the end nothing is stored in Kafka, but the new topic is created.
Thank you for any help!
Try set the bootstrap.servers to 127.0.0.1:9092
I have started my zookeeper and Kafka server.
I started my Kafka producer which sends 10 messages with topic 'xxx'. Then stopped my Kafka producer.
Now I started my Kafka consumer and subscribed with topic 'xxx'. My consumer consumes those 10 messages sent by my Kafka producer, which is not running now.
I need my Kafka consumer should only consume messages from running Kafka server.
Is there any way to achieve this ?
Following things in my consumer properties.
props.put("bootstrap.servers", "localhost:9092");
String consumeGroup = "cg1";
props.put("group.id", consumeGroup);
props.put("enable.auto.commit", "true");
props.put("auto.offset.reset", "earliest");
props.put("auto.commit.interval.ms", "100");
props.put("heartbeat.interval.ms", "3000");
props.put("session.timeout.ms", "30000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.ByteArrayDeserializer");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
Set the following property :
consumerProps.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
It tells the consumer to read only the latest messages , that is , the messages which were published after the consumer started.
Please create a new topic and keep the property ConsumerConfig.AUTO_OFFSET_RESET_CONFIG to "latest". make sure to not commit the offset.
i.e we should not use commitSync()
By default, receivers start consuming records from the last committed offset of each assigned partition. If a committed offset is not available, the offset reset strategy ConsumerConfig#AUTO_OFFSET_RESET_CONFIG configured for the KafkaConsumer is used to set the start offset to the earliest or latest offset on the partition.
I think in your case you are committing the offset or there ia committ offset available for given topic.
I am using Kafka producer 0.8.2 and I am trying to send a single message to the topic, in a way that the message is sent immediately. I have a console consumer to observe if the message arrives. I notice that the message is not sent immediately, unless of course I run producer.close(), immediately after sending, which isn't what I would like to do.
What is the correct producer configuration setting to target this? I'm using the following (I'm aware that it looks like a mess of different configurations/versions, but I simply cannot find something that's working as I would expect in the documentation):
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, brokersStr);
props.put(ProducerConfig.RETRIES_CONFIG, "3");
props.put("producer.type", "sync");
props.put("batch.num.messages", "1");
props.put(ProducerConfig.ACKS_CONFIG, "all");
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "none");
props.put(ProducerConfig.BATCH_SIZE_CONFIG, 1);
props.put(ProducerConfig.BLOCK_ON_BUFFER_FULL_CONFIG, true);
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
I found a solution, which seems reasonable, and involves running get() on the Future returned by the Producer's send() command. I changed the send command from:
producer.send(record);
to the following:
producer.send(record).get();
It would be nice to hear from the more experienced Kafka users if there are any issues with that approach? Also, I would be interested to learn if there is a configuration setting for the Producer to achieve the same thing (that is, send a single message immediately without running get() of the Future).
Old post but I have struggled way to much to miss a post here.
I stumbled upon the same behavior trying to run the Kafka examples and this .get() was the only thing that got the messages to Kafka. The Javadoc for KafkaProducer.send(…) states this method is asynchronous. On my test code, the message was thus sent to Kafka while my code continued to run and actually just got to the end of the run and terminated before the message was actually sent inside the Future.
So this .get() just blocks on the Future until it is realized. This actually removes the benefits of the Future. A cleaner way to do it could be to wait a bit with a Thread.sleep(…) right after the .send(…) (depends on your use case).