I'm currently trying to set up a consumer to consume messages from a topic. My log says it subscribes to the topic successfully
[Consumer clientId=consumer-1, groupId=consumer-group] Subscribed to topic(s): MY-TOPIC
and it clearly shows it is a part of a group, but when I go to the control center I can't find that group, however I can find the topic that I am subscribed too. It isn't consuming the records from the topic either which I attribute to not being apart of a valid group. I know it is polling the correct topic and I know there is records on the topic as I am constantly putting them on.
Here is my start method
#PostConstruct public void start()
{
// check if the config indicates whether to start the daemon or not
if (!parseBoolean(maskBlank(shouldStartConsumer, "true")))
{
System.err.println("CONSUMER DISABLED");
logger.warn("consumer not starting -- see value of " + PROP_EXTRACTOR_START_CONSUMER);
return;
}
System.err.println("STARTING CONSUMER");
Consumer<String, String> consumer = this.createConsumer(kafkaTopicName,
StringDeserializer.class, StringDeserializer.class);
Thread daemon = new Thread(() -> {
while (true)
{
ConsumerRecords<String, String> records = consumer.poll(Duration.ofSeconds(10));
if (records.count() > 0) // IS ALWAYS 0, Poll doesn't return records
{
printRecord(records);
records.iterator().forEachRemaining(r -> {
System.err.println("received record: " + r);
});
}
else { logger.debug("KafkaTopicConsumer::consumeMessage -- No messages found in topic {}", kafkaTopicName); }
}
});
daemon.setName(kafkaTopicName);
daemon.setDaemon(true);
daemon.start();
}
Note: createConsumer method is just adding all of my config settings and where I subscribe to my topic.
I have a feeling it has something to do with the thread... I can post some of my config if that would help as well just leave a comment. Thanks
Related
I am learning Kafka recently, and my consumers can't consume any records unless I specify the --parititon 0 parameter. In other words I can NOT consume records like:
kafka-console-consumer --bootstrap-server 127.0.0.10:9092 --topic first-topic
but works like:
kafka-console-consumer --bootstrap-server 127.0.0.10:9092 --topic first-topic --partition 0
THE MAIN PROBLEM IS, when I moved to java code, my KafkaConsumer class can't fetch records, and I need to know how to specify the partition number in java KafkaConsumer ?!
my current java code is:
public class ConsumerDemo {
public static void main(String[] args) {
Logger logger = LoggerFactory.getLogger((ConsumerDemo.class.getName()));
String bootstrapServer = "127.0.0.10:9092";
String groupId = "my-kafka-java-app";
String topic = "first-topic";
// create consumer configs
Properties properties = new Properties();
properties.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServer);
//properties.setProperty(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG, partition);
properties.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
properties.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
properties.setProperty(ConsumerConfig.GROUP_ID_CONFIG, groupId);
properties.setProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
// create consumer
KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(properties);
// subscribe consumer to our topic
consumer.subscribe(Collections.singleton(topic)); //means subscribtion to one topic
// poll for new data
while(true){
//consumer.poll(100); old way
ConsumerRecords<String, String> records =
consumer.poll(Duration.ofMillis(100));
for (ConsumerRecord<String, String> record : records){
logger.info("Key: " + record.key() + ", Value: "+ record.value() );
logger.info("Partition: " + record.partition() + ", Offset: "+ record.offset());
}
}
}
}
After a lot of inspection, my solution came out to be using consumer.assign and consumer.seek instead of using consumer.subscribe and without specifying the groupId. But I feel there should be a more optimal solution
the java code will be as:
// create consumer
KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(properties);
// subscribe consumer to our topic
//consumer.subscribe(Collections.singleton(topic)); //means subscription to one topic
// using assign and Seek, are mostly used to replay data or fetch a specific msg
TopicPartition partitionToReadFrom = new TopicPartition(topic, 0);
long offsetToReadFrom = 15L;
// assign
consumer.assign(Arrays.asList(partitionToReadFrom));
// seek: for a specific offset to read from
consumer.seek(partitionToReadFrom, offsetToReadFrom);
The way you are doing is correct. You don't need to specify the partition when subscribing to a topic. Maybe your consumer group has already consumed all messages in the topic and has committed the latest offsets.
Make sure new messages are being produced when you run your application or create a new consumer group to consume from the beginning (if you keep the ConsumerConfig.AUTO_OFFSET_RESET_CONFIG set to "earliest")
As the name implies, the ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG property aims to configure a Partition Assignment Strategy and no to set a fixed partition as instructed by the command line.
The default strategy used is the RangeAssignor which can be changed, for example to a StickyAssignor as follows:
properties.setProperty(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG,StickyAssignor.class.getName());
You can read more about Kafka Client Side Assignment Proposal.
In our application we use a kafka consumer to determine to send an email.
There was an issue we had the other day, where the kafka partition was timing out before it could read and process all its records. As a result, it looped back to the start of the partition and wasn't able to finished the set of records it had received and new data generated after the loop start never got processed.
My teams suggested that we could tell Kafka to commit after each message is read, however I can't figure out how to do that from Spring-kakfa.
The application uses spring-kafka 2.1.6, and the consumer code kinda resembles this.
#KafkaListener(topics = "${kafka.topic}", groupId = "${kafka.groupId}")
public void consume(String message, #Header("kafka_offset") int offSet) {
try{
EmailData data = objectMapper.readValue(message, EmailData.class);
if(isEligableForEmail(data)){
emailHandler.sendEmail(data)
}
} catch (Exception e) {
log.error("Error: "+e.getMessage(), e);
}
}
Note: sendEmail function uses CompletableFutures, as it has to call a different API before sending out an email.
Configuration: (snippet of the yaml file for the consumer, and a part of the producer)
consumer:
max.poll.interval.ms: 3600000
producer:
retries: 0
batch-size: 100000
acks: 0
buffer-memory: 33554432
request.timeout.ms: 60000
linger.ms: 10
max.block.ms: 5000
If you want manual Acknowledgment, then you can provide the Acknowledgment in method arguments
#KafkaListener(topics = "${kafka.topic}", groupId = "${kafka.groupId}")
public void consume(String message, Acknowledgment ack, #Header("kafka_offset") int offSet) {
When using manual AckMode, you can also provide the listener with the Acknowledgment. The following example also shows how to use a different container factory.
Example from docs #KafkaListener Annotation
#KafkaListener(id = "cat", topics = "myTopic",
containerFactory = "kafkaManualAckListenerContainerFactory")
public void listen(String data, Acknowledgment ack) {
...
ack.acknowledge();
}
Set the container ackMode property to AckMode.RECORD to commit the offset after each record.
You should also consider reducing max.poll.records or increasing max.poll.interval.ms Kafka consumer properties.
We are trying to implement Kafka as our message broker solution. We are deploying our Spring Boot microservices in IBM BLuemix, whose internal message broker implementation is Kafka version 0.10. Since my experience is more on the JMS, ActiveMQ end, I was wondering what should be the ideal way to handle system level errors in the java consumers?
Here is how we have implemented it currently
Consumer properties
enable.auto.commit=false
auto.offset.reset=latest
We are using the default properties for
max.partition.fetch.bytes
session.timeout.ms
Kafka Consumer
We are spinning up 3 threads per topic all having the same groupId, i.e one KafkaConsumer instance per thread. We have only one partition as of now. The consumer code looks like this in the constructor of the thread class
kafkaConsumer = new KafkaConsumer<String, String>(properties);
final List<String> topicList = new ArrayList<String>();
topicList.add(properties.getTopic());
kafkaConsumer.subscribe(topicList, new ConsumerRebalanceListener() {
#Override
public void onPartitionsRevoked(final Collection<TopicPartition> partitions) {
}
#Override
public void onPartitionsAssigned(final Collection<TopicPartition> partitions) {
try {
logger.info("Partitions assigned, consumer seeking to end.");
for (final TopicPartition partition : partitions) {
final long position = kafkaConsumer.position(partition);
logger.info("current Position: " + position);
logger.info("Seeking to end...");
kafkaConsumer.seekToEnd(Arrays.asList(partition));
logger.info("Seek from the current position: " + kafkaConsumer.position(partition));
kafkaConsumer.seek(partition, position);
}
logger.info("Consumer can now begin consuming messages.");
} catch (final Exception e) {
logger.error("Consumer can now begin consuming messages.");
}
}
});
The actual reading happens in the run method of the thread
try {
// Poll on the Kafka consumer every second.
final ConsumerRecords<String, String> records = kafkaConsumer.poll(1000);
// Iterate through all the messages received and print their
// content.
for (final TopicPartition partition : records.partitions()) {
final List<ConsumerRecord<String, String>> partitionRecords = records.records(partition);
logger.info("consumer is alive and is processing "+ partitionRecords.size() +" records");
for (final ConsumerRecord<String, String> record : partitionRecords) {
logger.info("processing topic "+ record.topic()+" for key "+record.key()+" on offset "+ record.offset());
final Class<? extends Event> resourceClass = eventProcessors.getResourceClass();
final Object obj = converter.convertToObject(record.value(), resourceClass);
if (obj != null) {
logger.info("Event: " + obj + " acquired by " + Thread.currentThread().getName());
final CommsEvent event = resourceClass.cast(converter.convertToObject(record.value(), resourceClass));
final MessageResults results = eventProcessors.processEvent(event
);
if ("Success".equals(results.getStatus())) {
// commit the processed message which changes
// the offset
kafkaConsumer.commitSync();
logger.info("Message processed sucessfully");
} else {
kafkaConsumer.seek(new TopicPartition(record.topic(), record.partition()), record.offset());
logger.error("Error processing message : {} with error : {},resetting offset to {} ", obj,results.getError().getMessage(),record.offset());
break;
}
}
}
}
// TODO add return
} catch (final Exception e) {
logger.error("Consumer has failed with exception: " + e, e);
shutdown();
}
You will notice the EventProcessor which is a service class which processes each record, in most cases commits the record in database. If the processor throws an error (System Exception or ValidationException) we do not commit but programatically set the seek to that offset, so that subsequent poll will return from that offset for that group id.
The doubt now is that, is this the right approach? If we get an error and we set the offset then until that is fixed no other message is processed. This might work for system errors like not able to connect to DB, but if the problem is only with that event and not others to process this one record we wont be able to process any other record. We thought of the concept of ErrorTopic where when we get an error the consumer will publish that event to the ErrorTopic and in the meantime it will keep on processing other subsequent events. But it looks like we are trying to bring in the design concepts of JMS (due to my previous experience) into kafka and there may be better way to solve error handling in kafka. Also reprocessing it from error topic may change the sequence of messages which we don't want for some scenarios
Please let me know how anyone has handled this scenario in their projects following the Kafka standards.
-Tatha
if the problem is only with that event and not others to process this one record we wont be able to process any other record
that's correct and your suggestion to use an error topic seems a possible one.
I also noticed that with your handling of onPartitionsAssigned you essentially do not use the consumer committed offset, as you seem you'll always seek to the end.
If you want to restart from the last succesfully committed offset, you should not perform a seek
Finally, I'd like to point out, though it looks like you know that, having 3 consumers in the same group subscribed to a single partition - means that 2 out of 3 will be idle.
HTH
Edo
I am creating consumers (a consumer group with single consumer in it) :
Properties properties = new Properties();
properties.put("zookeeper.connect","localhost:2181");
properties.put("auto.offset.reset", "largest");
properties.put("group.id", groupId);
properties.put("auto.commit.enable", "true");
ConsumerConfig consumerConfig = new ConsumerConfig(properties);
ConsumerConnector consumerConnector = Consumer.createJavaConsumerConnector(consumerConfig);
Map<String, List<KafkaStream<byte[], byte[]>>> consumerMap = consumerConnector.createMessageStreams(topicCountMap);
consumerMap.entrySet().stream().forEach(
streams -> {
streams.getValue().stream().forEach(
stream -> {
KafkaBasicConsumer customConsumer = new KafkaBasicConsumer();
try {
Future<?> consumerFuture = kafkaConsumerExecutor.submit(customConsumer);
kafkaConsumersFuture.put(groupId, consumerFuture);
} catch (Exception e) {
logger.error("---- Got error : "+ e.getMessage());
logger.error("Exception : ", e);
}
}
);
}
);
I have subscribed 2 consumers for the same topic.
I am unsubscribing the consumer by storing its future object and then invoking
consumerFuture.cancel(Boolean.TRUE);
Now I subscribe the same consumer again with above code and it gets successfully registered.
However, when the publisher now publishes the newly subscribed consumer is not getting messages whereas the other consumer which was registered is getting messages
I am also checking offsets of consumers, they are getting updated when producer publishes but consumers are not getting messages.
Before producing :
Group Topic Pid Offset logSize Lag
A T1 0 94 94 1
Group Topic Pid Offset logSize Lag
B T1 0 94 94 1
After producing :
Group Topic Pid Offset logSize Lag
A T1 0 95 97 2
Group Topic Pid Offset logSize Lag
B T1 0 94 97 2
I am not able to figure out that if this an issue from producer side (partitions not enough) or if I have created consumer in an incorrect way
Also, I am not able to figure out what is log and lag column means in this.
Let me know if anyone can help or need more details.
I found to solution to my problem, thanks #nautilus for reminding to update.
My main intent was to provide endpoint to subscribe and unsubscribe a consumer in kafka.
Since kafka provides only subscribing and not unsubscribing (only manually possible) I had to write layer over kafka implementation.
I stored the consumer object in a static map with key as group id (since my consumer group can have only one consumer)
Problem was I was not closing consumer once created when unsubscribing and old consumer with same group id was preventing new from getting messages
private static Map kafkaConsumersFuture
Based on some parameter, find out group id
kafkaConsumersFuture.put(groupId, consumerConnector);
And while unsubcribing I did
ConsumerConnector consumerConnector = kafkaConsumersFuture.get(groupId);
if(consumerConnector!=null) {
consumerConnector.shutdown();
kafkaConsumersFuture.remove(groupId);
}
EDIT : I am seeing the same exact behavior with the Kafka 9 Consumer API also.
I have a simple Kafaka 8.2.2 Producer with the enable topic creation property set to true. It will create a new topic when an event with a non-existent topic is created, but the event that creates that topic does not end up in Kafka and the RecordMetadata returned has no errors.
public void receiveEvent(#RequestBody EventWrapper events) throws InterruptedException, ExecutionException, TimeoutException {
log.info("Sending " + events.getEvents().size() + " Events ");
for (Event event : events.getEvents()) {
log.info("Sending Event - " + event);
ProducerRecord<String, String> record = new ProducerRecord<>(event.getTopic(), event.getData());
Future<RecordMetadata> ack = eventProducer.send(record);
log.info("ACK - " + ack.get());
}
log.info("SENT!");
}
I have a program that polls for new topics (I wasn't happy with the dynamic/regex topic code in Kafka 8) and it finds the new queue and subscribes, and it does see subsequent events, but never that first event.
I also tried the kafka-console-consumer script and it sees that exact same. First event never seen, then after that events start flowing.
Ideas?
Turns out there is a property you can set props.put("auto.offset.reset","earliest");
And after setting this, the Consumer does receive the first event put on the topic.