How not to lose messages from Kafka when database is offline - java

I am developing microservice which consumes messages from Kaffka then processes this messages and stores output to MongoDB
I am new to kafka and I encounter some problem with losing messages.
Scenario is pretty easy:
In case of mongoDB being offline microservice recieves a message then trying to save output to Mongo then I get error that says mongo is offline and message is lost.
My question is there is any mechanism in kafka that stops sending messages in that case. Should manually commit offset in Kafka ? What are best practices to handle error in Kafka consumers?

For such kind of scenario you should manually commit the offset. Commit offset only if your message processing successful. You commit it like below. However you should note that messages have ttl hence messages get automatically deleted from kafka broker after ttl elapse.
consumer.commitSync();

I think rather than making commit manually, you should use Kafka Streams and Kafka Connect. Managing transaction between two systems: Apache Kafka and MongoDB might be not so easy, so better use already developed and tested tools (You can read more about Kafka Connect: https://kafka.apache.org/documentation/#connect, https://docs.confluent.io/current/connect/index.html)
Your scenario might be something like this:
Process your message using Kafka Streams and send result to new
topic (Kafka Streams support exactly-once semantics)
Use Kafka Connect (Sink connector) to save data in MongoDB https://www.confluent.io/connector/kafka-connect-mongodb-sink/

One way you can do this by using pause and resume methods on MessageListenerContainer (But you have to use spring kafka > 2.1.x) spring-kafka-docs
#KafkaListener Lifecycle Management
The listener containers created for #KafkaListener annotations are not beans in the application context. Instead, they are registered with an infrastructure bean of type KafkaListenerEndpointRegistry. This bean is automatically declared by the framework and manages the containers' lifecycles; it will auto-start any containers that have autoStartup set to true.
So Autowire KafkaListenerEndpointRegistry registry endpoint in the application
#Autowired
private KafkaListenerEndpointRegistry registry;
Get the MessageListenerContainer from registry spring-kafka-docs
public MessageListenerContainer getListenerContainer(java.lang.String id)
Return the MessageListenerContainer with the specified id or null if no such container exists.
Parameters:
id - the id of the container
On MessageListenerContainer you can use pause or resume methods spring-kafka-docs
default void pause()
Pause this container before the next poll().
default void resume()
Resume this container, if paused, after the next poll().

Related

Performance settings for ActiveMQ producer using Apache Camel in Spring boot framework

We have a spring boot application and we are using apache camel as a framework for message processing. We are trying to best optimize our application settings to make the enqueue of messages on the ActiveMQ queue fast which is received by the Logstash on the other end of the queue as consumers.
The documentation is scattered at many places and there are too many configurations available.
For example, the camel link for spring boot specifies 102 options. Similarly, the activemq apache camel link details these with much more.
This is what we have currently configured:
Application.properties:
################################################
# Spring Active MQ
################################################
spring.activemq.broker-url=tcp://localhost:61616
spring.activemq.packages.trust-all=true
spring.activemq.user=admin
spring.activemq.password=admin
Apache Camel
.to("activemq:queue:"dataQueue"?messageConverter=#queueMessageConverter");
Problem:
1 - We suspect that we have to use poolConnectionFactory and not default Spring JMS Template bean which is somehow auto picked up.
2 - We also want the process to be asynchornous. We just want to put the message on queue and dont want to wait for any ACK from activemq or do anyretry or something.
3 - We want to wait for retry only if queue is full.
4 - Where should we set the settings for ActiveMq size? and also the activemq is putting things in Dead letter queue in case no consumer availaible? We want to override that behaviour and want to keep the message in there. (Is this have to be configured in Activemq and not in Our app/apache camel)
Update
Here is we have solved it after some more investigation and based on feedback for now. Note: this does not involve retrying, for that we will try the option suggested in the answer.
For Seda queues:
producer:
.to("seda:somequeue?waitForTaskToComplete=Never");
consumer:
.from("seda:somequeue?concurrentConsumers=20");
Active MQ:
.to("activemq:queue:dataQueue?disableReplyTo=true);
Application.Properties:
#Enable poolconnection factory
spring.activemq.pool.enabled=true
spring.activemq.pool.blockIfFull=true
spring.activemq.pool.max-connections=50
Yes, you need to use pooledConnectionFactory. Especially with Camel+Spring Boot. Or look to use the camel-sjms component. The culprit is Spring's JMSTemplate. Super high latency.
Send NON_PERSISTENT and AUTO_ACK, also turn on sendAsync on the connection factory
You need to catch javax.jms.ResourceAllocationException in your route to do retries when Producer Flow Control kicks in (aka queue or broker is full)
ActiveMQ does sizing based on bytes, not message count. See the SystemUsage settings in Producer Flow Control docs and Per-Destination Policies policies for limiting queue size based on bytes.

Spring integration Kafka MessaheChannel Thread?

I have an Spring Integration flow bean (configured via Java DSL) which processes the messages from kafka queue message channel binded with Spring CloudStream.
The source of kafka message is an external application so what I really want to understand, is what thread/threads will actually process that messages.
Is it a single dedicated thread created with application, or is there a thread pool created and configured automatically by CloudStream, or something else?
And could I manage it somehow?
The kafka message channel binder uses one thread by default; if you increase the binding consumer.concurrency property, you will get that number of threads - you need at least as many partitions as the concurrency setting because a partition can only be consumed by one consumer.

How to clear topics in tests with Spring Kafka

I'm writing a unit test with Spring Kafka 2.4 to prove that my Spring Boot setup is correct. I'm validating that SeekToCurrentBatchErrorHandler works as expected which requires sending an incorrect message that should be retried. Unfortunately this incorrect messages breaks other tests because this message will be retried forever.
Because of above I'd like to ensure that each test is correctly isolated. I either need to:
Delete and recreate the Kafka topic with AdminClient
Seek to the end of the existing Kafka topic and commit new offsets
I was trying option 2 with Consumer.seekToEnd() method however Spring Kafka hides the created consumers behind few layers of internal framework classes. I'm also not 100% sure if this method can be called in test thread which is different from listener thread.
What is the recommended way to clear topics in tests with Spring Kafka?
Best practice is to use unique topic names in each test to provide complete isolation; you could also stop the container(s), create a new Consumer with the same group.id and perform the seeks there.

how to make persistent JMS messages with java spring boot application?

I am trying to make a queue with activemq and spring boot using this link and it looks fine. What I am unable to do is to make this queue persistent after application goes down. I think that SimpleJmsListenerContainerFactory should be durable to achieve that but when I set factory.setSubscriptionDurable(true) and factory.setClientId("someid") I am unable to receive messages any more. I would be greatfull for any suggestions.
I guess you are embedding the broker in your application. While this is ok for integration tests and proof of concepts, you should consider having a broker somewhere in your infrastructure and connect to it. If you choose that, refer to the ActiveMQ documentation and you should be fine.
If you insist on embedding it, you need to provide a brokerUrl that enables message persistence.
Having said that, it looks like you misunderstand durable subscriber and message persistence. The latter can be achieved by having a broker that actually stores the content of the queue somewhere so that if the broker is stopped and restarted, it can restore the content of its queue. The former is to be able to receive a message even if the listener is not active at a period of time.
you can enable persistence of messages using ActiveMQConnectionFactory.
as mentioned in the spring boot link you provided, this ActiveMQConnectionFactory gets created automatically by spring boot.so you can have this bean in your application configuration created manually and you can set various property as well.
ActiveMQConnectionFactory cf = new ActiveMQConnectionFactory("vm://localhost?broker.persistent=true");
Here is the link http://activemq.apache.org/how-do-i-embed-a-broker-inside-a-connection.html

Is Apache Kafka able to handle transactions?

we plan to use Kafka as a central component in our data warehouse given that the producer is able to handle transactions (in short: rollbacks and commits).
When googling Kafka + Transactions I find a lot of theoretical thoughts about the possibility of how Kafka could handle transactions but at the moment I do not see any function in the java API that supports commits and rollbacks for the producer.
Has anybody made some experiences with transactions and Kafka and can give me an hint?
I think what you are looking for is basically called transactional messaging in Kafka where producers are capable of creating session (aka transactional session) and send messages within the sessions. Hence it can choose to either commit / abort the transaction.
[Source]: Please read the wiki for details
Actually, from the last version 0.11.0.0 transactions are supported. See Guarantee unique global transaction for Kafka Producers
No; Kafka does not support transactions.
You can get certainty that a message has been produced to a partition, but once produced you are not able to rollback that message.
Since version 0.11.0 Apache Kafka supports transactions: https://cwiki.apache.org/confluence/display/KAFKA/Transactional+Messaging+in+Kafka

Categories

Resources