Is Apache Kafka able to handle transactions?

Is Apache Kafka able to handle transactions? - java

we plan to use Kafka as a central component in our data warehouse given that the producer is able to handle transactions (in short: rollbacks and commits).
When googling Kafka + Transactions I find a lot of theoretical thoughts about the possibility of how Kafka could handle transactions but at the moment I do not see any function in the java API that supports commits and rollbacks for the producer.
Has anybody made some experiences with transactions and Kafka and can give me an hint?

I think what you are looking for is basically called transactional messaging in Kafka where producers are capable of creating session (aka transactional session) and send messages within the sessions. Hence it can choose to either commit / abort the transaction.
[Source]: Please read the wiki for details

Actually, from the last version 0.11.0.0 transactions are supported. See Guarantee unique global transaction for Kafka Producers

No; Kafka does not support transactions.
You can get certainty that a message has been produced to a partition, but once produced you are not able to rollback that message.
Since version 0.11.0 Apache Kafka supports transactions: https://cwiki.apache.org/confluence/display/KAFKA/Transactional+Messaging+in+Kafka

Related

How to clear topics in tests with Spring Kafka

I'm writing a unit test with Spring Kafka 2.4 to prove that my Spring Boot setup is correct. I'm validating that SeekToCurrentBatchErrorHandler works as expected which requires sending an incorrect message that should be retried. Unfortunately this incorrect messages breaks other tests because this message will be retried forever.
Because of above I'd like to ensure that each test is correctly isolated. I either need to:
Delete and recreate the Kafka topic with AdminClient
Seek to the end of the existing Kafka topic and commit new offsets
I was trying option 2 with Consumer.seekToEnd() method however Spring Kafka hides the created consumers behind few layers of internal framework classes. I'm also not 100% sure if this method can be called in test thread which is different from listener thread.
What is the recommended way to clear topics in tests with Spring Kafka?

Best practice is to use unique topic names in each test to provide complete isolation; you could also stop the container(s), create a new Consumer with the same group.id and perform the seeks there.

How not to lose messages from Kafka when database is offline

I am developing microservice which consumes messages from Kaffka then processes this messages and stores output to MongoDB
I am new to kafka and I encounter some problem with losing messages.
Scenario is pretty easy:
In case of mongoDB being offline microservice recieves a message then trying to save output to Mongo then I get error that says mongo is offline and message is lost.
My question is there is any mechanism in kafka that stops sending messages in that case. Should manually commit offset in Kafka ? What are best practices to handle error in Kafka consumers?

For such kind of scenario you should manually commit the offset. Commit offset only if your message processing successful. You commit it like below. However you should note that messages have ttl hence messages get automatically deleted from kafka broker after ttl elapse.
consumer.commitSync();

I think rather than making commit manually, you should use Kafka Streams and Kafka Connect. Managing transaction between two systems: Apache Kafka and MongoDB might be not so easy, so better use already developed and tested tools (You can read more about Kafka Connect: https://kafka.apache.org/documentation/#connect, https://docs.confluent.io/current/connect/index.html)
Your scenario might be something like this:
Process your message using Kafka Streams and send result to new
topic (Kafka Streams support exactly-once semantics)
Use Kafka Connect (Sink connector) to save data in MongoDB https://www.confluent.io/connector/kafka-connect-mongodb-sink/

One way you can do this by using pause and resume methods on MessageListenerContainer (But you have to use spring kafka > 2.1.x) spring-kafka-docs
#KafkaListener Lifecycle Management
The listener containers created for #KafkaListener annotations are not beans in the application context. Instead, they are registered with an infrastructure bean of type KafkaListenerEndpointRegistry. This bean is automatically declared by the framework and manages the containers' lifecycles; it will auto-start any containers that have autoStartup set to true.
So Autowire KafkaListenerEndpointRegistry registry endpoint in the application
#Autowired
private KafkaListenerEndpointRegistry registry;
Get the MessageListenerContainer from registry spring-kafka-docs
public MessageListenerContainer getListenerContainer(java.lang.String id)
Return the MessageListenerContainer with the specified id or null if no such container exists.
Parameters:
id - the id of the container
On MessageListenerContainer you can use pause or resume methods spring-kafka-docs
default void pause()
Pause this container before the next poll().
default void resume()
Resume this container, if paused, after the next poll().

JMS vs Hibernate Session

My project connects to a database using hibernate, getting connections from a connection pool on JBoss. I want to replace some of the reads/writes to tables with publish/consume from queues. I built a working example that uses OracleAQ, however, I am connecting to the DB using:
AQjmsFactory.getQueueConnectionFactory followed by createQueueConnection,
then using createQueueSession to get a (JMS) QueueSession on which I can call createProducer and createConsumer.
So I know how to do what I want using a jms.QueueSession. But using hibernate, I get a hibernate.session, which doesn't have those methods.
I don't want to open a new connection every time I perform an action on a queue - which is what I am doing now in my working example. Is there a way to perform queue operations from a hibernate.session? Only with SQL queries?

I think you're confusing a JMS (message queue) session with a Hibernate (database) session. The Hibernate framework doesn't have any overlap with JMS, so it can't be used to do both things.
You'll need 2 different sessions for this to work:
A Hibernate Session (org.hibernate.Session) for DB work
A JMS Session (javax.jms.Session) to to JMS/queue work
Depending on your use case, you may also want an XA transaction manager to do a proper two-phase commit across both sessions and maintain transactional integrity.

I was also looking for some "sane" way how to use JMS connection to manipulate database data. There is not any. Dean is right, you have to use two different connections to the same data and have distributed XA transaction between them.
This solution opens a world of various problems never seen before. In real life distributed transactions can really be non-trivial. Surprisingly in some situations Oracle can detect that two connections are pointing into the same database and then two-phase commit can be bypassed - even when using XA.

Spring synchronising Hibernate and JMS transactions

I am working on a stand-alone application that uses both JMS and Hibernate.
The documentation suggests JTA has to be used if I want to have transactions across both resources.
However, right now with a #Transaction annotated DAO method (and HibernateTransactionManager), this already seems to work. When I call send() on the JmsTemplate, the message is not immediately sent, but rather the JMS session is committed with the Hibernate session as the method returns.
I didn't know how this is possible without the JtaTransactionManager, so I checked the source code. It turns out both the wrapper for Hibernate and JmsTemplate registers the sessions with TransactionSynchronizationManager and the JMS session will be committed when the Hibernate session commits.
What's the different between this and a JTA transaction. Can I use this to replace the latter??

In short no, you can't get support for 2-phase commit without a JTATransactionManager and XA aware datasources.
What you are witnessing is a co-ordination of two Local Transactions supporting 1-phase commit only. Roughly performing this sequence of events...
Start JMS Transaction
Read JMS message
Start JDBC Transaction
Write to database
Commit JDBC Transaction
Commit/Acknowledge JMS
The JMS transaction will be started first wrapping the nested JDBC transaction, so that the JMS queue will rollback if the Hibernate/JDBC commit fails. Your JMS Listener Container should be setup not to acknowledge="auto" and instead wait for the Hibernate transaction to complete before sending the acknowledgement.
If you only have these two resources then the issue you will have to consider is when Hibernate succeeds in persiting then you get an Exception before you can acknowledge the JMS server. Not a big issue as the JMS message is not lost and you will read it again.
However
You must write your MessageListener to handle duplicate messages from the server
You must also handle a message that cannot be processed due to bad data and ending up in an infinite loop of trying to comsume it. In this case the server may be configured to move the message to a "dead message queue", or you deal with this yourself in the MessageListener
Other options and further reading
If your JMS server does not support XA (global) transactions this is pretty much your only solution.
If JMS server does support XA transactions but JDBC doesn't then you can use a JTATransactionManager and use the LastResourceCommitOptimisation. There are open source JTATransactionManagers you can use like JOTM
This JavaWorld article goes into more detail on your problem space.

Although this has been answered in detail by Brad, i would like to address a very specific part of your query:-
I didn't know how this is possible without the JtaTransactionManager
From the spring documentation:-
When a JTA environment is detected, Spring’s JtaTransactionManager will be used to manage transactions
https://docs.spring.io/spring-boot/docs/current/reference/html/boot-features-jta.html

Apache Camel - transaction in routes

I have a general question about Apache Camel. I wasn't able to find whether the aggregator is transacted. If it is transacted, how the transactions are implemented and how fast the aggregation is?

Sending the messages into the aggregator can run in a transaction.
You would need a persistent store with the aggregator to let the outgoing messages act as a transaction. See the documentation about persistence
http://camel.apache.org/aggregator2
For example there is a JDBC based and HawtDB (file based) persistent support out of the box. Its pluggable as you can also build your custom.
Camel in Action book chapter 8 and 9 convers this in much more details.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.