How to rollback all steps in Spring Batch

How to rollback all steps in Spring Batch - java

I created a job which splits my file into small chunks and all these chunks are read in the separated steps. For ex. 3 steps are finished without any errors and the records are commited to the database, but if 4th step fails I need to rollback all records from previous step. Is it possible to rollback them somehow?
Or perhaps there is a possibility to commit all records only when the last step was finished correctly? (But here is problem with large files)

Don't play with transaction while using spring batch; due to its transactional nature is a really bad idea manually manage transaction.
See Transaction Management in Spring batch or Spring batch - One transaction over whole Job for further explanation

Not just spring, for any framework if you need to perform atomic operation across multiple read/write to data source, generally all those calls need to be wrapped in a transaction and then either committed or rolled back at the end. Understanding how JTA works goes a long way in identifying how to use framework that handle transactions, more information on JTA can be found here

Related

Spring Chained Transaction Manager versus Atomikos

Hi I have a distributed transactions and I have to manage them somehow
Also in spring ecosystem ChainedTransactionManager can do that on the other hand in spring document Atomikos can be used for distributed transactions
https://docs.spring.io/spring-boot/docs/2.1.6.RELEASE/reference/html/boot-features-jta.html
Which one I should use?I prefer to stay in spring librarys but Atomikos is much more than spring transaction manager?If someone use them both,Can compare pros and cons

Using Atomikos is a better overall solution. The ChainedTransactionManager is something you can use in some cases. The assumption it makes are stated in the javadocs:
PlatformTransactionManager implementation that orchestrates transaction creation, commits and rollbacks to a list of delegates. Using this implementation assumes that errors causing a transaction rollback will usually happen before the transaction completion or during the commit of the most inner PlatformTransactionManager.
The configured instances will start transactions in the order given and commit/rollback in reverse order, which means the PlatformTransactionManager most likely to break the transaction should be the last in the list configured. A PlatformTransactionManager throwing an exception during commit will automatically cause the remaining transaction managers to roll back instead of committing.
The chance of committing one transaction and the other one failing still remains with ChainedTransactionManager.
Using Atomikos is a real distributed transaction all or nothing on both databases. But this also has some consequences that can affect the support of the application, for example when the TX is fully commited on one DB and prepared on the other, and at that point the application crashes. You'll need to ensure that your application can recover from this scenario. Usually the TX would be fully commited on the second DB when the app is restarted, but that might not happen.
So which one is the right one? It depends.

How to remove non-transactional database logging after running integration tests?

I'm working on a Java/Spring app that requires audit logs to be written to a database. All services are currently marked as #Transactional - so if there is a failure, the changes are rolled back.
But audit logging is the exception to this - it should always succeed - so have been considering marking the AuditLogService as either Propagation.NOT_SUPPORTED or Propagation.REQUIRES_NEW.
The question is how to craft the integration tests. Ideally these should not leave log entries in the database. Would prefer not to have to manually delete this at the end of the test. Is there perhaps a way of marking a test as transactional that would include all transactions including ones that have started via Propagation.REQUIRES_NEW?

I ended up doing exactly what I said I didn't want to do and deleting all the operational data at the end of each test. (This actually worked better as the tests were no longer run in an overarching transaction, which masked some bugs, e.g. relating to Hibernate lazy loading.)

Risk of data contamination due to in memory processing - JAVA

I am developing java application based on spring framework.
It
Connects to a MySQL database
Gets data from MySQLTable1 in POJOs
Manipulates (update,delete) it in memory
Inserts into a Netezza database table
The above 4 processes are done for each client (A,B,C) every hour.
I am using a spring JDBC template to get the data like this:
SELECT COL1,COL2,COL3 FROM MySQLTable1 WHERE CLIENTID='A' AND COL4='CONDITION'
and read each record into a POJO before I write it to a Netezza table.
There are going to be multiple instance of this application running every hour through a scheduler.
So Client A and Client B can be running concurrently but the SELECT will be unique,
I mean data for:
SELECT COL1,COL2,COL3 FROM MySQLTable1 WHERE CLIENTID='A' AND COL4='CONDITION'
will be different from
SELECT COL1,COL2,COL3 FROM MySQLTable1 WHERE CLIENTID='B' AND COL4='CONDITION'
But remember all of these are stored in memory as POJOs.
My questions are :
Is there a risk of data contamination?
Is there a need to implement database transaction using spring data transaction manager?
Does my application really need to use something like Spring Batch to deal with this?
I appreciate your thoughts and feedback.
I know this is a perfect scenario for using an ETL tool but that is out of scope.

Is there a risk of data contamination?
It depend on what you are doing with your data but I don't see how you can have data contamination if every instance is independant, you just have to make sure that every instances that run concurrently are not working on the same data (Client ID).
Is there a need to implement database transaction using spring data transaction manager?
You will probably need a transaction for insertion into the Netezza table. You certainly want your data to have a consistent state in the result table. If an error occur in the middle of the process, you'll probably want to rollback everything that was inserted before it failed. Regarding the transaction manager, you don't especially need the Spring transaction manager, but since you are using Spring it might be a good option.
Does my application really need to use something like Spring Batch to deal with this?
Does it really need it, probably not, but Spring Batch was made for those kind of application, so it might help you to structure your application (Spring Batch provides reusable functions that are essential in processing large volumes of records, including logging/tracing, transaction management, job processing statistics, job restart, skip, and resource management). Everything can be made without the framework and it might be overkill to use it if you have a really small application. But at the end, if you need those features, you'll probably want to use it...

Spring Batch is ETL, so using it would be a good fit for this use case and also a good alternative to a commercial ETL tool.
Is there a risk of data contamination? Client A and B read separate data, so they can never interfere with each other by reading or writing the same data by accident. The risk would be if two clients with the same ID are created, but that is not the case.
Is there a need to implement database transaction using spring data transaction manager?
There is no mandatory need to do that, although programatic transaction management has many pitfalls and is best avoided. Spring Batch would manage transactions for you, as well as other aspects such as paging.
Does my application really need to use something like Spring Batch to deal with this? There is no mandatory need to do this, although it would help a lot, especially in the paging aspect. How will you handle queries that return thousands of rows? Without a framework this needs to be handled manually.

Debugging JUnit Tests in a Hibernate driven Spring Application

Maybe this has been answered already, but i did not find any suggestions out there...
My Project is a Spring Utility Project, the database i use is MySQL and for persistence i´m using Hibernate in combination with c3p0 connection pooling. i´m on spring 3.2 and hibernate 3.5.
So here is what i want to do:
I want to debug a JUnit test, step over some persistence functions (save, update, etc. ) and then check the entries manually in the database via SQL. Because of the JUnit tests always running in a transaction, i cannot check the entries in the database, because a rollback happens every time a test finished / a commit never occurs.
Is there a way to fake transaction existence, or bypassing them during JUnit tests?

Perhaps you can flush the transaction in Hibernate during your debugging session and force Spring/Hibernate to write to the database.
Or you can turn off transactions for your debugging session.

Rather than fake transaction existence, the best approach to looking at the database while the transaction is taking place is to query with an isolation level that allows dirty reads. The mechanism for doing this varies from database to database, and in MySQL you can use
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
prior to querying.
Clearly you will also need to force Hibernate to flush writes to the database during your test, and set your breakpoint after the flush.

How to group a database write and spreadsheet write in single "transaction"

I have a Java program that writes results to both a DB (SQL Server) and a spreadsheet (POI), and it would be best if neither is written to if there's an error with either.
It would be a lot worse if the spreadsheet was produced and then an error happened while saving to the DB, so I'm doing the DB-write first. Even so, I'm wondering if someone knows of a way to guarantee they both succeed or fail as a unit.
Thanks!

Consider the Java Common Transaction, which has a File Transaction component.
If you could wrap both the database call and the file write within a File Transaction in a larger encompassing transaction, you might have what you're looking for.
More at http://commons.apache.org/transaction/file/index.html

Use XA transactions to include file-system operations and database operations in the same transaction (hence making them atomic). SQL Server already supports XA; use XADisk for file-system transactions enabled with XA.

We currently use the following approach for send emails in an transaction: The EMails are written to a database table within the transaction, and a helper thread takes them out and sends them asynchronously. The latter can be retried a few times, which makes us really sure that emails leave if they can. Same with calling the Report Server, the FAX server etc.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.