Given I have a simple task: process some piece of data and append it to the file. Its ok if I dont have exceptions, but this may happen. If something goes wrong I would like to remove all the changes from the file.
Also, may be I have set some variables during the processing and I would like to return their previous state too.
Also, may be I work with a database that doesn't support transactions (to the best of my knowledge MongoDB does not), so I would like to rollback it from DB somehow.
Yes, I can fix the issue with my file manually just by backuping the file and then replacing it. But generally looks like I need a transaction framework.
I dont want to use Spring monster for this. Its too much. And I dont have ELB container to manage EJB. I have a simple Java stand-alone application, but it needs transaction support.
Do I have some other options instead of plugging Spring or EJB?
If you don't want to use spring, try to implements a simple Two-phase commit mechanism: Two-Phase Commit Protocol
I am no Java expert but this sounds simple.
In fact I would not use transactions in an ACID compliant database since it doesn't sound like the right action.
Instead I would write to a temporary file, when your records have been written merge with the original file. That way if some records cannot be written to the file for whatever reason you just drop the old file and merging and saving the new file will be atomic within the program's memory and the OS's file system.
Related
I'm currently working on an admin panel for some java programs I've written. The Java Programs are currently getting controlled by their own cfg files. My idea to update the configs via the control panel is to change values in the control panel, save those changes in a database via mysql and then write a "config updater" which keeps fetching the database entries every couple of seconds and then writes the changes to the cfg files. I'm just wondering about the efficency, is this a good approach or are there way better and more efficent ways?
It really depends on what you actually want to achieve.
If your program is reading the cfg files on demand, meaning, that you can change them and the results are effective immediately, you would just fetch them on demand from the DB as well.
If the cfg files are more or less statics, you might consider using a .properties file instead (or even a Config class containing only static and final fields), or, if you want to stick to the database, you could use the Singleton approach and just read it after startup into the Singleton.
In the end it's almost down to opinion and use-case. If the config should be configurable by the customer / end user, it might be better to use a database anyway. But as a developer, I frankly don't care as long as it's documented where to configure it.
We are going to use Spring-Batch in a project that needs to read, convert and write big ammounts of data. So far, everything is fine.
But there is a non-functional requirement that says we can't create DB objects using english words, so the original schema used by Spring Data will not be aproved by client's DBA, unless we translate it.
In docs, I don't see any way to configure or extend the API to achieve this objective, so it seems that we'll have to customize source code to make it work with the equivalent, translated, model. Is that a correct/feasible assumption, or am I missing something?
That is an unusual requirement. However, in order to completely rename the tables and columns in the batch schema, you'll need to re-implement the JDBC based repository DAOs to use your own SQL.
I am developing java application based on spring framework.
It
Connects to a MySQL database
Gets data from MySQLTable1 in POJOs
Manipulates (update,delete) it in memory
Inserts into a Netezza database table
The above 4 processes are done for each client (A,B,C) every hour.
I am using a spring JDBC template to get the data like this:
SELECT COL1,COL2,COL3 FROM MySQLTable1 WHERE CLIENTID='A' AND COL4='CONDITION'
and read each record into a POJO before I write it to a Netezza table.
There are going to be multiple instance of this application running every hour through a scheduler.
So Client A and Client B can be running concurrently but the SELECT will be unique,
I mean data for:
SELECT COL1,COL2,COL3 FROM MySQLTable1 WHERE CLIENTID='A' AND COL4='CONDITION'
will be different from
SELECT COL1,COL2,COL3 FROM MySQLTable1 WHERE CLIENTID='B' AND COL4='CONDITION'
But remember all of these are stored in memory as POJOs.
My questions are :
Is there a risk of data contamination?
Is there a need to implement database transaction using spring data transaction manager?
Does my application really need to use something like Spring Batch to deal with this?
I appreciate your thoughts and feedback.
I know this is a perfect scenario for using an ETL tool but that is out of scope.
Is there a risk of data contamination?
It depend on what you are doing with your data but I don't see how you can have data contamination if every instance is independant, you just have to make sure that every instances that run concurrently are not working on the same data (Client ID).
Is there a need to implement database transaction using spring data transaction manager?
You will probably need a transaction for insertion into the Netezza table. You certainly want your data to have a consistent state in the result table. If an error occur in the middle of the process, you'll probably want to rollback everything that was inserted before it failed. Regarding the transaction manager, you don't especially need the Spring transaction manager, but since you are using Spring it might be a good option.
Does my application really need to use something like Spring Batch to deal with this?
Does it really need it, probably not, but Spring Batch was made for those kind of application, so it might help you to structure your application (Spring Batch provides reusable functions that are essential in processing large volumes of records, including logging/tracing, transaction management, job processing statistics, job restart, skip, and resource management). Everything can be made without the framework and it might be overkill to use it if you have a really small application. But at the end, if you need those features, you'll probably want to use it...
Spring Batch is ETL, so using it would be a good fit for this use case and also a good alternative to a commercial ETL tool.
Is there a risk of data contamination? Client A and B read separate data, so they can never interfere with each other by reading or writing the same data by accident. The risk would be if two clients with the same ID are created, but that is not the case.
Is there a need to implement database transaction using spring data transaction manager?
There is no mandatory need to do that, although programatic transaction management has many pitfalls and is best avoided. Spring Batch would manage transactions for you, as well as other aspects such as paging.
Does my application really need to use something like Spring Batch to deal with this? There is no mandatory need to do this, although it would help a lot, especially in the paging aspect. How will you handle queries that return thousands of rows? Without a framework this needs to be handled manually.
We have a utility spring-mvc application that doesn't use a database, it is just a soap/rest wrapper. We would like to store an arbitrary message for display to users that persists between deployments. The application must be able to both read and write this data. Are there any best practices for this?
Multiple options.
Write something to the file system - Great for persistence. A little slow. Primary drawback is that it would probably have to be a shared file system, as any type of clustering wouldn't deal well with this. Then you get into file locking issues. Very easy implementation
Embedded DB - Similar benefits and pitfalls as just writing to the file system, but probably deals better with locking/transactional issues. Somewhat more difficult implementation.
Distributed Cache - Like Memcached - A bit faster than file, though not much. Deals with the clustering and locking issues. However, it's not persistent. Fairly reliable for a short webapp restart, but definitely not 100%. More difficult implementation, plus you need another server.
Why not use an embedded database? Options are:
H2
HSQL
Derby
Just include the jar file in the webapps classdir and configure the JDBC URL as normal.
Perfect for demos and easy to substitute when you want to switch to a bigger database server
I would simple store that in a file on a filesystem. It's possible to use an embedded database, or something like that, but for 1 message, a file will be fine.
I'd recommend you store the file outside of the application directory.
It might be alongside (next to) it, but don't go storing it inside your "webapps/" directory, or anything like that.
You'll probably also need to manage concurrency. A global (static) read/write lock should do fine.
I would use JNDI. Why over-complicate?
Perhaps what I'm trying to explain here doesn't make any sense, so I'd like to apologize in advance. Anyway, I will try.
I'm trying to read from a file, perform some database operations and move the content to another file. I was wondering if it is possible to perform all this operations in an atomic way in Java, so if anything goes wrong in the list of actions, rollback the complete sequence and go back to the start point.
Thanks in advance for your help.
Take a look at Apache Commons Transaction. It has the capability to manage files transactionally.
An archived article detailed its use with the file system.
update
Be aware that the status on the front page says:
We have decided to move the project to dormant as we are convinced that the main advertised feature transactional file access can not be implemented reliably. We are convinced that no such implementation can be possible on top of an ordinary file system. Although there are other useful parts (as multi level locking including deadlock detection) the transactional file system is the main reason people use this library for. As it simply can not be made fully transactional, it does not work as advertised.
There is no standard Transaction File API however I beleive that there is an Apache project that implements what you want.
http://commons.apache.org/transaction/file/index.html
The transactional file package provides you with code that allows you to have atomic read and write operations on any file system. The file resource manager offers you the possibility to isolate a number of operations on a set of files in a transaction. Using the locks package it is able to offer you full ACID transactions including serializability. Of course to make this work all access to the managed files must be done by this manager. Direct access to the file system can not be monitored by the manager.
update
Be aware that the status on the front page says:
We have decided to move the project to dormant as we are convinced that the main advertised feature transactional file access can not be implemented reliably. We are convinced that no such implementation can be possible on top of an ordinary file system. Although there are other useful parts (as multi level locking including deadlock detection) the transactional file system is the main reason people use this library for. As it simply can not be made fully transactional, it does not work as advertised.
As XADisk supports XA transactions over file-systems, it should solve you problem. It can participate in XA transactions along with Databases and other XA Resources.
In case your application is not in a JCA supportive environment, you can also use standalone Transaction Manager like Atomikos and carry out XA transactions involving both files (using XADisk) and Database.
update
The project's home page does not exist anymore and the last release on Maven was in 2013.
No, at least not with a simple call. Filesystems in general (and Java filesystem operations in particular) do not support a "rollback".
You could however emulate this. A common way would be to first rename the file so that it is marked as being "in processing". Append some suffix for example.
Then process it, then change the file. If anything goes wrong, just rollback the DB, rename all the file(s) with suffixes back to their original names and you're set.
As a bonus, on some FS a rename is even atomic, so you'd be safe even with concurrent updates (don't know if this is relevant for you). I do not know whether file renaming is atomic in Java though; you'd need to check.
You can coordinate a distributed transaction using Two-Phase Commit. However, this is fairly complex to implement and an approach I've often seen taken is to use single-phase commit, building a stack of transactions and then committing them all in quick succession, generating an error if one of the commit attempts fails but others succeed.
If you chose to implement Two-Phase Commit you'd require Write-Ahead Logging for each participant in the transaction, where you log actions before you've taken them, allowing you to roll back any changes if the transaction fails. For example, you'd need to do this in order to potentially reverse any changes made to files (as sleske mentions).
JBossTS proposes its own implementation for transactional file i/o, as part of the Narayana project (formerly called JBossTS).