Transaction mode for file operations in Java

Transaction mode for file operations in Java - java

Perhaps what I'm trying to explain here doesn't make any sense, so I'd like to apologize in advance. Anyway, I will try.
I'm trying to read from a file, perform some database operations and move the content to another file. I was wondering if it is possible to perform all this operations in an atomic way in Java, so if anything goes wrong in the list of actions, rollback the complete sequence and go back to the start point.
Thanks in advance for your help.

Take a look at Apache Commons Transaction. It has the capability to manage files transactionally.
An archived article detailed its use with the file system.
update
Be aware that the status on the front page says:
We have decided to move the project to dormant as we are convinced that the main advertised feature transactional file access can not be implemented reliably. We are convinced that no such implementation can be possible on top of an ordinary file system. Although there are other useful parts (as multi level locking including deadlock detection) the transactional file system is the main reason people use this library for. As it simply can not be made fully transactional, it does not work as advertised.

There is no standard Transaction File API however I beleive that there is an Apache project that implements what you want.
http://commons.apache.org/transaction/file/index.html
The transactional file package provides you with code that allows you to have atomic read and write operations on any file system. The file resource manager offers you the possibility to isolate a number of operations on a set of files in a transaction. Using the locks package it is able to offer you full ACID transactions including serializability. Of course to make this work all access to the managed files must be done by this manager. Direct access to the file system can not be monitored by the manager.
update
Be aware that the status on the front page says:
We have decided to move the project to dormant as we are convinced that the main advertised feature transactional file access can not be implemented reliably. We are convinced that no such implementation can be possible on top of an ordinary file system. Although there are other useful parts (as multi level locking including deadlock detection) the transactional file system is the main reason people use this library for. As it simply can not be made fully transactional, it does not work as advertised.

As XADisk supports XA transactions over file-systems, it should solve you problem. It can participate in XA transactions along with Databases and other XA Resources.
In case your application is not in a JCA supportive environment, you can also use standalone Transaction Manager like Atomikos and carry out XA transactions involving both files (using XADisk) and Database.
update
The project's home page does not exist anymore and the last release on Maven was in 2013.

No, at least not with a simple call. Filesystems in general (and Java filesystem operations in particular) do not support a "rollback".
You could however emulate this. A common way would be to first rename the file so that it is marked as being "in processing". Append some suffix for example.
Then process it, then change the file. If anything goes wrong, just rollback the DB, rename all the file(s) with suffixes back to their original names and you're set.
As a bonus, on some FS a rename is even atomic, so you'd be safe even with concurrent updates (don't know if this is relevant for you). I do not know whether file renaming is atomic in Java though; you'd need to check.

You can coordinate a distributed transaction using Two-Phase Commit. However, this is fairly complex to implement and an approach I've often seen taken is to use single-phase commit, building a stack of transactions and then committing them all in quick succession, generating an error if one of the commit attempts fails but others succeed.
If you chose to implement Two-Phase Commit you'd require Write-Ahead Logging for each participant in the transaction, where you log actions before you've taken them, allowing you to roll back any changes if the transaction fails. For example, you'd need to do this in order to potentially reverse any changes made to files (as sleske mentions).

JBossTS proposes its own implementation for transactional file i/o, as part of the Narayana project (formerly called JBossTS).

Related

How to define persistence storage in cache2k?

It is said in Cache apidoc, that several methods like purge() or flush() operates dependent on persistence storage configured.
Unfortunately, I can't find, how to configure one?
Is it really possible?

Older versions of cache2k had persistence support baked in. It was working, but, however it never made it to a level that I would fully trust for production.
The actual issue was the clear() operation, which had a quite complex implementation. The clear should be fast, regardless of the storage implementation needing some time to remove the data. So, my idea was to switch to a write back scheme, where operations get queued and executed when the storage is available again. Implementing a partial write back scheme just for the clear, is quite some over engineering...
For the moment I dropped persistence from the feature set, since I don't want a 1.0 version which has a stabilized API and provides already a lot of useful features.
As you can see from the roadmap on the cache2k homepage the current plan is first to add bulk and async features and then get back to storage. Probably the storage interface needs to look totally different after the async capabilities are done.
Inside the current cache2k implementation there are still the interfaces where the storage will be hooked in, so that I do not completely abandon what already is achieved. flush() and purge() are still some remnants of this. So I better consequently will remove those two methods for the 1.0 version, to avoid confusion.
BTW: Since I saw your question on Guava, cache2k has support for a CacheWriter which is the counterpart for CacheLoader. With the cache loader and writer you can read and write to a storage by yourself, but it is not identical to storage support inside the cache itself. For example cache.contains(...) would check the storage, but it does not check the cache loader at least according to JSR107 and in every cache implementation I know of.

What is the simplest transaction framework in Java?

Given I have a simple task: process some piece of data and append it to the file. Its ok if I dont have exceptions, but this may happen. If something goes wrong I would like to remove all the changes from the file.
Also, may be I have set some variables during the processing and I would like to return their previous state too.
Also, may be I work with a database that doesn't support transactions (to the best of my knowledge MongoDB does not), so I would like to rollback it from DB somehow.
Yes, I can fix the issue with my file manually just by backuping the file and then replacing it. But generally looks like I need a transaction framework.
I dont want to use Spring monster for this. Its too much. And I dont have ELB container to manage EJB. I have a simple Java stand-alone application, but it needs transaction support.
Do I have some other options instead of plugging Spring or EJB?

If you don't want to use spring, try to implements a simple Two-phase commit mechanism: Two-Phase Commit Protocol

I am no Java expert but this sounds simple.
In fact I would not use transactions in an ACID compliant database since it doesn't sound like the right action.
Instead I would write to a temporary file, when your records have been written merge with the original file. That way if some records cannot be written to the file for whatever reason you just drop the old file and merging and saving the new file will be atomic within the program's memory and the OS's file system.

How do I save server state between webapp deployments not relying on a database?

We have a utility spring-mvc application that doesn't use a database, it is just a soap/rest wrapper. We would like to store an arbitrary message for display to users that persists between deployments. The application must be able to both read and write this data. Are there any best practices for this?

Multiple options.
Write something to the file system - Great for persistence. A little slow. Primary drawback is that it would probably have to be a shared file system, as any type of clustering wouldn't deal well with this. Then you get into file locking issues. Very easy implementation
Embedded DB - Similar benefits and pitfalls as just writing to the file system, but probably deals better with locking/transactional issues. Somewhat more difficult implementation.
Distributed Cache - Like Memcached - A bit faster than file, though not much. Deals with the clustering and locking issues. However, it's not persistent. Fairly reliable for a short webapp restart, but definitely not 100%. More difficult implementation, plus you need another server.

Why not use an embedded database? Options are:
H2
HSQL
Derby
Just include the jar file in the webapps classdir and configure the JDBC URL as normal.
Perfect for demos and easy to substitute when you want to switch to a bigger database server

I would simple store that in a file on a filesystem. It's possible to use an embedded database, or something like that, but for 1 message, a file will be fine.
I'd recommend you store the file outside of the application directory.
It might be alongside (next to) it, but don't go storing it inside your "webapps/" directory, or anything like that.
You'll probably also need to manage concurrency. A global (static) read/write lock should do fine.

I would use JNDI. Why over-complicate?

How to manage transaction for database and file system in Java EE environment?

I store file’s attributes (size, update time…) in database. So the problem is how to manage transaction for database and file.
In a Java EE environment, JTA is just able to manage database transaction.
In case, updating database is successful but file operation fails, should I write file-rollback method for this? Moreover, file operation in EJB container violates EJB spec.
What’s your opinion?

Access to external resources such as a file system should ideally go through a JCA connector. Though there are several posts around discussing this, I never found a ready-to-use JCA connector for transactional access to the file system, so I started to write one:
Have a look at: JCA connector: a file system adapter. It's fairly basic, but manages commit/rollback of files.
Regarding other projects:
I don't know the exact status of commons-transactions, it seems dead to me.
Have a look at JBoss Transactional File I/O, looks promising.
Have also a look at Filera: File resource adapter, but I don't think it's transactional
Note that as soon as you have more than one transactional participant, the app. server really need to use distributed transaction and things get more complicated. You must not underestimate this complexity (e.g. database have another timeout mechanism for distributed transaction).
Another lightweight approach to consider is to use a SFSB that writes on the file system. If you implement SessionSynchronization interface, you get beforeCompletion and afterCompletion callbacks. The later indicates whether the transaction was committed or rolled back and you do cleanup if necessary. You can then implement a transactional behavior.

JTA is not simply for Databases, it can be used long with any other resource if that resource supports XA transaction. For example, XADisk enables one such integration of file-systems with XA transactions. Hence, it can also solve the problem of file-system and database consistency which you have been trying to solve.
Hope that helps.
Nitin

manually. You'll probably need to write compensatory transactions for this.

Maybe have a look at commons-transaction for transactional file access. Refer to:
Transactional File System in Java
An XA Filesystem
An XA Filesystem, update
In any case, you'll have to write files outside the EJB container or to interact with a JCA connector as pointed out by #ewernli.

What JDBC tools do you use for synchronization of data sources?

I'm hoping to find out what tools folks use to synchronize data between databases. I'm looking for a JDBC solution that can be used as a command-line tool.
There used to be a tool called Sync4J that used the SyncML framework but this seems to have fallen by the wayside.

I have heard that the Data Replication Service provided by Db4O is really good. It allows you to use Hibernate to back onto a RDBMS - I don't think it supports JDBC tho (http://www.db4o.com/about/productinformation/drs/Default.aspx?AspxAutoDetectCookieSupport=1)
There is an open source project called Daffodil, but I haven't investigated it at all. (https://daffodilreplicator.dev.java.net/)
The one I am currently considering using is called SymmetricDS (http://symmetricds.sourceforge.net/)
There are others, they each do it slightly differently. Some use triggers, some poll, some use intercepting JDBC drivers. You need to decide what technical limitations you are under to determine which one you really want to use.
Wikipedia provides a nice overview of different techniques (http://en.wikipedia.org/wiki/Multi-master_replication) and also provides a link to another alternative DBReplicator (http://dbreplicator.org/).

If you have a model and DAO layer that exists already for your codebase, you can just create your own sync framework, it isn't hard.
Copy data is as simple as:
read an object from database A
remove database metadata (uuid, etc)
insert into database B
Syncing has some level of knowledge about what has been synced already. You can either do it at runtime by getting a list of uuids from TableInA and TableInB and working out which entries are new, or you can have a table of items that need to be synced (populate with a trigger upon insert/update in TableInA), and run from that. Your tool can be a TimerTask so databases are kept synced at the time granularity that you desire.
However there is probably some tool out there that does it all without any of this implementation faff, and each implementation would be different based on business needs anyway. In addition at the database level there will be replication tools.

True synchronization requires some data that I hope your database schema has (you can read the SyncML doc to see how they proceed). Sync4J won't help you much, it's really high-level and XML oriented. If you don't foresee any conflicts (which means: really easy synchronisation), you could try with a lightweight ETL like Enhydra Octopus.

I'm primarily using Oracle at the moment, and the most full-featured route I've come across is Red Gate's Data Compare:
http://www.red-gate.com/products/oracle-development/data-compare-for-oracle/
This old blog gives a good summary of the solution routes available:
http://www.novell.com/coolsolutions/feature/17995.html
The JDBC-specific offerings I've come across have been very basic. The solution mentioned by Aidos seems the most feature complete if you want to go down the publish-subscribe route:
http://symmetricds.codehaus.org/
Hope this helps.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.