Lightweight In-Process Distributed Transaction Manager for Java? - java

I'm trying to extend Clojure to add durability to refs in a way that allows users to choose which data store instances to persist to. That requires distributed transactions. Are there any really lightweight, in-process distributed transaction managers, supporting XA, for Java? If not, and I have to roll my own, are there any good resources explaining what a distributed transaction coordinator has to support? Specifically, I'm having trouble understanding what the semantics of the 3 parts of an XID are really supposed to be. As an initial implementation, I'm using BDB JE.

I know these two:
Bitronix: This is the one we are using currently, it seems to work OK and it is easy to configure.
Atomikos: We have tried this, but it is a little harder to configure than Bitronix, and it has some hardcoded dependencies to java.util.logging which we did not want. It should more feature-complete than Bitronix as it is an open source version of a commercially supported product.

http://www.atomikos.com should do what you are looking for...

Related

Migrating CORBA Application to Modern Java technologies (Rest/SOAP/EJB)

I have a requirement to migrate a legacy CORBA system to any latest java technology. The main problem I am facing is to provide long lived transaction(db) in the proposed system. Currently the client(Swing App) retain the CORBA service object and perform multiple db txn before actually committing/rolling back all the txn. Service layer keep the state of connection object through out to complete transaction.
I wanted to reproduce this mechanism in my new system(REST/WS) so that either Swing client/Web(future) can work in the same as is.
eg:
try {
service1.updateXXData(); // --> insert in to table XX
service2.updateUUData() //--> insert in to table UU
service1.updateZZData(); // --> insert in to table ZZ
service2.updateAAData(); // --> insert in to table AA
service1.commit(); // con.commmit();
service2.commit(); // con.commmit();
}
exception(){
service1.rollback(); // con.rollback();
service2.rollback(); // con.rollback();
}
Now I wanted to migrate CORBA to any modern technolgy, but still I am at large to find a solution for this. ( the concern is client do not want to make any change to service layer or db layer) , they just wanted to remove CORBA.
couple of options available for me are
Migrate CORBA to RMI --> so that changes required to current system are minimal, but transaction management,connection pooling, retaining state need to do my self.
Migrate CORBA to Stateful EJB --> Compare RMI more changes required, but better since I can use container managed connection pooling, maintain state in a better way.
Migrate CORBA to Stateful Webservice(SOAP) --> More futuristic, but lot of changes required - How ever I can convert IDL to WSDL, and delegate the call to implementation layer
Migrate CORBA to REST --> Most desired if possible - but the amount of time required to migrate is huge , Code changes would require from UI layer to service layer.
Thank you very much in advance
The order in which I would choose the options, from best to worst, would be 4, 3, 2, and 1, however I'd avoid stateful beans or services if humanly possible to do so.
I'll go over the implementation details of what you'll have to do in detail.
For any of these solutions, you'll have to use XA-compliant data sources and transactions so you can guarantee ACID compliance, preferably from an application server so you don't have to generate the transaction yourself. This should be an improvement from your existing application as it almost certainly can't guarantee that, but be advised that in my experience, people put loads of hacks in to essentially reinvent JTA, so watch out for that.
For 4, you'll want to use container-managed transactions with XA. You might do this by injecting a #PersistenceContext backed by a JTA connection. Yes, this costs a ton of time, testing, and effort, but it has two bonuses: First, moving to the web will be a lot easier, and it sounds like that time is coming. Second, those that come after you are more likely to be well-versed in newer web service technologies than bare CORBA and RMI.
For 3, you'll also want to use container-managed transactions with XA. SOAP would not be my first choice as it uses very verbose messages and REST is more popular, but it could be done. If it's stateful, though, you'll have to use bean-managed transactions instead and then hang on to resources across web service calls. This is dangerous, as it could potentially deadlock the whole system.
For 2, you can go two ways, either using container-managed transactions with XA by using a stateless session facade for a stateful EJB. You can use a client JAR for your EJB and package that with the Swing app. Using the stateless facade is preferable, as it will reduce the load on your application server. Keep in mind that you can generate web services from stateless EJB beans too, essentially turning this into #3.
For 1... well, good luck. It is possible to use RMI to interface with EJB's, and generate your own stub and tie, though this is not recommended, and for very good reason. This hasn't been a popular practice for years, may require the stubs and ties to be regenerated periodically, and may require an understanding of the low-level functions of the app server. Even here, you'll want XA transactions. You don't want to handle the transaction management yourself, if possible.
Ultimately, as I'm sure everyone will agree, the choice is yours on what to do, and there's no "right" or "wrong" way, despite the opinions stated above. If it were me (and it's not), I'd ask two important questions of myself and my customer:
Is this for a contract or temporary engagement, and if so what is the term? Do I get first pick at another contract for this same system later when they want additional updates? (In other words, how much money am I going to get out of this vs. how much time am I spending? If it's going to be a long term, then I would go with 4 or 3, otherwise 3 or 2 would be better.)
Why get rid of CORBA? "Because it's old" is an honest answer, but what's the impetus of getting rid of the "old hotness?" Do they plan on expanding usage of this system in the future? Is there some license about to expire and they just want to keep the lights on? Is it because they don't want to dump this on some younger programmer who might not know how to deal with low-level stuff like this? What do you want the system to do in two years, five years, or longer?
(OK, so that's more than two questions :D)

How to manage transactions over multiple databases

I have a web application which receives requests to save orders in a database. I want to write to 2 different databases - one Cassandra instance and one PostgreSQL instance. I am using plain Java and JDBC (with apache DBUtis) with a lightweight web application library at the front.
What I am unsure about is how to implement transactionality across the two databases, i.e. if a write to one of the databases fails, then rollback the other write and put an error message in the error log.
Are there any mechanisms in Java to implement this? I know of such a thing as two phase commit, is that what I would be looking for here? Are there any alternatives?
Both Cassandra & PostgreSQL support linearizability and compare-and-set (CAS), so you can implement transactions on the client side.
If you want Serializable Isolation level then you should take a look on the Percolator's transactions. The Percolator's transactions are quite known in the industry and have been used in the Amazon's DynamoDB transaction library, in the CockroachDB database and in the Google's Pecolator system itself. A step-by-step visualization of the Percolator's transactions may help you to understand it.
If you expect contention and can deal with Read Committed isolation level then RAMP transactions by Peter Bailis may suit you. I also created a step-by-step RAMP visualization.
The third approach is to use compensating transactions also known as the saga pattern. It was described in the late 80s in the Sagas paper but became more actual with the raise of distributed systems. Please see the Applying the Saga Pattern talk for inspiration.

Is it possible to build a web application as a generic product rather than a one-time customized solution?

There is a business problem that needs to be solved. The obvious solution is an enterprise web application - a locally hosted website that provides the desired functionality.
I want to build this web application, but build it such that -
Its more of a product than a one-time solution; such that it can be customized for different clients
It is possible to provide 'fixes' for this web application, so that bugs can be removed and enhancements added with minimum impact on operations
The web app should be capable of working with different databases and existing authentication systems
Is this even possible? Is it a common enough approach that there is a known way of going about this? Would it be better to use an application framework like Spring or try and keep dependencies on frameworks to minimal?
Also, any links or references to books that will guide me will be greatly appreciated.
Thanks in advance StackOverflow!
(I feel like I dont know all what I need to know before embarking on this project, please feel free to point out things I haven't and should consider)
Developing software, esp. for re-usability, requires analyzing which parts/functions are common between use cases and which aren't, drawing the line between re-usable (library) and customized/specialized code.
If you know what use cases you expect or want to support in the future this can be feasible.
If you don't, you should not start trying to generalize arbitrary functionality in the first place, because you cannot know what you will be needing in the future.
Java provides some good abstractions of various functionalities, like universal DB support via JDBC.
If you didn't already, have a look at application servers like JBoss or Glassfish. They provide plenty of basic functionality for web applications, support very loose coupling between components, and are highly configurable. To switch from one DBMS to another, for instance, it is enough to alter a single line of configuration (given the supported SQL is similar enough). Deploying applications or parts can often be done on the fly ("hot deployment") without even stopping the server.
Plus: There is a vast amout of supporting libraries and frameworks out there to help you standardize your application design.
I have been working for a while on a webapp that can be deployed in multiple locations: it is designed to be instantiated on many hosts. It's entirely possible to do this, but it is difficult. Writing the code so that it can work this way takes a great deal of care.
The key to doing it is to make all your dependencies on things explicit and all your configuration driven by properties that can be set during installation. Spring makes this quite a lot easier! In particular, the org.springframework.web.context.support.ServletContextPropertyPlaceholderConfigurer class allows you to use the servlet context as a source of values that you can then inject into your beans (e.g., via #Value annotations). It's far harder to do all that yourself. Here's (a simplified version of) what I use:
<bean class="org.springframework.web.context.support.ServletContextPropertyPlaceholderConfigurer">
<property name="contextOverride" value="true" />
<property name="location" value="/WEB-INF/default.properties" />
</bean>
This merges the servlet context's properties on top of the ones you provide as defaults inside your webapp (definitely a good practice if most things aren't going to need to be modified most of the time) and then uses them to define properties. I then apply a configuration property (e.g., foo.bar) to a bean property using a placeholder, like this:
#Value("${foo.bar}")
public void setFoobar(String foobar) { ... }
Things to configure that way include the database configuration, absolute locations of files holding things that can't be packaged inside the webapp, etc. You'll have to use your skill and knowledge of the application domain to work out what things need to be listed.
Other key principles are to keep as much as possible inside the webapp (so reducing the opportunity for the deployer to mess it up), to be very careful about documenting everything, and to try it with multiple servlet containers. Remember, the person deploying your webapp does not have access to the contents of your thoughts: you have to write it down and tell them exactly what to do. (Too many instructions are at the level of “click this, click that, magic happens” but those are poor instructions since the exact method will vary over time: saying why will help far more because its more portable.)
We are currently developing a product that can be deployed internally for multiple clients and also as a public portal solution. Here is our experience.
As others have pointed out, there are different factors to keep in mind.
Security
Security that is associated with your product, and how you would manage the product functional requirements to external security roles.
Security, authentication and authorization should not be as part of the base product. Once authorized the roles need to be mapped to product roles for achieving said functionality.
Images and logos, that require customization.
Internationalization.
For working with multiple databases, assuming a product has typically two different views, persistence and querying. Our experience was to use hibernate to support multiple databases, but theoretically we have used only two databases in the past. db2 and mysql.
Testing for multiple databases for every release of your product is a pain. Your test cases goes 3 fold or atleast once in a while to support multiple databases.
Using custom databases and functions are a big no, you can use some general functions but custom database specific functions in your query are going to be a pain and have to be very diligent to avoid them.
Supported browsers in your product.
Licenses of the third party jars may not be compatible / acceptable to all institutions so you have to watch out for that carefully.
As much as possible, enable properties or configuration to customize all variables.
Caching strategy and properties initialization strategies.
A framework helps the team to keep on the same page, rather than an internal framework. There are many advantages to use a well established framework like Spring for performance and other consideration.
Cheers!

An alternative to Hibernate or TopLink? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Is there a viable alternative to Hibernate? Preferably something that doesn't base itself on JPA.
Our problem is that we are building a complex (as in, many objects refer to each other) stateful RIA system. It seems as Hibernate is designed to be used mainly on one-off applications - JSF and the like.
The problem is mainly that of lazy loading. Since there can be several HTTP requests between the initialization and actually loading lazy collections, a session per transaction is out of the question. A long-lived session (one per application) doesn't work well either, because once a transaction hits a snag and throws an exception, the whole session is invalidated, thus the lazy loaded objects break. Then there's all kinds of stuff that just don't work for us (like implicit data persisting of data from outside an initialized transaction).
My poor explanations aside, the bottom line is that Hibernate does magic we don't like. It seems like TopLink isn't any better, it also being written on top of EJB.
So, a stateless persistence layer (or even bright-enough object-oriented database abstraction layer) is what we would need the most.
Any thoughts, or am I asking for something that doesn't exist?
Edit: I'm sorry for my ambiguous terminology, and thank you all for your corrections and insightful answers. Those who corrected me, you are all correct, I meant JPA, not EJB.
If you're after another JPA provider (Hibernate is one of these) then take a look at EclipseLink. It's far more fully-featured than the JPA 1.0 reference implementation of TopLink Essentials. In fact, EclipseLink will be the JPA 2.0 reference implementation shipped with Glassfish V3 Final.
JPA is good because you can use it both inside and outside a container. I've written Swing clients that use JPA to good effect. It doesn't have the same stigma and XML baggage that EJB 2.0/2.1 came with.
If you're after an even lighter weight solution then look no further than ibatis, which I consider to be my persistence technology of choice for the Java platform. It's lightweight, relies on SQL (it's amazing how much time ORM users spend trying to make their ORM produce good SQL) and does 90-95% of what JPA does (including lazy loading of related entities if you want).
Just to correct a couple of points:
JPA is the peristence layer of EJB, not built on EJB;
Any decent JPA provider has a whole lot of caching going on and it can be hard to figure it all out (this would be a good example of "Why is Simplicity So Complex?"). Unless you're doing something you haven't indicatd, exceptions shouldn't be an issue for your managed objects. Runtime exceptions typically rollback transactions (if you use Spring's transaction management and who doesn't do that?). The provider will maintain cached copies of loaded or persisted objects. This can be problematic if you want to update outside of the entity manager (requiring an explicit cache flush or use of EntityManager.refresh()).
As mentioned, JPA <> EJB, they're not even related. EJB 3 happens to leverage JPA, but that's about it. We have a bunch of stuff using JPA that doesn't even come close to running EJB.
Your problem is not the technology, it's your design.
Or, I should say, your design is not an easy fit on pretty much ANY modern framework.
Specifically, you're trying to keep transactions alive over several HTTP requests.
Naturally, most every common idiom is that each request is in itself one or more transactions, rather than each request being a portion of a larger transaction.
There is also obvious confusion when you used the term "stateless" and "transaction" in the same discussion, as transactions are inherently stateful.
Your big issue is simply managing your transactions manually.
If you transaction is occurring over several HTTP requests, AND those HTTP requests happen to be running "very quicky", right after one another, then you shouldn't really be having any real problem, save that you WILL have to ensure that your HTTP requests are using the same DB connection in order to leverage the Databases transaction facility.
That is, in simple terms, you get a connection to the DB, stuff it in the session, and make sure that for the duration of the transaction, all of your HTTP requests go through not only that same session, but in such a way that the actual Connection is still valid. Specifically, I don't believe there is an off the shelf JDBC connection that will actually survive failover or load balancing from one machine to another.
So, simply, if you want to use DB transactions, you need to ensure that your using the same DB Connection.
Now, if your long running transaction has "user interactions" within it, i.e. you start the DB transaction and wait for the user to "do something", then, quite simply, that design is all wrong. You DO NOT want to do that, as long lived transactions, especially in interactive environments, are just simply Bad. Like "Crossing The Streams" Bad. Don't do it. Batch transactions are different, but interactive long lived transactions are Bad.
You want to keep your interactive transactions as short lived as practical.
Now, if you can NOT ensure you will be able to use the same DB connection for your transaction, then, congratulations, you get to implement your own transactions. That means you get to design your system and data flows as if you have no transactional capability on the back end.
That essentially means that you will need to come up with your own mechanism to "commit" your data.
A good way to do this would be where you build up your data incrementally into a single "transaction" document, then feed that document to a "save" routine that does much of the real work. Like, you could store a row in the database, and flag it as "unsaved". You do that with all of your rows, and finally call a routine that runs through all of the data you just stored, and marks it all as "saved" in a single transaction mini-batch process.
Meanwhile, all of your other SQL "ignores" data that is not "saved". Throw in some time stamps and have a reaper process scavenging (if you really want to bother -- it may well be actually cheaper to just leave dead rows in the DB, depends on volume), these dead "unsaved" rows, as these are "uncomitted" transactions.
It's not as bad as it sounds. If you truly want a stateless environment, which is what it sounds like to me, then you'll need to do something like this.
Mind, in all of this the persistence tech really has nothing to do with it. The problem is how you use your transactions, rather than the tech so much.
I think you should have a look at apache cayenne which is a very good alternative to "big" frameworks. With its decent modeler, the learning curve is shorten by a good documentation.
I've looked at SimpleORM last year, and was very impressed by its lightweight no-magic design. Now there seems to be a version 3, but I don't have any experience with that one.
Ebean ORM (http://www.avaje.org)
It is a simpler more intuitive ORM to use.
Uses JPA Annotations for Mapping (#Entity, #OneToMany etc)
Sessionless API - No Hibernate Session or JPA Entity Manager
Lazy loading just works
Partial Object support for greater performance
Automatic Query tuning via "Autofetch"
Spring Integration
Large Query Support
Great support for Batch processing
Background fetching
DDL Generation
You can use raw SQL if you like (as good as Ibatis)
LGPL licence
Rob.
BEA Kodo (formerlly Solarmetric Kodo) is another alternative. It supports JPA, JDO, and EJ3. It is highly configurable and can support agressive pre-fetching, detaching/attaching of objects, etc.
Though, from what you've described, Toplink should be able to handle your problems. Mostly, it sounds like you need to be able to attach/detach objects from the persistence layer as requests start and end.
Just for reference, why the OP's design is his biggest problem: spanning transactions across multiple user requests means you can have as many open transactions at a given time as there are users connected to your app - a transaction keeps the connection busy until it is committed/rolled back. With thousand of simultaneously connected users, this can potentially mean thousands of connections. Most databases don't support this.
Neither Hibernate nor Toplink (EclipseLink) is based on EJB, they are both POJO persistancy frameworks (ORM).
I agree with the previous answer: iBatis is a good alternative to ORM frameworks: full control over sql, with a good caching mechanism.
One other option is Torque, I am not saying it is better than any of the options mentioned above but just that it is another option to look at.
It is getting quite old now but may fit some of your requirements.
Torque
When I was myself looking for a replacement to Hibernate I stumbled upon DataNucleus Access Platform, which is an Apache2-licensed ORM. It isn't just ORM as it provides persistence and retrieval of data also in other datasources than RDBMS, like LDAP, DB4O and XML. I don't have any usage experience, but it looks interesting.
Consider breaking your paradigm completely with something like tox. If you need Java classes you could load the XML result into JDOM.

What JDBC tools do you use for synchronization of data sources?

I'm hoping to find out what tools folks use to synchronize data between databases. I'm looking for a JDBC solution that can be used as a command-line tool.
There used to be a tool called Sync4J that used the SyncML framework but this seems to have fallen by the wayside.
I have heard that the Data Replication Service provided by Db4O is really good. It allows you to use Hibernate to back onto a RDBMS - I don't think it supports JDBC tho (http://www.db4o.com/about/productinformation/drs/Default.aspx?AspxAutoDetectCookieSupport=1)
There is an open source project called Daffodil, but I haven't investigated it at all. (https://daffodilreplicator.dev.java.net/)
The one I am currently considering using is called SymmetricDS (http://symmetricds.sourceforge.net/)
There are others, they each do it slightly differently. Some use triggers, some poll, some use intercepting JDBC drivers. You need to decide what technical limitations you are under to determine which one you really want to use.
Wikipedia provides a nice overview of different techniques (http://en.wikipedia.org/wiki/Multi-master_replication) and also provides a link to another alternative DBReplicator (http://dbreplicator.org/).
If you have a model and DAO layer that exists already for your codebase, you can just create your own sync framework, it isn't hard.
Copy data is as simple as:
read an object from database A
remove database metadata (uuid, etc)
insert into database B
Syncing has some level of knowledge about what has been synced already. You can either do it at runtime by getting a list of uuids from TableInA and TableInB and working out which entries are new, or you can have a table of items that need to be synced (populate with a trigger upon insert/update in TableInA), and run from that. Your tool can be a TimerTask so databases are kept synced at the time granularity that you desire.
However there is probably some tool out there that does it all without any of this implementation faff, and each implementation would be different based on business needs anyway. In addition at the database level there will be replication tools.
True synchronization requires some data that I hope your database schema has (you can read the SyncML doc to see how they proceed). Sync4J won't help you much, it's really high-level and XML oriented. If you don't foresee any conflicts (which means: really easy synchronisation), you could try with a lightweight ETL like Enhydra Octopus.
I'm primarily using Oracle at the moment, and the most full-featured route I've come across is Red Gate's Data Compare:
http://www.red-gate.com/products/oracle-development/data-compare-for-oracle/
This old blog gives a good summary of the solution routes available:
http://www.novell.com/coolsolutions/feature/17995.html
The JDBC-specific offerings I've come across have been very basic. The solution mentioned by Aidos seems the most feature complete if you want to go down the publish-subscribe route:
http://symmetricds.codehaus.org/
Hope this helps.

Categories

Resources