Distributed Processing: C++ equivalent of JTA

Distributed Processing: C++ equivalent of JTA - java

I'm developing a mission-critical solution where data integrity is paramount and performance a close second. If data gets stuffed up, it's gonna be cata$trophic.
So, I'm looking for the C/C++ version of JTA (Java Transaction API). Does anyone know of any C or C++ libraries that supports distributed transactions? And yes, I've googled it ... unsuccessfully.
I'd hate to be told that there isn't one and I'd need to implement the protocol specified by Distributed TP: The XA Specification.
Please help!
Edit (responding to kervin): If I need to insert records across multiple database servers and I need to commit them atomically, products like Oracle will have solutions for it. If I've written my own message queue server and I want to commit messages to multiple servers atomically, I'll need something like JTA to make sure that I don't stuff up the atomicity of the transaction.

Encina, DCE-RPC, TUXEDO, possibly CORBA (though I hesitate to suggest using CORBA), MTS (again, hmm).
These are the kind of things you want for distributed transaction processing.
Encina used to have a lot of good documentation for its DCE-based system.

There are hundreds. Seriously.
As far as general areas go. Check out Service Oriented Architecture, most of the new products are coming out in that area. Eg. RogueWave HydraSCA
I would start with plain Rogue Wave Suite, then see if I needed an Enterprise Service Bus after looking at that design.
That probably depends a lot on your design requirements and budget.

Oracle Tuxedo is the 800 pound gorilla in this space and was actually the basis for much of the XA specification. It provides distributed transaction management and can handle 100's of thousands of requests/second.
For more information: http://www.oracle.com/tuxedo
Also, if you like SCA (Service Component Architecture), there is an add-on product for Tuxedo called SALT that provides an SCA container for programming in C++, Python, Ruby, and PHP.

Related

Share data between Java EE servers

What products/projects could help me with the following scenario?
More than one server (same location)
Some state should be shared between server (for instance information if a scheduled task is running and on what server).
The obvious answer could of course be databases but we are using Seam and there doesn't seem to be a good way to nest transactions inside a Seam-bean so I need to find a way where I don't have to go crazy over configuration (tried to use EJB:s but persistence.xml wasn't pretty afterwards). So i need another way around this problem until Seam support nested transactions.
This is basically the same scenario as I have if you need more details: https://community.jboss.org/thread/182126.
Any ideas?

Sounds like you need to do distributed job management.
The reality is that in the Java EE world, you are going to end up having to do Queues, as in MoM [Message-oriented Middleware]. Seam will work with JMS, and you can have publish and subscribe queues.
Where you might want to take a look for an alternative is at Akka. It gives you the ability to distribute jobs across machines using an Actor/Agent model that is transparent. That's to say your agents can cooperate with each other whether they are on the same instance or across the network from each other, and you are not writing a ton of code to make that happen, or having to special handle things up and down the message chain.
The other thing Akka has going for it is the notion of Supervision, aka Go Ahead and Fail, or Let it Crash. This is the idea (followed by the Telcos for years), that systems will fail and you should design for it and have a means of making things resilient.
Finally, the state of other options job wise in the Java world is dismal. Have used Seam for years. It's great, but they decided to just support Quartz for jobs, which is useless.
Akka is built on Netty, too, which does some pretty crazy stuff in terms of concurrency and performance.
[Not a TypeSafe employee, btw…]

building a high scale java app, what stack would you use?

if you needed to build a highly scalable web application using java, what framework would you use and why?
I'm just reading thinking-in-java, head first servlets and manning's spring framework book, but really I want to focus on highly scalable architectures etc.
would you use tomcat, hibernate, ehcache?
(just assume you have to design for scale, not looking for the 'worry about it when you get traffic type responses)

The answer depends on what we mean by "scalable". A lot depends on your application, not on the framework you choose to implement it with.
No matter what framework you choose, the fact is that the hardware you deploy it on will have an upper limit on the number of simultaneous requests it'll be able to handle. If you want to handle more traffic, you'll have to throw more hardware at it and include load balancing, etc.
The part that's pertinent in that case has to do with shared state. If you have a lot of shared state, you'll have to make sure that it's thread safe, "sticky" when it needs to be, replicated throughout a cluster, etc. All that has to do with the app server you deploy it to and the way you design your app, not the framework.
Tomcat's not a "framework", it's a servlet/JSP engine. It's got clustering capabilities, but so do most other Java EE app servers. You can use Tomcat if you've already chosen Spring, because it implies that you don't have EJBs. Jetty, Resin, WebLogic, JBOSS, Glassfish - any of them will do.
Spring is a good choice if you already know it well. I think following the Spring idiom will make it more likely that your app is layered and architecturally sound, but that's not the deciding factor when it comes to scalability.
Hibernate will make your development life easier, but the scalability of your database depends a great deal on the schema, indexes, etc. Hibernate isn't a guarantee.
"Scalable" is one of those catch-all terms (like "lightweight") that is easy to toss off but encompasses many considerations. I'm not sure that a simple choice of framework will solve the issue once and for all.

I would check out Apache Mina. From the home page:
Apache MINA is a network application
framework which helps users develop
high performance and high scalability
network applications easily. It
provides an abstract · event-driven ·
asynchronous API over various
transports such as TCP/IP and UDP/IP
via Java NIO.
It has an HTTP engine AsyncWeb built on top of it.
A less radical suggestion (!) is Jetty - a servlet container geared towards performance and a small footprint.

The two keywords I would mainly focus on are Asynchronous and Stateless. Or at least "as stateless as possible: Of course you need state but maybe, instead of going for a full fledged RDBMS, have a look at document centered datastores.
Have a look at AKKA concerning async and CouchDB or MongoDB as datastores...

Frameworks are more geared towards speeding up development, not performance. There will be some overhead with any framework because of use cases it handles that you don't need. Granted, the overhead may be low, and most frameworks will point you towards patterns that have been proven to scale, but those patterns can be used without the framework as well.
So I would design your architecture assuming 'bare metal', i.e. pure servlets (yes, you could go even lower level, but I'm assuming you don't want to write your own http socket layer), straight JDBC, etc. Then go back and figure out which frameworks best fit your architecture, speed up your development, and don't add too much overhead. Tomcat versus other containers, Hibernate versus other ORMs, Struts versus other web frameworks - none of that matters if you make the wrong decisions about the key performance bottlenecks.
However, a better approach might be to choose a framework that optimizes for development time and then find the bottlenecks and address those as they occur. Otherwise, you could spin your wheels optimizing prematurely for cases that never occur. But that probably falls in the category of 'worry about it when you get traffic'.

All popular modern frameworks (and "stacks") are well-written and don't pose any threat to performance and scaling, if used correctly. So focus on what stack will be best for your requirements, rather than starting with the scalability upfront.
If you have a particular requirement, then you can ask a question about it and get recommendations about what's best for handling it.

There is no framework that is magically going to make your web service scalable.
The key to scalability is replicating the functionality that is (or would otherwise be) a bottleneck. If you are serious about making your service, you need to start with a good understanding of the characteristics of your application, and hence an idea of where the bottlenecks are likely to be:
Is it a read-only service or do user requests cause primary data to change?
Do you have / need sessions, or is the system RESTful?
Are the requests normal HTTP requests with HTML responses, or are you doing AJAX or callbacks or something.
Are user requests computation intensive, I/O intensive, rendering intensive?
How big/complicated is your backend database?
What are the availability requirements?
Then you need to decide how scalable you want it to be. Do you need to support hundreds, thousands, millions of simultaneous users? (Different degrees of scalability require different architectures, and different implementation approaches.)
Once you have figured these things out, then you decide whether there is an existing framework that can cope with the level traffic that you need to support. If not, you need to design your own system architecture to be scalable in the problem areas.

If you are able to work with a commercial system, then I'd suggest taking a look at Jazz Foundation at http://jazz.net. It's the base for IBM Rational's new products. The project is led by the guys that developed Eclipse within IBM before it was open-sourced. It has pluggable DB layer as well as supporting multiple App Servers. It's designed to handle clustering and multi-site type deployments. It has nice capabilities like OAuth support and License management.

In addition to the above:
Take a good look at JMS (Java Message Service). This is a much under rated technology. There are vendor solutions such as TibCo EMS, Oracle etc. But there are also free stacks such as Active MQ.
JMS will allow you to build synch and asynch solutions using queues. You can choose to have persistent or non-persistent queues.

As others already have replied scalability isn't about what framework you use. Sure it is nice to squeeze out as much performance as possible from each node, but what you ideally want is that by adding another node you scale your app in a linear fashion.
The application should be architected in distinct layers so it is possible to add more power to different layers of the application without a rewrite and also to add different layered caching. Caching is key to archive speed.
One example of layers for a big webapp:
Load balancers (TCP level)
Caching reverse proxies
CDN for static content
Front end webservers
Appservers (business logic of the app)
Persistent storage (RDBMS, key/value, document)

Java Process Servers Good Idea or Not?

Just want to shout out to the community to see what peoples thoughts are on Java process servers in general.
IBM in particular tend to make a lot of noise about Websphere process server. I can see the idea behind the process servers if your working in a web service world but in practice are they really effective or are they just overkill?
BPEL is another closely linked technology that tends to get a lot of hype from IBM but I am yet to see an implementation in real life.
General thoughts welcome.

Some projects/companies do have complex business processes that involve many services, applications, human interactions for which using a BPM engine, its connectors, its modeling tools can be justified. But this is clearly not for everybody.
Now, to use IBM Process Server, you'll need a license, you'll need an app server to deploy it (at random, WebSphere), some (IBM) machines, maybe some expensive connectors, some licenses for the modeling tools, etc. So I'm not surprised that IBM makes noise about it (even if don't really have the same feeling), selling such a solution must be a good deal for them (not even mentioning the consulting they will add to the bill).
And BPEL, which is a standardized language to describe flows as a sequences of services consuming or producing XML messages, i.e. a generalization of BPM through XML and Web Services, is another brick allowing to promote SOA a bit further, feeding the marketing soup. So, again, there is nothing surprising in the fact that software vendors try to promote it.
Conceptually, I don't think that BPM, BPEL, etc are bad ideas. But as I said, they are not for everybody. If they don't solve anything for you, then using them would be a bad idea. But this does not necessarily invalidate them as concepts.

IBM has multiple offerings now in this space.
The acquisition Lombardi and heritage WPS are not merged as IBM Business Process manager. There is also a FileNet BPM that is available from IBM which are targetted towards Document centric BPM solutions.
Lombardi stack effectively uses BPMN while WPS uses BPEL as the orchecstration mechanism.
The IBM/Oracle camp had chosen the BPEL path while the others like Appian, Lombardi, Pega etc had come in from using BPMN as the execution model for the business process.
Both of them are widely used and have a meaningful reason to exist.
HTH
Manglu

Should I invest in GraniteDS for Flex + Java development?

I'm new to Flex development, and RIAs in general. I've got a CRUD-style Java + Spring + Hibernate service on top of which I'm writing a Flex UI. Currently I'm using BlazeDS. This is an internal application running on a local network.
It's become apparent to me that the way RIAs work is more similar to a desktop application than a web application in that we load up the entire model and work with it directly on the client (or at least the portion that we're interested in). This doesn't really jive well with BlazeDS because really it only supports remoting and not data management, thus it can become a lot of extra work to make sure that clients are in sync and to avoid reloading the model which can be large (especially since lazy loading is not possible).
So it feels like what I'm left with is a situation where I have to treat my Flex application more like a regular old web application where I do a lot of fine grained loading of data.
LiveCycle is too expensive. The free version of WebOrb for Java really only does remoting.
Enter GraniteDS. As far as I can determine, it's the only free solution out there that has many of the data management features of LiveCycle. I've started to go through its documentation a bit and suddenly feel like it's yet another quagmire of framework that I'll have to learn just to get an application running.
So my question(s) to the StackOverflow audience is:
1) do you recommend GraniteDS,
especially if my current Java stack
is Spring + Hibernate?
2) at what point do you feel like it starts to
pay off? That is, at what level of
application complexity do you feel
that using GraniteDS really starts
to make development that much
better? In what ways?

If you're committed to Spring and don't want to introduce Seam then I don't think that Granite DS will give you much beyond Blaze DS. There is a useful utility that ensures only a single instance of any one entity exists in the client at any one time but it's actually pretty easy to do that with a few instances of Dictionary with weak references and some post-processing applied to the server calls. A lot of the other features are Seam-specific as alluded to here in the docs:
http://www.graniteds.org/confluence/display/DOC/6.+Tide+Data+Framework
Generally, the Tide approach is to minimize the amount of code needed to make things work between the client and the server. Its principles are very similar to the ones of JBoss Seam, which is the main reason why the first integration of Tide has been done with this framework. Integrations with Spring and EJB 3 are also available but are a little more limited.
I do however think that Granite's approach to data management is a big improvement over Livecycle's because they are indeed quite different. From the Granite docs:
All client/server interactions are done exclusively by method calls on services exposed by the server, and thus respect transaction boundaries and security defined by the remote services.
This is different to how Livecycle DS uses "managed collections" where you invoke fill() to grab large swathes of data and then invoke commit() methods to persist changes en-mass. This treats the backend like a raw data access API and starts to get complicated (or simply fall apart entirely) when you have fine-grained security requirements. Therefore I think Granite's approach is far more workable.

All data management features (serialization of JPA detached entities, client entity caching, data paging...) work with Spring.
GraniteDS does not mandate anything, you only need Seam if you want to use Seam on the server.

Actually, the free version of WebORB for Java does do data management. I've recently posted a comparison between WebORB for Java, LiveCycle DS, BlazeDS and GraniteDS. You can view this comparison chart here: http://bit.ly/d7RVnJ I'd be interested in your comments and feedback as we want this to be the most comprehensive feature comparison on the web.
Cheers,
Kathleen

Have you looked at the spring-blazeDS integration project?

GraniteDS with Seam Framework, Hibernate and MySql is a very nice combination. What I do is create the database, use seamgen to generate hibernate entities then work from there.

Propagation of Oracle Transactions Between C++ and Java

We have an existing C++ application that we are going to gradually replace with a new Java-based system. Until we have completely reimplemented everything in Java we expect the C++ and Java to have to communicate with each other (RMI, SOAP, messaging, etc - we haven't decided).
Now my manager thinks we'll need the Java and C++ sides to participate in the same Oracle DB transaction. This is related to, but different from the usual distrbuted transaction problem of having a single process co-ordinate 2 transactional resources, such as a DB and a message queue.
I think propagating a transaction across processes is a terrible idea from a performance and stability point-of-view, but I am still going to be asked for a solution.
I am familiar with XA transactions and I've done some work with the JBoss Transaction Manager, but my googling hasn't turned up anything good on propagating an XA transaction between 2 processes.
We are using Spring on the Java side and their documentation explicitly states they do not provide any help with transaction propagation.
We are not planning on using a traditional Java EE server (for example: IBM Websphere), which may have support for propagation (not that I can find any definitive documentation).
Any help or pointers on solutions is greatly appreciated.

There is an example on Laurent Schneider's blog of using the DBMS_XA package inside Oracle to permit multiple sessions to work in the same transaction. So it would be possible to have Java and C++ sessions participating in the same transaction without needing any sort of additional coordinator.
Alternately, you might consider using Workspace Manager. That was originally designed to support extremely long-running transactions (i.e. manipulating lots of spatial data for a proposed development). Essentially, you can create a workspace, which in your case would be roughly equivalent to a named transaction. Both the Java and C++ code could enter that workspace (from separate sessions) and both could manipulate and commit data in that workspace. When the transaction was complete, you could then merge the workspace to the LIVE workspace, which is equivalent to doing a commit in a normal transaction.
On the other hand, I would strongly agree with your initial assessment that coordinating transactions between processes is very likely to be a bad idea from a performance, stability, simplicity, and maintenance standpoint. On the other hand, it may well be a legitimate business requirement depending on how the C++ code is going to be retired (i.e. whether it is possible to replace code in such a way that transactions can be either exclusively Java or exclusively C++)

I have been using Hazlecast Messaging and Distributed memory locks to solve some of these concerns, however using such a tool would require that you redisign your software in those parts where you touch the same data. C++ client docs here Java client here
Oracle also has a similar product called Oracle Coherence that may help you, see locking in the dev guide.
Also the database contains a MQ system called Oracle Streams Advanced queueing ( transactional persistent queues) that might help you in some situations. Oracle AQ integrates well with Oracle triggers.
Additionally there is the Database Change Notification that may help you update caches or notify processes of updates, this can be used together with the Optimistic Offline Lock pattern.
See also Software transactional memory
Apache Zookeeper can also help you with distributed locking.

I believe JBoss Transaction Manager supports 2pc tx propagation across web service calls. You could, I suppose integrate your systems that way, but the performance would stink.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.