Propagation of Oracle Transactions Between C++ and Java

Propagation of Oracle Transactions Between C++ and Java - java

We have an existing C++ application that we are going to gradually replace with a new Java-based system. Until we have completely reimplemented everything in Java we expect the C++ and Java to have to communicate with each other (RMI, SOAP, messaging, etc - we haven't decided).
Now my manager thinks we'll need the Java and C++ sides to participate in the same Oracle DB transaction. This is related to, but different from the usual distrbuted transaction problem of having a single process co-ordinate 2 transactional resources, such as a DB and a message queue.
I think propagating a transaction across processes is a terrible idea from a performance and stability point-of-view, but I am still going to be asked for a solution.
I am familiar with XA transactions and I've done some work with the JBoss Transaction Manager, but my googling hasn't turned up anything good on propagating an XA transaction between 2 processes.
We are using Spring on the Java side and their documentation explicitly states they do not provide any help with transaction propagation.
We are not planning on using a traditional Java EE server (for example: IBM Websphere), which may have support for propagation (not that I can find any definitive documentation).
Any help or pointers on solutions is greatly appreciated.

There is an example on Laurent Schneider's blog of using the DBMS_XA package inside Oracle to permit multiple sessions to work in the same transaction. So it would be possible to have Java and C++ sessions participating in the same transaction without needing any sort of additional coordinator.
Alternately, you might consider using Workspace Manager. That was originally designed to support extremely long-running transactions (i.e. manipulating lots of spatial data for a proposed development). Essentially, you can create a workspace, which in your case would be roughly equivalent to a named transaction. Both the Java and C++ code could enter that workspace (from separate sessions) and both could manipulate and commit data in that workspace. When the transaction was complete, you could then merge the workspace to the LIVE workspace, which is equivalent to doing a commit in a normal transaction.
On the other hand, I would strongly agree with your initial assessment that coordinating transactions between processes is very likely to be a bad idea from a performance, stability, simplicity, and maintenance standpoint. On the other hand, it may well be a legitimate business requirement depending on how the C++ code is going to be retired (i.e. whether it is possible to replace code in such a way that transactions can be either exclusively Java or exclusively C++)

I have been using Hazlecast Messaging and Distributed memory locks to solve some of these concerns, however using such a tool would require that you redisign your software in those parts where you touch the same data. C++ client docs here Java client here
Oracle also has a similar product called Oracle Coherence that may help you, see locking in the dev guide.
Also the database contains a MQ system called Oracle Streams Advanced queueing ( transactional persistent queues) that might help you in some situations. Oracle AQ integrates well with Oracle triggers.
Additionally there is the Database Change Notification that may help you update caches or notify processes of updates, this can be used together with the Optimistic Offline Lock pattern.
See also Software transactional memory
Apache Zookeeper can also help you with distributed locking.

I believe JBoss Transaction Manager supports 2pc tx propagation across web service calls. You could, I suppose integrate your systems that way, but the performance would stink.

Related

Second level cache for java web app and its alternatives

Between the transitions of the web app I use a Session object to save my objects in.
I've heard there's a program called memcached but there's no compiled version of it on the site,
besides some people think there are real disadvantages of it.
Now I wanna ask you.
What are alternatives, pros and cons of different approaches?
Is memcached painpul for sysadmins to install? Is it difficult to embed it to the existing infrastructure from the perspective of a sysadmin?
What about using a database to hold temporary data between web app transitions?
Is it a normal practice?

What about using a database to hold
temporary data between web app
transitions? Is it a normal practice?
Database have indeed a cache already. A well design application should try to leverage it to reduce the disk IO.
The database cache works at the data level. That's why other caching mechanism can be used to address different levels. At the java level, you can use the 2nd level cache of hibernate, which can cache entities and query result. This can notably reduce the network IO between the app. server and the database.
Then you may want to address horizontal scalability, that is, to add servers to manage the load. In this case, the 2nd level cache need to be distributed across the nodes. This exists (see JBoss cache), but can get slightly complicated to manage.
Distributed cache tend to worker better if they have simpler scheme based on key/value. That's what memcached is, but there are also other similar solutions. The biggest problem with distributed caches is invalidation of outdated entries -- which can itself turn into a performance bottleneck.
Don't think that you can use a distributed cache as-is to make your performance problems vanish. Designing a scalable distributed architecture requires experience and is always a matter of trade-off between what to optimize and not.
To come back to your question: for regular application, there is IMHO no need of a distributed cache. Decent disk IO and network IO lead usually to decent performance.
EDIT
For non-persistent objects, you have several options:
The HttpSession. Objects need to implement Serializable. The exact way the session is managed depends on the container. In a cluster, the session is usually replicated twice, so that if one node crashes you still have one copy. There is then session affinity to route the request to the server that has the session in memory.
Distributed cache. A system like memcached may indeed make sense, but I don't know the details.
Database. You could of course dump any Serializable object in the database in a BLOB. Can be an option if the web servers are not as reliable as the database server.
Again, for regular application, I would try to go as far as possible with the HttpSession.

How about Ehcache? It's an easy to use pure Java solution ready to plug in to Hibernate. As far as I remember it's supported by containers.
It's quite painless in my experience.

http://docs.jboss.org/hibernate/core/3.3/reference/en/html/performance.html#performance-cache
This page should have everything that you need (hopefully !)

How to Call Java Code from MySQL?

I found an article from 2008 discussing how to call Java code from MySQL. There were a lot of caveats and disclaimers because the process involved working with an experimental branch of MySQL.
For a project I have in mind, it would be very useful to be be able to access Java libraries within MySQL, analogous to Oracle's Java Stored Procedures. Does this capability now exist as a standard feature of MySQL? If not, what open source RDBMSs support something similar to Oracle's Java Stored Procedures?

PostgreSQL supports pluggable procedure languages, and a project exists to extend PostgreSQL with PL/Java as the language.
I don't recommend putting too much code in the RDBMS. Tools to develop, test, and debug code in the application layer are better than tools for code in the RDBMS.
Also many developers don't understand that code inside the RDBMS should obey transaction isolation. They try to send emails from triggers and so forth. I think code with side effects should be in the application layer, so you don't create phantom effects (e.g. an email may notify of a database change, even though the change was rolled back).

If you can use HSQLDB then you can call java methods directly from SQL: http://hsqldb.org/doc/2.0/guide/sqlroutines-chapt.html#N1240C

I fully agree with Bill, but I can imagine business rules being stored (not processed) in the database. I'm thinking of drools here. The engine would be in the application, but the rules could be in the database with a management front-end.
Such a beast would be interesting for scenarios where not only the parameters change, but also the formulas can change.

It is difficult to give good advice based on the limited information that you have provided so far. However:
... the example involves a graph-based data type (chemical structures) that can't be matched to a query using built-in MySQL functions. The Java library would convert the query and contents of a text field into an in-memory object that can by matched. Keeping this logic in the DB layer would, for example, keep joins within the database, which seems like where they belong. That's the idea, at least.
I don't think I would use database-side Java in MySQL for this. Instead, I think I would consider the following options:
Use an object-relational mapping such as JDO or JPA (for example using Hibernate) to deal with the mapping between your graph-based data model and what the database provides. You don't necessarily have to use an RDBMS as the backend, but that is probably the best place to start ... unless you've already found that this is a performance issue.
Take another look at your data model and data access patterns. See if you can figure out some transformation that allows your application's main queries to be implemented as (efficient) table joins without resorting to server-side application logic.
If you do need to use server-side application logic (for performance reasons!) stick with the mechanisms supported by your RDBMS. For example, in Oracle you'd use PL/SQL and PostgreSQL you have a number of options. Be prepared to switch to a different RDBMS that better suits your application requirements.
I (personally) would avoid depending on an experimental branch of some database:
Consider what happens if the experimental branch is not merged back into the main branch. You would be stuck with your code base depending on a branch that is not supported, and is likely to stop being maintained and fizzle out.
Using a (currently) unsupported RDBMS branch will be an impediment to other folks who might want to use your software.
Now obviously, if the long term viability of your software is not a primary concern, you could choose to ignore this advice. But it probably matters to someone; e.g. your research supervisor.

I realise that this is quite an old article, but it bears updating. The ability to call java from a database trigger is is part of the "SQL Routines and Types for the Java Programming Language" (SQL/JRT) standard.
Read more about this on Wikipedia at https://en.wikipedia.org/wiki/SQL/JRT.
Amongst the compliant database engines are..
HyperSQL: http://hsqldb.org/
Oracle: https://www.oracle.com/database/

Should I invest in GraniteDS for Flex + Java development?

I'm new to Flex development, and RIAs in general. I've got a CRUD-style Java + Spring + Hibernate service on top of which I'm writing a Flex UI. Currently I'm using BlazeDS. This is an internal application running on a local network.
It's become apparent to me that the way RIAs work is more similar to a desktop application than a web application in that we load up the entire model and work with it directly on the client (or at least the portion that we're interested in). This doesn't really jive well with BlazeDS because really it only supports remoting and not data management, thus it can become a lot of extra work to make sure that clients are in sync and to avoid reloading the model which can be large (especially since lazy loading is not possible).
So it feels like what I'm left with is a situation where I have to treat my Flex application more like a regular old web application where I do a lot of fine grained loading of data.
LiveCycle is too expensive. The free version of WebOrb for Java really only does remoting.
Enter GraniteDS. As far as I can determine, it's the only free solution out there that has many of the data management features of LiveCycle. I've started to go through its documentation a bit and suddenly feel like it's yet another quagmire of framework that I'll have to learn just to get an application running.
So my question(s) to the StackOverflow audience is:
1) do you recommend GraniteDS,
especially if my current Java stack
is Spring + Hibernate?
2) at what point do you feel like it starts to
pay off? That is, at what level of
application complexity do you feel
that using GraniteDS really starts
to make development that much
better? In what ways?

If you're committed to Spring and don't want to introduce Seam then I don't think that Granite DS will give you much beyond Blaze DS. There is a useful utility that ensures only a single instance of any one entity exists in the client at any one time but it's actually pretty easy to do that with a few instances of Dictionary with weak references and some post-processing applied to the server calls. A lot of the other features are Seam-specific as alluded to here in the docs:
http://www.graniteds.org/confluence/display/DOC/6.+Tide+Data+Framework
Generally, the Tide approach is to minimize the amount of code needed to make things work between the client and the server. Its principles are very similar to the ones of JBoss Seam, which is the main reason why the first integration of Tide has been done with this framework. Integrations with Spring and EJB 3 are also available but are a little more limited.
I do however think that Granite's approach to data management is a big improvement over Livecycle's because they are indeed quite different. From the Granite docs:
All client/server interactions are done exclusively by method calls on services exposed by the server, and thus respect transaction boundaries and security defined by the remote services.
This is different to how Livecycle DS uses "managed collections" where you invoke fill() to grab large swathes of data and then invoke commit() methods to persist changes en-mass. This treats the backend like a raw data access API and starts to get complicated (or simply fall apart entirely) when you have fine-grained security requirements. Therefore I think Granite's approach is far more workable.

All data management features (serialization of JPA detached entities, client entity caching, data paging...) work with Spring.
GraniteDS does not mandate anything, you only need Seam if you want to use Seam on the server.

Actually, the free version of WebORB for Java does do data management. I've recently posted a comparison between WebORB for Java, LiveCycle DS, BlazeDS and GraniteDS. You can view this comparison chart here: http://bit.ly/d7RVnJ I'd be interested in your comments and feedback as we want this to be the most comprehensive feature comparison on the web.
Cheers,
Kathleen

Have you looked at the spring-blazeDS integration project?

GraniteDS with Seam Framework, Hibernate and MySql is a very nice combination. What I do is create the database, use seamgen to generate hibernate entities then work from there.

Teracotta and Hibernate Search

Does anyone have experience with using Terracotta with Hibernate Search to satisfy application Queries?
If so:
What magnitude of "object
updates" can it handle? (How's the
performance)
What kind of performance do the
Queries have?
Is it possible to use Terracotta
Hibernate Search without even having
a backing Database to satisfy all
"queries" in Memory?

I am Terracotta's CTO. I spent some time last month looking at Hibernate Search. It is not built in a way to be clustered transparently by Terracotta. Here's why in a nutshell: Hibernate has a custom-built JMS replication of Lucene indexes across JVMs.
The basic idea in Search is that talking to local disk under lucene works really well, whereas fragmenting or partitioning up Lucene indexes across the network introduces sooo much latency as to make Lucene seem bad when it is not Lucene's fault at all. To that end, HIbernate Search doesn't rely on JBossCache or any in-memory partitioning / caching schemes and instead relies on JMS and each JVM's local disk in order to provide up-to-date indexing across a cluster with simultaneous low latency. Then, the beauty of Hibernate Search is that standard Hibernate queries and more can be launch through Hibernate at these natural language indexes in each machine.
At Terracotta it turns out we had a similar idea to Emmanuel and built a SearchableMap product on top of Compass. Each machine gets its own Compass store and the store is configured to spill to disk locally. Terracotta is used to create a multi-master writing capability where any JVM can add to the index and the delta is sent through Terracotta to be replayed / reapplied locally to each disk. It works just like Hibernate Search but with DSO as the networking protocol in place of JMS and w/o the nice Hibernate interfaces but instead with Compass interfaces.
I think we will support Hibernate Search w/ help from JBoss (they would need to factor out the JMS impl as pluggable) by end of the year.
Now to your questions directly:
1.Object updates/sec in Hibernate or SearchableMap should be quite high because both are sending only deltas. In Hibernate's case it is a function of our JMS provider. In Terracotta it is scalable just by adding more Terracotta Servers to the array.
Query performance in both is very fast. Local memory performance in most cases. And if you need to page in from disk, it turns out most OSes do a good job and can respond way faster than any network-based clustering can to queries.
It will be, I think, once we get JBoss to factor out their JMS assumptions, etc.
Cheers,
--Ari

Since people on the Hibernate forums keep referring to this post I feel in need to point out that while Ari's comments where correct at the beginning of 2009, we have been developing and improving a lot.
Hibernate Search provides a set of backend channels out of the box, like the already mentioned JMS based and a more recent addition using JGroups, but we made it also pretty easy to plug in alternative implementations or override some.
In addition to using a custom backend, it's now possible since version 4 to replace the whole strategy and instead of changing the backend implementation only you can use an IndexManager which follows a different design and doesn't use a backend at all; at this time we have two IndexManagers only but we're working on more alternatives; again the idea is to provide nice implementations for the most common
It does have an Infinispan based backend for very quick distribution of the index across different nodes, and it should be straight forward to contribute one based on Terracotta or any other clustering technology. More solutions are coming.

Distributed Processing: C++ equivalent of JTA

I'm developing a mission-critical solution where data integrity is paramount and performance a close second. If data gets stuffed up, it's gonna be cata$trophic.
So, I'm looking for the C/C++ version of JTA (Java Transaction API). Does anyone know of any C or C++ libraries that supports distributed transactions? And yes, I've googled it ... unsuccessfully.
I'd hate to be told that there isn't one and I'd need to implement the protocol specified by Distributed TP: The XA Specification.
Please help!
Edit (responding to kervin): If I need to insert records across multiple database servers and I need to commit them atomically, products like Oracle will have solutions for it. If I've written my own message queue server and I want to commit messages to multiple servers atomically, I'll need something like JTA to make sure that I don't stuff up the atomicity of the transaction.

Encina, DCE-RPC, TUXEDO, possibly CORBA (though I hesitate to suggest using CORBA), MTS (again, hmm).
These are the kind of things you want for distributed transaction processing.
Encina used to have a lot of good documentation for its DCE-based system.

There are hundreds. Seriously.
As far as general areas go. Check out Service Oriented Architecture, most of the new products are coming out in that area. Eg. RogueWave HydraSCA
I would start with plain Rogue Wave Suite, then see if I needed an Enterprise Service Bus after looking at that design.
That probably depends a lot on your design requirements and budget.

Oracle Tuxedo is the 800 pound gorilla in this space and was actually the basis for much of the XA specification. It provides distributed transaction management and can handle 100's of thousands of requests/second.
For more information: http://www.oracle.com/tuxedo
Also, if you like SCA (Service Component Architecture), there is an add-on product for Tuxedo called SALT that provides an SCA container for programming in C++, Python, Ruby, and PHP.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.