How to access a PreparedStatement in a multi-thread-environment? - java

I'm using plain-JDBC-Database-Access in a multithreaded environment.
An exception I recently got when working with PreparedStatements (the Oracle flavour) made me aware of the fact, that they are not threadsafe.
There is, of course, always the possibility to use ThreadLocal-Variables (or synchronize access to the statement), but is there a more clever way to access a database in a multithreaded way?
Edit: To simplify the problem, I'm accessing the database read-only so parallel transactions are no concern to me.

Putting the PreparedStatement into a ThreadLocal into will not solve the problem - even the Connection must be put into ThreadLocal. But then you must make sure, that the connection is also released properly even when exceptions are thrown.
And what about transactions? How do you make sure, that one transaction does not contain stuff from independent threads?
The best way would be to adopt some patterns of EJB containers - here the infrastructure takes care of the resource and transaction management and connection pooling. But retrofitting existing code into EJB or even Spring correctly is not an easy task.

just use the ThreadLocal it's the simplest way
you could also delegate all database accesses to a single thread (through a blocking queue) which will also eliminate races. this also allows for easier batching of many statements (though only works for updates, a query requires the pending updates to be flushed first)

Related

Manually locking of DB necessary even if I use Hibernate/Postgresql/JDBC?

Sorry to ask this in case it has been answered before, but I heard (from a potential other noob) that Hibernate has/had some kind of connection pool manager that also handles locking of the database. Now I read this was abolished in Hibernate 3 so I, as a noob, am very confused what to use.
I have a Postgresql db with multiple clients that each use max. one db connection at any given time. I use JDBC but want to move to Hibernate.
So in case two concurrent update operations occur, I don't know if this is handled by the DBMS correctly. I thought about locking a db table manually in case someone operates on it, but there must be a better way.
I only operate with simple, single sql-statements, sometimes prepared statements. No big updates, just single line updates.
Do you have any idea how this, generally, is to be solved? Is this even a problem?
This is too general for a truly useful answer, and I should really just close-vote it. But I'll try to help.
The connection pool has nothing to do with locking. The two are unrelated topics.
I think you're vaguely trying to refer to the optimistic concurrency control in Hibernate. This is an alternative strategy to normal row locking, with a different set of advantages and disadvantages.
See the Hibernate documentation for more information, and the wikipedia article on optimistic concurrency control.
I also wrote a recent blog entry on this topic that may be useful.
Above all else, though, there's no substitute for actually understanding concurrency in the application and database. I very strongly recommend reading the PostgreSQL documentation chapter on concurrency control in detail.

converting sessions in hibernate to plein JDBC connections

I am shifting back from hibernate to plain JDBC in order to overcome the overheads incurred in using hibernate.I wanted to know how to deal with the sessions associated with hibernate.How should i convert back to Plain JDBC so that all my sessions are replaced with the JDBC connections.And please let me know if I am wrong in my thoughts that replacing a session with a connection converts back to plain JDBC as I am not well versed in these concepts and dont know if i am going in the right way.
I have used Hibernate extensively in high-performance tasks, including batch insertion of millions of records. Your problem is not with Hibernate, but with the way you are using it.
Above all, do not use Hibernate as a persistent state manager; use it as a thin layer above the raw SQL and you won't complain about performance.
Always prefer StatelessSession (it works for everything you need except save operations)`;
never use lazy fetching, use explicit joins for everythng;
never fetch whole objects, use SELECT to fetch exactly what you need;
fetch as much as possible in a single statement, avoid n+1 selects at all costs;
for large result sets, never use list, use iterate or scroll.
The list goes on, but this is what I have come up with at this moment.
As far as your direct question, it depends on the application. If it is a Spring application, then you will certainly want to use its declarative transaction management. Basically, you just put a few lines of XML config and you'll have an open DataSource in your DAO code ready to be used, with no management on your part.
If you are doing something more raw, then by all means use a connection pool library, such as the great BoneCP. You acquire connections from it and later return them to it, again with no explicit management.
Lastly, if you really want a bare-bones, unsafe and non-scalable approach, then you can create connections directly from the JDBC driver. This approach is really only for schoolwork and it is not recommended even in the smallest of production-worthy projects.
A Hibernate session is much more than a JDBC connection. It contains multiple such connections (usually managed via a JDBC Connection Pool which recycles JDBC Connection instances), a bunch of entities which are attached to, and managed by said session and other things as well (caching, etc).
Removing Hibernate and doing everything with the JDBC API-only will imply more than just replacing Hibernate Session instances with one or more JDBC connections followed by a duplication of the Hibernate code into analogous JDBC API calls. If you'd only do that, you'd simply do a lot of work for nothing, as you'd lose all of Hibernate's advantages (less verbose code, a higher level of abstraction, etc) and gain nothing of JDBC's advantages (less heap memory used, fewer method calls (yes, even with Hibernate's Javassist magic, this still counts towards performance in some cases), finer grained control of the database interactions, etc).
My advice is to first really look into the problems your app has (apparently due to Hibernate) and at least for the major ones, try to first see if you can't do something to optimize it without getting rid of Hibernate. Yes, Hibernate can become heavy and memory hungry, but more often than not, the issue with performance comes from improper use of the framework (are you sure you're fetching all the necessary associated entities in one query, or do you make Hibernate make hidden joins or pseudo joins in the background? Are you doing or you data operations on the database side, or is some of that done in Java code after a more-than-necessarily-generic Hibernate query is executed to fetch the data? etc.)
If you really need to get rid of Hibernate (maybe you need to use some very specific features of your database which are not standard SQL and which Hibernate doesn't let you access, like MySQL's ability to import big amounts of data via a custom flat-file format) then make sure that what ever it is you're replacing it with (plain JDBC, or maybe some other ORM like EclipseLink) can tackle the issue and solve it in a more performant way. Doing a small POC to test these before you start re hauling your code can save you a ton of time.
While I strongly urge you to heed the advice of Marko and Shivan, you could use hibernate to manage your connections/sessions/transactions and to execute your SQL queries without much overhead being generated.
a quick google search yielded this on executing SQL from a hibernate session.
http://www.informit.com/guides/content.aspx?g=java&seqNum=575
While I agree with both of the earlier answers, if you truly want to go down the road of executing straight SQL, I would look into this option for two reasons.
1) your sessions are already in place. If you don't have hibernate load up all of your entities I don't see how hibernate would generate that much overhead.
2)If the problem is speed, and not overhead which I have run into before, you can implement this to quickly execute native SQL in your problem areas and keep all of hibernates ORM goodies in place.
All of that being said, I would also urge you to dig into the documentation for hibernate. I have used hibernate for several high performance solutions with great success. While the nuances can be hard to grapple with in the beginning, the benefits of using hibernate (or at least something that adheres to JPA standard) far outweigh the cost of not doing so down the road scalability wise.

how to keep object instances synchronized

I am working with an object that serves as a database in my application. However, I need to have redundant copies of this database. So, on init, I create multiple instances (say 5) copies of the same object. (I am using JAVA for this, so any hint of pre-existing libraries could be helpful as well.)
The object is a server that listens on a port for request for the information it is holding. This information may be updated by other entities via the same or a different port at any time.
My question is as follows:
Would a lock strategy
work in this case? That is, every time an update is made in
any instance, that instance contacts
all other instances and passes the
update.
During this time, all the requests
(read or update) from other entities
are queued.
Would this approach work? I have my doubts because, even if this works, I think the system is creating its own bottleneck. What do you guys say? Is there a better way of doing this distributed synchronization?
What you're describing is a distributed cache. The big player in that space is currently Coherence though I believe JBoss Cache is catching up.
As for rolling your own, having seen the complexity in what superficially sounds quite a simple problem, I wouldn't recommend it in a comercial setting, though it'd be a fun home project.
Are you talking about a distributed cache? Have you looked at ehcache?
Would this approach work? I have my
doubts because, even if this works, I
think the system is creating its own
bottleneck.
It would be creating its own bottleneck. You'd be better off using an in-memory database like HSQLDB or an embedded database like SQLite.
There is lot more to distributed syntonization than it's possible to mention in a single answer. You have to worry about two-phase commits, network partitions, etc. etc. I would advise you to look into an existing distributed DB solution combined with an n-tier Java EE architecture that includes load-balancing.

How can I configure hibernate to use context-specific connection information?

I'm writing a Java SE (Note, not Java EE) application using Hibernate, and I need to provide a different connection to Hibernate for each thread of execution. These connections must be pooled, and each one has at the very least different authentication and, possibly, a different JDBC URL. The connections will be re-used (as can be inferred from the pooling requirement).
What parts of Hibernate/C3P0/et al do I have to override? Can this be accomplished with those tools, or do I need to write my own pooling data source?
I think the best course of action would be creating a SessionFactory for each data source, with possibly pooled connections - that's what's eqbridges suggested in his answer.
Now, Hibernate does have a ConnectionProvider hook, so I suppose you could write an implementation that would return Connections to different data sources, depending on current thread of execution and some additional parameters. Theoretically, you can then have one SessionFactory instance, which will be using different connections to different databases, supplied by your custom ConnectionProvider implementation. But, one SessionFactory holds quite a bit of data, and that data is then used by Hibernate internally, when opening a Session for a unit of work. Plus, there's a second-level cache associated with it as well.
Unfortunately, how will the factory and Sessions you open from it behave in the face of such a provider is anybody's guess. It feels like a hack to me, and I doubt it was ever considered a viable use-case for a SessionFactory. It can possibly lead to all kinds of, possibly very subtle, bugs or data corruption.
On another note, be sure to exactly measure the cost of creating multiple SessionFactories - it may not be as high as you think. Be sure to compare that with the cost of simply opening the needed JDBC connections. I don't know what kind of results you might get, but I think you should be sure about performance before you resort to more hackish solutions.
You have two questions here:
Connections are not thread safe, so each thread must have its own connection. Since you're working with Hibernate, what your application sees is actually a Session obtained from a SessionFactory. To utilize this, you call the SessionFactory#getCurrentSession() method, and configure the current session context in hibernate.cfg.xml:
<property name="current\_session\_context\_class">thread</property>
If you've properly configured thread pooling (using c3po or whatever pooling mechanism you favor) in hibernate.cfg.xml, then each thread will get a connection from that pool.
To maintain multiple data sources that the application may need to work with, then you need to configure a separate SessionFactory for each JDBC url you'd like to access. In your application you'll need to have some means of selecting with SessionFactory you'll need to choose (e.g. "client ID"), using this you can manage each of the SessionFactory instances in a Map or somesuch data structure (in a Java EE app you'd get a reference from JNDI).
To summarize (and generalize), basically a SessionFactory is essentially huge wrapper around a DataSource (and attendant connection pool). It is read-only (and hence thread safe), heavyweight and static, constructed once, and knows everything it needs to about a given DataSource.
A Session, on the other hand is essentially a lightweight wrapper around a Connection. It is not thread safe, often short-lived, and intended to be used and then thrown away.
Hope this helps!

Sharing a Java synchronized block across a cluster, or using a global lock?

I have some code that I want to only allow access to by one thread. I know how to accomplish this using either synchronized blocks or methods, but will this work in a clustered environment?
The target environment is WebSphere 6.0, with 2 nodes in the cluster.
I have a feeling that synchronized won't work, since each instance of the application on each node will have its own JVM, right?
What I am trying to do here is perform some updates to database records when the system is booted. It will look for any database records that are older that the version of the code, and perform specific tasks to update them. I only want one node to perform these upgrades, since I want to be sure that each work item is only upgraded once, and performance of these upgrades is not a big concern, since it only happens at application startup, and it only really does anything when the code has been changed since the last time it started up.
The database is DB2v9, and I am accessing it directly via JNDI (no ORM layer).
It has been suggested that a global lock might be the way to go here, but I'm not sure how to do that.
Does anyone have any pointers in this arena?
Thanks!
Yes, you are correct in that synchronized blocks won't work across a cluster. The reason is, as you stated, that each node has its own JVM.
There are ways, however, to get synchronized blocks to work in a cluster as they would work in a single-node environment. The easiest way is to use a product like Terracotta, which will handle the coordination of threads between different JVMs so that normal concurrency controls can be used across the cluster. There are many articles explaining how this works, like Introduction to OpenTerracotta.
There are other solutions, of course. It mostly depends on what you really want to achieve here. I wouldn't use database locks for synchronizing if you need to scale, as DB doesn't. But I really urge you to find a ready-made solution, because messing around with cluster synchronization is messy business :)
You are correct that synchronization across processes will not work using the Java synchronization constructs. Fortunately, your problem really isn't one of code synchronization, but rather of synchronizing interactions with the database.
The right way to deal with this problem is with database level locks. Presumably you have some table that contains a db schema version, so you should make sure to lock that table for the duration of the startup/upgrade process.
The precise sql/db calls involved would probably be more clear if you specified your database type (DB2?) and access method (raw sql, jpa, etc).
Update (8/4/2009 2:39PM): I suggest the LOCK TABLE statement on some table holding the version # of the schema. This will serialize access to that table preventing two instances from running through the upgrade code at once.
You can use a in-memory-data-grid like http://www.hazelcast.com/ for this too. This is a distributed data structure that supports locking.
Since you are talking about 2 machines, you don't even have shared memory so there is nothing to synchronize.
We do something similar with our database. This is achieved by adding record versioning in the table. This is what you should do,
Add a column for record/row version.
Go through the logic to check if record needs to be updated.
When you update record, make sure the record version in DB is the same as what you have.
Bump up version every time you write to the database.
You should only have one server updating the database if you follow these rules.
Couldn't you simply lock the table (or entire db) for updates, so when the first node in obtained the lock all other nodes would not be able to write. Subsequent nodes would wait, and when the lock is released the code would be updated so no record update would be required.

Categories

Resources