I'm writing a Java SE (Note, not Java EE) application using Hibernate, and I need to provide a different connection to Hibernate for each thread of execution. These connections must be pooled, and each one has at the very least different authentication and, possibly, a different JDBC URL. The connections will be re-used (as can be inferred from the pooling requirement).
What parts of Hibernate/C3P0/et al do I have to override? Can this be accomplished with those tools, or do I need to write my own pooling data source?
I think the best course of action would be creating a SessionFactory for each data source, with possibly pooled connections - that's what's eqbridges suggested in his answer.
Now, Hibernate does have a ConnectionProvider hook, so I suppose you could write an implementation that would return Connections to different data sources, depending on current thread of execution and some additional parameters. Theoretically, you can then have one SessionFactory instance, which will be using different connections to different databases, supplied by your custom ConnectionProvider implementation. But, one SessionFactory holds quite a bit of data, and that data is then used by Hibernate internally, when opening a Session for a unit of work. Plus, there's a second-level cache associated with it as well.
Unfortunately, how will the factory and Sessions you open from it behave in the face of such a provider is anybody's guess. It feels like a hack to me, and I doubt it was ever considered a viable use-case for a SessionFactory. It can possibly lead to all kinds of, possibly very subtle, bugs or data corruption.
On another note, be sure to exactly measure the cost of creating multiple SessionFactories - it may not be as high as you think. Be sure to compare that with the cost of simply opening the needed JDBC connections. I don't know what kind of results you might get, but I think you should be sure about performance before you resort to more hackish solutions.
You have two questions here:
Connections are not thread safe, so each thread must have its own connection. Since you're working with Hibernate, what your application sees is actually a Session obtained from a SessionFactory. To utilize this, you call the SessionFactory#getCurrentSession() method, and configure the current session context in hibernate.cfg.xml:
<property name="current\_session\_context\_class">thread</property>
If you've properly configured thread pooling (using c3po or whatever pooling mechanism you favor) in hibernate.cfg.xml, then each thread will get a connection from that pool.
To maintain multiple data sources that the application may need to work with, then you need to configure a separate SessionFactory for each JDBC url you'd like to access. In your application you'll need to have some means of selecting with SessionFactory you'll need to choose (e.g. "client ID"), using this you can manage each of the SessionFactory instances in a Map or somesuch data structure (in a Java EE app you'd get a reference from JNDI).
To summarize (and generalize), basically a SessionFactory is essentially huge wrapper around a DataSource (and attendant connection pool). It is read-only (and hence thread safe), heavyweight and static, constructed once, and knows everything it needs to about a given DataSource.
A Session, on the other hand is essentially a lightweight wrapper around a Connection. It is not thread safe, often short-lived, and intended to be used and then thrown away.
Hope this helps!
Related
Suppose that we have a businessLogic() method that does 2 things: write some information in a local cache and save the same information in the DB using JDBC so that the contents of the cache and the DB are always the same.
I know we can use Spring's JDBC Datasource Transaction Manager to automatically rollback the DB in case of exception. However, how can we define a custom transaction manager that also rollbacks the content of the cache in this case, so that the contents of the cache and the DB are always in sync?
Thanks all.
Gab's answer is right, except for the parts that aren't.
XA is indeed the standard way to coordinate update of multiple resources... except that where the cache is local i.e. in-process, it's not necessarily a resource.
A cache doesn't exactly 'implement JTA', it acts in one of two roles in the XA protocol, according to how it's deployed. It can be an XAResource, but that's usually only required where its lifecycle is distinct from that of the client process. For in-process use, it's more likely to be a Synchronization.
The key difference between these roles is: XAResource is fault-tolerant, but Synchronization is not. For a volatile cache that's in-memory with the client process, it's sufficient to rebuild the cache after a crash by querying the db. For a cache that's out of process, a client crash after the db tx commit but before the cache update would leave the cache out of sync, at least until it expired or was manually refreshed.
Depending on the cache implementation, there is no guarantee it will pick the right mode automatically. See the configuration reference for your chosen implementation e.g. https://infinispan.org/docs/stable/user_guide/user_guide.html#tx_sync_enlist
Spring isn't actually a JTA XA transaction manager either, though it does provide an abstraction layer over them. It's possible to use Spring to drive a database in non-XA mode, but then you have no standard hook for the cache Synchronizations and you need a proprietary interface instead. Or you can have the database do pseudo-XA via a one-phase resource adapter. Full-on 2PC is probably overkill for your use case.
First of all I believe that the task of transaction management for cache is redundant. I advice you to only update the cache if database level transaction is successfully committed.
Most scenarios with cache using are completely acceptable if you have small window between updates of entity in database and its cached state.
If your case rejects any possibility of outdated cache then you probably have to avoid using cache or use something special for caching, probably the same database as your original data supporting transactions. Otherwise you will have problems trying to maintain consistency between two different systems: db level and cache level. Most of the time the best you can achieve is eventual consistency - it means that anyway you will have windows of inconsistent state and only then (eventually) the data will become consistent.
Standard way to deal with transaction distributed among multiple resources is to use XA
You must then access your database using an xa-datasource and use a cache implementation implementing JTA, eg. ehcache.
I'am not very familiar with spring boot, but the transaction manager should manage the transaction synchronization across both resources out of the box with the appropriate configuration (no need to override anything)
I am fairly new to using Spring JDBC and I am going to retrieve objects from the database now which have associations to other objects (one-to-many, one-to-one...). I wonder what is the proper way of doing it? I have read this answer Spring Framework JDBC DAO with agrgegation/composition which basically recommends using a ORM framework which I won't cause of performance and I find Spring JDBC quite pleasant to work with.
The original poster of the question showed an example of using one repository/dao method inside another dao/repository class. That would have been my guess of doing it too, but from what I understand you then use two different connections, and it could increase if you have other repositories as well. Is this bad even though using connection pooling provided by Glassfish?
I am not sure if I understand the answer given to the question either, nor if this is the proper way of doing it?
Spring JDBC always used the same connection in the scope of a transaction, so you should not worry about the number of connections, you only need to ensure that the load of the object occurs within a single transaction.
see DataSourceUtils.doGetConnection() if you are interested on how connections are retrieved from data source.
I'm using plain-JDBC-Database-Access in a multithreaded environment.
An exception I recently got when working with PreparedStatements (the Oracle flavour) made me aware of the fact, that they are not threadsafe.
There is, of course, always the possibility to use ThreadLocal-Variables (or synchronize access to the statement), but is there a more clever way to access a database in a multithreaded way?
Edit: To simplify the problem, I'm accessing the database read-only so parallel transactions are no concern to me.
Putting the PreparedStatement into a ThreadLocal into will not solve the problem - even the Connection must be put into ThreadLocal. But then you must make sure, that the connection is also released properly even when exceptions are thrown.
And what about transactions? How do you make sure, that one transaction does not contain stuff from independent threads?
The best way would be to adopt some patterns of EJB containers - here the infrastructure takes care of the resource and transaction management and connection pooling. But retrofitting existing code into EJB or even Spring correctly is not an easy task.
just use the ThreadLocal it's the simplest way
you could also delegate all database accesses to a single thread (through a blocking queue) which will also eliminate races. this also allows for easier batching of many statements (though only works for updates, a query requires the pending updates to be flushed first)
If we have a web application which has
heavy UI (Spring MVC + JQuery with JSON)
Hibernate with JPA annotations being the domain model
extend Spring-provided DAO to code DAO layer
JBOSS being the app server with Oracle as backend
Datasource (JNDI) based connection pooling (Not an XA rather Local data source)
also has access to multiple data sources (dealing with multiple DB)
Behaviorally, lot of Data retrieval (70%) and update of data being 30%
What would be the best practices for the following to effectively consume DB connections and also see to that there is no much leakage at connection usage?
would it be better to opt for Hibernate template based DAOs?
What kind of transaction manager would be suggest-able and should we go for AOP-based transaction managementWhere
where to instantiate session and and where to close the sessions to effectively consume connections from connection pooling.
It is true that we need to handle transactions from Service layer but what happens to sessions would they be waiting for longer time (we are not using any opensessioninviewFilter)
which layer is better to handle the checked exceptions (business exceptions) and runtime exceptions.
Sorry for this being bit lengthier question, however I see that this is being a common query and I tried consolidating it. Appreciate your patience and guidance. Thanks for your help.
This sounds like a pretty typical Spring/Hibernate application, so I would recommend following current best practices, which I recently outlined in another answer. Specifically:
Do not extend Spring DAO support classes or use HibernateTemplate. Use the #Repository annotation combined with component scanning, and directly inject the SessionFactory into your DAO.
Use Spring's HibernateTransactionManager, and definitely use declarative transaction management via #Transactional as your default approach.
Let Spring manage that. It will open sessions just in time for transactions by default, but prefer the open session in view pattern enabled by Spring's OpenSessionInViewFilter.
See #3.
Always handle exceptions where they should be handled--in other words, this is a design decision. Note, however, that the Spring transaction framework by default rolls back on unchecked exceptions, but not checked, to match the behavior of the EJB spec. Make sure to set the proper rollback rules (see previous link) anywhere you use checked exceptions.
Additionally, use a connection pool, obviously. Apache Commons DBCP is a great choice. "Not much leakage in connection usage" isn't enough. You have to have zero connection leakage. Depending on Spring to manage your resources will help ensure this. As for any other performance issues, don't try to optimize prematurely. Wait until you see where your problem areas are, and then figure out the best way to solve each one individually. Since your bottlenecks will most likely be database-related, check out the performance chapter of the Hibernate reference to get an idea what you're up against. It covers the important concepts of caching and fetching strategies.
Use JPA EntityManager directly in your DAOs. Be sure not to mark it as Extended
Prefer <tx:annotation-driven /> and #Transactional - only on the service layer
The transaction manager also opens and closes sessions (if one doesn't exist already in the thread). Here it is good to know that sessions are session-per-request. Each request(=thread) has a separate session instance. But a database connection is created only if one is needed, so even if there is a transaction manager around all methods, needless connections won't be opened.
read-only transactions - use #Transactional(readOnly=true) in cases when there is only data retrieval
caching - utilize hibernate 2nd level cache to put entities in memory (instead of fetching them from the database each time)
avoid OpenSessionInView and lazy collections. This is subjective, but in my opinion all objects that leave the service layer must be initialized. For small collections (for ex. list of roles) you can have eager collections. For bigger collections use HQL queries.
Our design has one jvm that is a jboss/webapp (read/write) that is used to maintain the data via hibernate (using jpa) to the db. The model has 10-15 persistent classes with 3-5 levels of depth in the relationships.
We then have a separate jvm that is the server using this data. As it is running continuously we just have one long db session (read only).
There is currently no intra-jvm cache involved - so we manually signal one jvm from the other.
Now when the webapp changes some data, it signals the server to reload the changed data. What we have found is that we need to tell hibernate to purge the data and then reload it. Just doing a fetch/merge with the db does not do the job - mainly in respect of the objects several layers down the hierarchy.
Any thoughts on whether there is anything fundamentally wrong with this design or if anyone is doing this and has had better luck with working with hibernate on the reloads.
Thanks,
Chris
A Hibernate session loads all data it reads from the DB into what they call the first-level cache. Once a row is loaded from the DB, any subsequent fetches for a row with the same PK will return the data from this cache. Furthermore, Hibernate gaurentees reference equality for objects with the same PK in a single Session.
From what I understand, your read-only server application never closes its Hibernate session. So when the DB gets updated by the read-write application, the Session on read-only server is unaware of the change. Effectively, your read-only application is loading an in-memory copy of the database and using that copy, which gets stale in due course.
The simplest and best course of action I can suggest is to close and open Sessions as needed. This sidesteps the whole problem. Hibernate Sessions are intended to be a window for a short-lived interaction with the DB. I agree that there is a performance gain by not reloading the object-graph again and again; but you need to measure it and convince yourself that it is worth the pains.
Another option is to close and reopen the Session periodically. This ensures that the read-only application works with data not older than a given time interval. But there definitely is a window where the read-only application works with stale data (although the design guarantees that it gets the up-to-date data eventually). This might be permissible in many applications - you need to evaluate your situation.
The third option is to use a second level cache implementation, and use short-lived Sessions. There are various caching packages that work with Hibernate with relative merits and demerits.
Chris, I'm a little confused about your circumstances. If I understand correctly, you have a both a web app (read/write) a standalone application (read-only?) using Hibernate to access a shared database. The changes you make with the web app aren't visible to the standalone app. Is that right?
If so, have you considered using a different second-level cache implementation? I'm wondering if you might be able to use a clustered cache that is shared by both the web application and the standalone application. I believe that SwarmCache, which is integrated with Hibernate, will allow this, but I haven't tried it myself.
In general, though, you should know that the contents of a given cache will never be aware of activity by another application (that's why I suggest having both apps share a cache). Good luck!
From my point of view, you should change your underline Hibernate cache to that one, which supports clustered mode. It could be a JBoss Cache or a Swarm Cache. The first one has a better support of data synchronization (replication and invalidation) and also supports JTA.
Then you will able to configure cache synchronization between webapp and server. Also look at isolation level if you will use JBoss Cache. I believe you should use READ_COMMITTED mode if you want to get new data on a server from the same session.
The most used practice is to have a Container-Managed Entity Manager so that two or more applications in the same container (ie Glassfish, Tomcat, Websphere) can share the same caches.
But if you don't use an Application container, because you use Play! for instance, then I would build some webservices in the primary Application to read/write consistently in the cache.
I think using stale data is an open door for disaster. Just like Singletons become Multitons, read-only applications are often a write sometimes.
Belt and braces :)