Hibernate performance [closed] - java

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have question, maybe from someone it see as stupid. Is Hibernate fast? I use it in system which would have really large count of queries to database. And performance become alerted.
And other question off the point of current context. What would better - many simple query (with single table) or a bit less query with several JOIN?
Tanks for advance
Artem

From our experience Hibernate can be very fast. However, there can be many pitfalls that are specific to ORM tools including:
Incorrect lazy loading, where too much or too little data is selected
N+1 select problems, where iterating over collections is slow
Collections of List should be avoided and prefer Set so ordering information does not need to be included in the table
Batch actions where it's best to fall back to direct SQL
The Improving Performance page in the Hibernate docs is a good place to start to learn about these issues and other methods to improve performance.

First of all, there are many things you can do to speed up Hibernate. Check out these High-Performance Hibernate Tips article for a comprehensive list of things you can do to speed up your data access layer.
With "many" queries, you are meeting with the typical N+1 query problem. You load an Entity with Hibernate, which has related objects. With LAZY joins, you'll get a single query for every record. Each query goes through the network to your database server, and it returns the data. This all takes time (opening connection, network latency, lookup, data throughput, etc.).
For a single query (with joins) the lookup and data throughput are larger than with multiple queries. But you'll only have the opening of the connection and network latency just once. So with 100 or more queries you have a small lookup and data throughput, but you will have it 100 times (including opening the connection and network latency).
A single query that takes 20ms. vs 100 queries that take 1ms.? You do the math ;)
And if it can grow to be 1000's of records. The single query will have a small performance impact, but 1000's of queries vs 100's are 10 times more. So with multiple queries, you'll have reduced the performance greatly.
When using HQL queries to retrieve the data, you can add FETCH to a JOIN in order to load the data with the same query (using JOIN's).
For more info related to this topic, check out this Hibernate Performance Tuning Tutorial.

Hibernate can be fast. Designing a good schema, tuning your queries, and getting good performance is kind of an artform. Remember, under the covers its all sql anyway, so anything you can do with sql you can do with hibernate.
Typically on advanced application the hibernate mapping/schema is only the initial step of writing your persistence layer. That step gives you a lot of functionality. But the next step is to write custom queries using hql that allow you to fetch only the data you need.

Yes, it can be fast.
In past i got several cases when people think "aaaa it's this stupid ORM kills all performance of our nice application"
in ALL cases after profiling we found out other reasons for problem. (bad hash code implementation for collections, regExps from hell, db design made by mad hatter & etc.)
Actually it can do the job in most of the common cases. If you migrate huge and complex data - it will be poor competitor to plain well optimized SQL (but i hope it's not you case- i personally hate data migration with passion :)

This is not the first question about it, but I couldn't find an appropriate answer in my previous ones (perhaps it was for another "forum"). So, I'll answer once again :-)
I like to answer this in a somewhat provocative way (nothing personal!): do you think you'll be able to come with a solution which is better than Hibernate? That involves not only the basic problems, like mapping database columns to Java properties and "eager or lazy loading" (which is an actual concern from your question), but also caching (1L and 2L), connection pooling, bytecode enhancing, batching, ...
That said, Hibernate is a complex piece of software, which requires some learning to properly use and fine tune it. So, I'd say that it's better to spend your time in learning Hibernate than writing your own ORM.

Hibernate could be reasonable fast, if you know how to use it that way. However, polepos.org performance tests shows that for Hibernate could slow down applications by orders of magnitude.
If you want ORM which is light and faster, I can recommend fjorm

... which would have really large count of
queries to database ...
If you still in design/development phase do not optimize preventive.
Hibernate is a very well made piece of software and beware of performance issues. I would tell you when you project is more mature to go into performance issues and analyse for direct JDBC usage where necessary.

It's usually fast enough, and can be faster than a custom JDBC-based solution. But as all tools, you have to use it correctly.
Fast doesn't mean anything. You have to define maximum response time, minimum throughput, etc., then measure if the solution meets the requirements, and tune the app to meet them if it doesn't.
Regarding joins vs. multiple queries, it all depends. Usually, joins are obviously faster, since they require only one inter-process/network roundtrip.

Related

How to work with large database tables in java without suffering performance problems [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
We have a table of vocabulary items that we use to search text documents. The java program that uses this table currently reads it from a database, stores it in memory and then searches documents for individual items in the table. The table is brought into memory for performance reasons. This has worked for many years but the table has grown quite large over time and now we are starting to see Java Heap Space errors.
There is a brute force approach to solving this problem which is to upgrade to a larger server, install more memory, and then allocate more memory to the Java heap. But I'm wondering if there are better solutions. I don't think an embedded database will work for our purposes because the tables are constantly being updated and the application is hosted on multiple sites suggesting a maintenance nightmare. But, I'm uncertain about what other techniques are out there that might help in this situation.
Some more details, there are currently over a million vocabulary items (think of these items as short text strings, not individual words). The documents are read from a directory by our application, and then each document is analyzed to determine if any of the vocabulary is present in the document. If it is, we note which items are present and store them in a database. The vocabulary itself is stored and maintained in a MS SQL relational database that we have been growing for years. Since, all vocabulary items must be analyzed for each document, repeatedly reading from the database is inefficient. And the number of documents that need to be analyzed each day can at some of our installations be quite large (on the order of 100K documents a day). The documents are typically 2 to 3 pages long although we occasionally see documents as large a 100 pages.
In the hopes of making your application more performant, you're taking all the data out of a database that is designed with efficient data operations in mind and putting it into your application's memory. This works fine for small data sets, but as those data sets grow, you are eventually going to run out of resources in the application to handle the entire dataset.
The solution is to use a database, or at least a data tier, that's appropriate for your use case. Let your data tier do the heavy lifting instead of replicating the data set into your application. Databases are incredible, and their ability to crunch through huge amounts of data is often underrated. You don't always get blazing fast performance for free (you might have to think hard about indexes and models), but few are the use cases where java code is going to be able to pull an entire data set down and process it more efficiently than a database can.
You don't say much about which database technologies you're using, but most relational databases are going to offer a lot of useful tools for full text searching . I've seen well designed relational databases perform text searches very effectively. But if you're constrained by your database technology or your table really is so big that a relational database text search isn't feasible, you should put your data into a searchable cache such as elastic search. If you model and index your data effectively, you can build a very performant text search platform that will scale reliably. Tom's suggestion of lucene is another good one. There's a lot of cloud technologies that can help with this kind of thing too: S3 + Athena comes to mind, if you're into AWS.
I'd look at http://lucene.apache.org it should be a good fit for what you've described.
I was having the same issue with a Table with one more than millon of Data and there was a Client that want export all that data. My solution was very simple I followed this Question. But there was a little Issue having more than 100k records go to Heap Space. So I just use Chunks with my queries WITH NO LOCK ( I know this can have some inconsistent data, but I needed to do that because it was Blocking the DB Without this Statement). I hope this approach help you.
When you had a small table, you probably implemented an approach of looping over the words in the table and for each one looking it up in the document to be processed.
Now the table has grown to the point where you have trouble loading it all in memory. I expect that the processing of each document has also slowed down due to having more words to look up in each document.
If you flip the processing around, you have more opportunities to optimize this process. In particular, to process a document you first identify the set of words in the document (e.g., by adding each word to a Set). Then you loop over each document word and look it up in the table. The simplest implementation simply does a database query for each word.
To optimize this, without loading the whole table in memory, you will want to implement an in-memory cache of some kind. Your database server will actually automatically implement this for you when you query the database for each word; the efficacy of this approach will depend on the hardware and configuration of your database server as well as the other queries that are competing with your word look-ups.
You can also implement an in-memory cache of the most-used portion of the table. You will limit the size of the cache based on how much memory you can afford to give it. All words that you look up that are not in the cache need to be checked by querying the database. Your cache might use a least-recently-used eviction strategy so that you keep the most common words in the cache.
While you can only store words that exist in the table in your cache, you might achieve better performance if you cache the result of the lookup. This will result in your cache having the most common words that show up in the documents being in the cache (and each one with a boolean value that indicates if the word is or is not in the table).
There are several really good open source in-memory caching implementations available in Java, which will minimize the amount of code you need to write to implement a caching solution.

Calculation on query vs programmatically

i m working on Java EE projects using Hibernate as ORM , I have come to a phase where i have to perform some mathematical calculation on my Classes , like SUM , COUNT , addition and division .
i have 2 solutions :
To select my classes and apply those operation programmatically in my code
To do calculations on my named queries
i want to please in terms of performance and speed , which one is better ?
And thank you
If you are going to load the same entities that you want to do the aggregation on from the database in the same transaction, then the performance will be better if you do the calculation in Java.
It saves you one round-trip to the database, because in that case you already have the entities in memory.
Other benefits are:
Easier to unit-test the calculation because you can stick to a Java-based unit testing framework
Keeps the logic in one language
Will also work for collections of entities that haven't been persisted yet
But if you're not going to load the same set of entities that you want to do the calculation on, then you will get a performance improvement in almost any situation if you let the database do the calculation. The more entities are involved, the bigger the performance benefit.
Imagine doing a summation over all line items in this year's orders, perhaps several million of them.
It should be clear that having to load all these entities into the memory of the Java process across a TCP connection (even if it is within the same machine) first will take more time, and more memory, than letting the database perform the calculation.
And if your mapping requires additional queries per entity, then Hibernate would have at least one extra round-trip to the database for every entity, in which case the performance benefits of calculating things in SQL on the database would be even bigger.
Are these calculation on the entities (or data)? if yes, then you can indeed go for queries(or even faster, use sql queries iso hql). From performance perspective ,IMO, stored procedures shines but people don't use them so often with hibernate.
Also, if you have some frequent repetitive calculation, try using caching in your application.

Hibernate pagination : ScrollableResult vs. setMaxResult() + setFirstResult()

I've been using ORM frameworks for a while but I am rather new to Hibernate, though.
Suppose you have a query (is it a Query or a Criteria, does not matter) that retrieves a great result set and that you want to paginate though it. Would you rather use the setMaxResult() and setFirstResult() methods combo, or a ScrollableResult?
Which is the best approach regarding the performances (execution time AND memory consumption)?
If you are implementing a Web application that serves separate pages of results in separate request-response cycles, then there's no way you can use ScrollableResult to any advantage. Use setFirst/Max/Result. However, this can be a real performance killer, depending on the exact query and the total size of the result. Especially if the poor db must sort the whole result set every time so it can calculate what are the 100-110th records.
We had the same questions the other day, and settled for setMaxResult(..) and setFirstResult(..). The problems are two:
ScrollableResult may execute one query for each call to next() if your jdbc driver or database are not handling it properly. This was the case with us (MySQL)
it is hibernate-specific, rather than JPA standard.

Fast way to get results in hibernate?

I currently have hibernate set up in my project. It works well for most things. However today I needed to have a query return a couple hundred thousand rows from a table. It was ~2/3s of the total rows in the table. The problem is the query is taking ~7 minutes. Using straight JDBC and executing what I assumed was an identical query, it takes < 20 seconds. Because of this I assume I am doing something completely wrong. I'll list some code below.
DetachedCriteria criteria =DetachedCriteria.forlass(MyObject.class);
criteria.add(Restrictions.eq("booleanFlag", false));
List<MyObject> list = getHibernateTemplate().findByCriteria(criteria);
Any ideas on why it would be slow and/or what I could do to change it?
You have probably answered your own question already, use straight JDBC.
Hibernate is creating at best an instance of some Object for every row, or worse, multiple Object instances for each row. Hibernate has some really degenerate code generation and instantiation behavior that can be difficult to control, especially with large data sets, and even worse if you have any of the caching options enabled.
Hibernate is not suited for large results sets, and processing hundreds of thousands of rows as objects isn't very performance oriented either.
Raw JDBC is just that raw types for rows columns. Orders of magnitudes of less data.
I'm not sure hibernate is the right thing to use if you need to pull hundreds of thousands of records. The query execute time might be under 20 seconds but the fetch time will be huge and consume a lot of memory. After you get all those records, how do you output them? It's far more data than you could display to a user. Hibernate isn't really a good solution for doing data wharehouse style data crunching.
Probably you have several references to other classes in your MyObject class and in your mapping you set eager loading or something like that. It's very hard to find the issue using the code you wrote because it's OK.
Probably it will be better for you to use Hibernate Profiler - http://hibernateprofiler.com/ . It will show you all the problems with your mappings, configurations and queries.

Recommend a fast & scalable persistent Map - Java [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 months ago.
Improve this question
I need a disk backed Map structure to use in a Java app. It must have the following criteria:
Capable of storing millions of records (even billions)
Fast lookup - the majority of operations on the Map will simply to see if a key already exists. This, and 1 above are the most important criteria. There should be an effective in memory caching mechanism for frequently used keys.
Persistent, but does not need to be transactional, can live with some failure. i.e. happy to synch with disk periodically, and does not need to be transactional.
Capable of storing simple primitive types - but I don't need to store serialised objects.
It does not need to be distributed, i.e. will run all on one machine.
Simple to set up & free to use.
No relational queries required
Records keys will be strings or longs. As described above reads will be much more frequent than writes, and the majority of reads will simply be to check if a key exists (i.e. will not need to read the keys associated data). Each record will be updated once only and records are not deleted.
I currently use Bdb JE but am seeking other options.
Update
Have since improved query performance on my existing BDB setup by reducing the dependency on secondary keys. Some queries required a join on two secondary keys and by combining them into a composite key I removed a level of indirection in the lookup which speeds things up nicely.
JDBM3 does exactly what you are looking for. It is a library of disk backed maps with really simple API and high performance.
UPDATE
This project has now evolved into MapDB http://www.mapdb.org
You may want to look into OrientDB.
You can try Java Chronicles from http://openhft.net/products/chronicle-map/
Chronicle Map is a high performance, off-heap, key-value, in memory, persisted data store. It works like a standard java map
I'd likely use a local database. Like say Bdb JE or HSQLDB. May I ask what is wrong with this approach? You must have some reason to be looking for alternatives.
In response to comments:
As the problem performance and I guess you are already using JDBC to handle this it might be worth trying HSQLB and reading the chapter on Memory and Disk Use.
As of today I would either use MapDB (file based/backed sync or async) or Hazelcast. On the later you will have to implement you own persistency i.e. backed by a RDBMS by implementing a Java interface. OpenHFT chronicle might be an other option. I am not sure how persistency works there since I never used it, but the claim to have one. OpenHFT is completely off heap and allows partial updates of objects (of primitives) without (de-)serialization, which might be a performance benefit.
NOTE: If you need your map disk based because of memory issues the easiest option is MapDB. Hazelcast could be used as a cache (distributed or not) which allows you to evict elements from heap after time or size. OpenHFT is off heap and could be considered if you only need persistency for jvm restarts.
I've found Tokyo Cabinet to be a simple persistent Hash/Map, and fast to set-up and use.
This abbreviated example, taken from the docs, shows how simple it is to save and retrieve data from a persistent map:
// create the object
HDB hdb = new HDB();
// open the database
hdb.open("casket.tch", HDB.OWRITER | HDB.OCREAT);
// add item
hdb.put("foo", "hop");
hdb.close();
SQLite does this. I wrote a wrapper for using it from Java: http://zentus.com/sqlitejdbc
As I mentioned in a comment, I have successfully used SQLite with gigabytes of data and tables of hundreds of millions of rows. If you think out the indexing properly, it's very fast.
The only pain is the JDBC interface. Compared to a simple HashMap, it is clunky. I often end up writing a JDBC-wrapper for the specific project, which can add up to a lot of boilerplate code.
JBoss (tree) Cache is a great option. You can use it standalone from JBoss. Very robust, performant, and flexible.
I think Hibernate Shards may easily fulfill all your requirements.

Categories

Resources