Grails 1.3.1: Improved Query Caching

Grails 1.3.1: Improved Query Caching - java

http://www.grails.org/1.3.1+Release+Notes
Improved Query Caching
The findAll query method now supports
taking advantage of the 2nd level
cache.
Book.findAll("from Book as b where b.author=:author", [author:'Dan Brown'], [cache: true])
What advantages or disadvantages of using 2nd level cache ?
I'm developing web-server for iPhone application so i have a lot of parallel connections, DB queries, etc.

Generally, the 2nd level cache holds the application data previously retrieved from the database. The advantage is that you can make big savings on avoiding database calls for the same data. If 2nd level cache is going to be efficient or not depends on how your app is working with the data and also on the size of the data you can store in memory. Probably the only major disadvantage is that cache need to be invalidated when data is updated in the database. When that happens from your application, some frameworks can handle that automatically (e.g. write trough cache), but if database is changed externally you can only rely on the cace expiration.

Related

Using Hazelcast / Redis for DB backed cache requirement

I am developing a distributed Java application that needs to check a list of blacklist userids on each request.
If request fails on some eligibility rules, system should add userid ( a parameter of request ) to blacklist.
I am trying to find a proper caching solution for blacklist implementation. My requirements are;
querying blacklist should be very fast
blacklist persistence technology should be scalable
all blacklist data should be persisted on a RDBMS also for fail over / reloading purposes.
They are two possible solutions;
Option 1: I can use redis for storing blacklist data. Whenever a request fails on eligibility rules I can add userid to redis cache easly.
- advantages: extremely fast query, easy to implement
- disadvantages: trusting on redis persistency although it works, it is a cache solution by design not a persistency layer.
Option 2: I can use redis for storing blacklist data meanwhile I can maintain db tables on RDBMS for blacklist. Whenever a request fails on eligibility rules I can add userid to redis cache and rdbms table together.
- advantages: extremely fast query, ability(possibility) to reload redis cache from db
- disadvantages: there is a consistency issue between redis and db table.
Option 3: I can use hazelcast as hibernate L2 cache and when I add any user id to blacklist it is both added to cache and db.
I have questions about option 3
Does hazelcast L2 cache is suitable for preserving such a list of blacklisted users?
Does hibernate manages consistency issue between cache and db?
When application restarted, how L2 cache is reloaded?
and a last question
- Do you have any other suggestion for such a use-case?
Edit:
There will be 100m records in blacklist and I have a couple smilar blacklist.
my read performance is important. I need to query existence of a key within blacklist ~100ms

Ygok,
Still waiting for clarification on the query requirements but I can assume it a lookup by key (since you mention Redis and Redis doesn't have a query language. Hazelcast does have Distributed Query / Predicate API).
Lookup by key is an extremely fast operation with Hazelcast.
In option 2 you need to maintain data consistency between your RDBMS and Redis cache. Using Hazelcast MapLoader / MapStore you can implement write-through- / read-through- cache concepts. All you need to do is put the entry to the cache, and Hazelcast persists it immediately or with configured delay (with batching) to the RDBMS.
In terms of performance, please, feel free to make yourself familiar with recent Hazelcast / Redis benchmark.
Let me know if you have any questions.

I had similar question before, first of all, how much data do you want to store and spend how much memory? how fast query per second do you need? what the data structure like, only userId as a key?
Hazelcast query not very fast on my testing(you can do it for yourself), but it can store large memory data. Hazelcast using Java
default serialize, it cost a lot of memory and IO.
Hazelcast provide hibernate L2 cache, cache data store on
Hazelcast(only query cache), so restart your application not affect
the cache.
Redis provide memory data persistence(DUMP and AOF), maybe a
bit of data will be lost when server crashed, but it very fast.
If you want to not lose any data, store on multi MySQL
server(split data by userId to different server, but you should
consider the problems when add new server), at the same time, you can
add local cache (e.g. Ehcache or google CacheBuilder) and set a
expire time, it can be promote performance.

It's possible to maintain consistency between Redis cache and RDBMS using Redisson framework. It provides write-through and read-through strategies for Map object using MapWriter and MapLoader objects which are required to use in your case.
Please read this documentation section

caching readonly data for java application

I have a database which has around 150K records of data with a primary key on the table. The data size for each record will take less than 1kB. The processing time for constructing a POJO from the DB record takes about 1-2 secs(there is some business logic that takes too much time). This is read-only data. Hence I'm planning to implement caching the data. What I'm thinking to do is. Load the data in subsets(200 records each time) and create a thread that'll construct the POJOs and keep them in a hashtable. While the cache is being loaded(when I start the application) the User will see a wait sign. For storing the data in HashTable is an issue I'll actually store the processed data in to another DB table(marshall the POJO to xml).
I use a third party API to load the data from database. Once I load a record I'll have load the data I'll have to load associations for the loaded data and then associations for the association found at the top level. It's like loading a family tree.
I can't use Hibernate or any ORM framework as I'm using a third party API to load the data which is shipped with the database it self(it's a product). More over I don't think loading data once is not a big issue.
If there is a possibility to fine tune the business logic I wouldn't have asked this question here.
Caching the data on demand is an option, but I'm trying to see if I can do anything better.
Suggest me if there is a better idea that you are aware of. Thank you./

Suggest me if there is a better idea that you are aware of.
Yes, fix the business logic so that it doesn't take 1 to 2 seconds per record. That's a ridiculously long time.
Before you do that, profile your application to make sure that it is really the business logic that is causing the slow record loading, and not something else. (For example, it could be a pathological data structure, or a database issue.)
Once you've fixed the root cause of the slow record loading, it is still a good idea to cache the read-only records, but you probably don't need to preload the cache. Instead, just load the records on demand.

It sounds like you are reinventing the wheel. I'd be looking to use hibernate. Apart from simplifying the code to access the database, hibernate has built-in caching and lazy loading of data so it only creates objects as you request them. Ergo, a lot of what you describe above is already in place and you can concentrate on sorting out your business logic. I suspect that once you solve the business logic performance issue, there will be no need to do such as complicated caching system and hibernate defaults will be sufficient.

As maximdim said in a comment, preloading the whole thing will take a lot of time. If your system is not very strange, the user won't need all data at once. Just cache on demand instead. I would also recommend using an established caching solution, such as EHCache, which has persistence via DiskStore -- the only issue is that whatever you cache in this case has to be Serializable. Since you can marshall it as XML, I'm betting you can serialize it too, which should be faster.
In a past project, we had to query a very busy, very sluggish service running in an off-site mainframe in order to assemble one of the entities. Average response times from our app were dominated by this query. Since the data we retrieved was mostly read-only caching with EHCache solved our problems.

jdbm has a nice, persistent map implementation (http://code.google.com/p/jdbm2/) - that may help you do local caching - it would certainly be a lot faster than serializing your POJOs to XML and writing them back into a SQL database.
If your data is truly read-only, then I'd think that the best solution would be to treat the source database as an input queue that feeds your app database. Create a background process (heck, a service would be better), and have it monitor the source database and keep your app database synced.

Server side caching for Java/Java EE application

Here is my situation: I have Java EE single page application. All client-server communication is AJAX based with JSON is used as format to exchange data. One of my request takes around 1 min to calculate data required by client. Also this data is huge(Could be > 20 MB). So it is not possible to pass entire data to javascript in one go. So for this reason I am only passing few records to client and using grid to display data with paging option.
Now when user clicks on next page button, I need to get more data. My question is how do I cache data on server side ? I need this data only for one user as a time. Would you recommend caching all data one first request using session id as key ?
Any other suggestions ?

I am assuming you are using DB backend for that. I'd use limits to return small chunks of data, most DB vendors have solution for this. That would make your queries faster, and also most of JS fameworks with grid type of components will support paginating results(ExtJS for example).
If you are fetching data from 3rd party and passing it on (with some modifications or not) I'd still stick to the database and use such workflow: pool data from 3rd party, save in db, call from your widget small chunks required by customers.
Hope this helps.

The cheapest (and not so ineffective way of caching data) in a Java EE web application is to use the Session object like you intend to do. It's ineffective since it requires the developer to ensure that the cache does not leak memory; so it is upto to the developer to nullify the reference to the object once the object is no longer needed.
However, even if you wish to implement the poor man's cache, caching 20MB of data is not advisable, as it does not scale well. The scalability question rises when multiple users utilize the same functionality of the application, in which case 20MB is a lot of data.
You're better off returning paginated "datasets" in the form of JSON, based on the ValueList design pattern. Each request for the query of data will result in partial retrieval of data, which is then sent down the wire to the client. That way, you never have to cache the complete results of the query execution, and also you can return partial datasets. It is entirely upto to you, as to whether you want to cache; usually caching is done for large datasets that are utilized time and again.

How much session data is too much?

We are running into unusually high memory usage issues. And I observed that many places in our code we are pulling 100s of records from DB, packing it in custom data objects, adding it to an arraylist and storing in session. I wish to know what is the recommended upper limit storing data in session. Just a good practice bad practice kind of thing.
I am using JRockit 1.5 and 1.6GB of RAM. I did profiling with Jprobe and found that some parts of app have very heavy memory footprint. Most of this data is being into session to be used later.

That depends entirely on how many sessions are typically present (which in turn depends on how many users you have, how long they stay on the site, and the session timeout) and how much RAM your server has.
But first of all: have you actually used a memory profiler to tell you that your "high memory usage" is caused by session data, or are you just guessing?
If the only problem you have is "high memory usage" on a production machine (i.e. it can handle the production load but is not performing as well as you'd like), the easiest solution is to get more RAM for the server - much quicker and cheaper than redesigning the app.
But caching entire result sets in the session is bad for a different reason as well: what if the data changes in the DB and the user expects to see that change? If you're going to cache, use one of the existing systems that do this at the DB request level - they'll allow you to cache results between users and they have facilities for cache invalidation.

If you're storing data in session to improve performance, consider using true caching since cache is application-wide, whereas session is per-user, which results in unneccessary duplication of otherwise similar objects.
If, however, you're storing them for user to edit this objects (which I doubt, since hundreds of objects is way too much), try minimizing the amount of data stored or research optimistic concurrency control.

I'd say this heavily depends on the number of active sessions you expect. If you're writing an intranet application with < 20 users, it's certainly no problem to put a few MB in the session. However, if you're expecting 5000 live session for instance, each MB of data stored per session accounts for 5GB of RAM.
However, I'd generally recommend not to store any data from DB in session. Just fetch from DB for every request. If performance is an issue, use an application-wide cache (e.g. Hibernate's 2nd level cache).

What kind of data is it? Is it really needed per session or could it be cached at application level? Do you really need all the columns or only a subset? How often is it being accessed? What pages does it need to be available on? And so on.
It may make much more sense to retrieve the records from the DB when you really need to. Storing hundreds of records in session is never a good strategy.

I'd say try to store the minimum amount of data that will be enough to recreate the necessary environment in a subsequent request. If you're storing in memory to avoid a database round-trip, then a true caching solution such as Memcache might be helpful.
If you're storing these sessions in memory instead of a database, then the round-trip is saved, and requests will be served faster as long as the memory load is low, and there's no paging. Once the number of clients goes up and paging begins, most clients will see a huge degradation in response times. Both these variables and inversely related.
Its better to measure the latency to your database server, which is usually low enough in most cases to be considered as a viable means of storage instead of in-memory.

Try to split the data you are currently storing in the session into user-specific and static data. Then implement caching for all the static parts. This will give you a lot of reuse application-wide and still allow you to cache the specific data a user is working on.

You could also make per-user mini sqlite database and connect to it, and store the data the user is accessing in it, then just retrieve the records from it, while the user is requesting it, and after the user disconnects just delete the sqlite database.

web application session cache

I want to cache data for a user session in a web application built on struts.What is the best way to do it .Currently we store certain information from the DB in java objects in the user's session .Works fine till now but people are now concerned about memory usage etc.
Any thought on how best to get around this problem.

Works fine till now but people are now
concerned about memory usage etc.
Being "concerned" is relatively meaningless - do they have any concrete reason for it? Statistics that show how much memory session objects are taking up? Along the same line: do you have concrete reasons for wanting to cache the data in the user session? Have you profiled the app and determined that fetching this data from the DB for each request is slowing down your app significantly?
Don't guess. Measure.

It's usually bad practice to store whole objects in the user session for this reason. You should probably store just the keys in the session and re-query the database when you need them again. This is trade-off, but usually acceptable for most use-cases. Querying the database on keys is usually acceptable to do between requests, rather than storing objects in session.
If you must have them in session, consider using something like an LRUMap (in Apache Collections).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.