Java Multithreaded Caching with Single Updater Thread - java

I have a web service that has ~1k request threads running simultaneously on average. These threads access data from a cache (currently on ehcache.) When the entries in the cache expire, the thread that hits the expired entry tries getting the new value from the DB, while the other threads also trying to hit this entry block, i.e. I use the BlockingEhCache decorator. Instead of having the other threads waiting on the "fetching thread," I would like the other threads to use the "stale" value corresponding to the "missed" key. Is there any 3rd party developed ehcache decorators for this purpose? Do you know of any other caching solutions that have this behavior? Other suggestions?

I don't know EHCache good enough to give specific recommendations for it to solve your problem, so I'll outline what I would do, without EHCache.
Let's assume all the threads are accessing this cache using a Service interface, called FooService, and a service bean called SimpleFooService. The service will have the methods required to get the data needed (which is also cached). This way you're hiding the fact that it's cached from from the frontend (http requests objects).
Instead of simply storing the data to be cached in a property in the service, we'll make a special object for it. Let's call it FooCacheManager. It will store the cache in a property in FooCacheManger (Let's say its of type Map). It will have getters to get the cache. It will also have a special method called reload(), which will load the data from the DB (by calling a service methods to get the data, or through the DAO), and replace the content of the cache (saved in a property).
The trick here is as follows:
Declare the cache property in FooCacheManger as AtomicReference (new Object declared in Java 1.5). This guarantees thread safety when you read and also assign to it. Your read/write actions will never collide, or read half-written value to it.
The reload() will first load the data into a temporary map, and then when its finished it will assign the new map to the property saved in FooCacheManager. Since the property is AtomicReference, the assignment is atomic, thus it's basically swiping the map in an instant without any need for locking.
TTL implementation - Have FooCacheManager implement the QuartzJob interface, and making it effectively a quartz job. In the execute method of the job, have it run the reload(). In the Spring XML define this job to run every xx minutes (your TTL) which can also be defined in a property file if you use PropertyPlaceHolderConfigurer.
This method is effective since the reading threads:
Don't block for read
Don't called isExpired() on every read, which is 1k / second.
Also the writing thread doesn't block when writing the data.
If this wasn't clear, I can add example code.

Since ehcache removes stale data, a different approach can be to refresh data with a probability that increases as expiration time approaches, and is 0 if expiration time is "sufficiently" far.
So, if thread 1 needs some data element, it might refresh it, even though data is not old yet.
In the meantime, thread 2 needs same data, it might use the existing data (while refresh thread has not finished yet). It is possible thread 2 might try to do a refresh too.
If you are working with references (the updater thread loads the object and then simply changes the reference in the cache), then no separate synchronization is required for get and set operations on the cache.

Related

How to know that no operation is running on ConcurrentHashMap or in Idle state in JAVA?

I have a situation like, whenever my ConcurrentHashMap updates I need to clear an existing file and write the entire data into the File again. So every time I update clearing the file and writing the data again into the file causes high latency. So I am thinking that whenever my hashmap is in Idle state, like if no updating operation is going on then I will write the entire data into the file, else I will wait until the hashmap is idle.
Basically, I will be deleting Strings continuously from the Map. So everytime I delete a String from the HashMap writing to the file is a very costly operation. So is there a way to know that no deletion operation is going on the ConcurrentHashMap?
So is there a way to know that no deletion operation is going on the ConcurrentHashMap?
Short answer: no there isn't a way.
But even if there wasn't you would still get into problems. For example, suppose that new updates arrived immediately after you started clearing / writing.
I think the solution is to use two maps and a queue.
When an update request happens:
perform the update on the concurrent hashmap
add the request to the queue
In a background thread:
pull requests from the queue, and perform updates on the second (shadow) hashmap
periodically or based on some other criteria, cease pulling requests and flush the shadow hashmap to the file.
The primary hashmap is always updated quickly, and is always up to date. Operations updating and using the primary hashmap do not get (significantly) blocked.
The queue provides request buffering while the shadow hashmap is being written.
The second hashmap is only accessed by one thread, so it doesn't need to be concurrent. Therefore it will be faster.
The state of the file will typically be a little behind the primary hashmap. But that is inevitable. The only way to avoid that is to block updates to the primary map ... which is what you are trying to avoid.
Another way to approach this would be to make writing to the file faster. I suspect that the reason it is slow is because your current design requires you to clear and rewrite the file each time. Another approach would be to write only the changes to the file. This means you may have more work to do on restart ... assuming the purpose of the file is to record the map state so that you can restart.
Sounds like you would need to make use of encapsulation by wrapping the ConcurrentHashMap in a class and possibly have add/remove methods with a Queue. Look at the java.util.concurrent package for other options.
The Idea would be to use a Queue. Every access to the Map would go by calling the add/remove from the wrapper and add that to the queue. Then there would be an infinite Thread loop consuming the queue. While doing that you can check if the queue is empty and do the file persisting.

How to note web requests in concurrent environment?

We have a web application which receives some million requests per day, we audit the request counts and response status using an interceptor, which intern calls a class annotated with #Async annotation of spring, this class basically adds them to a map and persists the map after a configured interval. As we have fixed set of api we maintain ConcurrentHashMap map having API name as key and its count and response status object as value.So for every request for an api we check whether it exists in our map , if exist we fetch the object against it otherwise we create an object and put it in map. For ex
class Audit{
CounterObject =null;
if(APIMap.contains(apiname){
// fetch existing object
CounterObject=APIMap.get(apiname);
}
else{
//create new object and put it to the map
CounterObject=new CounterObject();
}
// Increment count,note response status and other operations of the CounterObject recieved
}
Then we perform some calculation on the received object (whether from map or newly created) and update counters.
We aggreagate the map values for specific interval and commit it to database.
This works fine for less hits , but under a high load we face some issues. Like
1. First thread got the object and updated the count, but before updating second thread comes and gets the value which is not the latest one, by this time first thread has done the changes and commits the value , but the second threads updates the values it got previously and updated them. But as the key on which operation is performed is same for both the threads the counter is overwritten by the thread whichever writes last.
2. I don't want to put synchronized keyword over the block which has logic for updating the counter. As even if the processing is async and the user gets response even before we check apiname in map still the application resources consumed will be higher under high load if synchronized keyword is used , which can result in late response or in worst case a deadlock.
Can anyone suggest a solution which does can update the counters in concurrent way without having to use synchronized keyword.
Note :: I am already using ConcurrentHashMap but as the lock hold and release is so fast at high load by multiple threads , the counter mismatches.
In your case you are right to look at a solution without locking (or at least with very local locking). And as long as you do simple operations you should be able to pull this off.
First of all you have to make sure you only make one new CounterObject, instead of having multiple threads create one of their own and the last one overwriting earlier object.
ConcurrentHashMap has a very useful function for this: putIfAbsent. It will story an object if there is none and return the object that is in the map right after calling it (although the documentation doesn't state it as directly, the code example does). It works as follows:
CounterObject counter = APIMap.putIfAbsent("key", new CounterObject());
counter.countStuff();
The downside of the above is that you always create a new CounterObject, which might be expensive. If that is the case you can use the Java 8 computeIfAbsent which will only call a lambda to create the object if there is nothing associated with the key.
Finally you have to make sure you CounterObject is threadsafe, preferably without locking/sychronization (although if you have very many CounterObjects, locking on it will be less bad than locking the full map, because fewer threads will try to lock the same object at the same time).
In order to make CounterObject safe without locking, you can look into classes such as AtomicInteger which can do many simple operations without locking.
Note that whenever I say locking here it means either with an explicit lock class or by using synchronize.
The reason for counter mismatch is check and put operation in the Audit class is not atomic on ConcurrentHashMap. You need to use putIfAbsent method that performs check and put operation atomically. Refer ConcurrentHashMap javadoc for putIfAbsent method.

Refreshing Caches while under load with Spring/EHCache

I have a caching issue on a Spring multi-threaded web service with a database backend and EHCache-based caching. The service has many clients all requesting the same object again and again, with dozens of requests per seconds. There is only a couple of objects that are requested that frequently, with a large number of other objects being requested infrequently. The objects can change every couple of minutes, so the cache's TTL is set to a minute. Loading an object from the database is slow and takes at least several seconds.
At first I used a naive implementation to get the object:
Check whether the object is in the cache.
If yes, return it from the cache.
If not, load it from the database, put it in the cache and return it.
This was working well when initially testing it locally. But performance testing on a faster server showed some pretty bad load spikes every time one of the more frequently requested objects expires in the cache. When this happens, for the next 10 seconds all requests for that object would result in database loads, until the first thread finished the database load and put the new object into the cache. The result was a short but very high load on the database, and a lot of users who need to wait for the request to finish.
My current implementation improves the database load by tracking whether which object are currently being loaded:
Check whether the object is cached.
If yes, return it from the cache.
If not, check whether the object is currently being loaded.
If yes, wait for the other thread's load to complete, get the new object from the cache and return it.
If no, put the object into the list of loading objects, put it into the cache when finished and return it.
With this implementation, even when the object expires, there is only one database operation. And, because of the lower database load, it will also finish sooner. But it still means that all users who request the object during the object load need to wait.
What I would really want is that only the first thread waits for the database load, and all others just return the 'expired' object while the object is being loaded. Response time is more important for me than the fact that the object is a few seconds too old.
Alternatively I could refresh the cache asynchronously when I notice that an object will expire in a few seconds. That's closer to EHCache's single TTL model and would mean that no one needs to wait for the database load
My real question is: before I re-invent the wheel, is there any existing framework that already implements something like this (in a Spring/EHCache environment)? Or maybe support for this already exists somewhere in Spring/EHCache and I just can't find the right option?
There are two Ehcache provided constructs that could help you:
Refresh ahead
Scheduled refresh
Both require you to change the way you interact with your cache as they require a CacheLoader to be configured.
Unfortunately, I can't find online documentation that shows example for the second option.
It allows to refresh cache entries using Quartz to schedule it. It can also refresh only a subset of the keys, based on a key generator.
Have a look at classes in package net.sf.ehcache.constructs.scheduledrefresh
Your design is flawed since the second thread can't get any "expired" object from the cache since there is none (as per step #2: Return immediately, when the object is in the cache).
Workarounds:
10 seconds to load a single object is way too long. Check your SQL and try to optimize it.
Cache objects longer and run update threads which query for new states of objects in the database. That means thread #1 just triggers some background work which eventually refreshes the object in the cache. Drawback: The cache must be big enough to keep most of the objects in memory at all times. Otherwise the "load object for the first time" will be too visible.
Display the web page without loading the objects and load them with AJAX requests in the background. Update the web page as objects become available. Depending on how useful your site is when not everything is ready at once, this might be good balance between responsiveness and accuracy.
Improve loading of objects. Create "view" tables which contain all the data necessary to display a single object in each row. Update these rows when you make changes to the "real" (normalized) objects. The "view cache" is populated from this table only. That makes loading objects very fast at the expense of changes to the data model. See "Command-query separation" for an extreme solution.
Try to denormalize your data model a bit to reduce the number of joins necessary to load a single object. Alternatively, cache some objects which you would normally join and do the filtering/aggregation on the web server.
When updating an object, trigger a refresh of the cache. Chances are that someone will want to see this object, soon. This approach works best when people manually edit the objects and least, when changes are randomly triggered by outside systems (news tickers, stock quotes, etc).
If you only need a lot of joins to display all the details, try to load the overview and then use a second cache for details which you can then load in a second thread. Together with AJAX, you can display an overview of the object quickly which will buy you some goodwill to wait for the details.

How do I synchronize cache list access

I've got the following problem (one important restriction - cannot use external jar/libraries, only java primitives that come with regular install):
Objects of class X are stored long term in sql DB. Objects are cached for performance sake (needs to be written. Intend to base it on LinkedHashMap).
get(key):
check if object is in cache and not in use - return it.
if object is in use - sleep till it's available.
if object is not in cache - read it from DB.
putInCache(object):
update object in cache (if it's not there, add it).
if the cache is exhausted it will trigger a saveToDB operation by the cache and remove
from cache the least recent used item.
saveToDB(object):
write object to DB (not removed from cache) and mark object and "not changed".
There are multiple threads calling get. A thread can change the object it received from get (and the object will be marked as "changed") - when it's finished it will call putInCache.
There is one dedicated thread that goes over the cache objects and when it encounters a "changed" object it will trigger saveToDB (object will be marked as used while DB access is going on).
How would you recommend to ensure thread safety ?
Basically I'm looking for the right Java classes that will enable:
1. get to synchronize it's access to each object in the cache. So that it can check if it's there and if so - if it's used or free for grabbing. If it's used - it should sleep until it's available.
2. the dedicated thread should not lock the cache while calling saveToDB but still making sure all the cache is examined and no starvation is caused (the cache might change while saveToDB is running)
just to clarify I'm only interested in the locking/synchronization solutions - things like the cache triggering and DB access can be assumed as given.
Here is an approach:
use an ExecutorService to handle DB requests;
use Futures for your map values;
use a ConcurrentHashMap as a map implementation.
The Future should get from the DB; it will use the ExecutorService.
When you need to make manipulations on one object, synchronize on this future's .get() which will be the object.
Also, google for "Java concurrency in practice", and buy the book ;)

variable value changed simulateously by different threads

I have many concurrent running http request serving threads. They will be creating an Object(? extends Object) for every request and save the object in a list.
Advice me some good data structure to implement this list.
I can't use ArrayList since it was not thread safe.
I dont like to use Vector - since its synchronized, it will make other threads to wait when one of the http thread was saving the object.
Also tried LinkedList, but there is data loss due to concurrent update.
Your variable would need to be atomic so that it can safely be updated by multiple threads (see java.util.concurrent.atomic). You could also use an AtomicInteger to keep track of the number of times the variable is updated.
But are you sure you want do this without explicitly controlling the update to a variable?

Categories

Resources