Refreshing Caches while under load with Spring/EHCache - java

I have a caching issue on a Spring multi-threaded web service with a database backend and EHCache-based caching. The service has many clients all requesting the same object again and again, with dozens of requests per seconds. There is only a couple of objects that are requested that frequently, with a large number of other objects being requested infrequently. The objects can change every couple of minutes, so the cache's TTL is set to a minute. Loading an object from the database is slow and takes at least several seconds.
At first I used a naive implementation to get the object:
Check whether the object is in the cache.
If yes, return it from the cache.
If not, load it from the database, put it in the cache and return it.
This was working well when initially testing it locally. But performance testing on a faster server showed some pretty bad load spikes every time one of the more frequently requested objects expires in the cache. When this happens, for the next 10 seconds all requests for that object would result in database loads, until the first thread finished the database load and put the new object into the cache. The result was a short but very high load on the database, and a lot of users who need to wait for the request to finish.
My current implementation improves the database load by tracking whether which object are currently being loaded:
Check whether the object is cached.
If yes, return it from the cache.
If not, check whether the object is currently being loaded.
If yes, wait for the other thread's load to complete, get the new object from the cache and return it.
If no, put the object into the list of loading objects, put it into the cache when finished and return it.
With this implementation, even when the object expires, there is only one database operation. And, because of the lower database load, it will also finish sooner. But it still means that all users who request the object during the object load need to wait.
What I would really want is that only the first thread waits for the database load, and all others just return the 'expired' object while the object is being loaded. Response time is more important for me than the fact that the object is a few seconds too old.
Alternatively I could refresh the cache asynchronously when I notice that an object will expire in a few seconds. That's closer to EHCache's single TTL model and would mean that no one needs to wait for the database load
My real question is: before I re-invent the wheel, is there any existing framework that already implements something like this (in a Spring/EHCache environment)? Or maybe support for this already exists somewhere in Spring/EHCache and I just can't find the right option?

There are two Ehcache provided constructs that could help you:
Refresh ahead
Scheduled refresh
Both require you to change the way you interact with your cache as they require a CacheLoader to be configured.
Unfortunately, I can't find online documentation that shows example for the second option.
It allows to refresh cache entries using Quartz to schedule it. It can also refresh only a subset of the keys, based on a key generator.
Have a look at classes in package net.sf.ehcache.constructs.scheduledrefresh

Your design is flawed since the second thread can't get any "expired" object from the cache since there is none (as per step #2: Return immediately, when the object is in the cache).
Workarounds:
10 seconds to load a single object is way too long. Check your SQL and try to optimize it.
Cache objects longer and run update threads which query for new states of objects in the database. That means thread #1 just triggers some background work which eventually refreshes the object in the cache. Drawback: The cache must be big enough to keep most of the objects in memory at all times. Otherwise the "load object for the first time" will be too visible.
Display the web page without loading the objects and load them with AJAX requests in the background. Update the web page as objects become available. Depending on how useful your site is when not everything is ready at once, this might be good balance between responsiveness and accuracy.
Improve loading of objects. Create "view" tables which contain all the data necessary to display a single object in each row. Update these rows when you make changes to the "real" (normalized) objects. The "view cache" is populated from this table only. That makes loading objects very fast at the expense of changes to the data model. See "Command-query separation" for an extreme solution.
Try to denormalize your data model a bit to reduce the number of joins necessary to load a single object. Alternatively, cache some objects which you would normally join and do the filtering/aggregation on the web server.
When updating an object, trigger a refresh of the cache. Chances are that someone will want to see this object, soon. This approach works best when people manually edit the objects and least, when changes are randomly triggered by outside systems (news tickers, stock quotes, etc).
If you only need a lot of joins to display all the details, try to load the overview and then use a second cache for details which you can then load in a second thread. Together with AJAX, you can display an overview of the object quickly which will buy you some goodwill to wait for the details.

Related

Springboot Scheduled Trigger on Entity Existence

I have a function in spring boot that checks every 10th of a second (100 ms) for any existing entity in a JPA SQL database, and processes the requested actions, and deletes the entity once the actions have been completed. Problem is, checking the db every 100ms is very memory intensive, expensive, and wasteful, and has caused crashes before (because I am running this on a free server, and maxes out the memory). I was wondering, is there a method similar to #scheduled that triggers the method if a db table holds any rows (basically, if exampleRepository.findAll() does not return null, then my method runs)?
Thanks guys!
Well, there are a couple of options you could try.
If your "processing" is just another database action, inserting/updating another table, then why don't you give database triggers a try?
You might benefit from an event-driven architecture, use a message queue or expose an API in your service which will consume the data directly. Schedulers are generally the last resort in such cases.
The out of memory issue might not be because of limited memory but the way you are retrieving the data from the database. Instead of loading all the data at once use smaller chunks and do batch processing.

Way to improve Rest Webservice performance which call other API

I have a webservice ABC
ABC Operations:
A. Call XYZ web service
B. Store response in db
C. return result
Overall ABC Responce time = 18 sec
XYZ Response Time = 8 sec.
Only ABC Response time = 18-8 = 10 sec
I want to minimize response time of ABC service.
How can this be done?
Few things I though:
1.Send part request and get part response = But its not possible in my case.
2. return response and perform db in asynchronous manner. (Can this be done in reliable manner?)
3. Is there any way to improve the db write operation?
If it is possible to “”perform db in asynchronous manner’’ i.e. if you can respond to the caller before the DB write completes then you can use the ‘write behind’ pattern to perform the DB writes asynchronously.
The write behind pattern looks like this: queue each data change, let this queue be subject to a configurable duration (aka the “write behind delay”) and a maximum size. When data changes, it is added to the write-behind queue (if it is not already in the queue) and it is written to the underlying store whenever one of the following conditions is met:
The write behind delay expires
The queue exceeds a configurable size
The system enters shutdown mode and you want to ensure that no data is lost
There is plenty of prior art in this space. For example, Spring’s Cache Abstraction allows you to add a caching layer and it supports JSR-107 compliant caches such as Ehcache 3.x which provides a write behind cache writer. Spring’s caching service is an abstraction not an implementation, the idea being that it will look after the caching logic for you while you continue to provide the store and the code to interact with the store.
You should also look at whatever else is happening inside ABC, other than the call to XYZ, if the DB call accounts for all of those extra 10s then ‘write behind’ will save you ~10s but if there are other activities happening in those 10s then you’ll need to address those separately. The key point here is to profile the calls inside ABC so that you can identify exactly where time is spent and then prioritise each phase according to factors such as (a) how long that phase takes; (b) how easily that time can be reduced.
If you move to a ‘write behind’ approach then the elapsed time of the DB is no longer an issue for your caller but it might still be an issue within ABC since long write times could cause the queue of ‘write behind’ instructions to build up. In that case, you would profile the DB call to understand why it is taking so long. Common candidates include: attempting to write large data items (e.g. a large denormalised data item), attempting to write into a table/store which is heavily indexed.
As far as I know you can follow the options based on your requirement:
Think of caching the results from XYZ response and store to database so that you can minimise the call.
There could be possibility of failures in option 2 but still you can fix it by writing the failure cases to error log and process it later.
DB write operation can be improved with proper indexing, normalisation etc..

How do I cache diffs in data for arbitrary time differences (java web service)

I have a Java Webserivce which querying a DB to return data to users. DB queries are expensive so I have Cron job which runs every 60 seconds to cache the current data in memcached.
Data elements 'close' after a time meaning they aren't returned by "get current data" requests. So these requests can utilize the cached data.
Clients use a feature called 'since' to get all the data that has changed since a particular timestamp (the last request's timestamp). This would return any closed data if that data closed during since that timestamp.
How can I effectively store the diffs/since data? Accessing the DB for every since request is too slow (and won't scale well), but because clients could request any since time, it makes it difficult to generate an all-purpose cache.
I tried having the cron job also build a since cache. It would do 'since' requests to have everything that changed since the last update, and attempted to force clients to request the timestamps which matched the cron job's since requests. But inconsistencies in how long the cron took plus neither the client nor corn job runs exactly every 60 seconds, so the small differences add up. This eventually results in some data closing, but the cache or the client misses it.
I'm not even sure what to search for to solve this.
I'd be tempted to stick a time expiring cache (eg ehcache with timeToLive set) in front of the database and have whatever process updated the database also put the data directly into the cache (resetting or removing an existing matching element). The webservice then just hits the cache (which is incredibly fast) on everything except its initial connection, filtering out the few elements that are too old and sending the rest on to the client. Gradually the old data gets dropped from the cache as its time to live passes. Then just make sure the cache gets pre populated when the service starts up.
Does your data has any time-stamping? We were having similar issues while caching here in my company, the time-stamping resolved it. You can use a "Valid-upto" timestamp with your data, so that your cache and client can know till when the data is valid.

How to iterate over db records correctly with hibernate

I want to iterate over records in the database and update them. However since that updating is both taking some time and prone to errors, I need to a) don't keep the db waiting (as e.g. with a ScrollableResults) and b) commit after each update.
Second thing is that this is done in multiple threads, so I need to ensure that if thread A is taking care of a record, thread B is getting another one.
How can I implement this sensibly with hibernate?
To give a better idea, the following code would be executed by several threads, where all threads share a single instance of the RecordIterator:
Iterator<Record> iter = db.getRecordIterator();
while(iter.hasNext()){
Record rec = iter.next();
// do something lengthy here
db.save(rec);
}
So my question is how to implement the RecordIterator. If on every next() I perform a query, how to ensure that I don't return the same record twice? If I don't, which query to use to return detached objects? Is there a flaw in the general approach (e.g. use one RecordIterator per thread and let the db somehow handle synchronization)? Additional info: there are way to many records to locally keep them (e.g. in a set of treated records).
Update: Because the overall process takes some time, it can happen that the status of Records changes. Due to that the ordering of the result of a query can change. I guess to solve this problem I have to mark records in the database once I return them for processing...
Hmmm, what about pushing your objects from a reader thread in some bounded blocking queue, and let your updater threads read from that queue.
In your reader, do some paging with setFirstResult/setMaxResults. E.g. if you have 1000 elements maximum in your queue, fill them up 500 at a time. When the queue is full, the next push will automatically wait until the updaters take the next elements.
My suggestion would be, since you're sharing an instance of the master iterator, is to run all of your threads using a shared Hibernate transaction, with one load at the beginning and a big save at the end. You load all of your data into a single 'Set' which you can iterate over using your threads (be careful of locking, so you might want to split off a section for each thread, or somehow manage the shared resource so that you don't overlap).
The beauty of the Hibernate solution is that the records aren't immediately saved to the database, since you're using a transaction, and are stored in hibernate's cache. Then at the end they'd all be written back to the database at once. This would save on those expensive database writes you're worried about, plus it gives you an actual object to work with on each iteration, instead of just a database row.
I see in your update that the status of the records may change during processing, and this could always cause a problem. If this is a constantly running process or long running, then my advice using a hibernate solution would be to work in smaller sets, and yes, add a flag to mark records that have been updated, so that when you move to the next set you can pick up ones that haven't been touched.

Java Multithreaded Caching with Single Updater Thread

I have a web service that has ~1k request threads running simultaneously on average. These threads access data from a cache (currently on ehcache.) When the entries in the cache expire, the thread that hits the expired entry tries getting the new value from the DB, while the other threads also trying to hit this entry block, i.e. I use the BlockingEhCache decorator. Instead of having the other threads waiting on the "fetching thread," I would like the other threads to use the "stale" value corresponding to the "missed" key. Is there any 3rd party developed ehcache decorators for this purpose? Do you know of any other caching solutions that have this behavior? Other suggestions?
I don't know EHCache good enough to give specific recommendations for it to solve your problem, so I'll outline what I would do, without EHCache.
Let's assume all the threads are accessing this cache using a Service interface, called FooService, and a service bean called SimpleFooService. The service will have the methods required to get the data needed (which is also cached). This way you're hiding the fact that it's cached from from the frontend (http requests objects).
Instead of simply storing the data to be cached in a property in the service, we'll make a special object for it. Let's call it FooCacheManager. It will store the cache in a property in FooCacheManger (Let's say its of type Map). It will have getters to get the cache. It will also have a special method called reload(), which will load the data from the DB (by calling a service methods to get the data, or through the DAO), and replace the content of the cache (saved in a property).
The trick here is as follows:
Declare the cache property in FooCacheManger as AtomicReference (new Object declared in Java 1.5). This guarantees thread safety when you read and also assign to it. Your read/write actions will never collide, or read half-written value to it.
The reload() will first load the data into a temporary map, and then when its finished it will assign the new map to the property saved in FooCacheManager. Since the property is AtomicReference, the assignment is atomic, thus it's basically swiping the map in an instant without any need for locking.
TTL implementation - Have FooCacheManager implement the QuartzJob interface, and making it effectively a quartz job. In the execute method of the job, have it run the reload(). In the Spring XML define this job to run every xx minutes (your TTL) which can also be defined in a property file if you use PropertyPlaceHolderConfigurer.
This method is effective since the reading threads:
Don't block for read
Don't called isExpired() on every read, which is 1k / second.
Also the writing thread doesn't block when writing the data.
If this wasn't clear, I can add example code.
Since ehcache removes stale data, a different approach can be to refresh data with a probability that increases as expiration time approaches, and is 0 if expiration time is "sufficiently" far.
So, if thread 1 needs some data element, it might refresh it, even though data is not old yet.
In the meantime, thread 2 needs same data, it might use the existing data (while refresh thread has not finished yet). It is possible thread 2 might try to do a refresh too.
If you are working with references (the updater thread loads the object and then simply changes the reference in the cache), then no separate synchronization is required for get and set operations on the cache.

Categories

Resources