I've got the following problem (one important restriction - cannot use external jar/libraries, only java primitives that come with regular install):
Objects of class X are stored long term in sql DB. Objects are cached for performance sake (needs to be written. Intend to base it on LinkedHashMap).
get(key):
check if object is in cache and not in use - return it.
if object is in use - sleep till it's available.
if object is not in cache - read it from DB.
putInCache(object):
update object in cache (if it's not there, add it).
if the cache is exhausted it will trigger a saveToDB operation by the cache and remove
from cache the least recent used item.
saveToDB(object):
write object to DB (not removed from cache) and mark object and "not changed".
There are multiple threads calling get. A thread can change the object it received from get (and the object will be marked as "changed") - when it's finished it will call putInCache.
There is one dedicated thread that goes over the cache objects and when it encounters a "changed" object it will trigger saveToDB (object will be marked as used while DB access is going on).
How would you recommend to ensure thread safety ?
Basically I'm looking for the right Java classes that will enable:
1. get to synchronize it's access to each object in the cache. So that it can check if it's there and if so - if it's used or free for grabbing. If it's used - it should sleep until it's available.
2. the dedicated thread should not lock the cache while calling saveToDB but still making sure all the cache is examined and no starvation is caused (the cache might change while saveToDB is running)
just to clarify I'm only interested in the locking/synchronization solutions - things like the cache triggering and DB access can be assumed as given.
Here is an approach:
use an ExecutorService to handle DB requests;
use Futures for your map values;
use a ConcurrentHashMap as a map implementation.
The Future should get from the DB; it will use the ExecutorService.
When you need to make manipulations on one object, synchronize on this future's .get() which will be the object.
Also, google for "Java concurrency in practice", and buy the book ;)
Related
We have a web application which receives some million requests per day, we audit the request counts and response status using an interceptor, which intern calls a class annotated with #Async annotation of spring, this class basically adds them to a map and persists the map after a configured interval. As we have fixed set of api we maintain ConcurrentHashMap map having API name as key and its count and response status object as value.So for every request for an api we check whether it exists in our map , if exist we fetch the object against it otherwise we create an object and put it in map. For ex
class Audit{
CounterObject =null;
if(APIMap.contains(apiname){
// fetch existing object
CounterObject=APIMap.get(apiname);
}
else{
//create new object and put it to the map
CounterObject=new CounterObject();
}
// Increment count,note response status and other operations of the CounterObject recieved
}
Then we perform some calculation on the received object (whether from map or newly created) and update counters.
We aggreagate the map values for specific interval and commit it to database.
This works fine for less hits , but under a high load we face some issues. Like
1. First thread got the object and updated the count, but before updating second thread comes and gets the value which is not the latest one, by this time first thread has done the changes and commits the value , but the second threads updates the values it got previously and updated them. But as the key on which operation is performed is same for both the threads the counter is overwritten by the thread whichever writes last.
2. I don't want to put synchronized keyword over the block which has logic for updating the counter. As even if the processing is async and the user gets response even before we check apiname in map still the application resources consumed will be higher under high load if synchronized keyword is used , which can result in late response or in worst case a deadlock.
Can anyone suggest a solution which does can update the counters in concurrent way without having to use synchronized keyword.
Note :: I am already using ConcurrentHashMap but as the lock hold and release is so fast at high load by multiple threads , the counter mismatches.
In your case you are right to look at a solution without locking (or at least with very local locking). And as long as you do simple operations you should be able to pull this off.
First of all you have to make sure you only make one new CounterObject, instead of having multiple threads create one of their own and the last one overwriting earlier object.
ConcurrentHashMap has a very useful function for this: putIfAbsent. It will story an object if there is none and return the object that is in the map right after calling it (although the documentation doesn't state it as directly, the code example does). It works as follows:
CounterObject counter = APIMap.putIfAbsent("key", new CounterObject());
counter.countStuff();
The downside of the above is that you always create a new CounterObject, which might be expensive. If that is the case you can use the Java 8 computeIfAbsent which will only call a lambda to create the object if there is nothing associated with the key.
Finally you have to make sure you CounterObject is threadsafe, preferably without locking/sychronization (although if you have very many CounterObjects, locking on it will be less bad than locking the full map, because fewer threads will try to lock the same object at the same time).
In order to make CounterObject safe without locking, you can look into classes such as AtomicInteger which can do many simple operations without locking.
Note that whenever I say locking here it means either with an explicit lock class or by using synchronize.
The reason for counter mismatch is check and put operation in the Audit class is not atomic on ConcurrentHashMap. You need to use putIfAbsent method that performs check and put operation atomically. Refer ConcurrentHashMap javadoc for putIfAbsent method.
Background:
I have a large thread-pool in java each process has some internal state.
I would like to gather some global information about the states -- to do that I have an associative commutative aggregation function (e.g. sum -- mine needs to be plug-able though).
The solution needs to have a fixed memory consumption and be log-free in best case not disturbing the pool at all. So no thread should need to require a log (or enter a synchronized area) when writing to the data-structure. The aggregated value is only read after the threads are done, so I don't need an accurate value all the time. Simply collecting all values and aggregate them after the pool is done might lead to memory problems.
The values are going to be more complex datatypes so I cannot use AtomicInteger etc.
My general Idea for the solution:
Have a log-free collection where all threads put their updates to. I don't even need the order of the events.
If it gets to big run the aggregation function on it (compacting it) while the threads continue filling it.
My question:
Is there a data structure that allows for something like that or do I need to implement it from scratch? I couldn't find anything that directly matches my problem. If I have to implement from scratch what would be a good non-blocking collection class to start from?
If the updates are infrequent (relatively speaking) and the aggregation function is fast, I would recommend aggregrating every time:
State myState;
AtomicReference<State> combinedState;
do
{
State original = combinedState.get();
State newCombined = Aggregate(original, myState);
} while(!combinedState.compareAndSet(original, newCombined));
I don't quite understand the question but I would, at first sight, suggest an IdentityHashMap where keys are (references to) your thread objects and values are where your thread objects write their statistics.
An IdentityHashMap only relies on reference equality, as such there would never be any conflict between two thread objects; you could pass a reference to that map to each thread (which would then call .get(this) on the map to get a reference to the collecting data structure), which would then collect the data it wants. Otherwise you could just pass a reference to the collecting data structure to the thread object.
Such a map is inherently thread safe for your use case, as long as you create the key/value pair for that thread before starting the thread, and because no thread object will ever modify the map anyway since they won't have a referece to it. With some management smartness you can even remove entries from this map, even if the map is not even thread-safe, once the thread is done with its work.
When all is done, you have a map whose values contains all the data collected.
Hope this helps... Reading the question again, in any case...
I have many concurrent running http request serving threads. They will be creating an Object(? extends Object) for every request and save the object in a list.
Advice me some good data structure to implement this list.
I can't use ArrayList since it was not thread safe.
I dont like to use Vector - since its synchronized, it will make other threads to wait when one of the http thread was saving the object.
Also tried LinkedList, but there is data loss due to concurrent update.
Your variable would need to be atomic so that it can safely be updated by multiple threads (see java.util.concurrent.atomic). You could also use an AtomicInteger to keep track of the number of times the variable is updated.
But are you sure you want do this without explicitly controlling the update to a variable?
I have a instance of a object which performs very complex operation.
So in the first case I create an instance and save it it my own custom cache.
From next times whatever thread comes if he finds that a ready made object is already present in the cache they take it from the cache so as to be good in performance wise.
I was worried about what if two threads have the same instance. IS there a chance that the two threads can corrupt each other.
Map<String, SoftReference<CacheEntry<ClassA>>> AInstances= Collections.synchronizedMap(new HashMap<String, SoftReference<CacheEntry<ClassA>>>());
There are many possible solutions:
Use an existing caching solution like EHcache
Use the Spring framework which got an easy way to cache results of a method with a simple #Cacheable annotation
Use one of the synchronized maps like ConcurrentHashMap
If you know all keys in advance, you can use a lazy init code. Note that everything in this code is there for a reason; change anything in get() and it will break eventually (eventually == "your unit tests will work and it will break after running one year in production without any problem whatsoever").
ConcurrentHashMap is most simple to set up but it has simple way to say "initialize the value of a key once".
Don't try to implement the caching by yourself; multithreading in Java has become a very complex area with Java 5 and the advent of multi-core CPUs and memory barriers.
[EDIT] yes, this might happen even though the map is synchronized. Example:
SoftReference<...> value = cache.get( key );
if( value == null ) {
value = computeNewValue( key );
cache.put( key, value );
}
If two threads run this code at the same time, computeNewValue() will be called twice. The method calls get() and put() are safe - several threads can try to put at the same time and nothing bad will happen, but that doesn't protect you from problems which arise when you call several methods in succession and the state of the map must not change between them.
Assuming you are talking about singletons, simply use the "demand on initialization holder idiom" to make sure your "check" works across all JVM's. This will also make sure all threads which are requesting the same object concurrently wait till the initialization is over and be given back only valid object instance.
Here I'm assuming you want a single instance of the object. If not, you might want to post some more code.
Ok If I understand your problem correctly, you are worried that 2 objects changing the state of the shared object will corrupt each other.
The short answer is yes they will.
If the object is expensive in creation but is needed in a read only manner. I suggest you make it immutable, this way you get the benefit of it being fast in access and at the same time thread safe.
If the state should be writable but you don't actually need threads to see each others updates. You can simply load the object once in an immutable cache and just return copies to anyone who asks for the object.
Finally if your object needs to be writable and shared (for other reasons than it just being expensive to create). Then my friend you need to handle thread safety, I don't know your case but you should take a look at the synchronized keyword, Locks and java 5 concurrency features, Atomic types. I am sure one of them will satisfy your need and I sincerely wish that your case is one of the first 2 :)
If you only have a single instance of the Object, have a quick look at:
Thread-safe cache of one object in java
Other wise I can't recommend the google guava library enough, in particular look at the MapMaker class.
I have a web service that has ~1k request threads running simultaneously on average. These threads access data from a cache (currently on ehcache.) When the entries in the cache expire, the thread that hits the expired entry tries getting the new value from the DB, while the other threads also trying to hit this entry block, i.e. I use the BlockingEhCache decorator. Instead of having the other threads waiting on the "fetching thread," I would like the other threads to use the "stale" value corresponding to the "missed" key. Is there any 3rd party developed ehcache decorators for this purpose? Do you know of any other caching solutions that have this behavior? Other suggestions?
I don't know EHCache good enough to give specific recommendations for it to solve your problem, so I'll outline what I would do, without EHCache.
Let's assume all the threads are accessing this cache using a Service interface, called FooService, and a service bean called SimpleFooService. The service will have the methods required to get the data needed (which is also cached). This way you're hiding the fact that it's cached from from the frontend (http requests objects).
Instead of simply storing the data to be cached in a property in the service, we'll make a special object for it. Let's call it FooCacheManager. It will store the cache in a property in FooCacheManger (Let's say its of type Map). It will have getters to get the cache. It will also have a special method called reload(), which will load the data from the DB (by calling a service methods to get the data, or through the DAO), and replace the content of the cache (saved in a property).
The trick here is as follows:
Declare the cache property in FooCacheManger as AtomicReference (new Object declared in Java 1.5). This guarantees thread safety when you read and also assign to it. Your read/write actions will never collide, or read half-written value to it.
The reload() will first load the data into a temporary map, and then when its finished it will assign the new map to the property saved in FooCacheManager. Since the property is AtomicReference, the assignment is atomic, thus it's basically swiping the map in an instant without any need for locking.
TTL implementation - Have FooCacheManager implement the QuartzJob interface, and making it effectively a quartz job. In the execute method of the job, have it run the reload(). In the Spring XML define this job to run every xx minutes (your TTL) which can also be defined in a property file if you use PropertyPlaceHolderConfigurer.
This method is effective since the reading threads:
Don't block for read
Don't called isExpired() on every read, which is 1k / second.
Also the writing thread doesn't block when writing the data.
If this wasn't clear, I can add example code.
Since ehcache removes stale data, a different approach can be to refresh data with a probability that increases as expiration time approaches, and is 0 if expiration time is "sufficiently" far.
So, if thread 1 needs some data element, it might refresh it, even though data is not old yet.
In the meantime, thread 2 needs same data, it might use the existing data (while refresh thread has not finished yet). It is possible thread 2 might try to do a refresh too.
If you are working with references (the updater thread loads the object and then simply changes the reference in the cache), then no separate synchronization is required for get and set operations on the cache.