Thread-safe caching for expensive resource that needs global clean up

Thread-safe caching for expensive resource that needs global clean up - java

Situation:
Need a cache of an expensive-to-create and non-thread-safe external resource
The resource requires explicit clean up
The termination of each thread cannot be hooked, but that of the application can
The code also runs in a Servlet container, so caches that cause a strong reference from the system class loader (e.g. ThreadLocal) cannot be directly used (see edit below).
Thus to use a ThreadLocal, it can only hold WeakReferences to the resource and a separated collection of strong references has to be kept. The code quickly gets very complicated and creates a memory leak (as the strong reference is never removed after thread death).
ConcurrentHashMap seems to be a good suit, but it also suffers from the memory leak.
What other alternatives are there? A synchronised WeakHashMap??
(Hopefully the solution can also be automatically initialised using a given Supplier just like ThreadLocal.withInitial())
Edit:
Just to prove the class loader leak is a thing. Create a minimal WAR project with:
public class Test {
public static ThreadLocal<Test> test = ThreadLocal.withInitial(Test::new);
}
index.jsp:
<%= Test.test.get() %>
Visit the page and shutdown the Tomcat and you get:
Aug 21, 2015 5:56:11 PM org.apache.catalina.loader.WebappClassLoaderBase checkThreadLocalMapForLeaks
SEVERE: The web application [test] created a ThreadLocal with key of type [java.lang.ThreadLocal.SuppliedThreadLocal] (value [java.lang.ThreadLocal$SuppliedThreadLocal#54e69987]) and a value of type [test.Test] (value [test.Test#2a98020a]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.

That seems to be the typical “weak key, strong value referencing the key” problem. If you make the value weak, it can be collected even if the key is reachable, if you make it strong, the key is strongly reachable as well. This can’t be solved without a direct support by the JVM.
Thankfully there is a class which offers that (though it’s not emphasized in its documentation):
java.lang.ClassValue:
Lazily associate a computed value with (potentially) every type. For example, if a dynamic language needs to construct a message dispatch table for each class encountered at a message send call site, it can use a ClassValue to cache information needed to perform the message send quickly, for each class encountered.
While this documentation doesn’t say that the values may refer to the Class key, it’s intended use case of holding dispatch tables for a class implies that it is typical to have values with back-references.
Let’s demonstrate it with a small test class:
public class ClassValueTest extends ClassValue<Method> {
#Override
protected Method computeValue(Class<?> type) {
System.out.println("computeValue");
return Arrays.stream(type.getDeclaredMethods())
.filter(m->Modifier.isPublic(m.getModifiers()))
.findFirst().orElse(null);
}
public static void main(String... arg) throws Throwable {
// create a collectible class:
MethodHandles.Lookup l=MethodHandles.lookup();
MethodType noArg = MethodType.methodType(void.class);
MethodHandle println = l.findVirtual(
PrintStream.class, "println", MethodType.methodType(void.class, String.class));
Runnable r=(Runnable)LambdaMetafactory.metafactory(l, "run",
println.type().changeReturnType(Runnable.class), noArg, println, noArg)
.getTarget().invokeExact(System.out, "hello world");
r.run();
WeakReference<Class<?>> ref=new WeakReference<>(r.getClass());
ClassValueTest test=new ClassValueTest();
// compute and get
System.out.println(test.get(r.getClass()));
// verify that the value is cached, should not compute
System.out.println(test.get(r.getClass()));
// allow freeing
r=null;
System.gc();
if(ref.get()==null) System.out.println("collected");
// ensure that it is not our cache instance that has been collected
System.out.println(test.get(String.class));
}
}
On my machine it printed:
hello world
computeValue
public void ClassValueTest$$Lambda$1/789451787.run()
public void ClassValueTest$$Lambda$1/789451787.run()
collected
computeValue
public boolean java.lang.String.equals(java.lang.Object)
To explain, this test creates an anonymous class, just like lambda expressions produce, which can be garbage collected. Then it uses the ClassValueTest instance to cache a Method object of that Class. Since Method instances have a reference to their declaring class, we have the situation of a value referring to its key here.
Still, after the class is not used anymore, it gets collected, which implies that the associated value has been collected too. So its immune to backreferences of the value to the key.
The last test using another class just ensures that we are not a victim of eager garbage collection as described here as we are still using the cache instance itself.
This class associates a single value with a class, not a value per thread, but it should be possible to combine ClassValue with ThreadLocal to get the desired result.

I'd propose to get rid of ThreadLocal and WeakReference stuff altogether, because, as you say, resources are not bound to specific threads, they just cannot be accessed from several threads simultaneously.
Instead, have a global cache, Map <Key, Collection <Resource>>. Cache contains only resources that are free for use at the moment.
Threads would first request an available resource from the cache. If present (this, of course, should be synchronized, as the cache is global), arbitrary resource is removed from the collection for that key and given to the thread. Otherwise, a new one for that key is built and also given to the thread.
When a thread finishes using a resource, it should return it to the cache, i.e. add to the collection mapped to resource key. From there it can be used by the same thread again, or even by a different thread.
Advantages:
Cache is global, trivial to shut down all allocated resources when application quits.
Hardly any potential for memory leaks, code should be pretty concise.
Threads can share resources (provided they need the same resource at different time), potentially decreasing demand.
Disadvantages:
Requires synchronization (but likely cheap and not difficult to code).
Maybe some others, depending on what exactly you do.

I am not sure about the problem you are talking about. Please take a look at: https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem
Some Questions:
How is the resource referenced?
What is the interface to the resource?
What data should be cached at all?
What is a "non-thread safe resource"
How often is the resource retrieved?
How long is the access to one resource, what level of concurrency is there?
Is one thread using the resource many times and this is the reason for the intended caching?
Are many threads using the same resource (instance)?
Can there be many instances of the same resource type, since the actual instance is not thread safe?
How many resources you have?
Is it many resource instances of the same type or different types?
Maybe you can try to remove the words ThreadLocal, WeakReference, ConcurrentHashMap from your question?
Some (wild) guess:
From what I can read between the lines, it seems to me that it is a straight forward use case for a Java cache. E.g. you can use Google Guava cache and add a removal listener for the explicit cleanup.
Since the resource is not thread safe you need to implement a locking mechanism. This can be done by putting a lock object into the cached object.
If you need more concurrency, create more resources of the same type and augment the cache key with the hash of the thread modulo the level of concurrency you like to have.

While researching the weak concurrent map idea, I found that it's implemented in Guava's Cache.
I used the current thread as the weak key and an CacheLoader is supplied to automatically create the resource for each new thread.
A removal listener is also added, so that each thread's resource will be automatically cleaned up after the Thread object is GC'ed or when I call the invalidateAll() method during shut-down.
Most of the configuration above can also be done in a one liner (with lambdas).

Related

Singleton returns null when accessed by threads

As the title states I'm trying to troubleshoot an issue where some threads which read data from a Singleton get a null value. My investigation into our logs read as though its a concurrency issue.
The Singleton is defined as follows:
#Singleton
public class StaticDatabaseEntries {
private static final Map<String,Thing> databaseEntries = new HashMap<>();
#Lock(LockType.READ)
public Thing getThing(String index) {
return databaseEntries.get(index);
}
}
At first I was under the impression that only one element within the data was corrupted as access to the same item is repeatededly returning null. Further access to debug entries show that the issue appears isolated to a specific thread. It's as though once whatever occurs that induces the null return on a thread continues to do so but only on the affected thread.
An earlier version of this class did not apply the LockType.READ so per the specification a LockType.WRITE is assumed. I deployed an update with the correct lock to enable concurrent read. This did not improve the situation.
The data is loaded into the HashMap from a database upon deployment and remains unchange for the duration. Since the class isn't tagged with #Startup the application instead uses a context listener to trigger the loading of the entries from database.
With threads primarily performing a read activity I don't believe a switch to ConcurrentHashMap is benficial. I am considering removing the static final portion as it seems unnecessary when the container is managing concurrent access and the singleton lifecycle. I have experienced side effects when the container cannot subclass/proxy things which are marked as final in EJBs.
The other possibility I've considered is there is some manner of bug in the container software. This is running on a older Java 1.7 and JBOSS 6 EAP. Worst case I'll have to forego the singleton pattern and instead load the entries from the database on demand.

In general: If you work with threads, read activity can cause problems, if you are calling an object method, that isn't thread-save! This is exactly what often leads to undetectable errors in larger projects.
HashMap isn't thread-safe!
You should switch to ConcurrentHashMap.
For further details, have a look at this article: https://www.baeldung.com/java-concurrent-map

restoring object to default state

I am applying an object pool where we are facing issues because of huge number of object creation. For the same I need a clear() method to clean or restore object to the default state of every object, as if it is a newly created object. Can this be done without having manually calling each method to set to default as this would lead to a chance of a bug if developer forget to update clear() when changing bean class.

There is definitely no any kind of such ready decision for your case in Java. If you need return pool's worker in initial state, so you have to do it by your self, I mean that you need write method which will put all needed object variables in initial values.
Meantime I would say your avoiding about huge number of object creation here is unnecessarily, until you will see in profiler significant time of this particular event. I would also say that due to complex structure of your classes (as you do not want to write this clean method by your self, I guess - they are complex) - clearing these objects them self will take comparable time to this object creation time. Pooling in Java in most cases is used for really expensive objects, such as threads or DB connections.

First of all, all unwanted objects automatically get collected by the garbage collector.
Another option you have to request garbage collector by using:
System.gc();
OR
Runtime.getRuntime().gc();
You can Override finalize method in your bean class
protected void finalize() throws Throwable
{
//Keep some resource closing operations here
}

Using static method and variables in java web application

I have a search box in the header of my web application and use autocomplete to help users find books by author's name or book's title. On user input, oninput() function calls servlet FindBooks through ajax. FindBooks servlet calls static method suggest() of class SuggestionBook which returns array of books matching input string.
FindBooks.java
JSONArray books = SuggestionBook.suggest(inputString);
out.print(books);
SuggestionBook.java
private static boolean isTernaryEmpty = true;
private static Ternary ternary;
private static void fillTernary() {
// fills ternary search tree with data.
isTernaryEmpty = false;
}
public static JSONArray suggest(String searchString) {
if(isTernaryEmpty) {
fillTernary();
}
return ternary.find(searchString);
}
I have used static methods in class SuggestionBook.java, so as to avoid loading data for each session of application. So it will be loaded only once and then can be shared by different sessions. But since there is only one copy of static method and all sessions use the same static method to fetch data. Do they have to wait while some other session is using the method or it can be accessed simultaneously by all sessions ? If Yes, does it affect the performance of the application. If No, how this concurrent access of a single copy is managed by JVM ? Lastly, as per my understanding data will stay in the memory as long as class SuggestionBook is not garbage collected. Is it a right approach to use data structures as class variables than instance variables, as they will block available memory for longer time.

Do they have to wait while some other session is using the method or it can be accessed simultaneously by all sessions ?
No they don't have to wait and yes they can be accessed simultaneously.
Accessing the same object from multiple sessions simultaneously can be a
problem but does not have to be. If for example two sessions perform
simultaneous access to an object without changing its state that would be
fine. If they do change the state and the state transition involves instable
intermediate states a problem could arise.
If two threads are running the same method at the same time they will both have their code pointers pointing at that method and have their own copies of arguments and local variables on their stacks. They will only interfere with each other if the things on their stacks point to the same objects on the heap.
If Yes, does it affect the performance of the application. If No, how this concurrent access of a single copy is managed by JVM ?
Memory in java is split up into two kinds - the heap and the stacks. The heap is where all the objects live and the stacks are where the threads do their work. Each thread has its own stack and can't access each others stacks. Each thread also has a pointer into the code which points to the bit of code they're currently running.When a thread starts running a new method it saves the arguments and local variables in that method on its own stack. Some of these values might be pointers to objects on the heap.
Lastly, as per my understanding data will stay in the memory as long as class SuggestionBook is not garbage collected. Is it a right approach to use data structures as class variables than instance variables, as they will block available memory for longer time.
Since you're using servlet, a single instance of servlet is created only once on webapp's startup and shared among all requests. Static or not, every class/instance variable is going to be shared among all requests/sessions.
There will be only a single instance of the Servlet, and an instance variable will act like a static variable. Therefore, rather than requiring people to know about the single instance (since many people do not) by making the variable static rather than instance, it removes any confusion in the usage. Therefore the intent of the variable is clearer and less likely to be misunderstood. So yeah its is not a bad approach by usability.

You can make suggest method as synchronized and it will work. As only the first called will fill the data in tree and the subsequent calls just read it.
But if you synchronize suggest method, every thread that calls suggest will be synchronized and this is unnecessary since the first call has already filled the tree.
Solution 1) Create a static block and initialize the tree in that. So that way the tree is initialized as soon as the class is loaded.
Solution 2) make "fillTernary" method as synchronized and inside the method initialize the tree only if it is not initialised i.e if(isTernaryEmpty). Please note that the if condition is required in both the methods which is to prevent multiple threads from initializing at the same time.

Should I mark object attributes as volatile if I init them in #PostConstruct in Spring Framework?

Suppose, that I do some initialization in Spring singleton bean #PostConstruct (simplified code):
#Service
class SomeService {
public Data someData; // not final, not volatile
public SomeService() { }
#PostConstruct
public void init() {
someData = new Data(....);
}
}
Should I worry about someData visibility to other beans and mark it volatile?
(suppose that I cannot initialize it in constructor)
And second scenario: what if I overwrite value in #PostConstruct (after for example explicit initialization or initialization in constructor), so write in #PostConstruct will not be first write to this attribute?

The Spring framework is not tied into the Java programming language, it is just a framework. Therefore, in general, you need to mark a non-final field that is accessed by different threads to be volatile. At the end of the day, a Spring bean is nothing more than a Java object and all language rules apply.
final fields receive a special treatment in the Java programming language. Alexander Shipilev, The Oracle performance guy, wrote a great article on this matter. In short, when a constructor initializes a final field, the assembly for setting the field value adds an additional memory barrier that assures that the field is seen correctly by any thread.
For a non-final field, no such memory barrier is created. Thus, in general, it is perfectly possible that the #PostConstruct-annotated method initializes the field and this value is not seen by another thread, or even worse, seen when the constructor is yet only partially executed.
Does this mean that you always need to mark non-final fields as volatile?
In short, yes. If a field can be accessed by different threads, you do. Don't make the same mistake that I did when only thinking of the matter for a few seconds (thanks to Jk1 for the correction) and think in terms of your Java code's execution sequence. You might think that your Spring application context is bootstraped in a single thread. This means that the bootstraping thread will not have issues with the non-volatile field. Thus, you might think that everything is in order as long as you do not expose the application context to another thread until it is fully initialized, i.e. the annotated method is called. Thinking like this, you could assume, the other threads do not have a chance to cache the wrong field value as long as you do not alter the field after this bootstrap.
In contrast, the compiled code is allowed to reorder instructions, i.e. even if the #PostConstruct-annotated method is called before the related bean is exposed to another thread in your Java code, this happens-before relationship is not necessarily retained at in the compiled code at runtime. Thus, another thread might always read and cache the non-volatile field while it is either not yet initialized at all or even partially initialized. This can introduce subtle bugs and the Spring documentation does unfortunately not mention this caveat. Such details of the JMM are a reason why I personally prefer final fields and constructor injection.
Update: According to this answer in another question, there are scenarios where not marking the field as volatile would still produce valid results. I investigated this a little further and the Spring framework guarantees as a matter of fact a certain amount of happens-before safety out of the box. Have a look at the JLS on happens-before relationships where it clearly states:
An unlock on a monitor happens-before every subsequent lock on that monitor.
The Spring framework makes use of this. All beans are stored in a single map and Spring acquires a specific monitor each time a bean is registered or retrieved from this map. As a result, the same monitor is unlocked after registering the fully initialized bean and it is locked before retrieving the same bean from another thread. This forces this other thread to respect the happens-before relationship that is reflected by the execution order of your Java code. Thus, if you bootstrap your bean once, all threads that access the fully initialized bean will see this state as long as they access the bean in a canonical manner (i.e. explicit retrieval by querying the application context or auto-wriring). This makes for example setter injection or the use of a #PostConstruct method safe even without declaring a field volatile. As a matter of fact, you should therefore avoid volatile fields as they introduce a run time overhead for each read what can get painful when accessing a field in loops and because the keyword signals a wrong intention. (By the way, by my knowledge, the Akka framework applies a similar strategy where Akka, other than Spring, drops some lines on the problem.)
This guarantee is however only given for the retrieval of the bean after its bootstrap. If you change the non-volatile field after its bootstrap or if you leak the bean reference during its initialization, this guarantee does not longer apply.
Check out this older blog entry which describes this feature in further detail. Apparently, this feature is not documented as even the Spring people are aware of (but did not do anything about in a long time).

Should I worry about someData write visibility to other beans and mark it volatile?
I see no reason why you should not. Spring framework provides no additional thread safety guarantees when calling #PostConstruct, so usual visibility issues may still happen. A common approach would be to declare someData final, but if you want to modify the field several times it obviously won't fit.
It should not really matter if it's the first write to the field, or not. According to Java Memory Model reordering/visibility issues apply in both cases. The only exception is made for final fields, which can be written safely on the first time, but later assignments (e.g. via reflection) are not guaranteed to be visible.
volatile, however, can guarantee necessary visibility from the other threads. It also prevents an unwanted exposure of partly-constructed Data object. Due to reordering issues someData reference may be assigned before all neccessary object creation operations are completed, including constructor operations and default value assignments.
Update: According to a comprehensive research made by #raphw Spring stores singleton beans in monitor-guarded map. This is actually true, as we can see from the source code of org.springframework.beans.factory.support.DefaultSingletonBeanRegistry:
public Object getSingleton(String beanName, ObjectFactory singletonFactory) {
Assert.notNull(beanName, "'beanName' must not be null");
synchronized (this.singletonObjects) {
Object singletonObject = this.singletonObjects.get(beanName);
...
return (singletonObject != NULL_OBJECT ? singletonObject : null);
}
}
This may provide you with a thread-safety properties on #PostConstruct, but I would not consider it as sufficient guarantee for a number of reasons:
It affect only singleton-scoped beans, providing no guarantees for the beans of other scopes: request, session, global session, accidentally exposed prototype scope, custom user scopes (yes, you can create one by yourself).
It ensures write to someData is protected, but it gives no guarantees to the reader thread. One can construct an equivalent, but simplified example here, where data write is monitor-guarder and reader thread is not affected by any happens-before relationship here and can read outdated data:
public class Entity {
public Object data;
public synchronized void setData(Object data) {
this.data = data;
}
}
The last, but not least: this internal monitor we're talking about is an implementation detail. Being undocumented it is not guaranteed to stay forever and may be changed without further notice.
Side note: All stated above is true for beans, that are subject of multithreaded access. For prototype-scoped beans it is not really the case, unless they are exposed to several threads explicitly, e.g. by injection into a singleton-scoped bean.

How to make cache thread safe

I have a instance of a object which performs very complex operation.
So in the first case I create an instance and save it it my own custom cache.
From next times whatever thread comes if he finds that a ready made object is already present in the cache they take it from the cache so as to be good in performance wise.
I was worried about what if two threads have the same instance. IS there a chance that the two threads can corrupt each other.
Map<String, SoftReference<CacheEntry<ClassA>>> AInstances= Collections.synchronizedMap(new HashMap<String, SoftReference<CacheEntry<ClassA>>>());

There are many possible solutions:
Use an existing caching solution like EHcache
Use the Spring framework which got an easy way to cache results of a method with a simple #Cacheable annotation
Use one of the synchronized maps like ConcurrentHashMap
If you know all keys in advance, you can use a lazy init code. Note that everything in this code is there for a reason; change anything in get() and it will break eventually (eventually == "your unit tests will work and it will break after running one year in production without any problem whatsoever").
ConcurrentHashMap is most simple to set up but it has simple way to say "initialize the value of a key once".
Don't try to implement the caching by yourself; multithreading in Java has become a very complex area with Java 5 and the advent of multi-core CPUs and memory barriers.
[EDIT] yes, this might happen even though the map is synchronized. Example:
SoftReference<...> value = cache.get( key );
if( value == null ) {
value = computeNewValue( key );
cache.put( key, value );
}
If two threads run this code at the same time, computeNewValue() will be called twice. The method calls get() and put() are safe - several threads can try to put at the same time and nothing bad will happen, but that doesn't protect you from problems which arise when you call several methods in succession and the state of the map must not change between them.

Assuming you are talking about singletons, simply use the "demand on initialization holder idiom" to make sure your "check" works across all JVM's. This will also make sure all threads which are requesting the same object concurrently wait till the initialization is over and be given back only valid object instance.
Here I'm assuming you want a single instance of the object. If not, you might want to post some more code.

Ok If I understand your problem correctly, you are worried that 2 objects changing the state of the shared object will corrupt each other.
The short answer is yes they will.
If the object is expensive in creation but is needed in a read only manner. I suggest you make it immutable, this way you get the benefit of it being fast in access and at the same time thread safe.
If the state should be writable but you don't actually need threads to see each others updates. You can simply load the object once in an immutable cache and just return copies to anyone who asks for the object.
Finally if your object needs to be writable and shared (for other reasons than it just being expensive to create). Then my friend you need to handle thread safety, I don't know your case but you should take a look at the synchronized keyword, Locks and java 5 concurrency features, Atomic types. I am sure one of them will satisfy your need and I sincerely wish that your case is one of the first 2 :)

If you only have a single instance of the Object, have a quick look at:
Thread-safe cache of one object in java
Other wise I can't recommend the google guava library enough, in particular look at the MapMaker class.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.