I have a search box in the header of my web application and use autocomplete to help users find books by author's name or book's title. On user input, oninput() function calls servlet FindBooks through ajax. FindBooks servlet calls static method suggest() of class SuggestionBook which returns array of books matching input string.
FindBooks.java
JSONArray books = SuggestionBook.suggest(inputString);
out.print(books);
SuggestionBook.java
private static boolean isTernaryEmpty = true;
private static Ternary ternary;
private static void fillTernary() {
// fills ternary search tree with data.
isTernaryEmpty = false;
}
public static JSONArray suggest(String searchString) {
if(isTernaryEmpty) {
fillTernary();
}
return ternary.find(searchString);
}
I have used static methods in class SuggestionBook.java, so as to avoid loading data for each session of application. So it will be loaded only once and then can be shared by different sessions. But since there is only one copy of static method and all sessions use the same static method to fetch data. Do they have to wait while some other session is using the method or it can be accessed simultaneously by all sessions ? If Yes, does it affect the performance of the application. If No, how this concurrent access of a single copy is managed by JVM ? Lastly, as per my understanding data will stay in the memory as long as class SuggestionBook is not garbage collected. Is it a right approach to use data structures as class variables than instance variables, as they will block available memory for longer time.
Do they have to wait while some other session is using the method or it can be accessed simultaneously by all sessions ?
No they don't have to wait and yes they can be accessed simultaneously.
Accessing the same object from multiple sessions simultaneously can be a
problem but does not have to be. If for example two sessions perform
simultaneous access to an object without changing its state that would be
fine. If they do change the state and the state transition involves instable
intermediate states a problem could arise.
If two threads are running the same method at the same time they will both have their code pointers pointing at that method and have their own copies of arguments and local variables on their stacks. They will only interfere with each other if the things on their stacks point to the same objects on the heap.
If Yes, does it affect the performance of the application. If No, how this concurrent access of a single copy is managed by JVM ?
Memory in java is split up into two kinds - the heap and the stacks. The heap is where all the objects live and the stacks are where the threads do their work. Each thread has its own stack and can't access each others stacks. Each thread also has a pointer into the code which points to the bit of code they're currently running.When a thread starts running a new method it saves the arguments and local variables in that method on its own stack. Some of these values might be pointers to objects on the heap.
Lastly, as per my understanding data will stay in the memory as long as class SuggestionBook is not garbage collected. Is it a right approach to use data structures as class variables than instance variables, as they will block available memory for longer time.
Since you're using servlet, a single instance of servlet is created only once on webapp's startup and shared among all requests. Static or not, every class/instance variable is going to be shared among all requests/sessions.
There will be only a single instance of the Servlet, and an instance variable will act like a static variable. Therefore, rather than requiring people to know about the single instance (since many people do not) by making the variable static rather than instance, it removes any confusion in the usage. Therefore the intent of the variable is clearer and less likely to be misunderstood. So yeah its is not a bad approach by usability.
You can make suggest method as synchronized and it will work. As only the first called will fill the data in tree and the subsequent calls just read it.
But if you synchronize suggest method, every thread that calls suggest will be synchronized and this is unnecessary since the first call has already filled the tree.
Solution 1) Create a static block and initialize the tree in that. So that way the tree is initialized as soon as the class is loaded.
Solution 2) make "fillTernary" method as synchronized and inside the method initialize the tree only if it is not initialised i.e if(isTernaryEmpty). Please note that the if condition is required in both the methods which is to prevent multiple threads from initializing at the same time.
Related
I am writing a multithreaded webcrawler, where there is one WebCrawler object which uses an ExecutorService to process WebPages and extract anchors from each page. I have a method defined in the WebCrawler class which can be called by WebPages to add extracted sublinks to the WebCrawler's Set of nextPagestoVisit, and the method currently looks like this:
public synchronized void addSublinks(Set<WebPage> sublinks) {
this.nextPagestoVisit.addAll(sublinks);
}
Currently I am using a synchronized method. However, I am considering other possible options.
Making the Set a synchronizedSet:
public Set<WebPage> nextPagestoVisit = Collections.synchronizedSet(new HashSet<WebPage>());
Making the Set volatile:
public volatile Set<WebPage> nextPagestoVisit = new HashSet<WebPage>();
Are both of these two alternatives sufficient on their own? (I am assuming that the synchronized method approach is sufficient). Or would I have to combine them with other safety measures? If they all work, which one would be the best approach? If one or both do not work, please provide a short explanation of why (ie. what kind of scenario would cause problems). Thanks
Edit: To be clear, my goal is to ensure that if two WebPages both try to add their sublinks at the same time, one write will not be overwritten by the other (ie. all sublinks will successfully be added to the Set).
Making the variable that holds the set volatile will do nothing for you. For a start this only affects the "pointer" to the set, not the set itself. Then it means the atomic updates to the pointer will be seen by all threads. It does nothing for the Set.
Making the Set a synchronizedSet does what you want. As would either synchronized blocks or Semaphores. However both would add more boilerplate than just using synchronizedSet and are an additional vector for bugs.
I am not sure that you know what the volatile keyword actually does. It does not ensure mutual exclusion. Quoting from here :
"Using volatile, on the other hand, forces all accesses (read or write) to the volatile variable to occur to main memory, effectively keeping the volatile variable out of CPU caches. This can be useful for some actions where it is simply required that visibility of the variable be correct and order of accesses is not important."
You do have however several alternatives:
Using a synchronized block
synchronized {
//synchronized code
}
Using alternatives like semaphores
Semaphore semaphore,
semaphore.aquire()
...
semaphore.release()
Again, note that you are saying you are trying to achieve synchronized access. If all you need is to ensure that the variable is the freshest possible always the volatile is a fairly simple solution.
Situation:
Need a cache of an expensive-to-create and non-thread-safe external resource
The resource requires explicit clean up
The termination of each thread cannot be hooked, but that of the application can
The code also runs in a Servlet container, so caches that cause a strong reference from the system class loader (e.g. ThreadLocal) cannot be directly used (see edit below).
Thus to use a ThreadLocal, it can only hold WeakReferences to the resource and a separated collection of strong references has to be kept. The code quickly gets very complicated and creates a memory leak (as the strong reference is never removed after thread death).
ConcurrentHashMap seems to be a good suit, but it also suffers from the memory leak.
What other alternatives are there? A synchronised WeakHashMap??
(Hopefully the solution can also be automatically initialised using a given Supplier just like ThreadLocal.withInitial())
Edit:
Just to prove the class loader leak is a thing. Create a minimal WAR project with:
public class Test {
public static ThreadLocal<Test> test = ThreadLocal.withInitial(Test::new);
}
index.jsp:
<%= Test.test.get() %>
Visit the page and shutdown the Tomcat and you get:
Aug 21, 2015 5:56:11 PM org.apache.catalina.loader.WebappClassLoaderBase checkThreadLocalMapForLeaks
SEVERE: The web application [test] created a ThreadLocal with key of type [java.lang.ThreadLocal.SuppliedThreadLocal] (value [java.lang.ThreadLocal$SuppliedThreadLocal#54e69987]) and a value of type [test.Test] (value [test.Test#2a98020a]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.
That seems to be the typical “weak key, strong value referencing the key” problem. If you make the value weak, it can be collected even if the key is reachable, if you make it strong, the key is strongly reachable as well. This can’t be solved without a direct support by the JVM.
Thankfully there is a class which offers that (though it’s not emphasized in its documentation):
java.lang.ClassValue:
Lazily associate a computed value with (potentially) every type. For example, if a dynamic language needs to construct a message dispatch table for each class encountered at a message send call site, it can use a ClassValue to cache information needed to perform the message send quickly, for each class encountered.
While this documentation doesn’t say that the values may refer to the Class key, it’s intended use case of holding dispatch tables for a class implies that it is typical to have values with back-references.
Let’s demonstrate it with a small test class:
public class ClassValueTest extends ClassValue<Method> {
#Override
protected Method computeValue(Class<?> type) {
System.out.println("computeValue");
return Arrays.stream(type.getDeclaredMethods())
.filter(m->Modifier.isPublic(m.getModifiers()))
.findFirst().orElse(null);
}
public static void main(String... arg) throws Throwable {
// create a collectible class:
MethodHandles.Lookup l=MethodHandles.lookup();
MethodType noArg = MethodType.methodType(void.class);
MethodHandle println = l.findVirtual(
PrintStream.class, "println", MethodType.methodType(void.class, String.class));
Runnable r=(Runnable)LambdaMetafactory.metafactory(l, "run",
println.type().changeReturnType(Runnable.class), noArg, println, noArg)
.getTarget().invokeExact(System.out, "hello world");
r.run();
WeakReference<Class<?>> ref=new WeakReference<>(r.getClass());
ClassValueTest test=new ClassValueTest();
// compute and get
System.out.println(test.get(r.getClass()));
// verify that the value is cached, should not compute
System.out.println(test.get(r.getClass()));
// allow freeing
r=null;
System.gc();
if(ref.get()==null) System.out.println("collected");
// ensure that it is not our cache instance that has been collected
System.out.println(test.get(String.class));
}
}
On my machine it printed:
hello world
computeValue
public void ClassValueTest$$Lambda$1/789451787.run()
public void ClassValueTest$$Lambda$1/789451787.run()
collected
computeValue
public boolean java.lang.String.equals(java.lang.Object)
To explain, this test creates an anonymous class, just like lambda expressions produce, which can be garbage collected. Then it uses the ClassValueTest instance to cache a Method object of that Class. Since Method instances have a reference to their declaring class, we have the situation of a value referring to its key here.
Still, after the class is not used anymore, it gets collected, which implies that the associated value has been collected too. So its immune to backreferences of the value to the key.
The last test using another class just ensures that we are not a victim of eager garbage collection as described here as we are still using the cache instance itself.
This class associates a single value with a class, not a value per thread, but it should be possible to combine ClassValue with ThreadLocal to get the desired result.
I'd propose to get rid of ThreadLocal and WeakReference stuff altogether, because, as you say, resources are not bound to specific threads, they just cannot be accessed from several threads simultaneously.
Instead, have a global cache, Map <Key, Collection <Resource>>. Cache contains only resources that are free for use at the moment.
Threads would first request an available resource from the cache. If present (this, of course, should be synchronized, as the cache is global), arbitrary resource is removed from the collection for that key and given to the thread. Otherwise, a new one for that key is built and also given to the thread.
When a thread finishes using a resource, it should return it to the cache, i.e. add to the collection mapped to resource key. From there it can be used by the same thread again, or even by a different thread.
Advantages:
Cache is global, trivial to shut down all allocated resources when application quits.
Hardly any potential for memory leaks, code should be pretty concise.
Threads can share resources (provided they need the same resource at different time), potentially decreasing demand.
Disadvantages:
Requires synchronization (but likely cheap and not difficult to code).
Maybe some others, depending on what exactly you do.
I am not sure about the problem you are talking about. Please take a look at: https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem
Some Questions:
How is the resource referenced?
What is the interface to the resource?
What data should be cached at all?
What is a "non-thread safe resource"
How often is the resource retrieved?
How long is the access to one resource, what level of concurrency is there?
Is one thread using the resource many times and this is the reason for the intended caching?
Are many threads using the same resource (instance)?
Can there be many instances of the same resource type, since the actual instance is not thread safe?
How many resources you have?
Is it many resource instances of the same type or different types?
Maybe you can try to remove the words ThreadLocal, WeakReference, ConcurrentHashMap from your question?
Some (wild) guess:
From what I can read between the lines, it seems to me that it is a straight forward use case for a Java cache. E.g. you can use Google Guava cache and add a removal listener for the explicit cleanup.
Since the resource is not thread safe you need to implement a locking mechanism. This can be done by putting a lock object into the cached object.
If you need more concurrency, create more resources of the same type and augment the cache key with the hash of the thread modulo the level of concurrency you like to have.
While researching the weak concurrent map idea, I found that it's implemented in Guava's Cache.
I used the current thread as the weak key and an CacheLoader is supplied to automatically create the resource for each new thread.
A removal listener is also added, so that each thread's resource will be automatically cleaned up after the Thread object is GC'ed or when I call the invalidateAll() method during shut-down.
Most of the configuration above can also be done in a one liner (with lambdas).
can I ask to explain me how threads and synchronisation works in Java?
I want to write a high-performance application. Inside this application, I read a data from files into some nested classes, which are basically a nut-shell around HashMap.
After the data reading is finished, I start threads which need to go through the data and perform different checks on it. However, threads never change the data!
If I can guarantee (or at least try to guarantee;) that my threads never change the data, can I use them calling non-synchronised methods of objects containing data?
If multiple threads access the non-synchronised method, which does not change any class field, but has some internal variables, is it safe?
artificial example:
public class Data{
// this hash map is filled before I start threads
protected Map<Integer, Spike> allSpikes = new HashMap<Integer, Spike>();
public HashMap returnBigSpikes(){
Map<Integer, Spike> bigSpikes = new HashMap<Integer, Spike>();
for (Integer i: allSpikes.keySet()){
if (allSpikes.get(i).spikeSize > 100){
bigSpikes.put(i,allSpikes.get(i));
}
}
return bigSpikes;
}
}
Is it safe to call a NON-synchronised method returnBigSpikes() from threads?
I understand now that such use-cases are potentially very dangerous, because it's hard to control, that data (e.g., returned bigSpikes) will not be modified. But I have already implemented and tested it like this and want to know if I can use results of my application now, and change the architecture later...
What happens if I make the methods synchronised? Will be the application slowed down to 1 CPU performance? If so, how can I design it correctly and keep the performance?
(I read about 20-40 Gb of data (log messages) into the main memory and then run threads, which need to go through the all data to find some correlation in it; each thread becomes only a part of messages to analyse; but for the analysis, the thread should compare each message from its part with many other messages from data; that's why I first decided to allow threads to read data without synchronisation).
Thank You very much in advance.
If allSpikes is populated before all the threads start, you could make sure it isn't changed later by saving it as an unmodifiable map.
Assuming Spike is immutable, your method would then be perfectly safe to use concurrently.
In general, if you have a bunch of threads where you can guarantee that only one thread will modify a resource and the rest will only read that resource, then access to that resource doesn't need to be synchronised. In your example, each time the method returnBigSpikes() is invoked it creates a new local copy of bigSpikes hashmap, so although you're creating a hashmap it is unique to each thread, so no sync'ing problems there.
As long as anything practically immutable (eg. using final keyword) and you use an unmodifiableMap everything is fine.
I would suggest the following UnmodifiableData:
public class UnmodifiableData {
final Map<Integer,Spike> bigSpikes;
public UnmodifiableData(Map<Integer,Spike> bigSpikes) {
this.bigSpikes = Collections.unmodifiableMap(new HashMap<>(bigSpikes));
}
....
}
Your plan should work fine. You do not need to synchronize reads, only writes.
If, however, in the future you wish to cache bigSpikes so that all threads get the same map then you need to be more careful about synchronisation.
If you use ConcurrentHashMap, it will do all syncronization work for you. Its bettr, then making synronization around ordinary HashMap.
Since allSpikes is initialized before you start threads it's safe. Concurrency problems appear only when a thread writes to a resource and others read from it.
I have a instance of a object which performs very complex operation.
So in the first case I create an instance and save it it my own custom cache.
From next times whatever thread comes if he finds that a ready made object is already present in the cache they take it from the cache so as to be good in performance wise.
I was worried about what if two threads have the same instance. IS there a chance that the two threads can corrupt each other.
Map<String, SoftReference<CacheEntry<ClassA>>> AInstances= Collections.synchronizedMap(new HashMap<String, SoftReference<CacheEntry<ClassA>>>());
There are many possible solutions:
Use an existing caching solution like EHcache
Use the Spring framework which got an easy way to cache results of a method with a simple #Cacheable annotation
Use one of the synchronized maps like ConcurrentHashMap
If you know all keys in advance, you can use a lazy init code. Note that everything in this code is there for a reason; change anything in get() and it will break eventually (eventually == "your unit tests will work and it will break after running one year in production without any problem whatsoever").
ConcurrentHashMap is most simple to set up but it has simple way to say "initialize the value of a key once".
Don't try to implement the caching by yourself; multithreading in Java has become a very complex area with Java 5 and the advent of multi-core CPUs and memory barriers.
[EDIT] yes, this might happen even though the map is synchronized. Example:
SoftReference<...> value = cache.get( key );
if( value == null ) {
value = computeNewValue( key );
cache.put( key, value );
}
If two threads run this code at the same time, computeNewValue() will be called twice. The method calls get() and put() are safe - several threads can try to put at the same time and nothing bad will happen, but that doesn't protect you from problems which arise when you call several methods in succession and the state of the map must not change between them.
Assuming you are talking about singletons, simply use the "demand on initialization holder idiom" to make sure your "check" works across all JVM's. This will also make sure all threads which are requesting the same object concurrently wait till the initialization is over and be given back only valid object instance.
Here I'm assuming you want a single instance of the object. If not, you might want to post some more code.
Ok If I understand your problem correctly, you are worried that 2 objects changing the state of the shared object will corrupt each other.
The short answer is yes they will.
If the object is expensive in creation but is needed in a read only manner. I suggest you make it immutable, this way you get the benefit of it being fast in access and at the same time thread safe.
If the state should be writable but you don't actually need threads to see each others updates. You can simply load the object once in an immutable cache and just return copies to anyone who asks for the object.
Finally if your object needs to be writable and shared (for other reasons than it just being expensive to create). Then my friend you need to handle thread safety, I don't know your case but you should take a look at the synchronized keyword, Locks and java 5 concurrency features, Atomic types. I am sure one of them will satisfy your need and I sincerely wish that your case is one of the first 2 :)
If you only have a single instance of the Object, have a quick look at:
Thread-safe cache of one object in java
Other wise I can't recommend the google guava library enough, in particular look at the MapMaker class.
I have a question specific to how the classloading / garbage collection works in Android. We have stumbled upon this issue a few times now, and as far as I can tell, Android behaves different here from an ordinary JVM.
The problem is this: We're currently trying to cut down on singleton classes in the app in favor of a single root factory singleton which sole purpose is to manage other manager classes. A top level manager if you will. This makes it easy for us to replace implementations in tests without opting for a full DI solution, since all Activities and Services share the same reference to that root factory.
Here's how it looks like:
public class RootFactory {
private static volatile RootFactory instance;
#SuppressWarnings("unused")
private Context context; // I'd like to keep this for now
private volatile LanguageSupport languageSupport;
private volatile Preferences preferences;
private volatile LoginManager loginManager;
private volatile TaskManager taskManager;
private volatile PositionProvider positionManager;
private volatile SimpleDataStorage simpleDataStorage;
public static RootFactory initialize(Context context) {
instance = new RootFactory(context);
return instance;
}
private RootFactory(Context context) {
this.context = context;
}
public static RootFactory getInstance() {
return instance;
}
public LanguageSupport getLanguageSupport() {
return languageSupport;
}
public void setLanguageSupport(LanguageSupport languageSupport) {
this.languageSupport = languageSupport;
}
// ...
}
initialize is called once, in Application.onCreate, i.e. before any Activity or Service is started. Now, here is the problem: the getInstance method sometimes comes back as null -- even when invoked on the same thread! That sounds like it isn't a visibility problem; instead, the static singleton reference hold on class level seems to actually have been cleared by the garbage collector. Maybe I'm jumping to conclusions here, but could this be because the Android garbage collector or class loading mechanism can actually unload classes when memory gets scarce, in which case the only reference to the singleton instance will go away? I'm not really deep into Java's memory model, but I suppose that shouldn't happen, otherwise this common way of implementing singletons wouldn't work on any JVM right?
Any idea why this is happening exactly?
PS: one can work around this by keeping "global" references on the single application instance instead. That has proven to be reliable when one must keep on object around across the entire life-time of an app.
UPDATE
Apparently my use of volatile here caused some confusion. My intention was to ensure that the static reference's current state is always visible to all threads accessing it. I must do that because I am both writing and reading that reference from more than one thread: In an ordinary app run just in the main application thread, but in an instrumentation test run, where objects get replaced with mocks, I write it from the instrumentation thread and read it on the UI thread. I could have as well synchronized the call to getInstance, but that's more expensive since it requires claiming an object lock. See What is an efficient way to implement a singleton pattern in Java? for a more detailed discussion around this.
Both you (#Matthias) and Mark Murphy (#CommonsWare) are correct in what you say, but the gist seems lost. (The use of volatile is correct and classes are not unloaded.)
The crux of the question is where initialize is called from.
Here is what I think is happening:
You are calling initialize from an Activity *
Android needs more memory, kills the whole Process
Android restarts the Application and the top Activity
You call getInstance which will return null, as initialize was not called
Correct me if I'm wrong.
Update:
My assumption – that initialize is called from an Activity * – seems to have been wrong in this case. However, I'll leave this answer up because that scenario is a common source of bugs.
I have never in my life seen a static data member declared volatile. I'm not even sure what that means.
Static data members will exist until the process is terminated or until you get rid of them (e.g., null out the static reference). The process may be terminated once all activities and services are proactively closed by the user (e.g., BACK button) and your code (e.g., stopService()). The process may be terminated even with live components if Android is desperately short on RAM, but this is rather unusual. The process may be terminated with a live service if Android thinks that your service has been in the background too long, though it may restart that service depending on your return value from onStartCommand().
Classes are not unloaded, period, short of the process being terminated.
To address the other of #sergui's points, activities may be destroyed, with instance state stored (albeit in RAM, not "fixed storage"), to free up RAM. Android will tend to do this before terminating active processes, though if it destroys the last activity for a process and there are no running services, that process will be a prime candidate for termination.
The only thing significantly strange about your implementation is your use of volatile.
Static references are cleared whenever the system feels like it and your application is not top-level (the user is not running it explicitly). Whenever your app is minimized and the OS wants some more memory it will either kill your app or serialize it on fixed storage for later use, but in both cases static variables are erased.
Also, whenever your app gets a Force Close error, all statics are erased as well. In my experience I saw that it's always better to use variables in the Application object than static variables.
I've seen similar strange behaviour with my own code involving disappearing static variables (I don't think this problem has anything to do with the volatile keyword). In particular this has come up when I've initialized a logging framework (ex. Crashlytics, log4j), and then after some period of activity it appears to be uninitialized. Investigation has shown this happens after the OS calls onSaveInstanceState(Bundle b).
Your static variables are held by the Classloader which is contained within your app's process. According to google:
An unusual and fundamental feature of Android is that an application
process's lifetime is not directly controlled by the application
itself. Instead, it is determined by the system through a combination
of the parts of the application that the system knows are running, how
important these things are to the user, and how much overall memory is
available in the system.
http://developer.android.com/guide/topics/processes/process-lifecycle.html
What that means for a developer is that you cannot expect static variables to remain initialized indefinitely. You need to rely on a different mechanism for persistence.
One workaround I've used to keep my logging framework initialized is for all my Activities to extend a base class where I override onCreate and check for initialization and re-initialize if necessary.
I think the official solution is to use the onSaveInstanceState(Bundle b) callback to persist anything that your Activity needs later, and then re-initialize in onCreate(Bundle b) when b != null.
Google explains it best:
http://developer.android.com/training/basics/activity-lifecycle/recreating.html