I have started learning concurrency and threads in Java. I know the basics of synchronized (i.e. what it does). Conceptually I understand that it provides mutually exclusive access to a shared resource with multiple threads in Java. But when faced with an example like the one below I am confused about whether it is a good idea to have it synchronized. I know that critical sections of the code should be synchronized and this keyword should not be overused or it effects the performance.
public static synchronized List<AClass> sortA(AClass[] aArray)
{
List<AClass> aObj = getList(aArray);
Collections.sort(aObj, new AComparator());
return aObj;
}
public static synchronized List<AClass> getList(AClass[] anArray)
{
//It converts an array to a list and returns
}
Assuming each thread passes a different array then no synchronization is needed, because the rest of the variables are local.
If instead you fire off a few threads all calling sortA and passing a reference to the same array, you'd be in trouble without synchronized, because they would interfere with eachother.
Beware, that it would seem from the example that the getList method returns a new List from an array, such that even if the threads pass the same array, you get different List objects. This is misleading. For example, using Arrays.asList creates a List backed by the given array, but the javadoc clearly states that Changes to the returned list "write through" to the array. so be careful about this.
Synchronization is usually needed when you are sharing data between multiple invocations and there is a possibility that the data would be modified resulting in inconsistency. If the data is read-only then you dont need to synchronize.
In the code snippet above, there is no data that is being shared. The methods work on the input provided and return the output. If multiple threads invoke one of your method, each invocation will have its own input and output. Hence, there is no chance of in-consistency anywhere. So, your methods in the above snippet need not be synchornized.
Synchronisation, if unnecessarily used, would sure degrade the performance due to the overheads involved and hence should be cautiously used only when required.
Your static methods don't depend on any shared state, so need not be synchronized.
There is no rule defined like when to use synchronized and when not, when you are sure that your code will not be accessed by concurrent threads then you can avoid using synchronised.
Synchronization as you have correctly figured has an impact on the throughput of your application, and can also lead to starving thread.
All get basically should be non blocking as Collections under concurrency package have implemented.
As in your example all calling thread will pass there own copy of array, getList doesn't need to be synchronized so is sortA method as all other variables are local.
Local variables live on stack and every thread has its own stack so other threads cannot interfere with it.
You need synchronization when you change the state of the Object that other threads should see in an consistent state, if your calls don't change the state of the object you don't need synchronization.
I wouldn't use synchronized on single threaded code. i.e. where there is no chance an object will be accessed by multiple threads.
This may appear obvious but ~99% of StringBuffer used in the JDK can only be used by one thread can be replaced with a StringBuilder (which is not synchronized)
Related
I have a class "A" with method "calculate()". Class A is of type singleton(Scope=Singleton).
public class A{
public void calculate(){
//perform some calculation and update DB
}
}
Now, I have a program that creates 20 thread. All threads need to access the method "calculate()".
I have multicore system. So I want the parallel processing of the threads.
In the above scenario, can i get performance? Can all threads access the method calculate at same instance of time?
Or, Since the class A is singleton so, the threads needs to be blocked waiting.
I have found similar questions in the web/Stackoverflow. But I cannot get clear answer.
Would you please help me?
Statements like "singletons need synchronization" or "singletons don't need synchronization" are overly simplistic, I'm afraid. No conclusions can be drawn only from the fact that you're dealing with the singleton pattern.
What really matters for purposes of multithreading is what is shared. If there are data that are shared by all threads performing the calculation, then you will probably need to synchronize that access. If there are critical sections of code than cannot run simultaneously between threads, then you will need to synchronize that.
The good news is that often times it will not be necessary to synchronize everything in the entire calculation. You might gain significant performance improvements from your multi-core system despite needing to synchronize part of the operation.
The bad news is that these things are very complex. Sorry. One possible reference:
http://www.amazon.com/Java-Concurrency-Practice-Brian-Goetz/dp/0321349601/ref=sr_1_1?ie=UTF8&qid=1370838949&sr=8-1&keywords=java+concurrency+in+practice
That's the fundamental concept of Singleton. Only one instance of the class would be present in the system (JVM). Now, it depends on the implementation of calculate(). Is it a stateless utility method? If yes, you might not want to make it synchronized. In that case, multiple threads will be able to access it at the same instance of time. If calculate() is NOT stateless, i.e. it uses instance variables (and those instance variables will be used by multiple threads), then be careful; You have to make calculate() thread safe. You have to synchronize the method. At least you have to use a synchronize block inside the method. But, once you do so, only one thread will be able to access it (the synchronized block or the synchronized block inside the method) at any point of time.
public void calculate() {
//Some code goes here which does not need require thread safety.
synchronized(someObj) {
//Some code goes here which requires thread safety.
}
//Some code goes here which does not need require thread safety.
}
If you want to use parallel processing (if that's the primary goal), then singleton is not the design pattern that you should use.
I have found similar questions in the web/Stackoverflow. But I cannot get clear answer.
There is a good reason for that!!
It is not possible to say whether a method on a singleton does, or does not, need to be synchronized by virtue of being singleton.
Synchronization and the need for synchronization is all about state that may be shared by different threads.
If different threads share state (even serially), then synchronization is required.
If not then no synchronization is required.
The only clues that you have provided us that would help us give you a yes / no answer are this enigmatic comment:
// perform some calculation and update DB
... and the fact that the calculate() method takes no arguments.
If we infer that the calculate() method gets its input from the state of the singleton itself, then at least the part of the method (or the methods it calls) must synchronize while retrieving that state. However, that doesn't mean that the entire method call must be synchronized. The proportion of its time that the calculate method needs to hold a lock on the shared data will determine how much parallelism you can actually get ...
The updating of the database will also require some kind of synchronization. However, this should be taken care of by the JDBC connection object and the objects you get from it ... provided that you obey the rules and don't try to share a connection between multiple threads. (The database update will also present a concurrency bottleneck ... assuming that the updates apply to the same database table or tables.)
It depends on how you implement Singleton. If you use Synchronized keyword then they will wait else not.
Use Singleton with eager initialization.
Something like this:
public final class Universe {
public static Universe getInstance() {
return fINSTANCE;
}
// PRIVATE //
/**
* Single instance created upon class loading.
*/
private static final Universe fINSTANCE = new Universe();
/**
* Private constructor prevents construction outside this class.
*/
private Universe() {
//..elided
}
}
Above will perform very well in multithreaded environment. or else you can go for enum implementation of Singleton.
Check this link for various singleton implementation: http://javarevisited.blogspot.in/2012/07/why-enum-singleton-are-better-in-java.html
Multiple threads can invoke calculate() at the same time.
Those invocations won't be queued (executed serially) within that JVM unless you perform some type of concurrency control (making the method synchronized is one option).
The fact that your object is a singleton may or may not affect performance, depending on how that object's attributes (if any) are used within calculate().
Also bear in mind that since you are "updating DB", table or row level locks may also limit concurrency.
If you are worried about performance, the best bet is to test it.
I was thinking about creating a class (like String, StringBuffer etc). This can be used in single-threaded as well as in multi-threaded environments. I do not know which kind of environment a developer might be using it. Anticipating the worst-case, I can synchronize.
But,
1. Synchronization takes a performance hit.
2. Without synchronization, it is not thread-safe.
So, I have two options.
Leave the class unsynchronized - but the developer using this class needs to synchronize it whenever appropriate.
Have all synchronized methods - and take a performance hit.
I have seen that many (if not all. for eg, ArrayList over Vector) classes in Java have evolved to take the first approach. What are some of the things I need to consider before deciding on these two options for my class?
Or to put it in a different way, should I use "public synchronized void bar()" over "public void bar()" only when I know for sure that bar can be used in a multi-threaded environment and should not be run at the same time?
EDIT So, clearly I have mis-used the word "utility class" in the heading. Thanks, Jon Skeet, for pointing it out. I have removed the world "utility" from the heading.
To give an example, I was thinking about a class like, say, Counter. Counter is just as an example. There are other ways to implement Counter. But this question is about synchronization. A Counter object keeps track of how many times something has been done. But it can possibly be used in single-threaded or multi-threaded environments. So, how should I handle the problem of synchronization in Counter.
What I think of as a utility class - typically a grab-bag of vaguely-related public static methods - rarely requires any synchronization. Unless the class maintains some mutable state, you're usually absolutely fine.
Of course, if you take parameters which are themselves shared between threads and contain mutable state, you may need synchronization - but that should be for the caller to decide, usually.
If you mean something else by "utility class" it would be good to know what you mean. If it's a class with no mutable state (but perhaps immutable state, set at construction) then it's typically fine to share between threads.
If it has mutable state but isn't explicitly about threading, I would typically not put any synchronization within that class, but document that it's not thread-safe. Typically the callers would need to synchronize multiple operations using multiple objects anyway, so "method at a time" synchronization doesn't typically help.
If it's a class which is all about threading (e.g. something to manage producer/consumer queues) then I would try to make it thread-safe, but document what you mean by that. I'd encourage you not to make the methods themselves synchronize, but instead synchronize on a private final field which is only used for synchronization; that way your class will contain the only code which could possibly synchronize on that object, making it easier to reason about your locking.
You should almost certainly not make these decisions on the basis of performance. Correctness and easy of is far more important than performance in most cases.
Regarding your last question: you don't synchronize a method if it can be called from multiple threads. You synchronize a method if it uses some state that can be accessed from multiple threads.
So, even if your bar() method is called from only one thread, if it accesses an instance variable which is read and modified, through other methods, by multiple threads, then the bar() method must be synchronized (or at least, the block of code which accesses the shared variable). Synchronization is all about shared state.
EDIT:
Regarding your main problem: you could simply use the same strategy as the collection framework: make your Counter an interface, and provide a default, non-thread-safe implementation. Also provide a utility class (Counters) containing a method which returns a synchronized Counter proxy: Counters.synchronizedCounter(Counter counter);.
This way, you have the best of both worlds. Note that an important point of this design is that the synchronized counter is synchronized on itself. This makes it possible for the callers to add external synchronization in case two method calls on the counter must be done in an atomic way:
Counter synchronizedCounter = Counters.synchronizedCounter(c);
// call a and b atomically:
synchronized (synchronizedCounter) {
synchronizedCounter.a();
synchronizedCounter.b();
}
While performance of synchronization has been improved, so are other parts of VM. Therefore synchronization is still a noticeable overhead.
Particularly, synchronization actions prevent lots of optimization tricks. Even if VM can do escape analysis, VM still must withheld reordering and add memory barrier, to conform to Java Memory Model.
These days, the "performance hit" from synchronization is remarkably small. You should only be concerned about it if you have proof that synchronization is causing a performance problem.
You might need synchronization if your class has state via static fields that are references by multiple methods. In this case, it would be preferable to have instance fields and use the singleton pattern, which will convey to other programmers more clearly what the intention of the class is.
The performance penalty for single-thread access to synchronized methods is pretty much negligible on modern JVMs.
But why not create a benchmark and see for yourself?
In Java an Object itself can act as a lock for guarding its own state . This convention is used in many built in classes like Vector and other synchronized collections where every method is synchronized and thus guarded by the intrinsic lock of the object itself . Is this good or bad ? Please give reasons also .
Pros
It's simple.
You can control the lock externally.
Cons
It breaks encapuslation.
You can't change its locking behaviour without changing its implied contract.
For the most part, it doesn't matter unless you are developing an API which will be widely used. So while using synchronised(this) is not ideal, it is simple.
Well Vector, Hashtable, etc. were synchronized like this internally and we all know what happened to them...
I honestly can't find any good reason to do synchronization like this. Here are the disadvantages that I see:
There's almost always a more efficient way of ensuring thread-safety than just putting a lock on the entire method.
It slows down the code in single threaded environments because you pay the overhead of locking and unlocking without actually needing the lock.
It gives a false sense of security because although each operation is synchronized, sequences of operations are not and you can still accidentally create data races. Imagine a collection which is synchronized on each method and the following code:
if(collection.isEmpty()) {
collection.add(...);
}
Assuming the aim is to have only a single item added, the above code is not thread safe because a thread can be interrupted between the if check and the actual call to add, even though both operations are synchronized individually, so it is possible to actually get two items in the collection.
I have a method that is invoked by a scheduler every minute to get a file from ftp, process and persists its records to a DB. I need to make this thread safe so that if the method has to perform multiple files at once, it acts a in a thread safe way..
public synchronized void processData(String data){
//do processing
}
is this really going to be a thread safe method that will handle high volumes of load gracefully?
It's thread-safe as long as it doesn't use any stateful fields from the enclosing object.
In other words, if there is a class-level field that is manipulated or accessed in processData(String data) with the intention of keeping track of what's going on, then it's not thread-safe.
An example might be a class-level field called private Boolean hasConnection; If you need to check whether or not a connection exists with this field, then you don't have a thread-safe method.
If you meet this requirement, then you don't even have to add the synchronized keyword to your method. It will be, by default, thread-safe, and an unlimited number of threads may access it simultaneously.
If you do not meet this requirement, then you will need to post the whole class in order to determine whether or not it is thread-safe.
Assuming that the mysterious "process the file" operation is self-contained, the biggest thing you should worry about is your DB connection: do not make it shared, obtain a new one each time from a connection string, and use a connection pool. Do not make your method synchronized, unless you need to access shared state inside your class; otherwise, your method would not be able to make progress concurrently on multiple threads.
Please describe us what resources your method uses, and which of those resources are shared.
If you do not use common object, there is no problem.
If you do use common resources, you need to make sure these resources can be accessed in a thread-safe manner, or are not accessed by multiple threads.
Your question is about performance. In general, processData seems to be a method which will take some time to complete: you are using databases. The time required to get a lock is minimal compared to a DB Query. So no, the synchronized keyword will not give you any noticeable performance impact.
I am using JCS for caching in my application.Recently some errors are occuring where concurrent access to data in the cache results in null value i.e one thread writes to the cache and one thread reads to the cache.I want to know whether JCS inherently supports thread safe implementations when writing and reading from the cache.
I also want to know how to make my implementation thread safe.Because I have multiple classes writing to the cache,say PutData which implements Runnable is used for writing to the cache and GetData which also implements Runnable for reading from the cache,so making the method synchronized has no meaning and also making them atomic also is not meaningful since the data is not shared among the classes and I pass around the data object to the individual classes.BTW I am using a POJO serialiazed class.
Is there anyway to overcome this,or do I have to change my implementation in such a way that it forcefully completes writing and then reading,which is foolish I think.
This is more of like a producer-consumer problem except in the fact that my consumer thread here is not consuming data,but just reading data.
So synchronizing does guarantee that only one thread writes to the cache,but that does not solve my problem because the other thread accesses objects of different key.
Expecting your answers,
Thanks,
Madhu.
Recently I started using JCS and I had the problem "Is JCS thread safe?".
Well I had a look at the source code and found that the implementation has considered about thread safety.
JCS.getInstance(String region) does always return the same CompositeCache object per region key, wrapped in a new JCS object.In other words, reference of the one an only CompositeCache object is kept in a newly created wrapper JCS object. When we call methods like JCS.get(), JCS.put(), JCS.remove() etc. it will always ended up in invoking a method of the one an only CompositeCache object. So, it is singleton.
Importantly, CompositeCache object has synchronized methods for its write operations (put remove etc.) and in the inner implementation Hashtable objects have been used, which are also thread safe. So I think the JCS has taken care of thread safety at atomic levels.
What Thomas has mentioned above is true. If the cache object was synchronized, then a concurrency issue should have been avoided, which seems to be not the case as mentioned above, may be the issue is something else not really concurrency.
However, I just wanted to share the fact that, one should not plan to use the JCS by gaining an object level lock as discussed above, as the implementation seems to be thread safe, and we should let the concurrency to be handled at more atomic levels, looking for better performance.
I don't know JCS but you can synchronize on objects, so you might want to synchronize on the cache object.
Someting like this:
public void putToCache(...) {
synchronized (cache) { //cache is your actual cache instance here
//put to cache here
}
}