When Should I synchronize the methods of my class? - java

I was thinking about creating a class (like String, StringBuffer etc). This can be used in single-threaded as well as in multi-threaded environments. I do not know which kind of environment a developer might be using it. Anticipating the worst-case, I can synchronize.
But,
1. Synchronization takes a performance hit.
2. Without synchronization, it is not thread-safe.
So, I have two options.
Leave the class unsynchronized - but the developer using this class needs to synchronize it whenever appropriate.
Have all synchronized methods - and take a performance hit.
I have seen that many (if not all. for eg, ArrayList over Vector) classes in Java have evolved to take the first approach. What are some of the things I need to consider before deciding on these two options for my class?
Or to put it in a different way, should I use "public synchronized void bar()" over "public void bar()" only when I know for sure that bar can be used in a multi-threaded environment and should not be run at the same time?
EDIT So, clearly I have mis-used the word "utility class" in the heading. Thanks, Jon Skeet, for pointing it out. I have removed the world "utility" from the heading.
To give an example, I was thinking about a class like, say, Counter. Counter is just as an example. There are other ways to implement Counter. But this question is about synchronization. A Counter object keeps track of how many times something has been done. But it can possibly be used in single-threaded or multi-threaded environments. So, how should I handle the problem of synchronization in Counter.

What I think of as a utility class - typically a grab-bag of vaguely-related public static methods - rarely requires any synchronization. Unless the class maintains some mutable state, you're usually absolutely fine.
Of course, if you take parameters which are themselves shared between threads and contain mutable state, you may need synchronization - but that should be for the caller to decide, usually.
If you mean something else by "utility class" it would be good to know what you mean. If it's a class with no mutable state (but perhaps immutable state, set at construction) then it's typically fine to share between threads.
If it has mutable state but isn't explicitly about threading, I would typically not put any synchronization within that class, but document that it's not thread-safe. Typically the callers would need to synchronize multiple operations using multiple objects anyway, so "method at a time" synchronization doesn't typically help.
If it's a class which is all about threading (e.g. something to manage producer/consumer queues) then I would try to make it thread-safe, but document what you mean by that. I'd encourage you not to make the methods themselves synchronize, but instead synchronize on a private final field which is only used for synchronization; that way your class will contain the only code which could possibly synchronize on that object, making it easier to reason about your locking.
You should almost certainly not make these decisions on the basis of performance. Correctness and easy of is far more important than performance in most cases.

Regarding your last question: you don't synchronize a method if it can be called from multiple threads. You synchronize a method if it uses some state that can be accessed from multiple threads.
So, even if your bar() method is called from only one thread, if it accesses an instance variable which is read and modified, through other methods, by multiple threads, then the bar() method must be synchronized (or at least, the block of code which accesses the shared variable). Synchronization is all about shared state.
EDIT:
Regarding your main problem: you could simply use the same strategy as the collection framework: make your Counter an interface, and provide a default, non-thread-safe implementation. Also provide a utility class (Counters) containing a method which returns a synchronized Counter proxy: Counters.synchronizedCounter(Counter counter);.
This way, you have the best of both worlds. Note that an important point of this design is that the synchronized counter is synchronized on itself. This makes it possible for the callers to add external synchronization in case two method calls on the counter must be done in an atomic way:
Counter synchronizedCounter = Counters.synchronizedCounter(c);
// call a and b atomically:
synchronized (synchronizedCounter) {
synchronizedCounter.a();
synchronizedCounter.b();
}

While performance of synchronization has been improved, so are other parts of VM. Therefore synchronization is still a noticeable overhead.
Particularly, synchronization actions prevent lots of optimization tricks. Even if VM can do escape analysis, VM still must withheld reordering and add memory barrier, to conform to Java Memory Model.

These days, the "performance hit" from synchronization is remarkably small. You should only be concerned about it if you have proof that synchronization is causing a performance problem.
You might need synchronization if your class has state via static fields that are references by multiple methods. In this case, it would be preferable to have instance fields and use the singleton pattern, which will convey to other programmers more clearly what the intention of the class is.

The performance penalty for single-thread access to synchronized methods is pretty much negligible on modern JVMs.
But why not create a benchmark and see for yourself?

Related

Performance issue: use Singleton object in multi thread environment

I have a class "A" with method "calculate()". Class A is of type singleton(Scope=Singleton).
public class A{
public void calculate(){
//perform some calculation and update DB
}
}
Now, I have a program that creates 20 thread. All threads need to access the method "calculate()".
I have multicore system. So I want the parallel processing of the threads.
In the above scenario, can i get performance? Can all threads access the method calculate at same instance of time?
Or, Since the class A is singleton so, the threads needs to be blocked waiting.
I have found similar questions in the web/Stackoverflow. But I cannot get clear answer.
Would you please help me?
Statements like "singletons need synchronization" or "singletons don't need synchronization" are overly simplistic, I'm afraid. No conclusions can be drawn only from the fact that you're dealing with the singleton pattern.
What really matters for purposes of multithreading is what is shared. If there are data that are shared by all threads performing the calculation, then you will probably need to synchronize that access. If there are critical sections of code than cannot run simultaneously between threads, then you will need to synchronize that.
The good news is that often times it will not be necessary to synchronize everything in the entire calculation. You might gain significant performance improvements from your multi-core system despite needing to synchronize part of the operation.
The bad news is that these things are very complex. Sorry. One possible reference:
http://www.amazon.com/Java-Concurrency-Practice-Brian-Goetz/dp/0321349601/ref=sr_1_1?ie=UTF8&qid=1370838949&sr=8-1&keywords=java+concurrency+in+practice
That's the fundamental concept of Singleton. Only one instance of the class would be present in the system (JVM). Now, it depends on the implementation of calculate(). Is it a stateless utility method? If yes, you might not want to make it synchronized. In that case, multiple threads will be able to access it at the same instance of time. If calculate() is NOT stateless, i.e. it uses instance variables (and those instance variables will be used by multiple threads), then be careful; You have to make calculate() thread safe. You have to synchronize the method. At least you have to use a synchronize block inside the method. But, once you do so, only one thread will be able to access it (the synchronized block or the synchronized block inside the method) at any point of time.
public void calculate() {
//Some code goes here which does not need require thread safety.
synchronized(someObj) {
//Some code goes here which requires thread safety.
}
//Some code goes here which does not need require thread safety.
}
If you want to use parallel processing (if that's the primary goal), then singleton is not the design pattern that you should use.
I have found similar questions in the web/Stackoverflow. But I cannot get clear answer.
There is a good reason for that!!
It is not possible to say whether a method on a singleton does, or does not, need to be synchronized by virtue of being singleton.
Synchronization and the need for synchronization is all about state that may be shared by different threads.
If different threads share state (even serially), then synchronization is required.
If not then no synchronization is required.
The only clues that you have provided us that would help us give you a yes / no answer are this enigmatic comment:
// perform some calculation and update DB
... and the fact that the calculate() method takes no arguments.
If we infer that the calculate() method gets its input from the state of the singleton itself, then at least the part of the method (or the methods it calls) must synchronize while retrieving that state. However, that doesn't mean that the entire method call must be synchronized. The proportion of its time that the calculate method needs to hold a lock on the shared data will determine how much parallelism you can actually get ...
The updating of the database will also require some kind of synchronization. However, this should be taken care of by the JDBC connection object and the objects you get from it ... provided that you obey the rules and don't try to share a connection between multiple threads. (The database update will also present a concurrency bottleneck ... assuming that the updates apply to the same database table or tables.)
It depends on how you implement Singleton. If you use Synchronized keyword then they will wait else not.
Use Singleton with eager initialization.
Something like this:
public final class Universe {
public static Universe getInstance() {
return fINSTANCE;
}
// PRIVATE //
/**
* Single instance created upon class loading.
*/
private static final Universe fINSTANCE = new Universe();
/**
* Private constructor prevents construction outside this class.
*/
private Universe() {
//..elided
}
}
Above will perform very well in multithreaded environment. or else you can go for enum implementation of Singleton.
Check this link for various singleton implementation: http://javarevisited.blogspot.in/2012/07/why-enum-singleton-are-better-in-java.html
Multiple threads can invoke calculate() at the same time.
Those invocations won't be queued (executed serially) within that JVM unless you perform some type of concurrency control (making the method synchronized is one option).
The fact that your object is a singleton may or may not affect performance, depending on how that object's attributes (if any) are used within calculate().
Also bear in mind that since you are "updating DB", table or row level locks may also limit concurrency.
If you are worried about performance, the best bet is to test it.

Synchronization decision built into Java using intrinsic locks (good or bad)

In Java an Object itself can act as a lock for guarding its own state . This convention is used in many built in classes like Vector and other synchronized collections where every method is synchronized and thus guarded by the intrinsic lock of the object itself . Is this good or bad ? Please give reasons also .
Pros
It's simple.
You can control the lock externally.
Cons
It breaks encapuslation.
You can't change its locking behaviour without changing its implied contract.
For the most part, it doesn't matter unless you are developing an API which will be widely used. So while using synchronised(this) is not ideal, it is simple.
Well Vector, Hashtable, etc. were synchronized like this internally and we all know what happened to them...
I honestly can't find any good reason to do synchronization like this. Here are the disadvantages that I see:
There's almost always a more efficient way of ensuring thread-safety than just putting a lock on the entire method.
It slows down the code in single threaded environments because you pay the overhead of locking and unlocking without actually needing the lock.
It gives a false sense of security because although each operation is synchronized, sequences of operations are not and you can still accidentally create data races. Imagine a collection which is synchronized on each method and the following code:
if(collection.isEmpty()) {
collection.add(...);
}
Assuming the aim is to have only a single item added, the above code is not thread safe because a thread can be interrupted between the if check and the actual call to add, even though both operations are synchronized individually, so it is possible to actually get two items in the collection.

Synchronization, When to or not to use?

I have started learning concurrency and threads in Java. I know the basics of synchronized (i.e. what it does). Conceptually I understand that it provides mutually exclusive access to a shared resource with multiple threads in Java. But when faced with an example like the one below I am confused about whether it is a good idea to have it synchronized. I know that critical sections of the code should be synchronized and this keyword should not be overused or it effects the performance.
public static synchronized List<AClass> sortA(AClass[] aArray)
{
List<AClass> aObj = getList(aArray);
Collections.sort(aObj, new AComparator());
return aObj;
}
public static synchronized List<AClass> getList(AClass[] anArray)
{
//It converts an array to a list and returns
}
Assuming each thread passes a different array then no synchronization is needed, because the rest of the variables are local.
If instead you fire off a few threads all calling sortA and passing a reference to the same array, you'd be in trouble without synchronized, because they would interfere with eachother.
Beware, that it would seem from the example that the getList method returns a new List from an array, such that even if the threads pass the same array, you get different List objects. This is misleading. For example, using Arrays.asList creates a List backed by the given array, but the javadoc clearly states that Changes to the returned list "write through" to the array. so be careful about this.
Synchronization is usually needed when you are sharing data between multiple invocations and there is a possibility that the data would be modified resulting in inconsistency. If the data is read-only then you dont need to synchronize.
In the code snippet above, there is no data that is being shared. The methods work on the input provided and return the output. If multiple threads invoke one of your method, each invocation will have its own input and output. Hence, there is no chance of in-consistency anywhere. So, your methods in the above snippet need not be synchornized.
Synchronisation, if unnecessarily used, would sure degrade the performance due to the overheads involved and hence should be cautiously used only when required.
Your static methods don't depend on any shared state, so need not be synchronized.
There is no rule defined like when to use synchronized and when not, when you are sure that your code will not be accessed by concurrent threads then you can avoid using synchronised.
Synchronization as you have correctly figured has an impact on the throughput of your application, and can also lead to starving thread.
All get basically should be non blocking as Collections under concurrency package have implemented.
As in your example all calling thread will pass there own copy of array, getList doesn't need to be synchronized so is sortA method as all other variables are local.
Local variables live on stack and every thread has its own stack so other threads cannot interfere with it.
You need synchronization when you change the state of the Object that other threads should see in an consistent state, if your calls don't change the state of the object you don't need synchronization.
I wouldn't use synchronized on single threaded code. i.e. where there is no chance an object will be accessed by multiple threads.
This may appear obvious but ~99% of StringBuffer used in the JDK can only be used by one thread can be replaced with a StringBuilder (which is not synchronized)

Speed of Synchronization vs Normal

I have a class which is written for a single thread with no methods being synchronized.
class MyClass implements MyInterface{
//interface implementation methods, not synchronized
}
But we also needed a synchronized version of the class. So we made a wrapper class that implements the same interface but has a constructor that takes an instance of MyClass. Any call to the methods of the synchronized class are delegated to the instance of MyClass. Here is my synchronized class..
class SynchronizedMyClass implements MyInterface{
//the constructor
public SynchronizedMyClass(MyInterface i/*this is actually an instance of MyClass*/)
//interface implementation methods; all synchronized; all delegated to the MyInterface instance
}
After all this I ran numerous amounts of test runs with both the classes. The tests involve reading log files and counting URLs in each line. The problem is that the synchronized version of the class is consistently taking less time for the parsing. I am using only one thread for the teste, so there is no chance of deadlocks, race around condition etc etc. Each log file contains more than 5 million lines which means calling the methods more than 5 million times. Can anyone explain why synchronized versiuon of the class migt be taking less time than the normal one?
First you should read about making benchmarks in Java: How do I write a correct micro-benchmark in Java?
Assuming that the benchmark is good, then here are some possible reasons:
Lock elision: If the JVM can notice that the method can only be called from one thread, it may optimize away the synchronization.
Lock coarsening: The JVM may combine multiple synchronized blocks into one block, which improves performance. Maybe the JVM is able to optimize your synchronized version of the method a bit better.
Non-contending synchronized blocks in Java are fast, so it might be hard to notice the difference (although there should anyways be some overhead) and the reason for performance difference could be caused by something else . Synchronized blocks become slow when there is contention (i.e. many threads try to access it at the same time), in which case java.util.concurrent.locks and other synchonization mechanisms might be faster.
The reason could also be something else. Maybe the JVM optimizes the methods differently. To see what is really happening, have a look at what native code the JIT generates: How to see JIT-compiled code in JVM?
As already pointed out, micro-benchmarking is not that trivial with Java.
IMO There's no reason to be worried about the overhead of synchronization itself and even in that case I would save the optimizations to the time when you find out you actually have a bottleneck.
The interesting part of synchronization is how your code works in a multithreaded environment. I'd definitely focus on making sure the synchronization is used correctly in the right places.
Frankly it sounds a bit odd to need both fully synchronized and unsynchronized versions of the same class.

If more than one thread can access a field should it be marked as volatile?

Reading a few threads (common concurrency problems, volatile keyword, memory model) I'm confused about concurrency issues in Java.
I have a lot of fields that are accessed by more than one thread. Should I go through them and mark them all as volatile?
When building a class I'm not aware of whether multiple threads will access it, so surely it is unsafe to let any field not be volatile, so by my understanding there's very few cases you wouldn't use it. Is this correct?
For me this is specific to version 1.5 JVMs and later, but don't feel limited to answering about my specific setup.
Well, you've read those other questions and I presume you've read the answers already, so I'll just highlight some key points:
are they going to change? if not, you don't need volatile
if yes, then is the value of a field related to another? if yes, go to point 4
how many threads will change it? if only 1, then volatile is all you need
if the answer to number 2 is "no" or more than one threads is going to write to it, then volatile alone is not enough, you'll probably need to synchronize the access
Added:
If the field reference an Object, then it will have fields of its own and all those consideration also applies to these fields.
If a field is accessed by multiple threads, it should be volatile or final, or accessed only with synchronized blocks. Otherwise, assigned values may not be visible to other threads.
A class has to be specifically designed for concurrent access by multiple threads. Simply marking fields volatile or final is not sufficient for thread-safety. There are consistency issues (atomicity of changes to multiple fields), concerns about inter-thread signaling (for example using wait and notify), etc.
So, it is safest to assume that an object should be visible to only a single thread unless it is documented otherwise. Making all of your objects thread-safe isn't necessary, and is costly—in terms of software speed, but more importantly, in terms of development expense.
Instead, software should be designed so that concurrent threads interact with each other as little as possible, preferably not at all. The points where they do interact need to be clearly identified so that the proper concurrency controls can be designed.
If you have to ask, use locks. volatile can be useful in some cases, but it's very, very difficult to get right. For example:
class Foo {
private volatile int counter = 0;
int Increment() {
counter++;
return counter;
}
}
If two threads run Increment() at the same time, it's possible for the result to be counter = 1. This is because the computer will first retrieve counter, add one, then save it back. Volatile just forces the save and load to occur in a specific order relative to other statements.
Note that synchronized usually obviates the need for volatile - if all accesses to a given field are protected by the same monitor, volatile will never be needed.
Using volatile to make lockless algorithms is very, very difficult; stick to synchronized unless you have hard evidence that it's too slow already, and have done detailed analysis on the algorithm you plan to implement.
The short answer is no. Threading issues require more thought and planning than this. See this for some limitations on when volatile helps for threading and when it does not. The modification of the values has to be properly synchronized, but very typically modification requires the state of more than one variable at a time. Say for example you have variable and you want to change it if it meets a criteria. The read from the array and the write to the array are different instructions, and need to be synchronized together. Volatile is not enough.
Consider also the case where the variable references a mutable object (say an array or a Collection), then interacting with that object will not be thread safe just because the reference is volatile.

Categories

Resources