I have a class which is written for a single thread with no methods being synchronized.
class MyClass implements MyInterface{
//interface implementation methods, not synchronized
}
But we also needed a synchronized version of the class. So we made a wrapper class that implements the same interface but has a constructor that takes an instance of MyClass. Any call to the methods of the synchronized class are delegated to the instance of MyClass. Here is my synchronized class..
class SynchronizedMyClass implements MyInterface{
//the constructor
public SynchronizedMyClass(MyInterface i/*this is actually an instance of MyClass*/)
//interface implementation methods; all synchronized; all delegated to the MyInterface instance
}
After all this I ran numerous amounts of test runs with both the classes. The tests involve reading log files and counting URLs in each line. The problem is that the synchronized version of the class is consistently taking less time for the parsing. I am using only one thread for the teste, so there is no chance of deadlocks, race around condition etc etc. Each log file contains more than 5 million lines which means calling the methods more than 5 million times. Can anyone explain why synchronized versiuon of the class migt be taking less time than the normal one?
First you should read about making benchmarks in Java: How do I write a correct micro-benchmark in Java?
Assuming that the benchmark is good, then here are some possible reasons:
Lock elision: If the JVM can notice that the method can only be called from one thread, it may optimize away the synchronization.
Lock coarsening: The JVM may combine multiple synchronized blocks into one block, which improves performance. Maybe the JVM is able to optimize your synchronized version of the method a bit better.
Non-contending synchronized blocks in Java are fast, so it might be hard to notice the difference (although there should anyways be some overhead) and the reason for performance difference could be caused by something else . Synchronized blocks become slow when there is contention (i.e. many threads try to access it at the same time), in which case java.util.concurrent.locks and other synchonization mechanisms might be faster.
The reason could also be something else. Maybe the JVM optimizes the methods differently. To see what is really happening, have a look at what native code the JIT generates: How to see JIT-compiled code in JVM?
As already pointed out, micro-benchmarking is not that trivial with Java.
IMO There's no reason to be worried about the overhead of synchronization itself and even in that case I would save the optimizations to the time when you find out you actually have a bottleneck.
The interesting part of synchronization is how your code works in a multithreaded environment. I'd definitely focus on making sure the synchronization is used correctly in the right places.
Frankly it sounds a bit odd to need both fully synchronized and unsynchronized versions of the same class.
Related
To be more specific my question is if the main thread methods are already synchronized?
For example:
#MainThread
class MyClass{
private Object o = null;
#MainThread
MyClass(){
}
#MainThread
public Object getObjectFromMainThread(){
return this.o.getObj2();
}
#MainThread
public void setObjectFromMainThread(Object obj){
obj.set(1);
this.o=obj;
}
#AnyThread
public synchronized Object getObjectFromAnyThread(){
return this.o;
}
#AnyThread
public synchronized void setObjectFromAnyThread(Object obj){
this.o=obj;
}
}
The methods getObjectFromMainThread and setObjectFromMainThread which are called only from main thread are not synchronized. Does it need to be synchronize as well or is not necessary?
The answer to your immediate question is yes, you will have to synchronize the getObjectFromMainThread and setObjectFromMainThread methods in your example. The answer to why there's this need is a mighty deep rabbit hole.
The general problem with multithreading is what happens when multiple threads access shared, mutable state. In this case, the shared, mutable state is this.o. It doesn't matter whether any of the threads involved is the main thread, it's a general problem that arises when more than one thread is in play.
The problem we're dealing with comes down to "what happens when a thread is reading the state at the same time that one or more threads are writing it?", with all its variations. This problem fans out into really intricate subproblems like each processor core having its own copy of the object in its own processor cache.
The only way of handling this is to make explicit what will happen. The synchronized mechanism is one such way. Synchronization involves a lock, when you use a synchronised method, the lock is this:
public synchronized void foo() {
// this code uses the same lock...
}
public void bar() {
synchronized (this) {
// ...as this code
}
}
Of all the program code that synchronizes on the same lock, only one thread can be executing it at the same time. That means that if (and only if) all code that interacts with this.o runs synchronized to the this lock, the problems described earlier are avoided.
In your example, the presence of setObjectFromAnyThread() means that you must also synchronize setObjectFromMainThread(), otherwise the state in this.o is accessed sometimes-synchronized and sometimes-unsynchronized, which is a broken program.
Synchronization comes at a cost: because your locking bits of code to be run by one thread at a time (and other threads are made to wait), you remove some or all of the speed-up you gained from using multi-threading in the first place. In some cases, you're better off forgetting multi-threading exists and using a simpler single-threaded program.
Within a multi-threaded program, it's useful to limit the amount of shared, mutable state to a minimum. Any state that's not accessed by more than one thread at a time doesn't need synchronization, and is going to be easier to reason about.
The #MainThread annotation, at least as it exists in Android, indicates that the method is intended to be accessed on the main thread only. It doesn't do anything, it's just there as a signal to both the programmer(s) and the compiler. There is no technical protection mechanism involved at run time; it all comes down to your self-discipline and some compile-time tool support. The advantage of this lack of protection is that there's no runtime overhead.
Multi-threaded programming is complicated and easy to get wrong. The only way to get it right is to truly understand it. There's a book called Java Concurrency In Practice that's a really good explanation of both the general principles and problems of concurrency and the specifics in Java.
Is there any way I can make a Junit test to make sure that a synchronized object (in my case HashMap in synchronized block) is not accessed by two threads simultaneously? e.g. forcing two threads to try to access and having exception thrown.
Thanks for your help!
The best framework I've seento help with thread testing is Thread Weaver. At the very least it offers some deterministic way of thread scheduling, and a limited (yet useful) way of trying to find race conditions.
You can even code up some more intricate thread scheduling scenarios, but those tests will inevitably be white box tests. Still, those can have their use too.
Is there any way I can make a Junit test to make sure that a synchronized object (in my case HashMap in synchronized block) is not accessed by two threads simultaneously?
I'm not sure there is a testing framework to test this but you can certainly write some code that tries to access the protected HashMap over and over with many threads. Unfortunately this is very hard to do reliably since, as #Bohemian mentions, there is no way to be sure how threads run and access the map, especially in concert.
e.g. forcing two threads to try to access and having exception thrown. Thanks for your help!
Yeah this won't happen for 2 reasons. As mentioned, there is no "forcing" of threads. You just don't have that level of control. Also, threads do not throw exceptions because of synchronization problems unless you are doing something other than synchronized(hashMap) { ... }. When a thread is holding the lock on the map, other threads will block until it releases the lock. This is hard to detect and control. If you add code to do the detection and thread control then you get into a Heisenberg situation where you will be affecting the thread behavior because of your monitoring code.
Testing proper synchronization is very difficult and often impossible to do. Reviewing code with other developers to make sure that your HashMap is fully synchronized every time it is used, maybe be more productive.
Lastly, if you are worried about the HashMap then you maybe should consider to moving to ConcurrentHashMap or Collections.synchronizedMap(new HashMap). These take care of the synchronization and protection of the actual map for you although they don't handle race conditions if you are making multiple map calls with one operation. Btw, HashTable is considered and old class and should not be used.
Hope this helps.
Essentially, you can't, because you can't control when threads are scheduled, let alone coordinate them to test a particular behaviour.
Secondly, not all build servers are multi threaded (I got bitten by this only a couple of days ago - cheap AWS instances have only 1 CPU), so you can't rely on even having more than know thread available to test with.
Try to refactor your code so the locking part is separated from your application and test that logic in isolation... if you can.
As I understand, you have a code similar to this one
synchronized(myHashMap) {
...
}
... which means that a thread acquires the lock provided by myHashMap when it enters synchronized block and all other threads that try to enter the same block have to wait, i.e. no other thread can acquire the same lock.
Is there any way I can make a Junit test to make sure that a synchronized object (in my case HashMap in synchronized block) is not accessed by two threads simultaneously?
Knowing the above, why would you do that? If you still want to try then you might want to take a look at this answer.
And last, but not least. I'd recommend you to use Hashtable because it's synchronized. Use ConcurrentHashMap.
I have a class "A" with method "calculate()". Class A is of type singleton(Scope=Singleton).
public class A{
public void calculate(){
//perform some calculation and update DB
}
}
Now, I have a program that creates 20 thread. All threads need to access the method "calculate()".
I have multicore system. So I want the parallel processing of the threads.
In the above scenario, can i get performance? Can all threads access the method calculate at same instance of time?
Or, Since the class A is singleton so, the threads needs to be blocked waiting.
I have found similar questions in the web/Stackoverflow. But I cannot get clear answer.
Would you please help me?
Statements like "singletons need synchronization" or "singletons don't need synchronization" are overly simplistic, I'm afraid. No conclusions can be drawn only from the fact that you're dealing with the singleton pattern.
What really matters for purposes of multithreading is what is shared. If there are data that are shared by all threads performing the calculation, then you will probably need to synchronize that access. If there are critical sections of code than cannot run simultaneously between threads, then you will need to synchronize that.
The good news is that often times it will not be necessary to synchronize everything in the entire calculation. You might gain significant performance improvements from your multi-core system despite needing to synchronize part of the operation.
The bad news is that these things are very complex. Sorry. One possible reference:
http://www.amazon.com/Java-Concurrency-Practice-Brian-Goetz/dp/0321349601/ref=sr_1_1?ie=UTF8&qid=1370838949&sr=8-1&keywords=java+concurrency+in+practice
That's the fundamental concept of Singleton. Only one instance of the class would be present in the system (JVM). Now, it depends on the implementation of calculate(). Is it a stateless utility method? If yes, you might not want to make it synchronized. In that case, multiple threads will be able to access it at the same instance of time. If calculate() is NOT stateless, i.e. it uses instance variables (and those instance variables will be used by multiple threads), then be careful; You have to make calculate() thread safe. You have to synchronize the method. At least you have to use a synchronize block inside the method. But, once you do so, only one thread will be able to access it (the synchronized block or the synchronized block inside the method) at any point of time.
public void calculate() {
//Some code goes here which does not need require thread safety.
synchronized(someObj) {
//Some code goes here which requires thread safety.
}
//Some code goes here which does not need require thread safety.
}
If you want to use parallel processing (if that's the primary goal), then singleton is not the design pattern that you should use.
I have found similar questions in the web/Stackoverflow. But I cannot get clear answer.
There is a good reason for that!!
It is not possible to say whether a method on a singleton does, or does not, need to be synchronized by virtue of being singleton.
Synchronization and the need for synchronization is all about state that may be shared by different threads.
If different threads share state (even serially), then synchronization is required.
If not then no synchronization is required.
The only clues that you have provided us that would help us give you a yes / no answer are this enigmatic comment:
// perform some calculation and update DB
... and the fact that the calculate() method takes no arguments.
If we infer that the calculate() method gets its input from the state of the singleton itself, then at least the part of the method (or the methods it calls) must synchronize while retrieving that state. However, that doesn't mean that the entire method call must be synchronized. The proportion of its time that the calculate method needs to hold a lock on the shared data will determine how much parallelism you can actually get ...
The updating of the database will also require some kind of synchronization. However, this should be taken care of by the JDBC connection object and the objects you get from it ... provided that you obey the rules and don't try to share a connection between multiple threads. (The database update will also present a concurrency bottleneck ... assuming that the updates apply to the same database table or tables.)
It depends on how you implement Singleton. If you use Synchronized keyword then they will wait else not.
Use Singleton with eager initialization.
Something like this:
public final class Universe {
public static Universe getInstance() {
return fINSTANCE;
}
// PRIVATE //
/**
* Single instance created upon class loading.
*/
private static final Universe fINSTANCE = new Universe();
/**
* Private constructor prevents construction outside this class.
*/
private Universe() {
//..elided
}
}
Above will perform very well in multithreaded environment. or else you can go for enum implementation of Singleton.
Check this link for various singleton implementation: http://javarevisited.blogspot.in/2012/07/why-enum-singleton-are-better-in-java.html
Multiple threads can invoke calculate() at the same time.
Those invocations won't be queued (executed serially) within that JVM unless you perform some type of concurrency control (making the method synchronized is one option).
The fact that your object is a singleton may or may not affect performance, depending on how that object's attributes (if any) are used within calculate().
Also bear in mind that since you are "updating DB", table or row level locks may also limit concurrency.
If you are worried about performance, the best bet is to test it.
In Java, C, and C++ is source code guaranteed to be executed sequentially line by line, even after compiler optimizations, within a single given thread? It seems like nothing would ever work if the system were allowed to re-order your code, but I can't seem to find any documentation guaranteeing that if I have the following in Java:
class MyClass{
String testString = "";
public MyClass(){
}
public void foo(){
testString = "foo";
}
public void bar(){
testString = "bar";
testString += "r";
}
public String getTestString(){
return testString;
}
}
class Main{
static void main(String[] args){
MyClass testClass = new MyClass();
testClass.foo();
System.out.println(class.getTestString());
testClass.bar();
System.out.println(class.getTestString());
}
}
that the output will always be
"foo"
"barr"
and never
"foo"
"rbar"
or any other possible variation that might arise if the method invocations and statements within them are not executed sequentially as specified in the source code.
This question arose regarding Java specifically, since it gives the programmer significantly less control over what the bytecode compiler and the JIT compiler or interpreter on the target system will do to your code. The main system in question for me is Android, for which I implemented my own semaphore and mutex locking mechanisms (e.g. not making much use of the built-in Java concurrency mechanisms like 'synchronized' and 'volatile' keywords) which is more apt for my app than those provided by Java. A friend warned me however that because of the multiple levels of transformation Java goes through from source to machine code that unless I used Java's built-in concurrency mechanisms there was no guarantee that my semaphore and locking implementations would execute as I intended. That really boils down to whether or not there is a specified guarantee that, for any given runtime implementation, execution of code will be sequential within a single thread. So the main questions are:
In C and C++ is code execution guaranteed to be sequential despite compiler optimizations? If not, is disabling compiler optimization enough to achieve such a guarantee?
In Java is code execution guaranteed to be sequential despite potential alterations by the bytecode compiler and JIT compiler or interpreter (specifically running on Android but also for arbitrary VM implementations)?
If the answers to the above are yes as I expect, are there any programming languages/platforms/contexts for which sequential execution within a single thread is not guaranteed?
If there is only one thread, your code will have the intuitively expected results. Any optimization must preserve the functionality before the optimization. Unexpected things can happen only if you have multiple threads.
The Java Language Specification deals basically with the counterintuitive behavior that arises when there are multiple threads, and your question is about the "intra-thread semantics" as defined here:
The memory model determines what values can be read at every point in the program. The actions of each thread in isolation must behave as governed by the semantics of that thread, with the exception that the values seen by each read are determined by the memory model. When we refer to this, we say that the program obeys intra-thread semantics. Intra-thread semantics are the semantics for single-threaded programs, and allow the complete prediction of the behavior of a thread based on the values seen by read actions within the thread. To determine if the actions of thread t in an execution are legal, we simply evaluate the implementation of thread t as it would be performed in a single-threaded context, as defined in the rest of this specification.
http://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html
Most importantly, JIT / compiler changes instructions to optimize code execution in such a way that it does not change the results of code execution within a single given thread.
In Java, if you place your code within a sunchronized method / block it guarantees that there will be no instruction reordering.
You can find more answers in this article http://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html#otherlanguages
Even in no threaded most compilers are allowed to re-order the code to get the best speed. But the result of re-ordering is not allowed to affect the result. In C/C++ this is called the as-if rule.
So within a function the compiler is allowed to re-order for optimization as long as the result is not affected. So we have a slight difference between "source code" and "generated code".
In threaded code will a single thread execute a function? That depends on how you define thread. What is a thread exactly (A set of stack frames/registers/and a current point of execution).
I see no problem moving a thread between cores (though I can see why a runtime would not want to do this I don't see it as imposable). So you can't assume the thread will not jump cores. But if you consider a "thread of execution" to be the current state without reference to any hardware.
What you can guarantee is that a "thread of execution" will execute from start to end of a piece of "generated code". It will not leave out any "generated code". It will execute the "generated code" in order. If it is unscheduled and re-scheduled it will continue exactly from where it left of.
Your example would be executed sequentially, but it's more complicated than this if operands and other expressions come into play. The general rules are described here:
http://docs.oracle.com/javase/specs/jls/se7/html/jls-15.html#jls-15.7
Note the last section on the evaluation order for specific expressions:
http://docs.oracle.com/javase/specs/jls/se7/html/jls-15.html#jls-15.7.5
Yes, the code you provided above will be executed sequentially. Your code would be executed sequentially on one thread even if there were 50 processors.
Btw, pretty sure you can't call a variable class. Even if you can, don't.
I think the simple answer is that compiler developers follow the mantra "thou shalt not modify the behavior of a single-threaded program", beyond that I think you are on your own.
"preshing on programming" has some of the best explanations of the issues with out of order execution I have ever seen. If someone knows of a better set of articles I would to know. Here are two of them but if you really want to dig into it you should check out more material on the site:
Memory Ordering at Compile Time
Weak vs. Strong Memory Models
I was thinking about creating a class (like String, StringBuffer etc). This can be used in single-threaded as well as in multi-threaded environments. I do not know which kind of environment a developer might be using it. Anticipating the worst-case, I can synchronize.
But,
1. Synchronization takes a performance hit.
2. Without synchronization, it is not thread-safe.
So, I have two options.
Leave the class unsynchronized - but the developer using this class needs to synchronize it whenever appropriate.
Have all synchronized methods - and take a performance hit.
I have seen that many (if not all. for eg, ArrayList over Vector) classes in Java have evolved to take the first approach. What are some of the things I need to consider before deciding on these two options for my class?
Or to put it in a different way, should I use "public synchronized void bar()" over "public void bar()" only when I know for sure that bar can be used in a multi-threaded environment and should not be run at the same time?
EDIT So, clearly I have mis-used the word "utility class" in the heading. Thanks, Jon Skeet, for pointing it out. I have removed the world "utility" from the heading.
To give an example, I was thinking about a class like, say, Counter. Counter is just as an example. There are other ways to implement Counter. But this question is about synchronization. A Counter object keeps track of how many times something has been done. But it can possibly be used in single-threaded or multi-threaded environments. So, how should I handle the problem of synchronization in Counter.
What I think of as a utility class - typically a grab-bag of vaguely-related public static methods - rarely requires any synchronization. Unless the class maintains some mutable state, you're usually absolutely fine.
Of course, if you take parameters which are themselves shared between threads and contain mutable state, you may need synchronization - but that should be for the caller to decide, usually.
If you mean something else by "utility class" it would be good to know what you mean. If it's a class with no mutable state (but perhaps immutable state, set at construction) then it's typically fine to share between threads.
If it has mutable state but isn't explicitly about threading, I would typically not put any synchronization within that class, but document that it's not thread-safe. Typically the callers would need to synchronize multiple operations using multiple objects anyway, so "method at a time" synchronization doesn't typically help.
If it's a class which is all about threading (e.g. something to manage producer/consumer queues) then I would try to make it thread-safe, but document what you mean by that. I'd encourage you not to make the methods themselves synchronize, but instead synchronize on a private final field which is only used for synchronization; that way your class will contain the only code which could possibly synchronize on that object, making it easier to reason about your locking.
You should almost certainly not make these decisions on the basis of performance. Correctness and easy of is far more important than performance in most cases.
Regarding your last question: you don't synchronize a method if it can be called from multiple threads. You synchronize a method if it uses some state that can be accessed from multiple threads.
So, even if your bar() method is called from only one thread, if it accesses an instance variable which is read and modified, through other methods, by multiple threads, then the bar() method must be synchronized (or at least, the block of code which accesses the shared variable). Synchronization is all about shared state.
EDIT:
Regarding your main problem: you could simply use the same strategy as the collection framework: make your Counter an interface, and provide a default, non-thread-safe implementation. Also provide a utility class (Counters) containing a method which returns a synchronized Counter proxy: Counters.synchronizedCounter(Counter counter);.
This way, you have the best of both worlds. Note that an important point of this design is that the synchronized counter is synchronized on itself. This makes it possible for the callers to add external synchronization in case two method calls on the counter must be done in an atomic way:
Counter synchronizedCounter = Counters.synchronizedCounter(c);
// call a and b atomically:
synchronized (synchronizedCounter) {
synchronizedCounter.a();
synchronizedCounter.b();
}
While performance of synchronization has been improved, so are other parts of VM. Therefore synchronization is still a noticeable overhead.
Particularly, synchronization actions prevent lots of optimization tricks. Even if VM can do escape analysis, VM still must withheld reordering and add memory barrier, to conform to Java Memory Model.
These days, the "performance hit" from synchronization is remarkably small. You should only be concerned about it if you have proof that synchronization is causing a performance problem.
You might need synchronization if your class has state via static fields that are references by multiple methods. In this case, it would be preferable to have instance fields and use the singleton pattern, which will convey to other programmers more clearly what the intention of the class is.
The performance penalty for single-thread access to synchronized methods is pretty much negligible on modern JVMs.
But why not create a benchmark and see for yourself?