detecting race condition using findbugs or another analysis tool - java

Below bean is not thread-safe: method addIfNotExist is not synchronized, so it is possible that the same term gets added twice because of race condition. I annotated the class using JCIP annotation #ThreadSafe hoping FindBugs would find that the implementation is not thread-safe and flag it as an error, but it is not. Are there any tools that identify these type of errors in code base?
Methods addIfNotExist and isExist should be synchronized to make this bean thread-safe. Should isExist method be also synchronized?
package com.test;
import java.util.ArrayList;
import java.util.Collection;
import net.jcip.annotations.GuardedBy;
import net.jcip.annotations.ThreadSafe;
#ThreadSafe
public class Dictionary {
#GuardedBy("this")
public Collection<String> terms = new ArrayList<String>();
public void addIfNotExist(final String input) {
if (!this.terms.contains(input)) {
this.terms.add(input);
}
}
public boolean isExist(final String input){
return this.terms.contains(input);
}
public void remove(final String input){
this.terms.remove(input);
}
}

It is tremendously difficult to write safe multi-threaded code that has any degree of complexity: this type of locking (using monitors) is fraught with all sorts of intermittent race conditions, deadlocks and livelock issues that often evades detection right up into promotion into production systems; if you can, consider using message passing, software transactional memory or persistent data structures instead.
FindBugs (or indeed any static analysis tools) can only go so far in detecting non-thread safe code: by their very definition, race conditions are time-sensitive - they require many execution runs to manifest, so static analysis fails in this respect because they don't run any code and only look for common code signatures. The best things IMHO to detect issues are:
A second pair of eyes - rigorous code reviews with peers who are familiar with the code - goes along way in finding bugs that are not immediately obvious to the original author.
Continuous integration & exhaustive automated tests that exercise multi-threadedness on a variety of hardware, and ruthlessly investigate any 'intermittent' test failures.
In answer to the second question, yes, all methods that make some reference to terms should be guarded by a synchronization monitor, regardless of whether it is a write or read operation; consider what happens if thread A calls remove("BOB") whilst thread B is calling isExists("BOB") when you don't have synchronization - thread A will be compacting the array list while thread B will be attempting to traversing it.
At best, you will not be able to determine result of isExists("BOB"), but it is entirely possible that B could intermittently throw an IndexOutOfBounds exception since the size of the array could have changed (i.e shrunk) as it was being traversed over.
Synchronize, and while you still can't be sure of the order in which the calls are made (due to the non-deterministic nature of scheduling), but at least you will be guaranteed that operations on terms will be atomic - that is, they are not being altered by something else whilst the current thread is running.

This is something that you can use at run-time (during automated unit tests or integration tests or what have you) to help find threading problems: IBM ConTest (Concurrency Testing)
ConTest description:
"The ConTest technology is innovative and counter-intuitive. Specifically, ConTest systematically and transparently schedules the execution of program threads such that program scenarios which are likely to contain race conditions, deadlocks and other intermittemt bugs - collectively called synchronization problems - are forced to appear with high frequency. In doing so, ConTest dramtically improves the quality of testing and reduces development expense, as bugs are found earlier in the testing process. "

To find such incorrectly synchronized code blocks, I use the following algorithm:
Record the threads for all field modifications using instrumentation. If a field is modified by more than one thread without synchronization, I have found a data race.
I implemented this algorithm inside http://vmlens.com, which is a dynamic tool to find data races inside java programs.

Related

Why is this multithreading code broken?

Why is the following multithreading related example code broken?
public void method1(){
synchronized(intVariable){
}
synchronized(stringVariable){
}
}
public void method2(){
synchronized(stringVariable){
}
synchronized(intVariable){
}
}
Above two methods are from same class where stringVariable and intVariable are instance variables.
I thought it will not cause any problem, at least with Thread deadlocks. Is there any other reason why this code is broken?
Either you didn't understand the problem, or you are right that this wouldn't cause a deadlock.
Perhaps he was looking for something more obscure like,
you can't lock an int field.
locking a String object is a very bad idea because you don't know how it is shared.
But I doubt it. In any case, he should have clarified the question and your answer because perhaps he might have learnt something, if only how to make the question clearer next time.
If you, as an interviewer, have a set of screening questions, you should make sure they are covered before you even bring in a candidate. A questionnaire to give to HR or an agent can be useful. A phone interview is often a good first set. As a candidate, I sometimes ask for a phone interview, just to see if it is worth my time going to a face to face. (e.g. if I have serious doubts its worth it)
Not only are you trying to convince them you are a good fit for them, but they are trying to convince you they are a good fit for you. It appears they failed both technically to explain the problem to you, and how they handled it HR wise, so I would count yourself lucky you didn't waste any more time with them.
BTW: Most big companies are diverse and working for one team can be very different to another team. It would be unfair to characterise a company based on one experience.
The problem is, assuming that both variables have a reference type (otherwise you couldn’t synchronize on them), that synchronizing on a variable whose contents could change is broken.
The first read of the variable is done without synchronization and whatever reference the thread will see (which could be a completely outdated value) is used to synchronize on, which does not prevent other threads from synchronizing on a different value of that variable as it will be a completely different object.
Since String and Integer are immutable each change of the variable’s value implies changing the reference contained in the variable, allowing another thread to enter the synchronized block while the thread performing the change is still inside that block.
And due to legal reordering of operations it might even appear as if the second thread performs actions inside the synchronized block before the first thread performs the write. Just recall that the read of the reference to use for synchronization is not synchronized. So it’s like having no synchronization at all.

Under what conditions will writes to non-volatile variables be unseen by other threads? Can I force such conditions for experimental purposes?

I've recently been reading a lot here on SO and elsewhere about threaded memory management, in particular, the use of the volatile keyword. I'm beginning to feel reasonably confident with the concept, however, in order to full appreciate the effect it has I would like to try and run some experiments which illustrate it.
Here is my setup: I have a producer thread (it reads audio data from the microphone, related to my previous question, but the actual data doesn't matter) which passes on data as byte[] to a separate consumer thread. The way in which the data is shared between threads is the primary variable in my experiment: I have tried an ArrayBlockingQueue; I have tried a shared volatile byte[] reference (with a array = array self-reference as recommended in this blog post); and I have also tried a normal non-volatile byte[] with no self reference. Both threads also write the data to disk as they go along.
My hope was to find that, after running for some length of time, the non-volatile byte[] version would have discrepancies between the data that the producer attempted to share and the data that the consumer read data due to some memory writes not being visible in time, while the other two versions would have exactly the same data logged by each thread because of the precautions taken to ensure publication of memory writes. As it happens however, I find 100% accuracy whatever method I use.
I can already think of a few possibilities as to why this occurred, but my main question is: under what conditions are writes to a non-volatile variable unseen to another thread, which as far as I understand is the whole point of volatile? And can I force these conditions for experimental purposes?
My thoughts so far are:
Maybe the two threads are running on the same core and share the same cache, so memory writes are visible immediately?
Maybe CPU load is a factor? Perhaps I need many threads all doing different things before I see any problem?
Maybe I need to wait longer: perhaps such problems are very rare?
Could anyone either suggest how I could design such an experiment or explain why my idea is flawed?
Many thanks.
You won't be able to easily observe the effects of a lack of barriers in your code on an x86 because it has a fairly strong memory model. But that does not mean that the same code would not break on a different architecture. On x86, you generally need to play with the JIT compiler and help it make an optimisation that would not be allowed with a volatile variable, for example variable hoisting.
The code below, on my machine with hotspot 7u25 server, never ends if the variable is non-volatile but stops promptly if it is. You might need to change the sleep delay depending on your machine.
public class Test {
static /* volatile */ boolean done = false;
public static void main(String[] args) throws Exception {
Runnable waiter = new Runnable() {
#Override public void run() {
while(!done);
System.out.println("Exited loop");
}
};
new Thread(waiter).start();
Thread.sleep(100); //wait for JIT compilation
done = true;
System.out.println("done is true");
}
}

Concurrently accessing different members of the same object in Java

I am familiar with many of the mechanisms and idioms surrounding concurrency in Java. Where I am confused is with a simple concept: concurrent access of different members of the same object.
I have a set of variables which can be accessed by two threads, in this case concerning graphical information within a game engine. I need to be able to modify the position of an object in one thread and read it off in another. The standard approach to this problem is to write the following code:
private int xpos;
private object xposAccess;
public int getXpos() {
int result;
synchronized (xposAccess) {
result = xpos;
}
return result;
}
public void setXpos(int xpos) {
synchronized (xposAccess) {
this.xpos = xpos;
}
}
However, I'm writing a real-time game engine, not a 20 questions application. I need things to work fast, especially when I access and modify them as often as I do the position of a graphical asset. I want to remove the synchronized overhead. Even better, I'd like to remove the function call overhead altogether.
private int xpos;
private int bufxpos;
...
public void finalize()
{
bufxpos = xpos;
...
}
Using locks, I can make the threads wait on each other, and then call finalize() while the object is neither being accessed nor modified. After this quick buffering step, both threads are free to act on the object, with one modifying/accessing xpos and one accessing bufxpos.
I have already had success using a similar method where the information was copied in to a second object, and each thread acted on a separate object. However, both members are still part of the same object in the above code, and some funny things begin to happen when both my threads access the object concurrently, even when acting on different members. Unpredictable behaviour, phantom graphical objects, random errors in screen position, etc. To verify that this was indeed a concurrency issue, I ran the code for both threads in a single thread, where it executed flawlessly.
I want performance above all else, and I am considering buffering the critical data in to separate objects. Are my errors caused by concurrent access of the same objects? Is there a better solution for concurrency?
EDIT: If you are doubting my valuation of performance, I should give you more context. My engine is written for Android, and I use it to draw hundreds or thousands of graphic assets. I have a single-threaded solution working, but I have seen a near doubling in performance since implementing the multi-threaded solution, despite the phantom concurrency issues and occasional uncaught exceptions.
EDIT: Thanks for the fantastic discussion about multi-threading performance. In the end, I was able to solve the problem by buffering the data while the worker threads were dormant, and then allowing them each their own set of data within the object to operate on.
If you are dealing with just individual primitives, such as AtomicInteger, which has operations like compareAndSet, are great. They are non-blocking and you can get a good deal of atomicity, and fall back to blocking locks when needed.
For atomically setting accessing variables or objects, you can leverage non-blocking locks, falling back to traditional locks.
However, the simplest step forward from where you are in your code is to use synchronized but not with the implicit this object, but with several different member objects, one per partition of members that need atomic access: synchronized(partition_2) { /* ... */ }, synchronized(partition_1) { /* ... */ }, etc. where you have members private Object partition1;, private Object partition2; etc.
However, if the members cannot be partitioned, then each operation must acquire more than one lock. If so, use the Lock object linked earlier, but make sure that all operation acquires the locks it needs in some universal order, otherwise your code might deadlock.
Update: Perhaps it is genuinely not possible to increase the performance if even volatile presents an unacceptable hit to performance. The fundamental underlying aspect, which you cannot work around, is that mutual exclusion necessarily implies a tradeoff with the substantial benefits of a memory hierarchy, i. e. caches. The fastest per-processor-core memory cache cannot hold variables that you are synchronizing. Processor registers are arguably the fastest "cache" and even if the processor is sophisticated enough to keep the closest caches consistent, it still precludes keeping values in registers. Hopefully this helps you see that it is a fundamental block to performance and there is no magic wand.
In case of mobile platforms, the platform is deliberately designed against letting arbitrary apps run as fast as possible, because of battery life concerns. It is not a priority to let any one app exhaust battery in a couple of hours.
Given the first factor, the best thing to do would be redesign your app so that it doesn't need as much mutual exclusion -- consider tracking x-pos inconsistently except if two objects come close to each other say within a 10x10 box. So you have locking on a coarse grid of 10x10 boxes and as long an object is within it you track position inconsistently. Not sure if that applies or makes sense for your app, but it is just an example to convey the spirit of an algorithm redesign rather than search for a faster synchronization method.
I don't think that I get exactly what you mean, but generally
Is there a better solution for concurrency?
Yes, there is:
prefer Java Lock API over the intrinsic built-in lock.
think of using non-blocking constructs provided in atomic API such as AtomicInteger for better performance.
I think synchronization or any kind of locking can be avoided here with using an immutable object for inter-thread communication. Let's say the message to be sent looks like this:
public final class ImmutableMessage {
private final int xPos;
// ... other fields with adhering the rules of immutability
public ImmutableObject(int xPos /* arguments */) { ... }
public int getXPos() { return xPos; }
}
Then somewhere in the writer thread:
sharedObject.message = new ImmutableMessage(1);
The reader thread:
ImmutableMessage message = sharedObject.message;
int xPos = message.getXPos();
The shared object (public field for the shake of simplicity):
public class SharedObject {
public volatile ImmutableMessage message;
}
I guess things change rapidly in a real-time game engine which might end up creating a lot of ImmutableMessage object which in the end may degrade the performance, but may be it is balanced by the non-locking nature of this solution.
Finally, if you have one free hour for this topic, I think it's worth to watch this video about the Java Memory Model by Angelika Langer.

Looking for a surprising concurrent Java program

Since I am writing a profiler focusing on concurrency aspects, I am looking for a good artificial example using synchronization mechanisms in Java. My profiler makes visible some actions related to threading; for instance:
calling notify/wait
thread changes its state
a thread is contended with another thread for a monitor lock
a monitor lock has been acquired by a thread after contending for it with another
measure the execution time of each method
which thread has accessed a certain method and how often
etc.
So what I am looking for, is a Java program which seems to be understood at first glance, but when executing it, you start to wonder about the results. I hope that my profiler might be able to detect what is going on in the background.
To clarify myself I give you an example, the book Java Concurrency in Practice by Brian Goetz gives "toxic" code examples which are used for learning reasons.
#NotThreadSafe
public class ListHelper<E> {
public List<E> list =
Collections.synchronizedList(new ArrayList<E>());
...
public synchronized boolean putIfAbsent(E x) {
boolean absent = !list.contains(x);
if (absent)
list.add(x);
return absent;
}
}
This is intended to be an extension of a thread-safe class, by the method putIfAbsent. Since list is synchronized, but putIfAbsent uses another lock for protecting the state as the methods defined on the list.
The profiler could display the used monitor locks and to the suprise of the user (or not...) the user would see there are two possible monitor locks instead of one.
I don't like this example very much, but I wouldn't ask, if I had already a bunch of good examples.
I found out my question is similar to this: What is the most frequent concurrency issue you've encountered in Java? and Java concurrency bug patterns.
But they refer only to broken concurrent programs. I am also looking for thread-safe implementations, but where it still not obvious that they are thread-safe.
Have a look at the list of FindBugs bug descriptions, specifically those belonging to category of Multithreaded correctness (right table column).
Each of these bugs contain references on why a particular idiom is bad and how can it be solved.
I'd go back in time, like maybe seven years or more, and find some open source code from the era before java.util.concurrent. Just about anything that rolled its own concurrency is going to have some subtle bugs in it, 'cause concurrency is hard to get right.
How about this?
class ObjectReference {
private volatile Object obj = null;
public void set(Object obj) {
if (obj == null) {
throw new IllegalArgumentException();
}
this.obj = obj;
synchronized (this) {
notifyAll();
}
}
/**
* This method never returns null
*/
public Object waitAndGet() {
if (obj != null) {
return obj;
}
synchronized (this) {
wait();
return obj;
}
}
}
You could get null from waitAndGet() actually. See — Do spurious wakeups actually happen?
Dining philosophers problem is a classical concurrency example. This link has one possible solution and more can be found around the web.
As described in the first link this example illustrates quite many of the common concurrency problems. Please let your profiler show how many it can track!
See the The Java Specialists' Newsletter for a consistent stream of small Java puzzles, many of which should fit your testing needs.
I would recommend looking around (or asking the authors) for the IBM ConTest benchmark suite as it contains a number of Java concurrency bugs (unfortunately not large open-source programs). The good thing about this benchmark is that the bugs are already documented (type, and location).
If you want to find more programs I would recommend taking a look at some of the research papers in the area of software testing/quality of concurrent programs. They should indicate the sample programs they've used in their studies.
If all else fails you could try search on GitHub (or similar service) for repositories that contain the necessary concurrency mechanisms (i.e., synchronization). You might find a large amount of Java code that way, the only problem is that the bugs are not documented (unless you look for commit fixes).
I think these three suggestions will supply you with enough programs to test your concurrency profiler.
Maybe Eclipse or a Tomcat deployment? Neither one is very artificial, but I could imagine wanting good tools while debugging one or the other.

Should I make all my java code threadsafe?

I was reading some of the concurrency patterns in Brian Goetze's Java Concurrency in Practice and got confused over when is the right time to make the code thread safe.
I normally write code that's meant to run in a single thread so I do not worry too much about thread safety and synchronization etc. However, there always exists a possibility that the same code may be re-used sometime later in a multi-threaded environment.
So my question is, when should one start thinking about thread safety? Should I assume the worst at the onset and always write thread-safe code from the beginning or should I revisit the code and modify for thread safety if such a need arises later ?
Are there some concurrency patterns/anti-patterns that I must always be aware of even while writing single-threaded applications so that my code doesn't break if it's later used in a multi-threaded environment ?
You should think about thread safety when your code will be used in a multithreaded environment. There is no point in tackling the complexity if it will only be run in a singlethreaded environment.
That being said, there are simple things you can do that are good practices anyway and will help with multithreading:
As Josh Bloch says, Favor Immutability. Immutable classes are threadsafe almost by definition;
Only use data members or static variables where required rather than a convenience.
Making your code thread safe can be as simple as adding a comment that says the class was not designed for concurrent use by multiple threads. So, in that sense: yes, all of your classes should be thread safe.
However, in practice, many, many types are likely to be used by only a single thread, often only referenced as local variables. This can be true even if the program as a whole is multi-threaded. It would be a mistake to make every object safe for multi-threaded access. While the penalty may be small, it is pervasive, and can add up to be a significant, hard-to-fix performance problem.
I advise you to obtain a copy of "Effective Java", 2nd Ed. by Joshua Bloch. That book devotes a whole chapter to concurrency, including a solid exploration of the issue of when (and when not) to synchronize. Note, for example, the title of item 67 in "Effective Java": 'Avoid excessive synchronization', which is elaborated over five pages.
As was stated previously, you need thread safety when you think your code will be used in a multithreaded environment.
Consider the approach taken by the Collections classes, where you provide a thread-unsafe class that does all its work without using synchronize, and you also provide another class that wraps the unsynchonized class and providing all of the same public methods but making them synchronize on the underlying object.
This gives your clients a choice of using the multi-threaded or the single-threaded version of your code. It may also simplify your coding by isolating all of the threading/locking logic in a separate class.

Categories

Resources