Looking for a surprising concurrent Java program - java

Since I am writing a profiler focusing on concurrency aspects, I am looking for a good artificial example using synchronization mechanisms in Java. My profiler makes visible some actions related to threading; for instance:
calling notify/wait
thread changes its state
a thread is contended with another thread for a monitor lock
a monitor lock has been acquired by a thread after contending for it with another
measure the execution time of each method
which thread has accessed a certain method and how often
etc.
So what I am looking for, is a Java program which seems to be understood at first glance, but when executing it, you start to wonder about the results. I hope that my profiler might be able to detect what is going on in the background.
To clarify myself I give you an example, the book Java Concurrency in Practice by Brian Goetz gives "toxic" code examples which are used for learning reasons.
#NotThreadSafe
public class ListHelper<E> {
public List<E> list =
Collections.synchronizedList(new ArrayList<E>());
...
public synchronized boolean putIfAbsent(E x) {
boolean absent = !list.contains(x);
if (absent)
list.add(x);
return absent;
}
}
This is intended to be an extension of a thread-safe class, by the method putIfAbsent. Since list is synchronized, but putIfAbsent uses another lock for protecting the state as the methods defined on the list.
The profiler could display the used monitor locks and to the suprise of the user (or not...) the user would see there are two possible monitor locks instead of one.
I don't like this example very much, but I wouldn't ask, if I had already a bunch of good examples.
I found out my question is similar to this: What is the most frequent concurrency issue you've encountered in Java? and Java concurrency bug patterns.
But they refer only to broken concurrent programs. I am also looking for thread-safe implementations, but where it still not obvious that they are thread-safe.

Have a look at the list of FindBugs bug descriptions, specifically those belonging to category of Multithreaded correctness (right table column).
Each of these bugs contain references on why a particular idiom is bad and how can it be solved.

I'd go back in time, like maybe seven years or more, and find some open source code from the era before java.util.concurrent. Just about anything that rolled its own concurrency is going to have some subtle bugs in it, 'cause concurrency is hard to get right.

How about this?
class ObjectReference {
private volatile Object obj = null;
public void set(Object obj) {
if (obj == null) {
throw new IllegalArgumentException();
}
this.obj = obj;
synchronized (this) {
notifyAll();
}
}
/**
* This method never returns null
*/
public Object waitAndGet() {
if (obj != null) {
return obj;
}
synchronized (this) {
wait();
return obj;
}
}
}
You could get null from waitAndGet() actually. See — Do spurious wakeups actually happen?

Dining philosophers problem is a classical concurrency example. This link has one possible solution and more can be found around the web.
As described in the first link this example illustrates quite many of the common concurrency problems. Please let your profiler show how many it can track!

See the The Java Specialists' Newsletter for a consistent stream of small Java puzzles, many of which should fit your testing needs.

I would recommend looking around (or asking the authors) for the IBM ConTest benchmark suite as it contains a number of Java concurrency bugs (unfortunately not large open-source programs). The good thing about this benchmark is that the bugs are already documented (type, and location).
If you want to find more programs I would recommend taking a look at some of the research papers in the area of software testing/quality of concurrent programs. They should indicate the sample programs they've used in their studies.
If all else fails you could try search on GitHub (or similar service) for repositories that contain the necessary concurrency mechanisms (i.e., synchronization). You might find a large amount of Java code that way, the only problem is that the bugs are not documented (unless you look for commit fixes).
I think these three suggestions will supply you with enough programs to test your concurrency profiler.

Maybe Eclipse or a Tomcat deployment? Neither one is very artificial, but I could imagine wanting good tools while debugging one or the other.

Related

Why is this multithreading code broken?

Why is the following multithreading related example code broken?
public void method1(){
synchronized(intVariable){
}
synchronized(stringVariable){
}
}
public void method2(){
synchronized(stringVariable){
}
synchronized(intVariable){
}
}
Above two methods are from same class where stringVariable and intVariable are instance variables.
I thought it will not cause any problem, at least with Thread deadlocks. Is there any other reason why this code is broken?
Either you didn't understand the problem, or you are right that this wouldn't cause a deadlock.
Perhaps he was looking for something more obscure like,
you can't lock an int field.
locking a String object is a very bad idea because you don't know how it is shared.
But I doubt it. In any case, he should have clarified the question and your answer because perhaps he might have learnt something, if only how to make the question clearer next time.
If you, as an interviewer, have a set of screening questions, you should make sure they are covered before you even bring in a candidate. A questionnaire to give to HR or an agent can be useful. A phone interview is often a good first set. As a candidate, I sometimes ask for a phone interview, just to see if it is worth my time going to a face to face. (e.g. if I have serious doubts its worth it)
Not only are you trying to convince them you are a good fit for them, but they are trying to convince you they are a good fit for you. It appears they failed both technically to explain the problem to you, and how they handled it HR wise, so I would count yourself lucky you didn't waste any more time with them.
BTW: Most big companies are diverse and working for one team can be very different to another team. It would be unfair to characterise a company based on one experience.
The problem is, assuming that both variables have a reference type (otherwise you couldn’t synchronize on them), that synchronizing on a variable whose contents could change is broken.
The first read of the variable is done without synchronization and whatever reference the thread will see (which could be a completely outdated value) is used to synchronize on, which does not prevent other threads from synchronizing on a different value of that variable as it will be a completely different object.
Since String and Integer are immutable each change of the variable’s value implies changing the reference contained in the variable, allowing another thread to enter the synchronized block while the thread performing the change is still inside that block.
And due to legal reordering of operations it might even appear as if the second thread performs actions inside the synchronized block before the first thread performs the write. Just recall that the read of the reference to use for synchronization is not synchronized. So it’s like having no synchronization at all.

Multithreaded correctness - Inconsistent synchronization

Whats wrong with this...?
public final void setListValid(final List<ValidRes> listValidRes) {
this.listValidRes = listValidRes;
}
Sonar yells me at:
Inconsistent synchronization of xxx.listValidRes; locked 50% of time
Does anyone know what things i need to do ?
The code given in the question has no synchronization. I assume that you synchronize on the this.listValidRes somewhere else in your code. And exactly that is what Sonar tells you: if you synchronize on a resource do so on all usages or don't do it at all and have someone else deal with it.
Basically it is a design decision:
You can chose to not synchronize and have the client bother with it. The advantage is that without synchronization it will be significantly faster. So if your class is used in a single-threaded setup, it will be better to ditch synchronization. But document it clearly to be not threadsafe or a client will use it multithreaded and complain about weird errors...
If you chose to (or have to) synchronize, then do it on every usage of the critical resource. There are different ways to achieve this. Maybe you want to show a usage of the resource that you in fact did synchronize. Maybe I or someone else can give you some good advice on that.

Asynchronous atomic array

I have a critical section of my (Java) code which basically goes like the snippet below. They're coming in from a nio server.
void messageReceived(User user, Message message) {
synchronized(entryLock) {
userRegistry.updateLastMessageReceived(user,time());
server.receive(user,message);
}
}
However, a high percentage of my messages are not going to change the server state, really. They're merely the client saying "hello, I'm still here". I really don't want to have to make that inside the synchronization block.
I could use a synchronous map or something like that, but it's still going to incur a synchronization penalty.
What I would really like to do is to have something like a drop box, like this
void messageReceived(User user, Message message) {
dropbox.add(new UserReceived(user,time());
if(message.getType() != message.TYPE_KEPT_ALIVE) {
synchronized(entryLock) {
server.receive(user,message);
}
}
}
I have a cleanup routine to automatically put clients that aren't active to sleep. So instead of synchronizing on every kept alive message to update the registry, the cleanup routine can simply compile the kept alive messages in a single synchronization block.
So naturally, reconigizing a need for this, the first thing I did was start making a solution. Then I decided this was a non-trivial class, and a problem that was more than likely fairly common. so here I am.
tl;dr Is there a Java library or other solution I can use to facilitate atomically adding to a list of objects in an asynchronous manner? Collecting from the list in an asychronous manner is not required. I just don't want to synchronize on every add to the list.
ConcurrentLinkedQueue claims to be:
This implementation employs an efficient "wait-free" algorithm based on one described in Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms by Maged M. Michael and Michael L. Scott.
I'm not sure what the quotes on "wait-free" entail but the Concurrent* classes are good places to look for structures like you're looking for.
You might also be interested in the following: Effective Concurrency: Lock-Free Code — A False Sense of Security. It talks about how hard these things are to get right, even for experts.
Well, there are few things you must bear in mind.
First, there is very little "synchronization cost" if there is little contention (more than one thread trying to enter the synchronized block at the same time).
Second, if there is contention, you're going to incur some cost no matter what technique you're using. Paul is right about ConcurrentLinkedQueue and the "wait-free" means that thread concurrency control is not done using locks, but still, you will always pay some price for contention. You may also want to look at ConcurrentHashMap because I'm not sure a list is what you're looking for. Using both classes is quite simple and common.
If you want to be more adventurous, you might find some non-locking synchronization primitives in java.util.concurrent.atomic.
One thing we could do is to use a simple ArrayList for keep-alive messages:
Keep adding to this list whenever each keep-alive message comes.
The other thread would synch on a lock X and read and process
keep-alives. Note that this thread is not removing from list only
reading/copying.
Finally in messageReceived itself you check if the list has grown
say beyond 1000, in which case you synch on the lock X and clear the
list.
List keepAliveList = new ArrayList();
void messageReceived(User user, Message message) {
if(message.getType() == message.TYPE_KEPT_ALIVE) {
if(keepAliveList.size() > THRESHOLD) {
synchronized(X) {
processList.addAll(list);
list.clear();
}
}
keepAliveList.add(message);
}
}
//on another thread
void checkKeepAlives() {
synchronized(X) {
processList.addAll(list)
}
processKeepAlives(processList);
}

Java concurrency - use which technique to achieve safety?

I have a list of personId. There are two API calls to update it (add and remove):
public void add(String newPersonName) {
if (personNameIdMap.get(newPersonName) != null) {
myPersonId.add(personNameIdMap.get(newPersonName)
} else {
// get the id from Twitter and add to the list
}
// make an API call to Twitter
}
public void delete(String personNAme) {
if (personNameIdMap.get(newPersonName) != null) {
myPersonId.remove(personNameIdMap.get(newPersonName)
} else {
// wrong person name
}
// make an API call to Twitter
}
I know there can be concurrency problem. I read about 3 solutions:
synchronized the method
use Collections.synchronizedlist()
CopyOnWriteArrayList
I am not sure which one to prefer to prevent the inconsistency.
1) synchronized the method
2) use Collections.synchronizedlist
3) CopyOnWriteArrayList ..
All will work, it's a matter of what kind of performance / features you need.
Method #1 and #2 are blocking methods. If you synchronize the methods, you handle concurrency yourself. If you wrap a list in Collections.synchronizedList, it handles it for you. (IMHO #2 is safer -- just be sure to use it as the docs say, and don't let anything access the raw list that is wrapped inside the synchronizedList.)
CopyOnWriteArrayList is one of those weird things that has use in certain applications. It's a non-blocking quasi-immutable list, namely, if Thread A iterates through the list while Thread B is changing it, Thread A will iterate through a snapshot of the old list. If you need non-blocking performance, and you are rarely writing to the list, but frequently reading from it, then perhaps this is the best one to use.
edit: There are at least two other options:
4) use Vector instead of ArrayList; Vector implements List and is already synchronized. However, it's generally frowned, upon as it's considered an old-school class (was there since Java 1.0!), and should be equivalent to #2.
5) access the List serially from only one thread. If you do this, you're guaranteed not to have any concurrency problems with the List itself. One way to do this is to use Executors.newSingleThreadExecutor and queue up tasks one-by-one to access the list. This moves the resource contention from your list to the ExecutorService; if the tasks are short, it may be fine, but if some are lengthy they may cause others to block longer than desired.
In the end you need to think about concurrency at the application level: thread-safety should be a requirement, and find out how to get the performance you need with the simplest design possible.
On a side note, you're calling personNameIdMap.get(newPersonName) twice in add() and delete(). This suffers from concurrency problems if another thread modifies personNameIdMap between the two calls in each method. You're better off doing
PersonId id = personNameIdMap.get(newPersonName);
if (id != null){
myPersonId.add(id);
}
else
{
// something else
}
Collections.synchronizedList is the easiest to use and probably the best option. It simply wraps the underlying list with synchronized. Note that multi-step operations (eg for loop) still need to be synchronized by you.
Some quick things
Don't synchronize the method unless you really need to - It just locks the entire object until the method completes; hardly a desirable effect
CopyOnWriteArrayList is a very specialized list that most likely you wouldn't want since you have an add method. Its essentially a normal ArrayList but each time something is added the whole array is rebuilt, a very expensive task. Its thread safe, but not really the desired result
Synchronized is the old way of working with threads. Avoid it in favor of new idioms mostly expressed in the java.util.concurrent package.
See 1.
A CopyOnWriteArrayList has fast read and slow writes. If you're making a lot of changes to it, it might start to drag on your performance.
Concurrency isn't about an isolated choice of what mechanism or type to use in a single method. You'll need to think about it from a higher level to understand all of its impacts.
Are you making changes to personNameIdMap within those methods, or any other data structures access to which should also be synchronized? If so, it may be easiest to mark the methods as synchronized; otherwise, you might consider using Collections.synchronizedList to get a synchronized view of myPersonId and then doing all list operations through that synchronized view. Note that you should not manipulate myPersonId directly in this case, but do all accesses solely through the list returned from the Collections.synchronizedList call.
Either way, you have to make sure that there can never be a situation where a read and a write or two writes could occur simultaneously to the same unsynchronized data structure. Data structures documented as thread-safe or returned from Collections.synchronizedList, Collections.synchronizedMap, etc. are exceptions to this rule, so calls to those can be put anywhere. Non-synchronized data structures can still be used safely inside methods declared to be synchronized, however, because such methods are guaranteed by the JVM to never run at the same time, and therefore there could be no concurrent reading / writing.
In your case from the code that you posted, all 3 ways are acceptable. However, there are some specific characteristics:
#3: This should have the same effect as #2 but may run faster or slower depending on the system and workload.
#1: This way is the most flexible. Only with #1 can you make the the add() and delete() methods more complex. For example, if you need to read or write multiple items in the list, then you cannot use #2 or #3, because some other thread can still see the list being half updated.
Java concurrency (multi-threading) :
Concurrency is the ability to run several programs or several parts of a program in parallel. If a time consuming task can be performed asynchronously or in parallel, this improve the throughput and the interactivity of the program.
We can do concurrent programming with Java. By java concurrency we can do parallel programming, immutability, threads, the executor framework (thread pools), futures, callables and the fork-join framework programmings.

detecting race condition using findbugs or another analysis tool

Below bean is not thread-safe: method addIfNotExist is not synchronized, so it is possible that the same term gets added twice because of race condition. I annotated the class using JCIP annotation #ThreadSafe hoping FindBugs would find that the implementation is not thread-safe and flag it as an error, but it is not. Are there any tools that identify these type of errors in code base?
Methods addIfNotExist and isExist should be synchronized to make this bean thread-safe. Should isExist method be also synchronized?
package com.test;
import java.util.ArrayList;
import java.util.Collection;
import net.jcip.annotations.GuardedBy;
import net.jcip.annotations.ThreadSafe;
#ThreadSafe
public class Dictionary {
#GuardedBy("this")
public Collection<String> terms = new ArrayList<String>();
public void addIfNotExist(final String input) {
if (!this.terms.contains(input)) {
this.terms.add(input);
}
}
public boolean isExist(final String input){
return this.terms.contains(input);
}
public void remove(final String input){
this.terms.remove(input);
}
}
It is tremendously difficult to write safe multi-threaded code that has any degree of complexity: this type of locking (using monitors) is fraught with all sorts of intermittent race conditions, deadlocks and livelock issues that often evades detection right up into promotion into production systems; if you can, consider using message passing, software transactional memory or persistent data structures instead.
FindBugs (or indeed any static analysis tools) can only go so far in detecting non-thread safe code: by their very definition, race conditions are time-sensitive - they require many execution runs to manifest, so static analysis fails in this respect because they don't run any code and only look for common code signatures. The best things IMHO to detect issues are:
A second pair of eyes - rigorous code reviews with peers who are familiar with the code - goes along way in finding bugs that are not immediately obvious to the original author.
Continuous integration & exhaustive automated tests that exercise multi-threadedness on a variety of hardware, and ruthlessly investigate any 'intermittent' test failures.
In answer to the second question, yes, all methods that make some reference to terms should be guarded by a synchronization monitor, regardless of whether it is a write or read operation; consider what happens if thread A calls remove("BOB") whilst thread B is calling isExists("BOB") when you don't have synchronization - thread A will be compacting the array list while thread B will be attempting to traversing it.
At best, you will not be able to determine result of isExists("BOB"), but it is entirely possible that B could intermittently throw an IndexOutOfBounds exception since the size of the array could have changed (i.e shrunk) as it was being traversed over.
Synchronize, and while you still can't be sure of the order in which the calls are made (due to the non-deterministic nature of scheduling), but at least you will be guaranteed that operations on terms will be atomic - that is, they are not being altered by something else whilst the current thread is running.
This is something that you can use at run-time (during automated unit tests or integration tests or what have you) to help find threading problems: IBM ConTest (Concurrency Testing)
ConTest description:
"The ConTest technology is innovative and counter-intuitive. Specifically, ConTest systematically and transparently schedules the execution of program threads such that program scenarios which are likely to contain race conditions, deadlocks and other intermittemt bugs - collectively called synchronization problems - are forced to appear with high frequency. In doing so, ConTest dramtically improves the quality of testing and reduces development expense, as bugs are found earlier in the testing process. "
To find such incorrectly synchronized code blocks, I use the following algorithm:
Record the threads for all field modifications using instrumentation. If a field is modified by more than one thread without synchronization, I have found a data race.
I implemented this algorithm inside http://vmlens.com, which is a dynamic tool to find data races inside java programs.

Categories

Resources