Java array synchronization - java

I have a kind of async task managing class, which has an array like this:
public static int[][][] tasks;
Mostly I access the cells like this:
synchronized(tasks[A][B]) {
// Doing something with tasks[A][B][0]
}
My question is, if I do this:
synchronized(tasks[A]) {
// ...
}
will it also block threads trying to enter synchronized(tasks[A][B])?
In other words, does a synchronized access to an array also synchronizes the accsess to it's cells?
If not, how to I block the WHOLE array of tasks[A] for my thread?
Edit: the answer is NO. When someone is in a synchronized block on tasks[A] someone else can simultaniously be in a synchronized block on tasks[A][B] for example - because it's a different object. So when talking about accessing objects from one place at a time, arrays are no different: to touch object X from one place at a time you need to surround it by synchronized(X) EVERYWHERE you touch it.

No. Each array is an Object (with a monitor) unto itself; and the array tasks[A] is a distinct object from the array tasks[A][B]. The solution, if you want to synchronize all accesses to "sub" arrays of tasks[A] is simply that you must do synchronized (tasks[A]). If all accesses into the descendant objects (say, tasks[A][B]) do this, then any further synchronization is redundant.'
It appears your underlying question is actually something like "how can I safely modify the structure as well as the contents of a data structure while retaining the best concurrency possible?" If you augment your question a bit more about the problem space, we might be able to delve deeper. A three-dimensional array may not be the best solution.

int[][][] is an array of arrays of integer arrays, so your synchronized(tasks[A][B]) is synchronizing on the lowest level object, an array of integers, blocking other synchronized access to that same array.
synchronized(tasks[A]) on the other hand is synchronizing the object at the next level up - an array of integer arrays. This prevents synchronized access to that array, which means, in practice that any other code which uses synchronized(tasks[A]) will be blocked - which seems to be what you want, so long as all your acccesses to tasks synchronizes at the same level.
Note that synchronize does not lock anything! However, if two threads attempt to synchronize on the same object, one will have to wait.
It doesn't matter that you then work on another object (your array of integers).
I'm afraid I'm saying that andersoj's answer is misleading. You're doing the right thing.

Whenever I see code that grabs lots of different mutexes or monitors, my first reaction is to worry about deadlock; in your current code, do you ever lock multiple monitors in the same thread? If so, do you ensure that they are locked in a canonical ordering each time?
It would probably help if you could explain what you are trying to accomplish and how your are using / modifying this tasks array. There are surprising (or perhaps unsurprising) number of cases where the utilities in java.util.concurrent are sufficient, and using individual monitors isn't necessary. Of course, it all depends on what exactlly you are trying to do. Also, if the reason you are trying to grab so many different locks is because you are reading them very frequently, it is possible that using a single read-write lock for the entire 3d jagged-array object might be sufficient for your needs.

Related

Are simultaneous reads from an array thread-safe?

I have an array which contains integer values declared like this:
int data[] = new int[n];
Each value needs to be processed and I am splitting the work into pieces so that it can be processed by separate threads. The array will not be modified during processing.
Can all the processing threads read separate parts of the array concurrently? Or do I have to use a lock?
In other words: is this work order thread-safe?
Array is created and filled
Threads are created and started
Thread 0 reads data[0..3]
Thread 1 reads data[4..7]
Thread 2 reads data[8..n]
Reading contents of an array (or any other collection, fields of an object, etc.) by multiple threads is thread-safe provided that the data is not modified in the meantime.
If you fill the array with data to process and pass it to different threads for reading, then the data will be properly read and no data race will be possible.
Note that this will only work if you create the threads after you have filled the array. If you pass the array for processing to some already existing threads without synchronization, the contents of the array may not be read correctly. In such case the method in which the thread obtains the reference to the array should be synchronized, because a synchronized block forces memory update between threads.
On a side note: using an immutable collection may be a good idea. That way you ensure no modification is even possible. I would sugges using such wrapper. Check the java.util.concurrent.atomic package, there should be something you can use.
As long as the threads don't modify the contents in the array, it is fine to read the array from multiple threads.
If you ensure all the threads are just reading, its thread safe. Though you should not be relying on that fact alone and try to make your array immutable via a wrapper.
Sure, if you just want to read it, pass the array to the threads when you create them. There won't be any problem as long as you don't modify it.
Reading from array array is Thread-Safe Operation but if you are Modifying array than consider using class AtomicIntegerArray.
Consider populating a ConcurrendLinkedQueue and have each thread pull from it. This would ensure that there are no concurrency issues.
Your threads would each be pulling their data from the top of the queue and processing it.

In Java can I depend on reference assignment being atomic to implement copy on write?

If I have an unsynchronized java collection in a multithreaded environment, and I don't want to force readers of the collection to synchronize[1], is a solution where I synchronize the writers and use the atomicity of reference assignment feasible? Something like:
private Collection global = new HashSet(); // start threading after this
void allUpdatesGoThroughHere(Object exampleOperand) {
// My hypothesis is that this prevents operations in the block being re-ordered
synchronized(global) {
Collection copy = new HashSet(global);
copy.remove(exampleOperand);
// Given my hypothesis, we should have a fully constructed object here. So a
// reader will either get the old or the new Collection, but never an
// inconsistent one.
global = copy;
}
}
// Do multithreaded reads here. All reads are done through a reference copy like:
// Collection copy = global;
// for (Object elm: copy) {...
// so the global reference being updated half way through should have no impact
Rolling your own solution seems to often fail in these type of situations, so I'd be interested in knowing other patterns, collections or libraries I could use to prevent object creation and blocking for my data consumers.
[1] The reasons being a large proportion of time spent in reads compared to writes, combined with the risk of introducing deadlocks.
Edit: A lot of good information in several of the answers and comments, some important points:
A bug was present in the code I posted. Synchronizing on global (a badly named variable) can fail to protect the syncronized block after a swap.
You could fix this by synchronizing on the class (moving the synchronized keyword to the method), but there may be other bugs. A safer and more maintainable solution is to use something from java.util.concurrent.
There is no "eventual consistency guarantee" in the code I posted, one way to make sure that readers do get to see the updates by writers is to use the volatile keyword.
On reflection the general problem that motivated this question was trying to implement lock free reads with locked writes in java, however my (solved) problem was with a collection, which may be unnecessarily confusing for future readers. So in case it is not obvious the code I posted works by allowing one writer at a time to perform edits to "some object" that is being read unprotected by multiple reader threads. Commits of the edit are done through an atomic operation so readers can only get the pre-edit or post-edit "object". When/if the reader thread gets the update, it cannot occur in the middle of a read as the read is occurring on the old copy of the "object". A simple solution that had probably been discovered and proved to be broken in some way prior to the availability of better concurrency support in java.
Rather than trying to roll out your own solution, why not use a ConcurrentHashMap as your set and just set all the values to some standard value? (A constant like Boolean.TRUE would work well.)
I think this implementation works well with the many-readers-few-writers scenario. There's even a constructor that lets you set the expected "concurrency level".
Update: Veer has suggested using the Collections.newSetFromMap utility method to turn the ConcurrentHashMap into a Set. Since the method takes a Map<E,Boolean> my guess is that it does the same thing with setting all the values to Boolean.TRUE behind-the-scenes.
Update: Addressing the poster's example
That is probably what I will end up going with, but I am still curious about how my minimalist solution could fail. – MilesHampson
Your minimalist solution would work just fine with a bit of tweaking. My worry is that, although it's minimal now, it might get more complicated in the future. It's hard to remember all of the conditions you assume when making something thread-safe—especially if you're coming back to the code weeks/months/years later to make a seemingly insignificant tweak. If the ConcurrentHashMap does everything you need with sufficient performance then why not use that instead? All the nasty concurrency details are encapsulated away and even 6-months-from-now you will have a hard time messing it up!
You do need at least one tweak before your current solution will work. As has already been pointed out, you should probably add the volatile modifier to global's declaration. I don't know if you have a C/C++ background, but I was very surprised when I learned that the semantics of volatile in Java are actually much more complicated than in C. If you're planning on doing a lot of concurrent programming in Java then it'd be a good idea to familiarize yourself with the basics of the Java memory model. If you don't make the reference to global a volatile reference then it's possible that no thread will ever see any changes to the value of global until they try to update it, at which point entering the synchronized block will flush the local cache and get the updated reference value.
However, even with the addition of volatile there's still a huge problem. Here's a problem scenario with two threads:
We begin with the empty set, or global={}. Threads A and B both have this value in their thread-local cached memory.
Thread A obtains obtains the synchronized lock on global and starts the update by making a copy of global and adding the new key to the set.
While Thread A is still inside the synchronized block, Thread B reads its local value of global onto the stack and tries to enter the synchronized block. Since Thread A is currently inside the monitor Thread B blocks.
Thread A completes the update by setting the reference and exiting the monitor, resulting in global={1}.
Thread B is now able to enter the monitor and makes a copy of the global={1} set.
Thread A decides to make another update, reads in its local global reference and tries to enter the synchronized block. Since Thread B currently holds the lock on {} there is no lock on {1} and Thread A successfully enters the monitor!
Thread A also makes a copy of {1} for purposes of updating.
Now Threads A and B are both inside the synchronized block and they have identical copies of the global={1} set. This means that one of their updates will be lost! This situation is caused by the fact that you're synchronizing on an object stored in a reference that you're updating inside your synchronized block. You should always be very careful which objects you use to synchronize. You can fix this problem by adding a new variable to act as the lock:
private volatile Collection global = new HashSet(); // start threading after this
private final Object globalLock = new Object(); // final reference used for synchronization
void allUpdatesGoThroughHere(Object exampleOperand) {
// My hypothesis is that this prevents operations in the block being re-ordered
synchronized(globalLock) {
Collection copy = new HashSet(global);
copy.remove(exampleOperand);
// Given my hypothesis, we should have a fully constructed object here. So a
// reader will either get the old or the new Collection, but never an
// inconsistent one.
global = copy;
}
}
This bug was insidious enough that none of the other answers have addressed it yet. It's these kinds of crazy concurrency details that cause me to recommend using something from the already-debugged java.util.concurrent library rather than trying to put something together yourself. I think the above solution would work—but how easy would it be to screw it up again? This would be so much easier:
private final Set<Object> global = Collections.newSetFromMap(new ConcurrentHashMap<Object,Boolean>());
Since the reference is final you don't need to worry about threads using stale references, and since the ConcurrentHashMap handles all the nasty memory model issues internally you don't have to worry about all the nasty details of monitors and memory barriers!
According to the relevant Java Tutorial,
We have already seen that an increment expression, such as c++, does not describe an atomic action. Even very simple expressions can define complex actions that can decompose into other actions. However, there are actions you can specify that are atomic:
Reads and writes are atomic for reference variables and for most primitive variables (all types except long and double).
Reads and writes are atomic for all variables declared volatile (including long and double variables).
This is reaffirmed by Section §17.7 of the Java Language Specification
Writes to and reads of references are always atomic, regardless of whether they are implemented as 32-bit or 64-bit values.
It appears that you can indeed rely on reference access being atomic; however, recognize that this does not ensure that all readers will read an updated value for global after this write -- i.e. there is no memory ordering guarantee here.
If you use an implicit lock via synchronized on all access to global, then you can forge some memory consistency here... but it might be better to use an alternative approach.
You also appear to want the collection in global to remain immutable... luckily, there is Collections.unmodifiableSet which you can use to enforce this. As an example, you should likely do something like the following...
private volatile Collection global = Collections.unmodifiableSet(new HashSet());
... that, or using AtomicReference,
private AtomicReference<Collection> global = new AtomicReference<>(Collections.unmodifiableSet(new HashSet()));
You would then use Collections.unmodifiableSet for your modified copies as well.
// ... All reads are done through a reference copy like:
// Collection copy = global;
// for (Object elm: copy) {...
// so the global reference being updated half way through should have no impact
You should know that making a copy here is redundant, as internally for (Object elm : global) creates an Iterator as follows...
final Iterator it = global.iterator();
while (it.hasNext()) {
Object elm = it.next();
}
There is therefore no chance of switching to an entirely different value for global in the midst of reading.
All that aside, I agree with the sentiment expressed by DaoWen... is there any reason you're rolling your own data structure here when there may be an alternative available in java.util.concurrent? I figured maybe you're dealing with an older Java, since you use raw types, but it won't hurt to ask.
You can find copy-on-write collection semantics provided by CopyOnWriteArrayList, or its cousin CopyOnWriteArraySet (which implements a Set using the former).
Also suggested by DaoWen, have you considered using a ConcurrentHashMap? They guarantee that using a for loop as you've done in your example will be consistent.
Similarly, Iterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration.
Internally, an Iterator is used for enhanced for over an Iterable.
You can craft a Set from this by utilizing Collections.newSetFromMap like follows:
final Set<E> safeSet = Collections.newSetFromMap(new ConcurrentHashMap<E, Boolean>());
...
/* guaranteed to reflect the state of the set at read-time */
for (final E elem : safeSet) {
...
}
I think your original idea was sound, and DaoWen did a good job getting the bugs out. Unless you can find something that does everything for you, it's better to understand these things than hope some magical class will do it for you. Magical classes can make your life easier and reduce the number of mistakes, but you do want to understand what they are doing.
ConcurrentSkipListSet might do a better job for you here. It could get rid of all your multithreading problems.
However, it is slower than a HashSet (usually--HashSets and SkipLists/Trees hard to compare). If you are doing a lot of reads for every write, what you've got will be faster. More importantly, if you update more than one entry at a time, your reads could see inconsistent results. If you expect that whenever there is an entry A there is an entry B, and vice versa, the skip list could give you one without the other.
With your current solution, to the readers, the contents of the map are always internally consistent. A read can be sure there's an A for every B. It can be sure that the size() method gives the precise number of elements that will be returned by the iterator. Two iterations will return the same elements in the same order.
In other words, allUpdatesGoThroughHere and ConcurrentSkipListSet are two good solutions to two different problems.
Can you use the Collections.synchronizedSet method? From HashSet Javadoc http://docs.oracle.com/javase/6/docs/api/java/util/HashSet.html
Set s = Collections.synchronizedSet(new HashSet(...));
Replace the synchronized by making global volatile and you'll be alright as far as the copy-on-write goes.
Although the assignment is atomic, in other threads it is not ordered with the writes to the object referenced. There needs to be a happens-before relationship which you get with a volatile or synchronising both reads and writes.
The problem of multiple updates happening at once is separate - use a single thread or whatever you want to do there.
If you used a synchronized for both reads and writes then it'd be correct but the performance may not be great with reads needing to hand-off. A ReadWriteLock may be appropriate, but you'd still have writes blocking reads.
Another approach to the publication issue is to use final field semantics to create an object that is (in theory) safe to be published unsafely.
Of course, there are also concurrent collections available.

How to Ensure Memory Visibility in Java when passing data across threads

I have a producer consumer like pattern where some threads are creating data and periodically passing putting chunks of that data to be consumed by some other threads.
Keeping the Java Memory Model in mind, how do i ensure that the data passed to the consumer thread has full 'visibility'?
I know there are data structures in java.util.concurrent like ConcurrentLinkedQueue that are built specifically for this, but I want to do this as low level as possible without utilizing those and have full transparency on what is going on under the covers to ensure the memory visibility part.
If you want "low level" then look into volatile and synchronized.
To transfer data, you need a field somewhere available to all threads. In your case it really needs to be some sort of collection to handle multiple entries. If you made the field final, referencing, say, a ConcurrentLinkedQueue, you'd pretty much be done. The field could be made public and everyone could see it, or you could make it available with a getter.
If you use an unsynchronized queue, you have more work to do, because you have to manually synchronize all access to it, which means you have to track down all usages; not easy when there's a getter method. Not only do you need to protect the queue from simultaneous access, you must make sure interdependent calls end up in the same synchronized block. For instance:
if (!queue.isEmpty()) obj = queue.remove();
If the whole thing is not synchronized, queue is perfectly capable of telling you it is not empty, then throwing a NoSuchElementException when you try to get the next element. (ConcurrentLinkedQueue's interface is specifically designed to let you do operations like this with one method call. Take a good look at it even if you don't want to use it.)
The simple solution is to wrap the queue in another object whose methods are carefully chosen and all synchronized. The wrapped class, even if it's LinkedList or ArrayList, will now act (if you do it right) like CLQ, and it can be freely released to the rest of the program.
So you would have what is really a global field with an immutable (final) reference to a wrapper class, which contains a LinkedList (for example) and has synchronized methods that use the LinkedList to store and access data. The wrapper class, like CLQ, would be thread-safe.
Some variants on this might be desirable. It might make sense to combine the wrapper with some other high-level class in your program. It might also make sense to create and make available instances of nested classes: perhaps one that only adds to the queue and one that only removes from it. (You couldn't do this with CLQ.)
A final note: having synchronized everything, the next step is to figure out how to unsynchronize (to keep threads from waiting too much) without breaking thread safety. Work really hard on this, and you'll end up rewriting ConcurrentLinkedQueue.

Explanation for different behavior in Vector.set() and ArrayList.set()

Project background aside, I've implemented a table of custom JComboBoxes. Each row of ComboBoxes is exclusive: while each ComboBox has its own model (to allow different selections), each choice can only be selected once per row. This is done by adding a tag to the front of an item when selected and removing it again when deselected. If a user tries to select a tagged item, nothing happens.
However, this only works when using a Vector as the backing for the list of options. I can get the Vector of strings, use either set() or setElementAt(), and boom presto it works.
With an ArrayList instead of a Vector, however, this doesn't work at all. I was under the impression that ArrayLists functioned similarly in that I can retrieve an anonymous ArrayList, change its contents, and all other objects relying on the contents of that ArrayList will update accordingly, just like the Vector implementation does.
I was hoping someone could tell me why this is different, as both Vector and ArrayList implement List and supposedly should have similar behavior.
EDIT:
Thanks for the prompt responses! All answers refer to synchronization disparities between ArrayList and Vector. However, my project does not explicitly create new threads. Is it possible that this is a synchronization issue between my data and the Swing thread? I'm not good enough with threads to know...
2nd EDIT:
Thanks again everybody! The synchronization between data and Swing answers my question readily enough, though I'd still be interested in more details if there's more to it.
I suspect the difference is due to Vector being thread-safe and ArrayList not. This affects the visibility of changes to its elements to different threads. When you change an element in a Vector, the change becomes visible to other threads instantly. (This is because its methods are synchronized using locks, which create a memory barrier, effectively synchronizing the current state of the thread's memory - including the latest changes in it - with that of other threads.) However, with ArrayList such synchronization does not automatically happen, thus the changes made by one thread may become visible to other threads only later (and in arbitrary order), or not at all.
Since Swing is inherently multithreadedd, you need to ensure that data changes are visible between different (worker, UI) threads.
Vector is synchronized. It uses the synchronized keyword to ensure that all threads that access it see a consistent result. ArrayList is not synchronized. When one thread sets an element of an ArrayList there is no guarantee that another thread will see the update.
Access to Vector elements are synchronized, whereas its not for an ArrayList. If you have different threads accessing and modifying the lists, you will see different behavior between the two.
I don't have time to test this code, and your code sample is still really light (a nice fully functional sample would be more helpful - I don't want to write a full app to test this) but I'm willing to bet that if you wrapped your call to 'setSelectDeselect' (as shown in your pastebin) like this then ArrayList would work as well as Vector:
Runnable selectRunnable = new Runnable()
{
public void run()
{
setSelectDeselect(cat, itemName, selected);
}
};
SwingUtilities.invokeLater(selectRunnable);
You're updating your ArrayList in the middle of event processing. The above code will defer the update until after the event is complete. I suspect there's something else at play here that would be apparent from reviewing the rest of your code.

Why does unsynchronization make ArrayList faster and less secure?

I read the following statement:
ArrayLists are unsynchronized and therefore faster than Vector, but less secure in a multithreaded environment.
I would like to know why unsynchronization can improve the speed, and why it will be less secure?
I will try to address both of your questions:
Improve speed
If the ArrayList were synchronized and multiple threads were trying to read data out of the list at the same time, the threads would have to wait to get an exclusive lock on the list. By leaving the list unsynchronized, the threads don't have to wait and the program will run faster.
Unsafe
If multiple threads are reading and writing to a list at the same time, the threads can have unstable view of the list, and this can cause instability in multi-threaded programs.
The whole point of synchronization is that it means only one thread has access to an object at any given time. Take a box of chocolates as an example. If the box is synchronized (Vector), and you get there first, no one else can take any and you get your pick. If the box is NOT synchronized (ArrayList), anyone walking by can snag a chocolate - It will disappear faster, but you may not get the ones you want.
ArrayLists are unsynchronized and
therefore faster than Vector, but less
secure in a multithreaded environment.
I would like to know why
unsynchronization can improve the
speed,and why it will be less secure?
When multiple threads are reading/writing to a shared memory location, the program might compute incorrect results due to lack of mutual exclusion and proper visibility. Hence lack of synchronization is considered "unsafe". This blog post by Jeremy Manson might provide a good introduction to the topic.
When the JVM executes a synchronized method, it makes sure that the current thread has an exclusive lock on the object on which the method is invoked. Similarly when the method finishes execution, the JVM releases the lock held by the executing thread. Synchronized methods provide mutual exclusion and visibility guarantees - and is important for "safety" (i.e. guaranteeing correctness) of the executing code. But, if only one thread is ever accessing the methods of the object, there is no safety issues to worry about. Although the JVM performance has improved over the years, uncontended synchronization (i.e. locking/unlocking of objects accessed by only one thread) still takes non-zero amount of time. For unsynchronized methods, the JVM does not pay this extra penalty - hence they are faster than their synchronized counterparts.
Vectors force their choice on you. All methods are synchronized and it is difficult to use them incorrectly. But when Vectors are used in a single-threaded context, you pay the price for the extra synchronization unnecessarily. ArrayLists leave the choice to you. When used in the multi-threaded context, it is up to you (the programmer) to correctly synchronizing the code; but when used in a single-threaded context you are guaranteed not to pay any extra synchronization overhead.
Also, when an collection is populated initially, and read subsequently ArrayLists perform better even in a multi-threaded context. For example, consider this method:
public synchronized List<String> getList() {
List<String> list = new Vector<String>();
list.add("Foo");
list.add("Bar");
return Collections.unmodifiableList(list);
}
A list is created, populated, and an immutable view of it is safely published. Looking at the code above it is clear that all subsequent uses of this list are reads and won't need any synchronization even when used by multiple threads - the object is effectively immutable. Using a Vector here incurs the synchronization overhead even for reads where it is not needed; using an ArrayList instead would perform better.
Data structures that synchronize use locks (or other synchronization constructs) to ensure that their data is always in a consistent state. Oftentimes, this requires that one or more threads wait on another thread to finish updating the structure's state, which will then reduce performance, since a wait has been introduced where before there was none.
2 threads can modify the list at the same time and add a new item or delete/modify the same item in the list at the same time because no synchronization (or lock mechanism if you prefer) exists. So imagine you delete one item of the list while somebody else is trying to work with it or you modify an item while someone uses it, it's not very secure.
http://download.oracle.com/javase/1.4.2/docs/api/java/util/ArrayList.html
Read the "Note that this implementation is not synchronized." paragraph, it explains a bit better.
And I forgot, considering speed, it seems quite trivial to imagine that when you try to control the access to a data, you add some mechanisms that prevent other people from accessing your data. Thus, you add some more computations so it is slower...
Non-blocking data structures will be faster than ones that bock, because of that fact. With blocking data structures, if a resources is acquired by some entity it will take time for another entity to acquire that same resource, once it becomes available.
However, this can be less secure in some instances depending on the situation. The main points of contention are during writes. If it can be guaranteed that the data contained in a data structure will not change it has been added and will only be accessed to read the value than there will not be a problem. The issues arise when there is a conflict between a write and a read, or a write and a write.

Categories

Resources