Synchronization in Java - Vector vs ArrayList - java

I am attempting to understand the difference between the Vector and ArrayList classes in terms of thread-safety. Vector is supposedly internally synchronized. Is it synchronized by each element, or as a whole? (I could imagine the case where multiple threads could access the vector at the same time, but multiple threads could not access the same element at the same time). If you look at the code below, getAo() is not equivalent to getV() because the synchronized keyword when used in a method signature synchronizes on the containing class object (an instance of VectorVsArrayList) to my knowledge. HOWEVER, is getAoSync() equivalent to getV()? By equivalent, I mean does the ao instance variable start behaving like a Vector object in terms of synchronization as long as all access to it goes through the getter method?
public class VectorVsArrayList {
private ArrayList<?> ao = null;
private Vector<?> v = null;
public ArrayList<?> getAoSync(){
synchronized(ao){
return ao;
}
}
public synchronized ArrayList<?> getAo() {
return ao;
}
public Vector<?> getV() {
return v;
}
}

They aren't equivalent. What you're looking for is Collections.synchronizedList which can "wrap around" any list, including ArrayList.

Short answer: No, it's not equivalent.
When you use synchronized around that return ao;, the ArrayList is only synchronized during the return instruction. This means that 2 threads cannot get the object at the exact same time, but once they have got it, they can modify it at the same time.
If 2 threads execute this code, the add() is not thread safe:
ArrayList<?> list = getAo(); // cannot be executed concurrently
list.add(something); // CAN be executed concurrently
Side note: don't use Vectors, take a look at this post to know why.

to do the equivalent of Vector you should protect any access to any element in the collection, the method getAo simply sychronize the access to the array list.
If two threads call getAo and after each thread call "add" method over this arraylist then you could have a multi thread problem (because "add" is not synch").
I recommend you to check the atomic classes like CopyOnWriteArrayList:
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/CopyOnWriteArrayList.html

Related

Correct working with Collections.synchronizedList

I am not sure how to properly use the Collections.synchronizedList() implementation.
I have these two:
public synchronized static List<CurrencyBox> getOrderList() {
return Collections.synchronizedList(orderList);
}
and
public static List<CurrencyBox> getOrderList() {
return Collections.synchronizedList(orderList);
}
So as far as I understood, synchronizedList really returns the orderList and not a copy, correct?
So If I want to gurantee atomic operations, like add and remove, which of the implementation above is correct?
And does something maybe changes with Java9? Or is it still the way to go or have you any other suggestion?
Thank you
Without context it's a bit hard to tell, from the snippets provided neither give you guaranteed atomic operations.
The documentation states:
Returns a synchronized (thread-safe) list backed by the specified
list. In order to guarantee serial access, it is critical that all
access to the backing list is accomplished through the returned list.
So even if you synchronize the method the best you'll get is a guarantee that no two objects are creating the synchronized list at the same time.
You need to wrap the original orderList with Collections.synchronizedList to begin with and return the stored result of that each time.
private static List<CurrencyBox> orderList = Collections.synchronizedList(new ArrayList<CurrencyBox>());
public static List<CurrencyBox> getOrderList() {
return orderList
}
A synchronized list only synchronized methods of this list.
It means a thread won't be able to modify the list while another thread is currently running a method from this list. The object is locked while processing method.
As an example, Let's say two threads run addAllon your list, with 2 different lists (A=A1,A2,A3, B=B1,B2,B3) as parameter.
As the method is synchronized, you can be sure those lists won't be merged randomly like A1,B1,A2,A3,B2,B3
You don't decide when a thread handover the process to the other thread so you can either get A1,A2,A3,B1,B2,B3 or B1,B2,B3,A1,A2,A3.
Credit : jhamon

java Volatile/synchronization on arraylist

My program looks like this:
public class Main {
private static ArrayList<T> list;
public static void main(String[] args) {
new DataListener().start();
new DataUpdater().start();
}
static class DataListener extends Thread {
#Override
public void run() {
while(true){
//Reading the ArrayList and displaying the updated data
Thread.sleep(5000);
}
}
}
static class DataUpdater extends Thread{
#Override
public void run() {
//Continuously receive data and update ArrayList;
}
}
}
In order to use this ArrayList in both threads, I know two options:
To make the ArrayList volatile. However I read in this article that making variables volatile is only allowed if it "Writes to the variable do not depend on its current value." which I think in this case it does (because for example when you do an add operation on an ArrayList, the contents of the ArrayList after this operation depend on the current contents of the ArrayList, or doesn't it?). Also the DataUpdater has to remove some elements from the list every now and then, and I also read that editing a volatile variable from different threads is not possible.
To make this ArrayList a synchronized variable. However, my DataUpdater will continuously update the ArrayList, so won't this block the DataListener from reading the ArrayList?
Did I misunderstand any concepts here or is there another option to make this possible?
Volatile won't help you at all. The meaning of volatile is that changes made by thread A to a shared variable are visible to thread B immediately. Usually such changes may be in some cache visible only to the thread that made them, and volatile just tells the JVM not to do any caching or optimization that will result in the value update being delayed.
So it is not a means of synchronization. It's just a means of ensuring visibility of change. Moreover, it's change to the variable, not to the object referenced by that variable. That is, if you mark list as volatile, it will only make any difference if you assign a new list to list, not if you change the content of the list!
Your other suggestion was to make the ArrayList a synchronized variable. There is a misconception here. Variables can't be synchronized. The only thing that can be synchronized is code - either an entire method or a specific block inside it. You use an object as the synchronization monitor.
The monitor is the object itself (actually, it's a logical part of the object that is the monitor), not the variable. If you assign a different object to the same variable after synchronizing on the old value, then you won't have your old monitor available.
But in any case, it's not the object that's synchronized, it's code that you decided to synchronize using that object.
You can therefore use the list as the monitor for synchronizing the operations on it. But you can not have list synchronized.
Suppose you want to synchronize your operations using the list as a monitor, you should design it so that the writer thread doesn't hold the lock all the time. That is, it just grabs it for a single read-update, insert, etc., and then releases it. Grabs it again for the next operation, then releases it. If you synchronize the whole method or the whole update loop, the other thread will never be able to read it.
In the reading thread, you should probably do something like:
List<T> listCopy;
synchronized (list) {
listCopy = new ArrayList(list);
}
// Use listCopy for displaying the value rather than list
This is because displaying is potentially slow - it may involve I/O, updating GUI etc. So to minimize the lock time, you just copy the values from the list, and then release the monitor so that the updating thread can do its work.
Other than that, there are many types of objects in the java.util.concurrent package etc. that are designed to help in situations like this, where one side is writing and the other is reading. Check the documentation - perhaps a ConcurrentLinkedDeque will work for you.
Indeed, none of the two solutions is sufficient. You actually need to synchronize the complete iteration on the arraylist, and every write access to the arraylist:
synchronized(list) {
for (T t : list) {
...
}
}
and
synchronized(list) {
// read/add/modify the list
}
make the ArrayList volatile.
You can't make an ArrayList volatile. You can't make any object volatile. The only things in Java that can be volatile are fields.
In your example, list is not an ArrayList.
private static ArrayList<T> list;
list is a static field of the Main class.
The volatile keyword only matters when one thread updates the field, and another thread subsequently accesses the field.
This line updates the list, but does not update the volatile field:
list.add(e);
After executing that line, the list has changed, but the field still refers to the same list object.

Why is this code not thread-safe, even when using a synchronized method?

Why is this code not thread-safe even though we are using synchronized method and hence obtaining a lock on Helper object?
class ListHelper <E> {
public List<E> list = Collections.synchronizedList(new ArrayList<E>());
public synchronized boolean putIfAbsent(E x) {
boolean absent = !list.contains(x);
if (absent)
list.add(x);
return absent;
}
}
Because the list is unlocked when contains returns, and then locked again when add is called. Something else could add the same element between the two.
If you mean to only use the list from within the helper object, it should be declared private; if you do this, the code will be thread safe, as long as all manipulations of the list go through methods that are synchronized in the helper object. It's also worth noting that as long as this is the case, you don't need to be using a Collections.synchronizedList as you're providing all necessary synchronization in your own code.
Alternatively, if you want to allow the list to be public, you need to synchronize your access on the list, rather than on your helper object. The following would be thread safe:
class ListHelper <E> {
public List<E> list = Collections.synchronizedList(new ArrayList<E>());
public boolean putIfAbsent(E x) {
synchronized (list) {
boolean absent = !list.contains(x);
if (absent)
list.add(x);
return absent;
}
}
}
The difference is that it is using the same lock as the other methods of the list, rather than a different one.
This code is not thread safe only because list is public.
If the list instance is private, and referenced nowhere else, this code is threadsafe. Else it is not threadsafe as multiple threads could be manipulating the list simultaneously.
If the list is not referenced elsewhere, you need not declare it as a synchronized list through the collections class, as long as all list manipulation occurs through synchronized methods and a reference to that list is never returned to anything.
When you mark a method synchronized, all threads calling that method are synchronized with the object instance said method is defined in. This is why if ListHelper internal list instance is not referenced elsewhere, and all methods are synchronized, your code would be threadsafe.
A major component of thread safety concerns more than only mutual exclusion. It is quite possible to complete an atomic update of an object's state, i.e. to effect a state transition that leaves an object in a valid state with its invariants intact, but to still leave the object vulnerable if its references are still published to untrustworthy or incompletely debugged clients.
In the example you post:
public synchronized boolean putIfAbsent(E x) {
boolean absent = !list.contains(x);
if (absent)
list.add(x);
return absent;
}
The code is thread safe, as W.M. pointed out. But we have no assurances about x itself and where it may have references still held by other code. If such references did exist, another thread can modify corresponding elements in your list, defeating your efforts to guard the invariants of objects in the list.
If you are accepting elements to this list from client code that you don't trust or don't know about, a good practice would be to make a defense copy of x and then add that to your list. Similarly, if you will be returning an object from your list to other client code, making a defensive copy and returning that will help assure that your list remains thread safe.
Moreover, the list should be fully encapsulated in the class. By having it be public, client code anywhere can freely access the elements and make it impossible for you to protect the state of objects in the list.

achieving synchronized addAll to a list in java

Updated the question.. please check secodn part of question
I need to build up a master list of book ids. I have multiple threaded tasks which brings up a subset of book ids. As soon as each task execution is completed, I need to add them to the super list of book ids. Hence I am planning to pass below aggregator class instance to all of my execution tasks and have them call the updateBookIds() method. To ensure it's thread safe, I have kept the addAll code in synchronized block.
Can any one suggest is this same as Synchronized list? Can I just say Collections.newSynchronizedList and call addAll to that list from all thread tasks? Please clarify.
public class SynchronizedBookIdsAggregator {
private List<String> bookIds;
public SynchronizedBookIdsAggregator(){
bookIds = new ArrayList<String>();
}
public void updateBookIds(List<String> ids){
synchronized (this) {
bookIds.addAll(ids);
}
}
public List<String> getBookIds() {
return bookIds;
}
public void setBookIds(List<String> bookIds) {
this.bookIds = bookIds;
}
}
Thanks,
Harish
Second Approach
So after below discussions, I am currently planning to go with below approach. Please let me know if I am doing anything wrong here:-
public class BooksManager{
private static Logger logger = LoggerFactory.getLogger();
private List<String> fetchMasterListOfBookIds(){
List<String> masterBookIds = Collections.synchronizedList(new ArrayList<String>());
List<String> libraryCodes = getAllLibraries();
ExecutorService libraryBookIdsExecutor = Executors.newFixedThreadPool(BookManagerConstants.LIBRARY_BOOK_IDS_EXECUTOR_POOL_SIZE);
for(String libraryCode : libraryCodes){
LibraryBookIdsCollectionTask libraryTask = new LibraryBookIdsCollectionTask(libraryCode, masterBookIds);
libraryBookIdsExecutor.execute(libraryTask);
}
libraryBookIdsExecutor.shutdown();
//Now the fetching of master list is complete.
//So I will just continue my processing of the master list
}
}
public class LibraryBookIdsCollectionTask implements Runnable {
private String libraryCode;
private List<String> masterBookIds;
public LibraryBookIdsCollectionTask(String libraryCode,List<String> masterBookIds){
this.libraryCode = libraryCode;
this.masterBookIds = masterBookIds;
}
public void run(){
List<String> bookids = new ArrayList<String>();//TODO get this list from iconnect call
synchronized (masterBookIds) {
masterBookIds.addAll(bookids);
}
}
}
Thanks,
Harish
Can I just say Collections.newSynchronizedList and call addAll to that list from all thread tasks?
If you're referring to Collections.synchronizedList, then yes, that would work fine. That will give you a object that implements the List interface where all of the methods from that interface are synchronized, including addAll.
Consider sticking with what you have, though, since it's arguably a cleaner design. If you pass the raw List to your tasks, then they get access to all of the methods on that interface, whereas all they really need to know is that there's an addAll method. Using your SynchronizedBookIdsAggregator keeps your tasks decoupled from design dependence on the List interface, and removes the temptation for them to call something other than addAll.
In cases like this, I tend to look for a Sink interface of some sort, but there never seems to be one around when I need it...
The code you have implemented does not create a synchronization point for someone who accesses the list via getBookIds(), which means they could see inconsistent data. Furthermore, someone who has retrieved the list via getBookIds() must perform external synchronization before accessing the list. Your question also doesn't show how you are actually using the SynchronizedBookIdsAggregator class, which leaves us with not enough information to fully answer your question.
Below would be a safer version of the class:
public class SynchronizedBookIdsAggregator {
private List<String> bookIds;
public SynchronizedBookIdsAggregator() {
bookIds = new ArrayList<String>();
}
public void updateBookIds(List<String> ids){
synchronized (this) {
bookIds.addAll(ids);
}
}
public List<String> getBookIds() {
// synchronized here for memory visibility of the bookIds field
synchronized(this) {
return bookIds;
}
}
public void setBookIds(List<String> bookIds) {
// synchronized here for memory visibility of the bookIds field
synchronized(this) {
this.bookIds = bookIds;
}
}
}
As alluded to earlier, the above code still has a potential problem with some thread accessing the ArrayList after it has been retrieved by getBookIds(). Since the ArrayList itself is not synchronized, accessing it after retrieving it should be synchronized on the chosen guard object:
public class SomeOtherClass {
public void run() {
SynchronizedBookIdsAggregator aggregator = getAggregator();
List<String> bookIds = aggregator.getBookIds();
// Access to the bookIds list must happen while synchronized on the
// chosen guard object -- in this case, aggregator
synchronized(aggregator) {
<work with the bookIds list>
}
}
}
I can imagine using Collections.newSynchronizedList as part of the design of this aggregator, but it is not a panacea. Concurrency design really requires an understanding of the underlying concerns, more than "picking the right tool / collection for the job" (although the latter is not unimportant).
Another potential option to look at is CopyOnWriteArrayList.
As skaffman alluded to, it might be better to not allow direct access to the bookIds list at all (e.g., remove the getter and setter). If you enforce that all access to the list must run through methods written in SynchronizedBookIdsAggregator, then SynchronizedBookIdsAggregator can enforce all concurrency control of the list. As my answer above indicates, allowing consumers of the aggregator to use a "getter" to get the list creates a problem for the user of that list: to write correct code they must have knowledge of the synchronization strategy / guard object, and furthermore they must also use that knowledge to actively synchronize externally and correctly.
Regarding your second approach. What you have shown looks technically correct (good!).
But, presumably you are going to read from masterBookIds at some point, too? And you don't show or describe that part of the program! So when you start thinking about when and how you are going to read masterBookIds (i.e. the return value of fetchMasterListOfBookIds()), just remember to consider concurrency concerns there too! :)
If you make sure all tasks/worker threads have finished before you start reading masterBookIds, you shouldn't have to do anything special.
But, at least in the code you have shown, you aren't ensuring that.
Note that libraryBookIdsExecutor.shutdown() returns immediately. So if you start using the masterBookIds list immediately after fetchMasterListOfBookIds() returns, you will be reading masterBookIds while your worker threads are actively writing data to it, and this entails some extra considerations.
Maybe this is what you want -- maybe you want to read the collection while it is being written to, to show realtime results or something. But then you must consider synchronizing properly on the collection if you want to iterate over it while it is being written to.
If you would just like to make sure all writes to masterBookIds by worker threads have completed before fetchMasterListOfBookIds() returns, you could use ExecutorService.awaitTermination (in combination with .shutdown(), which you are already calling).
Collections.SynchronizedList (which is the wrapper type you'd get) would synchronize almost every method on either itself or a mutex object you pass to the constructor (or Collections.synchronizedList(...) ). Thus it would basically be the same as your approach.
All the methods called using the wrapper returned by Collections.synchronizedList() will be synchronized. This means that the addAll method of normal List when called by this wrapper will be something like this :-
synchronized public static <T> boolean addAll(Collection<? super T> c, T... elements)
So, every method call for the list (using the reference returned and not the original reference) will be synchronized.
However, there is no synchronization between different method calls.
Consider following code snippet :-
List<String> l = Collections.synchronizedList(new ArrayList<String>);
l.add("Hello");
l.add("World");
While multiple threads are accessing the same code, it is quite possible that after Thread A has added "Hello", Thread B will start and again add "Hello" and "World" both to list and then Thread A resumes. So, list would have ["hello", "hello", "world", "world"] instead of ["hello", "world", hello", "world"] as was expected. This is just an example to show that list is not thread-safe between different method calls of the list. If we want the above code to have desired result, then it should be inside synchronized block with lock on list (or this).
However, with your design there is only one method call. SO IT IS SAME AS USING Collections.synchronizedList().
Moreover, as Mike Clark rightly pointed out, you should also synchronized getBookIds() and setBookIds(). And synchronizing it over List itself would be more clear since it is like locking the list before operating on it and unlocking it after operating. So that nothing in-between can use the List.

Using synchronized lists

This is my first time using the synchronized keyword, so I am still unsure of how it exactly works. I have a list that I want to be accessed by multiple threads so I do this:
players = Collections.synchronizedList(new ArrayList<Player>(maxPlayers));
Now, I want to make sure that I am not calling players.add() at the same time as players.get(), so I think i should use synchronized statements (methods A and B could be called at the same time):
public void A() {
synchronized(players) {
players.add(new Player());
}
}
public void B(String msg) {
synchronized(players) {
for(int i = 0;i<players.size();i++) {
players.get(i).out.println(msg);
}
}
}
Is this the correct procedure? If not, what should I do instead?
Provided you only access the list through the object returned by synchronizedList then access should be thread-safe, though note that you may need to used synchronized blocks for compound actions like iterating through the list or making actions and decisions based on multiple calls into the list (for example, getting a value making a decision then adding a value).
So in your example A() doesn't need the synchronized block, but B() might if you don't want the list to be changed or be read by some other thread during the iteration. (In fact, by using the counter to iterate it is needed to prevent a race condition between the loop termination condition and another thread removing an item; other ways of iterating might not have this issue though).

Categories

Resources