Returning Java Arrays vs Collections - java

I'm trying to think through some design in regards to memory allocation and multithreading in a java app and this is what I'm wondering:
I have a class that has a synchronized Collection say a list that is getting updated several times a a second but all updates are happening within the class and its own thread not from other threads. However I have many other threads that call the getCollection() method and do a foreach to iterate its contents in a read only fashion. This is what I don't know:
If another thread is iterating the synchronized colletion will the single thread that performs the updates have to wait until a point in time when no other threads are iterating?
My second question is it seems to make sense to return an array copy of the collection not the collection itself by doing .toArray but from thinking about it from a memory point of view won't that have to allocate a new array that is the size of the collection contents everytime and if getting called hundreds of times a second on a collection that has several thousand objects in it is something I don't know makes sense or not.
Also if I never return the collection itself than making the list synchronized is no longer necessary?
Would appreciate any input. Thanks! - Duncan

if another thread is iterating the
synchronized colletion will the single
thread that performs the updates have
to wait until a point in time when no
other threads are iterating?
If you're talking about synchronized (not concurrent) collections then yes.
As for the second question, it
looks like a real use case for java.util.concurrent.CopyOnWriteArrayList.

I would suggest you use the CopyOnWriteArrayList. This is thread safe and can be read accessed efficient by any number of threads. Provided you have a small number of updates this should be fine.
However, to answer your questions. If you iterator over a synchronized collection while it is being modifed, you will get a ConcurrentModificationException (COWAL doesn't get this) Your update will not be blocked by this, only your readers will have a problem.
Instead of creating a copy each time getCollection is called, youc an create a copy each time the collection is modifed (far less often) This is what COWAL does for you.
If you return a copy on demand, you will still need to synchronize the collection.

Probably the easiest way to deal with this is to keep two collections: one that is updated by the class itself, and a read-only copy in a volatile field that is returned when getCollection() is called.
The latter needs to be recreated by the process that updates the main collection when appropiate. This allows you to atomically update your collection: change several elements in one go, while hiding the intermediate states.
If your updates are infrequent and every update leaves the collection in a consistent state, then use the CopyOnWriteArrayList already suggested.

It seems that the collections is being updated frequently and #getCollection() is being called frequently. You could use CopyOnWriteArrayList but you'll creating a copy every time you modify the array. So you'll need to see how this effects performance.
Another option is to task the thread within the class to make a copy everytime #getCollection is called. This will involve #getCollection waiting for the internal class thread to complete.
If you just want #getCollection to return a recent copy and not the most up to date copy then you can have the internal thread periodically create a copy of the collection that gets returned in #getCollection. The copy will need to be volatile or be an AtomicReference.

Related

Java when to use a multi threaded approach? [duplicate]

My main thread has a private LinkedList which contains task objects for the players in my game. I then have a separate thread that runs every hour that accesses and clears that LinkedList and runs my algorithm which randomly adds new uncompleted tasks to every players LinkedList. Right now I made a getter method that is synchronized so that I dont run into any concurrency issues. This works fine but the synchronized keyword has a lot of overhead especially since its accessed a ton from the main thread while only accessed hourly from my second thread.
I am wondering if there is a way to prioritize the main thread? For example on that 2nd thread I could loop through the players then make a new LinkedList then run my algorithm and add all the tasks to that LinkedList then quickly assign the old LinkedList equal to the new one. This would slightly increase memory usage on the stack while improving main thread speed.
Basically I am trying to avoid making my main thread use synchronization when it will only be used once an hour at most and I am willing to greatly degrade the performance of the 2nd thread to keep the main threads speed. Is there a way I can use the 2nd thread to notify the 1st that it will be locking a method instead of having the 1st thread physically have to go through all of the synchronization over head steps? I feel like this would be possible since if that 2nd thread shares a cache with the main thread and it could change a boolean denoting that the main thread has to wait till that variable is changed back. The main thread would have to check that boolean every time it tries run that method and if the 2nd thread is telling it to wait the main thread will then freeze till the boolean is changed.
Of course the 2nd thread would have to specify which object and method has the lock along with a binary 0 or 1 denoting if its locked or not. Then the main thread would just need to check its shared cache for the object and the binary boolean value once it reaches that method which seems way faster than normal synchronization. Anyways this would then result in them main thread running at normal speed while the 2nd thread handles a bunch of work behind the scenes without degrading main thread performance. Does this exist if so how can I do it and if it does not exist how hard would it actually be to implement?
Premature optimization
It sounds like you are overly worried about the cost of synchronization. Doing a dozen, or a hundred, or even a thousand synchronizations once an hour is not going to impact the performance of your app by any significant amount.
If your concern has not yet been validated by careful study with a profiling tool, you’ve fallen into the common trap of premature optimization.
AtomicReference
Nevertheless, I can suggest an alternative approach.
You want to replace a list once an hour. If you do not mind letting any threads continue using the current list already accessed while you swap out for a new list, then use AtomicReference. An object of this class holds the reference to another object of a specified type.
I generally like the Atomic… classes for thread-safety work because they scream out to the reader that a concurrency problem is at hand.
AtomicReference < List < Task > > listRef = new AtomicReference<>( originalList ) ;
A different thread is able to replace that reference to the old list with a reference to the new list.
listRef.set( newList ) ;
Access by the other thread:
List< Task > list = listRef.get() ;
Note that this approach does not make thread-safe the payload, the list itself. But you claim that only a single thread will ever be manipulating the content of the list. You claim a different thread will only replace the entire list. So this AtomicReference serves the purpose of replacing the list in a thread-safe manner while making the issue of concurrency quite obvious.
volatile
Using AtomicReference accomplishes the same goal as volatile. I’m wary of volatile because (a) its use may go unnoticed by the reader, and (b) I suspect many Java programmers do not understand volatile, especially since its meaning was redefined.
For more info about why plain reference assignment is not thread-safe, see this Question.

Should I access (not change) an object from multiple threads?

My situation is that I have two threads. The 1st thread produces a number of objects which the 2nd thread does not have access to until all of them are created. After that the 2nd thread reads fields in those objects but does so concurrently with the 1st. At this point no thread is changing the values of the fields of the objects.
The objects are not synchronized. Should I synchronize them or not?
What I would recommend is to use an AtomicReference<Collection<SomeObject>>. The first thread would produce the collection of objects and do a reference.put(collection). The 2nd thread would see the objects (reference.get()) after they have been set on the AtomicReference only. Here are the javadocs for AtomicReference. You could also set your objects as an array or any type of collection such as List.
If is important to realize that after your set the collection (or array) on the AtomicReference you cannot make any changes to the collection. You can't add additional items, clear it, etc.. If you want true concurrent access to a collection of objects then you should look into ConcurrentHashMap and friends.
Should I synchronize them or not?
If the objects are not going to be mutated at all after they are put in your collection then you do not need to make them synchronized.
There's nothing wrong with reading data from multiple threads at the same time. Issues arise when you attempt to modify that data. So long as the objects are fully initialized and the values are such that the second thread receives the actual value (no issues with caching etc), there no problem with reading data from multiple threads concurrently.

Can Java LinkedList be read in multiple-threads safely?

I would like multiple threads to iterate through the elements in a LinkedList. I do not need to write into the LinkedList. Is it safe to do so? Or do I need a synchronized list to make it work?
Thank you!
They can do this safely, PROVIDED THAT:
they synchronize with (all of) the threads that have written the list BEFORE they start the iterations, and
no threads modify the list during the iterations.
The first point is necessary, because unless there is proper synchronization before you start, there is a possibility that one of the "writing" threads has unflushed changes for the list data structures in local cache memory or registers, or one of the reading threads has stale list state in its cache or registers.
(This is one of those cases where a solid understanding of the Java memory model is needed to know whether the scenario is truly thread-safe.)
Or do I need a synchronized list to make it work
You don't necessarily need to go that far. All you need to do is to ensure that there is a "happens-before" relationship at the appropriate point, and there are a variety of ways to achieve that. For instance, if the list is created and written by the writer thread, and the writer then passes the list to the reader thread objects before calling start() on them.
From the Java documentation:
Note that this implementation is not synchronized. If multiple threads access a linked list concurrently, and at least one of the threads modifies the list structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more elements; merely setting the value of an element is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the list. If no such object exists, the list should be "wrapped" using the Collections.
In other words, if you are truly just iterating through then you're alright, just be careful.

Synchronization, When to or not to use?

I have started learning concurrency and threads in Java. I know the basics of synchronized (i.e. what it does). Conceptually I understand that it provides mutually exclusive access to a shared resource with multiple threads in Java. But when faced with an example like the one below I am confused about whether it is a good idea to have it synchronized. I know that critical sections of the code should be synchronized and this keyword should not be overused or it effects the performance.
public static synchronized List<AClass> sortA(AClass[] aArray)
{
List<AClass> aObj = getList(aArray);
Collections.sort(aObj, new AComparator());
return aObj;
}
public static synchronized List<AClass> getList(AClass[] anArray)
{
//It converts an array to a list and returns
}
Assuming each thread passes a different array then no synchronization is needed, because the rest of the variables are local.
If instead you fire off a few threads all calling sortA and passing a reference to the same array, you'd be in trouble without synchronized, because they would interfere with eachother.
Beware, that it would seem from the example that the getList method returns a new List from an array, such that even if the threads pass the same array, you get different List objects. This is misleading. For example, using Arrays.asList creates a List backed by the given array, but the javadoc clearly states that Changes to the returned list "write through" to the array. so be careful about this.
Synchronization is usually needed when you are sharing data between multiple invocations and there is a possibility that the data would be modified resulting in inconsistency. If the data is read-only then you dont need to synchronize.
In the code snippet above, there is no data that is being shared. The methods work on the input provided and return the output. If multiple threads invoke one of your method, each invocation will have its own input and output. Hence, there is no chance of in-consistency anywhere. So, your methods in the above snippet need not be synchornized.
Synchronisation, if unnecessarily used, would sure degrade the performance due to the overheads involved and hence should be cautiously used only when required.
Your static methods don't depend on any shared state, so need not be synchronized.
There is no rule defined like when to use synchronized and when not, when you are sure that your code will not be accessed by concurrent threads then you can avoid using synchronised.
Synchronization as you have correctly figured has an impact on the throughput of your application, and can also lead to starving thread.
All get basically should be non blocking as Collections under concurrency package have implemented.
As in your example all calling thread will pass there own copy of array, getList doesn't need to be synchronized so is sortA method as all other variables are local.
Local variables live on stack and every thread has its own stack so other threads cannot interfere with it.
You need synchronization when you change the state of the Object that other threads should see in an consistent state, if your calls don't change the state of the object you don't need synchronization.
I wouldn't use synchronized on single threaded code. i.e. where there is no chance an object will be accessed by multiple threads.
This may appear obvious but ~99% of StringBuffer used in the JDK can only be used by one thread can be replaced with a StringBuilder (which is not synchronized)

Multiple threads iterating over the same map

I was recently writing a concurrent program in Java and came across the dollowing dilemma: suppose you have a global data structure, which is partof regular non-synchronized, non-concurrent lib such as HashMap. Is it OK to allow multiple threads to iterate through the collection (just reading, no modifications) perhaps at different, interleaved periods i.e. thread1 might be half way thrpugh iterating when thread2 gets his iterator on the same map?
It is OK. Ability to do this is the reason to create such interface as iterator. Every thread iterating over collection has its own instance of iterator that holds its state (e.g. where you are now in your iterating process).
This allows several threads to iterate over the same collection simultaneously.
It should be fine, as long as there are no writers.
This issue is similar to the readers-writer lock, where multiple readers are allowed to read from the data, but not during the time a writer "has" the lock for it. There is no concurrency issue for multiples read at the same time. [data race can occure only when you have at least one write].
Problems only arise when you attempt concurrent modifications on a data structure.
For instance, if one thread is iterating over the content of a Map, and another thread deletes elements from that collection, you'll be heading for serious trouble.
If you do need some threads to modify that collection safely, Java provides for mechanisms to do so, namely, ConcurrentHashMap.
ConcurrentHashMap in Java?
There is also Hashtable, which has the same interface as HashMap, but is synchronized, although it's use is not advised currently (deprecated), since it's performance suffers when the number of elements becomes larger (compared to ConcurrentHashMap which doesn't need to lock the entire Collection).
If you happen to have a Collection that is not synchronized and you need to have several threads reading and writing on top of it, you can use Collections.synchronizedMap(Map) to get a synchronized version of it.
The above answers are certainly good advice. In general, when writing Java with concurrent threads, so long as you do not modify a data structure, you need not worry about multiple threads concurrently reading that structure.
Should you have a similar problem in the future, except that the global data structure could be concurrently modified, I would suggest writing a Java class that all threads use to access and modify the structure. This class could impleement its own concurrency methodology, using either synchronized methods or locks. The Java tutorial has a very good explanation of Java's concurrency mechanisms. I have personally done this and it is fairly straight forward.

Categories

Resources