achieving synchronized addAll to a list in java - java

Updated the question.. please check secodn part of question
I need to build up a master list of book ids. I have multiple threaded tasks which brings up a subset of book ids. As soon as each task execution is completed, I need to add them to the super list of book ids. Hence I am planning to pass below aggregator class instance to all of my execution tasks and have them call the updateBookIds() method. To ensure it's thread safe, I have kept the addAll code in synchronized block.
Can any one suggest is this same as Synchronized list? Can I just say Collections.newSynchronizedList and call addAll to that list from all thread tasks? Please clarify.
public class SynchronizedBookIdsAggregator {
private List<String> bookIds;
public SynchronizedBookIdsAggregator(){
bookIds = new ArrayList<String>();
}
public void updateBookIds(List<String> ids){
synchronized (this) {
bookIds.addAll(ids);
}
}
public List<String> getBookIds() {
return bookIds;
}
public void setBookIds(List<String> bookIds) {
this.bookIds = bookIds;
}
}
Thanks,
Harish
Second Approach
So after below discussions, I am currently planning to go with below approach. Please let me know if I am doing anything wrong here:-
public class BooksManager{
private static Logger logger = LoggerFactory.getLogger();
private List<String> fetchMasterListOfBookIds(){
List<String> masterBookIds = Collections.synchronizedList(new ArrayList<String>());
List<String> libraryCodes = getAllLibraries();
ExecutorService libraryBookIdsExecutor = Executors.newFixedThreadPool(BookManagerConstants.LIBRARY_BOOK_IDS_EXECUTOR_POOL_SIZE);
for(String libraryCode : libraryCodes){
LibraryBookIdsCollectionTask libraryTask = new LibraryBookIdsCollectionTask(libraryCode, masterBookIds);
libraryBookIdsExecutor.execute(libraryTask);
}
libraryBookIdsExecutor.shutdown();
//Now the fetching of master list is complete.
//So I will just continue my processing of the master list
}
}
public class LibraryBookIdsCollectionTask implements Runnable {
private String libraryCode;
private List<String> masterBookIds;
public LibraryBookIdsCollectionTask(String libraryCode,List<String> masterBookIds){
this.libraryCode = libraryCode;
this.masterBookIds = masterBookIds;
}
public void run(){
List<String> bookids = new ArrayList<String>();//TODO get this list from iconnect call
synchronized (masterBookIds) {
masterBookIds.addAll(bookids);
}
}
}
Thanks,
Harish

Can I just say Collections.newSynchronizedList and call addAll to that list from all thread tasks?
If you're referring to Collections.synchronizedList, then yes, that would work fine. That will give you a object that implements the List interface where all of the methods from that interface are synchronized, including addAll.
Consider sticking with what you have, though, since it's arguably a cleaner design. If you pass the raw List to your tasks, then they get access to all of the methods on that interface, whereas all they really need to know is that there's an addAll method. Using your SynchronizedBookIdsAggregator keeps your tasks decoupled from design dependence on the List interface, and removes the temptation for them to call something other than addAll.
In cases like this, I tend to look for a Sink interface of some sort, but there never seems to be one around when I need it...

The code you have implemented does not create a synchronization point for someone who accesses the list via getBookIds(), which means they could see inconsistent data. Furthermore, someone who has retrieved the list via getBookIds() must perform external synchronization before accessing the list. Your question also doesn't show how you are actually using the SynchronizedBookIdsAggregator class, which leaves us with not enough information to fully answer your question.
Below would be a safer version of the class:
public class SynchronizedBookIdsAggregator {
private List<String> bookIds;
public SynchronizedBookIdsAggregator() {
bookIds = new ArrayList<String>();
}
public void updateBookIds(List<String> ids){
synchronized (this) {
bookIds.addAll(ids);
}
}
public List<String> getBookIds() {
// synchronized here for memory visibility of the bookIds field
synchronized(this) {
return bookIds;
}
}
public void setBookIds(List<String> bookIds) {
// synchronized here for memory visibility of the bookIds field
synchronized(this) {
this.bookIds = bookIds;
}
}
}
As alluded to earlier, the above code still has a potential problem with some thread accessing the ArrayList after it has been retrieved by getBookIds(). Since the ArrayList itself is not synchronized, accessing it after retrieving it should be synchronized on the chosen guard object:
public class SomeOtherClass {
public void run() {
SynchronizedBookIdsAggregator aggregator = getAggregator();
List<String> bookIds = aggregator.getBookIds();
// Access to the bookIds list must happen while synchronized on the
// chosen guard object -- in this case, aggregator
synchronized(aggregator) {
<work with the bookIds list>
}
}
}
I can imagine using Collections.newSynchronizedList as part of the design of this aggregator, but it is not a panacea. Concurrency design really requires an understanding of the underlying concerns, more than "picking the right tool / collection for the job" (although the latter is not unimportant).
Another potential option to look at is CopyOnWriteArrayList.
As skaffman alluded to, it might be better to not allow direct access to the bookIds list at all (e.g., remove the getter and setter). If you enforce that all access to the list must run through methods written in SynchronizedBookIdsAggregator, then SynchronizedBookIdsAggregator can enforce all concurrency control of the list. As my answer above indicates, allowing consumers of the aggregator to use a "getter" to get the list creates a problem for the user of that list: to write correct code they must have knowledge of the synchronization strategy / guard object, and furthermore they must also use that knowledge to actively synchronize externally and correctly.
Regarding your second approach. What you have shown looks technically correct (good!).
But, presumably you are going to read from masterBookIds at some point, too? And you don't show or describe that part of the program! So when you start thinking about when and how you are going to read masterBookIds (i.e. the return value of fetchMasterListOfBookIds()), just remember to consider concurrency concerns there too! :)
If you make sure all tasks/worker threads have finished before you start reading masterBookIds, you shouldn't have to do anything special.
But, at least in the code you have shown, you aren't ensuring that.
Note that libraryBookIdsExecutor.shutdown() returns immediately. So if you start using the masterBookIds list immediately after fetchMasterListOfBookIds() returns, you will be reading masterBookIds while your worker threads are actively writing data to it, and this entails some extra considerations.
Maybe this is what you want -- maybe you want to read the collection while it is being written to, to show realtime results or something. But then you must consider synchronizing properly on the collection if you want to iterate over it while it is being written to.
If you would just like to make sure all writes to masterBookIds by worker threads have completed before fetchMasterListOfBookIds() returns, you could use ExecutorService.awaitTermination (in combination with .shutdown(), which you are already calling).

Collections.SynchronizedList (which is the wrapper type you'd get) would synchronize almost every method on either itself or a mutex object you pass to the constructor (or Collections.synchronizedList(...) ). Thus it would basically be the same as your approach.

All the methods called using the wrapper returned by Collections.synchronizedList() will be synchronized. This means that the addAll method of normal List when called by this wrapper will be something like this :-
synchronized public static <T> boolean addAll(Collection<? super T> c, T... elements)
So, every method call for the list (using the reference returned and not the original reference) will be synchronized.
However, there is no synchronization between different method calls.
Consider following code snippet :-
List<String> l = Collections.synchronizedList(new ArrayList<String>);
l.add("Hello");
l.add("World");
While multiple threads are accessing the same code, it is quite possible that after Thread A has added "Hello", Thread B will start and again add "Hello" and "World" both to list and then Thread A resumes. So, list would have ["hello", "hello", "world", "world"] instead of ["hello", "world", hello", "world"] as was expected. This is just an example to show that list is not thread-safe between different method calls of the list. If we want the above code to have desired result, then it should be inside synchronized block with lock on list (or this).
However, with your design there is only one method call. SO IT IS SAME AS USING Collections.synchronizedList().
Moreover, as Mike Clark rightly pointed out, you should also synchronized getBookIds() and setBookIds(). And synchronizing it over List itself would be more clear since it is like locking the list before operating on it and unlocking it after operating. So that nothing in-between can use the List.

Related

Lock List when in use

How do I lock a data structure (such as List) when someone is iterating over it?
For example, let's say I have this class with a list in it:
class A{
private List<Integer> list = new ArrayList<>();
public MyList() {
// initialize this.list
}
public List<Integer> getList() {
return list;
}
}
And I run this code:
public static void main(String[] args) {
A a = new A();
Thread t1 = new Thread(()->{
a.getList().forEach(System.out::println);
});
Thread t2 = new Thread(()->{
a.getList().removeIf(e->e==1);
});
t1.start();
t2.start();
}
I don't have a single block of code that uses the list, so I can't use synchronized().
I was thinking of locking the getList() method after it has been called but how can I know if the caller has finished using it so I could unlock it?
And I don't want to use CopyOnWriteArrayList because of I care about my performance;
after it has been called but how can I know if the caller has finished using it so I could unlock it?
That's impossible. The iterator API fundamentally doesn't require that you explicitly 'close' them, so, this is simply not something you can make happen. You have a problem here:
Iterating over the same list from multiple threads is an issue if anybody modifies that list in between. Actually, threads are immaterial; if you modify a list then interact with an iterator created before the modification, you get ConcurrentModificationException guaranteed. Involve threads, and you merely usually get a CoModEx; you may get bizarre behaviour if you haven't set up your locking properly.
Your chosen solution is "I shall lock the list.. but how do I do that? Better ask SO". But that's not the correct solution.
You have a few options:
Use a lock
It's not specifically the iteration that you need to lock, it's "whatever interacts with this list". Make an actual lock object, and define that any interaction of any kind with this list must occur in the confines of this lock.
Thread t1 = new Thread(() -> {
a.acquireLock();
try {
a.getList().forEach(System.out::println);
} finally {
a.releaseLock();
}
});
t1.start();
Where acquireLock and releaseLock are methods you write that use a ReadWriteLock to do their thing.
Use CopyOnWriteArrayList
COWList is an implementation of java.util.List with the property that it copies the backing store anytime you change anything about it. This has the benefit that any iterator you made is guaranteed to never throw ConcurrentModificationException: When you start iterating over it, you will end up iterating each value that was there as the list was when you began the iteration. Even if your code, or any other thread, starts modifying that list halfway through. The downside is, of course, that it is making lots of copies if you make lots of modifications, so this is not a good idea if the list is large and you're modifying it a lot.
Get rid of the getList() method, move the tasks into the object itself.
I don't know what a is (the object you call .getList() on, but apparently one of the functions that whatever this is should expose is some job that you really can't do with a getList() call: It's not just that you want the contents, you want to get the contents in a stable fashion (perhaps the method should instead have a method that gives you a copy of the list), or perhaps you want to do a thing to each element inside it (e.g. instead of getting the list and calling .forEach(System.out::println) on it, instead pass System.out::println to a and let it do the work. You can then focus your locks or other solutions to avoid clashes in that code, and not in callers of a.
Make a copy yourself
This doesn't actually work, even though it seems like it: Immediately clone the list after you receive it. This doesn't work, because cloning the list is itself an operation that iterates, just like .forEach(System.out::println) does, so if another thread interacts with the list while you are making your clone, it fails. Use one of the above 3 solutions instead.

Correct working with Collections.synchronizedList

I am not sure how to properly use the Collections.synchronizedList() implementation.
I have these two:
public synchronized static List<CurrencyBox> getOrderList() {
return Collections.synchronizedList(orderList);
}
and
public static List<CurrencyBox> getOrderList() {
return Collections.synchronizedList(orderList);
}
So as far as I understood, synchronizedList really returns the orderList and not a copy, correct?
So If I want to gurantee atomic operations, like add and remove, which of the implementation above is correct?
And does something maybe changes with Java9? Or is it still the way to go or have you any other suggestion?
Thank you
Without context it's a bit hard to tell, from the snippets provided neither give you guaranteed atomic operations.
The documentation states:
Returns a synchronized (thread-safe) list backed by the specified
list. In order to guarantee serial access, it is critical that all
access to the backing list is accomplished through the returned list.
So even if you synchronize the method the best you'll get is a guarantee that no two objects are creating the synchronized list at the same time.
You need to wrap the original orderList with Collections.synchronizedList to begin with and return the stored result of that each time.
private static List<CurrencyBox> orderList = Collections.synchronizedList(new ArrayList<CurrencyBox>());
public static List<CurrencyBox> getOrderList() {
return orderList
}
A synchronized list only synchronized methods of this list.
It means a thread won't be able to modify the list while another thread is currently running a method from this list. The object is locked while processing method.
As an example, Let's say two threads run addAllon your list, with 2 different lists (A=A1,A2,A3, B=B1,B2,B3) as parameter.
As the method is synchronized, you can be sure those lists won't be merged randomly like A1,B1,A2,A3,B2,B3
You don't decide when a thread handover the process to the other thread so you can either get A1,A2,A3,B1,B2,B3 or B1,B2,B3,A1,A2,A3.
Credit : jhamon

When and how should I use additional synchronization of ConcurrentHashMap?

I need to know when I should add some synchronization block to my code when using ConcurrentHashMap. Let's say I have a method like:
private static final ConcurrentMap<String, MyObjectWrapper> myObjectsCache = new ConcurrentHashMap<>(CACHE_INITIAL_CAPACITY);
public List<MyObject> aMethod(List<String> ids, boolean b) {
List<MyObject> result = new ArrayList<>(ids.size());
for (String id : ids) {
if (id == null) {
continue;
}
MyObjectWrapper myObjectWrapper = myObjectsCache.get(id);
if (myObjectWrapper == null) {
continue;
}
if (myObjectWrapper.getObject() instanceof MyObjectSub) {
((MyObjectSub) myObjectWrapper.getObject()).clearAField();
myObjectWrapper.getObject().setTime(System.currentTimeMillis());
}
result.add(myObjectWrapper.getObject());
if (b) {
final MyObject obj = new MyObject(myObjectWrapper.getObject());
addObjectToDb(obj);
}
}
return result;
}
How should I efficiently make this method concurrent?
I think that the "get" is safe but once I get the value from cache and update the cached object's fields - there can be problems beacuse another thread could get the same wrapper and try to update the same underlying object... Should I add synchronization? And if so, then should I synchronize from "get" to end of loop iteration or the entire loop?
Maybe someone could share some more specific guidelines of proper and efficient use of ConcurrentHashMap when some more operations need to be done on the map keys/values inside loops etc...
I would be really grateful.
EDIT:
Some context for the question:
I'm currently working on refactoring of some dao classes in production code and a few of the classes used HashMaps for caching data retrieved from the database. All methods that used the cache (for write or reads) had their entire content inside a synchronized(cache) block (playing safe?). I don't have much experience with concurrency and I really want to use this opportunity to learn. I naively changed the HashMaps to ConcurrentHashMaps and now want to remove the synchronized bloocks where they're necessary. All caches are used for writes and reads. The presented method is based on one of the methods that I've changed and now I'm trying to learn when and to what extent synchronize. The methods clearAField just changes a value of one of the fields of the wrapped POJO object and addObjectToDb tries to add the object to the database.
Other example would be refilling of the cache:
public void findAll() throws SQLException{
// get data from database into a list
List<Data> data=getAllDataFromDatabase();
cacheCHM.clear();
cacheCHM.putAll(data);
}
In which case I should put the clear and putAll inside a synchronize(cacheCHM) block, right?
I've tried to find and read some posts/articles about the proper and efficient usage of CHM but most deal with single operations, without loops etc.... The best I've found would be:
http://www.javamadesoeasy.com/2015/04/concurrenthashmap-in-java.html
You've not mentioned what concurrency you expect to happen within your app, so I'm going to assume you have multiple threads calling aMethod, and nothing else.
You only have a single call to the ConcurrentHashMap: myObjectsCache.get(id), this is fine. In fact since nothing is writing data into your objectCache [see assumption above] you don't even need a ConcurrentHashMap! You'd be fine with any immutable collection. You have a suspicious line at the end: addObjectToDb(obj), does this method also affect your cache? If so it's still safe (probably, we'd have to see the method to be certain), but you definitely need the ConcurentHashMap.
The danger is where you change the objects, here:
myObjectWrapper.getObject().clearAField();
myObjectWrapper.getObject().setTime(System.currentTimeMillis());
It's possible for multiple threads to call these methods on the same object at the same time. Without knowing what these methods do, we can't say if this is safe or not. If these methods are both marked synchronised, or if you took care to ensure that it was safe for these methods to run concurrently then you're fine (but beware there's scope for these methods to run in different orders to what you might intuitively expect!). If you weren't so careful then there is a potential for data corruption.
A better approach to threadsaftey and caches is to use immutable objects. Here's what the MyObjectSub calss might look like if it were immutable [not sure why you need the wrapper - I'd omit that completely is possible]:
//Provided by way of example. You should consider generating these
//using http://immutables.github.io/ or similar
public class MyImmutableObject {
//If all members are final primitives or immutable objects
//then this class is threadsafe.
final String field;
final long time;
public MyImmutableObject(String field, long time) {
this.field = field;
this.time = time;
}
public MyImmutableObject clearField() {
//Since final fields can never be changed, our only option is to
//return a copy.
return new MyImmutableObject("", this.time);
}
public MyImmutableObject setTime(long newtime) {
return new MyImmutableObject(this.field, newtime);
}
}
If your objects are immutable then thread safety is a lot simpler. Your method would look something like this:
public List<Result> typicialCacheUsage(String key) {
MyImmutableObject obj = myObjectsCache.get(key);
obj = obj.clearField();
obj = obj.setTime(System.currentTimeMillis());
//If you need to put the object back in the cache you can do this:
myObjectsCache.put(key, obj);
List<Result> res = generateResultFromObject(obj);
return res;
}

Thread-safety simple

If I have a list of Components in a multithreading environnment and if I do any operation on this list except add (I use the keyword synchronized on the list in this case) and get (the method called by a component is thread-safe), is that thread-safe?
public class Test {
private final ArrayList<Component> myContainer = new ArrayList<Component>();
public void add(Component){
synchronized(myContainer){
myContainer.add(Component)
}
}
public void useComponents()
{
for(Component c : myContainer)
c.use(); // method thread-safe
}
// no other operations on myContainer
}
In the current form, it is not thread-safe: The useComponents method could be executed by one thread. At the same time, another thread could call add, and thus modify the collection while it is being iterated over. (This modification could happen between two calls to c.use(), so the fact that the use() method is thread safe would not help you here).
Strictly speaking, this is not even restricted to multithreading: If the c.use() internally called test.add(someOtherComponent) (even if it was done in the same thread!) this would throw a ConcurrentModificiationException, because again, the collection is modified while being iterated over.
Thread safety (without safety agains concurrent modifications) could be achieved by simply wrapping the iteration into a synchronized block:
public void useComponents()
{
synchronized (myContainer)
{
for(Component c : myContainer)
c.use(); // method thread-safe
}
}
However, this would still leave the possibility of a ConcurrentModificationException. Most likely the c.use() call will not (and should not) modify the collection that the component is contained in (otherwise, one could question the design in general).
If you wanted to allow the c.use() call to modify the collection, you could replace the collection with a CopyOnWriteArrayList:
private final List<Component> myContainer =
new CopyOnWriteArrayList<Component>();
and then you could even remove the synchroniziation completely. But you should be aware of the implications: The contents of the list will be copied during each modification (hence the name...). This is usually used in cases where you have a small collection that is frequently iterated over, but which is rarely modified. (Listeners in all forms are a classic example here).
It looks ok, except that I am not sure about iterator behavior in useComponents() if you'll simultaneously add elements to the list.
Did you consider using CopyOnWriteArrayList instead?

Java - concurrent clear of the list

I am trying to find a good way to achieve the following API:
void add(Object o);
void processAndClear();
The class would store the objects and upon calling processAndClear would iterate through the currently stored ones, process them somehow, and then clear the store. This class should be thread safe.
the obvious approach is to use locking, but I wanted to be more "concurrent". This is the approach which I would use:
class Store{
private AtomicReference<CopyOnWriteArrayList<Object>> store = new AtomicReference<>(new CopyOnWriteArrayList <>());
void add(Object o){
store.get().add(o);
}
void processAndClear(){
CopyOnWriteArrayList<Object> objects = store.get();
store.compareAndSet(objects, new CopyOnWriteArrayList<>());
for (Object object : objects) {
//do sth
}
}
}
This would allow threads that try to add objects to proceed almost immediately without any locking/waiting for the xlearing to complete. Is this the more or less correct approach?
Your above code is not thread-safe. Imagine the following:
Thread A is put on hold at add() right after store.get()
Thread B is in processAndClear(), replaces the list, processes all elements of the old one, then returns.
Thread A resumes and adds a new item to the now obsolete list that will never be processed.
The probably easiest solution here would be to use a LinkedBlockingQueue, which would as well simplify the task a lot:
class Store{
final LinkedBlockingQueue<Object> queue = new LinkedBlockingQueue<>();
void add(final Object o){
queue.put(o); // blocks until there is free space in the optionally bounded queue
}
void processAndClear(){
Object element;
while ((element = queue.poll()) != null) { // does not block on empty list but returns null instead
doSomething(element);
}
}
}
Edit: How to do this with synchronized:
class Store{
final LinkedList<Object> queue = new LinkedList<>(); // has to be final for synchronized to work
void add(final Object o){
synchronized(queue) { // on the queue as this is the shared object in question
queue.add(o);
}
}
void processAndClear() {
final LinkedList<Object> elements = new LinkedList<>(); // temporary local list
synchronized(queue) { // here as well, as every access needs to be properly synchronized
elements.addAll(queue);
queue.clear();
}
for (Object e : elements) {
doSomething(e); // this is thread-safe as only this thread can access these now local elements
}
}
}
Why this is not a good idea
Although this is thread-safe, it is much slower if compared to the concurrent version. Assume that you have a system with 100 threads that frequently call add, while one thread calls processAndClear. Then the following performance bottle-necks will occur:
If one thread calls add the other 99 are put on hold in the meantime.
During the first part of processAndClear all 100 threads are put on hold.
If you assume that those 100 adding threads have nothing else to do, you can easily show, that the application runs at the same speed as a single-threaded application minus the cost for synchronization. That means: adding will effectively be slower with 100 threads than with 1. This is not the case if you use a concurrent list as in the first example.
There will however be a minor performance gain with the processing thread, as doSomething can be run on the old elements while new ones are added. But again the concurrent example could be faster, as you could have multiple threads do the processing simultaneously.
Effectively synchronized can be used as well, but you will automatically introduce performance bottle-necks, potentially causing the application to run slower as single-threaded, forcing you to do complicated performance tests. In addition extending the functionality always contains a risk of introducing threading issues, as locking needs to be done manually.A concurrent list in contrast solves all these problems without additional code and the code can easily changed or extended later on.
The class would store the objects and upon calling processAndClear would iterate through the currently stored ones, process them somehow, and then clear the store.
This seems like you should use a BlockingQueue for this task. Your add(...) method would add to the queue and your consumer would call take() which blocks waiting for the next item. The BlockingQueue (ArrayBlockingQueue is a typical implementation) takes care of all of the synchronization and signaling for you.
This means that you don't have to have a CopyOnWriteArrayList nor an AtomicReference. What you would lose is a collection and you can iterate through for other reasons than your post articulates currently.

Categories

Resources