Updating elements atomically retrieving from Map using Java 8 parallel streams - java

I have a parallel stream in which I'm using a Map to mutate the elements.
Map<Long, List<MyItem>> hashmap = foo.getMap();
itemStream.parallel()
.filter(Objects::nonNull)
.forEach(item -> setProperties(hashmap, item));
The method 'setProperties()' takes the map and item, and performs a get using item and then sets some attributes of the item.
What I want is for the getting/property setting to be done atomically. So that two threads can't perform a get on the same key and have property updates be interleaved.
private void setProperties(Map<Long, List<Item>> map, Item item) {
long id = item.getID();
List<Object1> items = map.get(id);
for (Object1 ob : items) {
ob.setId(item.getFloorId());
ob.setPath(item.getPath());
ob.setTypeName(item.getTypeName());
}
}
Also a bit concerned about the latency hit and whether this sort of parallelization will really have a benefit vs the existing single threaded approach.

Synchnorising the Map or the get from it has no benefit, because the map is not being altered so there’s no race condition.
You need to synchronise the updates so they happen all-at-once:
for (Object1 ob : items) {
synchronized (ob) {
ob.setId(item.getFloorId());
ob.setPath(item.getPath());
ob.setTypeName(item.getTypeName());
}
}
This will have very little impact on performance, because these days synchronising introduces very little overhead and you will only block if the same Item is being operated on.

Related

ConcurrentHashMap with synchronized

I am maintaining some legacy code and found some implementation with synchronized key-word on ConcurrentHashMap. It seem unnecessary to me:
public class MyClass{
private final Map<MyObj, Map<String, List<String>>> conMap = new ConcurrentHashMap<>();
//...
//adding new record into conMap:
private void addToMap(MyObj id, String name, String value){
conMap.putIfAbsent(id, new ConcurrentHashMap<>());
Map<String, List<String>> subMap = conMap.get(id);
synchronized(subMap){ // <-- is it necessary?
subMap.putIfAbsent(name, new ArrayList<>());
subMap.get(name).add(value);
}
}
//...
public void doSomthing((MyObj id){
List<Map<String, List<String>>> mapsList = new LinkedList<>();
for(MyObj objId: conMap.keySet()){
if(objId.key1.equals(id.key1)){
mapsList.add(conMap.get(objId));
}
}
for(Map<String, List<String>> map: mapsList){
synchronized(map){ // <-- is it necessary?
if(timeout <= 0){
log(map.size());
for(List<String> value: map.values(){
log(id, value);
}
}
else{
int sum = 0;
for(map.Entry<String, List<String>> val: map.entrySet()){
sum += val.getValue().size();
}
log(sum);
map.wait(timeout);
}
}
//...
}
So, is it reasonable to use synchronized key on object that already concurrent? Or those are two different things?
In this case:
synchronized(subMap){ // <-- is it necessary?
subMap.putIfAbsent(name, new ArrayList<>());
subMap.get(name).add(value);
}
the synchronized is necessary. Without it, you could have two threads simultaneously updating the same ArrayList instance. Since ArrayList is not thread-safe, the addToMap method would not be thread-safe either.
In this case:
synchronized(map){ // <-- is it necessary?
if(/*condition*/){
log(map.size());
for(List<String> value: map.values(){
log(id, value);
}
}
else{
int sum = 0;
for(map.Entry<String, List<String>> val: map.entrySet()){
sum += val.getValue().size();
}
log(sum);
map.wait(timeout);
}
the synchronized is necessary.
In the if branch, the log method (or something called from it) will probably call ArrayList::toString which will iterate each ArrayList. Without the synchronizing at the submap level, there could be a simultaneous add by another thread (e.g. an addToMap call). That means that there are memory hazards, and a ConcurrentModificationException may be possible in the toString() method.
In the else branch, the size() call is accessing a size field in each ArrayList in the submap. Without the synchronizing at the submap level, there could be a simultaneous add on one of those list. That could cause the size() method to return a stale value. In addition, you are not guaranteed to see map entries added to a submap while your are iterating it. If either of those events happen, the sum could be inaccurate. (Whether that is really an issue depends on the requirements for this method: inaccurate counts could be acceptable.)
ConcurrentHashMap synchronizes each individual method call itself, so that no other thread can access the map (and possibly break the internal data structure of the map).
Synchronized block synchronizes two or more consecutive method calls, so that no other thread can modify the data structure between the calls (and possibly break the consistency of the data, with regards to the application logic).
Note that the synchornized block only works if all access to the HashMap is performed from synchronized blocks using the same monitor object.
It is sort of necessary, as multiple threads may try to append to the same ArrayList at the same time. The synchonized is protecting against that happening as ArrayList is obviously not synchronized.
Since Java 8 we have computeIfAbsent which means the puts followed by gets they are doing can be simplified. I would write it like this, no synchronization required:
conMap.computeIfAbsent(id, k -> new ConcurrentHashMap<>())
.computeIfAbsent(name, k -> new CopyOnWriteArrayList<>()) // or other thread-safe list
.add(value);
Other answers don't adequately this bit...
for(Map<String, List<String>> map: mapsList){
synchronized(map){ // <-- is it necessary?
if(/*condition*/){
...iterate over map...
}
else {
...iterate over map...
}
}
}
Is it necessary? Hard to tell.
What is /*condition*/ ? Does synchronizing on map prevent some other thread A from changing the value of /*condition*/ after thread B has tested it, but before or while thread B is performing either of the two branches? If so, then the synchronized block could be very important.
How about those iterations? Does synchronizing on map prevent some other thread A from changing the contents of the map while thread B is iterating? If so, then the synchronized block could be very important.

Java: get+clear atomic for map

I would like to implement the following logic:
-the following structure is to be used
//Map<String, CopyOnWriteArrayList> keeping the pending updates
//grouped by the id of the updated object
final Map<String, List<Update>> updatesPerId = new ConcurrentHashMap<>();
-n producers will add updates to updatesPerId map (for the same id, 2 updates can be added at the same time)
-one TimerThread will run from time to time and has to process the received updates. Something like:
final Map<String, List<Update>> toBeProcessed = new HashMap<>(updatesPerId);
updatesPerId.clear();
// iterate over toBeProcessed and process them
Is there any way to make this logic thread safe without synchronizing the adding logic from producers and the logic from timerThread(consumer)? I am thinking about an atomic clear+get but it seems that ConcurrentMap does not provide something like that.
Also, I have to mention that updates should be kept by updated object id so I cannot replace the map with a queue or something else.
Any ideas?
Thanks!
You can leverage the fact that ConcurrentHashMap.compute executes atomically.
You can put into the updatesPerId like so:
updatesPerId.compute(k, (k, list) -> {
if (list == null) list = new ArrayList<>();
// ... add to the list
// Return a non-null list, so the key/value pair is stored in the map.
return list;
});
This is not using computeIfAbsent then adding to the return value, which would not be atomic.
Then in your thread to remove things:
for (String key : updatesPerId.keySet()) {
List<Update> list = updatesPerId.put(key, null);
updatesPerId.compute(key, (k, list) -> {
// ... Process the contents of the list.
// Removes the key/value pair from the map.
return null;
});
}
So, adding a key to the list (or processing all the values for that key) might block if you so happen to try to process the key in both places at once; otherwise, it will not be blocked.
Edit: as pointed out by #StuartMarks, it might be better to simply get all things out of the map first, and then process them later, in order to avoid blocking other threads trying to add:
Map<String, List<Update>> newMap = new HashMap<>();
for (String key : updatesPerId.keySet()) {
newMap.put(key, updatesPerId.remove(key));
}
// ... Process entries in newMap.
I'd suggest using LinkedBlockingQueue instead of CopyOnWriteArrayList as the map value. With COWAL, adds get successively more expensive, so adding N elements results in N^2 performance. LBQ addition is O(1). Also, LBQ has drainTo which can be used effectively here. You could do this:
final Map<String, Queue<Update>> updatesPerId = new ConcurrentHashMap<>();
Producer:
updatesPerId.computeIfAbsent(id, LinkedBlockingQueue::new).add(update);
Consumer:
updatesPerId.forEach((id, queue) -> {
List<Update> updates = new ArrayList<>();
queue.drainTo(updates);
processUpdates(id, updates);
});
This is somewhat different from what you had suggested. This technique processes the updates for each id, but lets producers continue to add updates to the map while this is going on. This leaves map entries and queues in the map for each id. If the ids end up getting reused a lot, the number of map entries will plateau at a high-water mark.
If new ids are continually coming in, and old ids becoming disused, the map will grow continually, which probably isn't what you want. If this is the case you could use the technique in Andy Turner's answer.
If the consumer really needs to snapshot and clear the entire map, I think you have to use locking, which you wanted to avoid.
Is there any way to make this logic thread safe without synchronizing the adding logic from producers and the logic from timerThread(consumer)?
In short, no - depending on what you mean by "synchronizing".
The easiest way is to wrap your Map into a class of your own.
class UpdateManager {
Map<String,List<Update>> updates = new HashMap<>();
public void add(Update update) {
synchronized (updates) {
updates.computeIfAbsent(update.getKey(), k -> new ArrayList<>()).add(update);
}
}
public Map<String,List<Update>> getUpdatesAndClear() {
synchronized (updates) {
Map<String,List<Update>> copy = new HashMap<>(updates);
updates.clear();
return copy;
}
}
}

What type of List/Map should I use for categorizing data but maintaining the order?

I have the following data objects:
MyObject {
priority (e.g. HIGH, LOW, ...)
information
}
I need to save them in the correct order for iterating over it if necessary.
I also need to get only data with priority HIGH or LOW sometimes (also in correct order).
If I use a List (e.g. ArrayList) I would have to iterate over every single data-object to search for my priorities.
If I use Map<Priority, List<Information>> I would lose the order between information within two different priorities.
Sample data input:
LOW, "Hello1"
HIGH, "Hello2"
LOW, "World3"
HIGH, "World4"
Desired results:
printData() -> Hello1, Hello2, World3, World4
printLow() -> Hello1, World3
printHigh() -> Hello2, World4
What data structure would fulfill my requirements at the best? (Java)
If iterating over the list is really too slow, then maintain two parallel collections:
a List<Information> to iterate over all the informations in order,
and a Map<Priority, List<Information>> to iterate over the informations of a given priority.
I would only do that if I had a proven performance problem and I have proven that it was caused by the iteration over the list of all the informations. Otherwise, it's premature optimization that makes the code harder to maintain and make correct, especially if the collection is mutable.
Use HashMap and a seperate list as below:
public enum Priority { .... };
Map<Priority, List<Information>> map = new HashMap<Priority, List<Information>>();
map.put(Priority.HIGH, new LinkedList<Information>());
map.put(Priority.MID, new LinkedList<Information>());
map.put(Priority.LOW, new LinkedList<Information>());
List<Information> infoOrderedList = new LinkedList<Information>();
public void putInfo(MyObject myobject) {
List<Information> infoList = map.get(myObject.getPriority());
infoList.add(myobject.getInformation());
infoOrderedList.add(myobject.getInformation());
}
public void removeInfo(MyObject myobject) {
List<Information> infoList = map.get(myObject.getPriority());
infoList.remove(myobject.getInformation());
infoOrderedList.remove(myobject.getInformation());
}
You may avoid explicit iteration using lambda and filtering on a List.
For instance if you want to get a list of high priority items just type:
List<MyObject> high = list.stream().filter(o -> o.priority == Priority.HIGH).collect(Collectors.toList());
Using an ArrayList you keep the sorting.
To improve performance you may use parallelStream() instead of stream()

Best way to tranverse List in Java?

I have a concurrent List used in multi-threaded environment. Once the List is built, mostly operation is traversing it. I am wondering which of the following 2 methods are more efficient, or what's cost of creating a new List vs using synchronized? Or maybe there are other better ways?
List<Object> list = new CopyOnWriteArrayList<Object>();
public int[] getAllValue1() {
List<Object> list2 = new ArrayList<Object>(list);
int[] data = new int[list2.size()];
int i = 0;
for (Object obj : list2) {
data[i++] = obj.getValue();
}
return data;
}
public int[] getAllValue2() {
synchronized (list) {
int[] data = new int[list.size()];
int i = 0;
for (Object obj : list) {
data[i++] = obj.getValue();
}
return data;
}
}
UPDATE
getAllValue1(): It is threadsafe, because it takes a snapshot of the CopyOnWriteList, which itself is threadsafe List. However, as sharakan points out, the cost is iterate 2 lists, and creating a local object ArrayList, which could be costly if the original list is large.
getAllValue2(): It is also threadsafe in the synchronization block. (Assume other functions do synchronization properly.) The reason to put it in the synchronization block is because I want to pre-allocate the array, to make sure .size() call is synchronized with iteration. (Iteration part is threadsafe, because it use CopyOnWriteList.) However the cost here is the opportunity cost of using syncrhonized block. If there are 1 million clients calling getAllValue2(), each one has to wait.
So I guess the answer really depends on how many concurrent users need to read the data. If not many concurrent users, probably the method2 is better. Otherwise, method1 is better. Agree?
In my usage, I have a couple concurrent clients, probably method2 is preferred. (BTW, my list is about 10k size).
getAllValue1 looks good given that you need to return an array of primitive types based on a a field of your objects. It'll be two iterations, but consistent and you won't cause any contention between reader threads. You haven't posted any profiling results, but unless your list is quite large I'd be more worried about contention in a multithreaded environment than the cost of two complete iterations.
You could remove one iteration if you change the API. Easiest way to do that is to return a Collection instead, as follows:
public Collection<Integer> getAllValue1() {
List<Integer> list2 = new ArrayList<Integer>(list.size());
for (Object obj : list2) {
list2.add(obj.getValue());
}
return list2;
}
if you can change your API that way, that'd be an improvement.
I think the second one is more efficient. The reason is, in the first one, you create another list as a local creation. That means if the original list contains lot of data, it is gonna copy all of them. If it contains millions of data, then it will be a issue.
However, there is list.toArray() method
Collections interface also contain some useful stuff
Collection synchronizedCollection = Collections.synchronizedCollection(list);
or
List synchronizedList = Collections.synchronizedList(list);
If you need objects VALUE, and not object, then move with the second code of yours. Else, you can replace appropriate parts of the second code with the above functions, and do whatever you want.
Edit (again):
Since you are using a copy on write array list (should've been more observant) I would get the iterator and use that to initialize your array. Since the iterator is a snapshot of the array at the time you ask for it you can synchronize on the list to get the size and then subsequently iterate without fear of ConcurrentModificationException or having the iterator change.
public int[] getAllValue1() {
synchronized(list){
int[] data = new int[list.size()];
}
Iterator i = list.iterator();
while(i.hasNext()){
data[i++] = i.next().getValue();
}
return data;
}

Why to synchronize on SynchronizedMap or SynchronizedCollections?

I am referring to question asked here and using authors code example, now my question is
Why does author uses synchronized(synchronizedMap), is it really necessary because synchronizedMap will always make sure that there are no two threads trying to do read/put operation on Map so why do we need to synchronize on that map itself?
Would really appreciate an explanation.
public class MyClass {
private static Map<String, List<String>> synchronizedMap =
Collections.synchronizedMap(new HashMap<String, List<String>>());
public void doWork(String key) {
List<String> values = null;
while ((values = synchronizedMap.remove(key)) != null) {
//do something with values
}
}
public static void addToMap(String key, String value) {
synchronized (synchronizedMap) {
if (synchronizedMap.containsKey(key)) {
synchronizedMap.get(key).add(value);
}
else {
List<String> valuesList = new ArrayList<String>();
valuesList.add(value);
synchronizedMap.put(key, valuesList);
}
}
}
}
why do we need to synchronize on that synchronizemap itself?
You may need to synchronize on an already synchronized collection because you are performing two operations on the collection -- in your example, a containsKey() and then a put(). You are trying to protect against race conditions in the code that is calling the collection. In addition, in this case, the synchronized block also protects the ArrayList values so that multiple threads can add their values to these unsynchronized collections.
If you look at the code you linked to, they first check for the existence of the key and then put a value into the map if the key did not exist. You need to protect against 2 threads checking for a key's existence and then both of them putting into the map. The race is which one will put first and which one will overwrite the previous put.
The synchronized collection protects itself from multiple threads corrupting the map itself. It does not protect against logic race conditions around multiple calls to the map.
synchronized (synchronizedMap) {
// test for a key in the map
if (synchronizedMap.containsKey(key)) {
synchronizedMap.get(key).add(value);
} else {
List<String> valuesList = new ArrayList<String>();
valuesList.add(value);
// store a value into the map
synchronizedMap.put(key, valuesList);
}
}
This is one of the reasons why the ConcurrentMap interface has the putIfAbsent(K key, V value);. That does not require two operations so you may not need to synchronize around it.
Btw, I would rewrite the above code to be:
synchronized (synchronizedMap) {
// test for a key in the map
List<String> valuesList = synchronizedMap.get(key);
if (valueList == null) {
valuesList = new ArrayList<String>();
// store a value into the map
synchronizedMap.put(key, valuesList);
}
valuesList.add(value);
}
Lastly, if most of the operations on the map need to be in a synchronized block anyway, you might as well not pay for the synchronizedMap and just use a HashMap always inside of synchronized blocks.
It is not just about updating synchronizedMap values, it is about sequence of operations effecting the map. There are two operations happening on the map inside same method.
If you don't synchronize block/method, assume there may be case like Thread1 executing first part and thread2 executing second part, your business operation may results in weird results (even though updates to map are synchronized)

Categories

Resources