I know that for concurrency reasons I cannot update the value of a local variable in a lambda in Java 8. So this is illegal:
double d = 0;
orders.forEach( (o) -> {
d+= o.getTotal();
});
But, what about updating an instance variable or changing the state of a local object?, For example a Swing application I have a button and a label declared as instance variables, when I click the button I want to hide the label
jButton1.addActionListener(( e) -> {
jLabel.setVisible(false);
});
I get no compiler errors and works fine, but... is it right to change state of an object in a lambda?, Will I have concurrency problems or something bad in the future?
Here another example. Imagine that the following code is in the method doGet of a servlet
Will I have some problem here?, If the answer is yes: Why?
String key = request.getParameter("key");
Map<String, String> resultMap = new HashMap<>();
Map<String, String> map = new HashMap<>();
//Load map
map.forEach((k, v) -> {
if (k.equals(key)) {
resultMap.put(k, v);
}
});
response.getWriter().print(resultMap);
What I want to know is: When is it right to mutate the state of an object instance in a lambda?
Your assumptions are incorrect.
You can only change effectively final variables in lambdas, because lambdas are syntactic sugar* over anonymous inner classes.
*They are actually more than only syntactic sugar, but that is not relevant here.
And in anonymous inner classes you can only change effectively final variables, hence the same holds for lambdas.
You can do anything you want with lambdas as long as the compiler allows it, onto the behaviour part now:
If you modify state that depends on other state, in a parallel setting, then you are in trouble.
If you modify state that depends on other state, in a linear setting, then everything is fine.
If you modify state that does not depend on anything else, then everything is fine as well.
Some examples:
class MutableNonSafeInt {
private int i = 0;
public void increase() {
i++;
}
public int get() {
return i;
}
}
MutableNonSafeInt integer = new MutableNonSafeInt();
IntStream.range(0, 1000000)
.forEach(i -> integer.increase());
System.out.println(integer.get());
This will print 1000000 as expected no matter what happens, even though it depends on the previous state.
Now let's parallelize the stream:
MutableNonSafeInt integer = new MutableNonSafeInt();
IntStream.range(0, 1000000)
.parallel()
.forEach(i -> integer.increase());
System.out.println(integer.get());
Now it prints different integers, like 199205, or 249165, because other threads are not always seeing the changes that different threads have made, because there is no synchronization.
But say that we now get rid of our dummy class and use the AtomicInteger, which is thread-safe, we get the following:
AtomicInteger integer = new AtomicInteger(0);
IntStream.range(0, 1000000)
.parallel()
.forEach(i -> integer.getAndIncrement());
System.out.println(integer.get());
Now it correctly prints 1000000 again.
Synchronization is costly however, and we have lost nearly all benefits of parallelization here.
In general: yes, you may get concurrency problems, but only the ones you already had. Lambdafying it won't make code non-threadsafe where it was before, or vice versa. In the example you give, your code is (probably) threadsafe because an ActionListener is only ever called on the event-dispatching thread. Provided you have observed the Swing single-threaded rule, no other thread ever accesses jLabel, and if so there can be no thread interference on it. But that question is orthogonal to the use of lambdas.
in case 'forEach' is distributed to different threads/cores you might have concurrency issues. consider using atomics or concurrent structures (like ConcurrentHashMap)
Related
I'm a bit confused about how AtomicReference getAndUpdate guarantees atomicity. Consider the following examples
example 1
AtomicReference<Set<String>> set = new AtomicReference<>(new HashSet<>());
set.getAndUpdate(current -> {
Set<String> updated = new HashSet<>();
updated.add("test");
return updated;
});
example 2
AtomicReference<Set<String>> set = new AtomicReference<>(new HashSet<>());
set.getAndUpdate(current -> {
current.add("test");
return current;
});
In example 2, the set will be modified in the callback of the getAndUpdate. If multiple threads try to access this function at the same time, will they see the modified state or getAndUpdate prevents this by cloning the original set when passing it to the callback so that the modification happens in one thread will not be seen in other threads? If example 2 does not guarantee atomicity, why would getAndUpdate allow us to write this code?
example 1 will guarantee the atomicity since the modification happens on a new set. But how it defers from below?
AtomicReference<Set<String>> set = new AtomicReference<>(new HashSet<>());
Set<String> updated = new HashSet<>();
updated.add("test");
set.set(updated);
In example 2, the set will be modified in the callback of the getAndUpdate. If multiple threads try to access this function at the same time, will they see the modified state or getAndUpdate prevents this by cloning the original set when passing it to the callback so that the modification happens in one thread will not be seen in other threads?
There's nothing preventing the modification to the set from being seen by multiple threads. Only the change to the reference is atomic, not changes to whatever the reference refers to.
If example 2 does not guarantee atomicity, why would getAndUpdate allow us to write this code?
Because it can't stop you. The compiler isn't smart enough to know that example 2 is broken code.
To minimize the risk of accidentally / unsafely modifying shared state, make sure the things stored in an atomic reference are immutable, or at least unmodifiable:
AtomicReference<Set<String>> set = new AtomicReference<>(Collections.emptySet());
set.getAndUpdate(current -> {
Set<String> updated = new HashSet<>();
updated.add("test");
return Collections.unmodifiableSet(updated);
});
// or if you're on Java 9+
set.getAndUpdate(current -> {
return Set.of("test");
}
example 1 will guarantee the atomicity since the modification happens on a new set. But how it defers from [AtomicReference.set()]?
Your Example 1 uses AtomicReference.getAndUpdate which returns the previous value, and lets you generate the new value based on the previous value. If you don't need to know what the previous value was, you can just call AtomicReference.set.
I stumbled upon the following piece of code:
public static final Map<String, Set<String>> fooCacheMap = new ConcurrentHashMap<>();
this cache is accessed from rest controller method:
public void fooMethod(String fooId) {
Set<String> fooSet = cacheMap.computeIfAbsent(fooId, k -> new ConcurrentSet<>());
//operations with fooSet
}
Is ConcurrentSet really necessary? when I know for sure that the set is accessed only in this method?
As you use it in the controller then multiple threads can call your method simultaneously (ex. multiple parallel requests can call your method)
As this method does not look like synchronized in any way then ConcurrentSet is probably necessary here.
Is ConcurrentSet really necessary?
Possibly, possibly not. We don't know how this code is being used.
However, assuming that it is being used in a multithreaded way (specifically: that two threads can invoke fooMethod concurrently), yes.
The atomicity in ConcurrentHashMap is only guaranteed for each invocation of computeIfAbsent. Once this completes, the lock is released, and other threads are able to invoke the method. As such, access to the return value is not atomic, and so you can get thread inference when accessing that value.
In terms of the question "do I need `ConcurrentSet"? No: you can do it so that accesses to the set are atomic:
cacheMap.compute(fooId, (k, fooSet) -> {
if (fooSet == null) fooSet = new HashSet<>();
// Operations with fooSet
return v;
});
Using a concurrent map will not guarantee thread safety. Additions to the Map need to be performed in a synchronized block to ensure that two threads don't attempt to add the same key to the map. Therefore, the concurrent map is not really needed, especially because the Map itself is static and final. Furthermore, if the code modifies the Set inside the Map, which appears likely, that needs to be synchronized as well.
The correct approach is to the Map is to check for the key. If it does not exist, enter a synchronized block and check the key again. This guarantees that the key does not exist without entering a synchronized block every time.
Set modifications should typically occur in a synchronized block as well.
I'm trying to understand warnings I found in the Documentation on Streams. I've gotten in the habit of using forEach() as a general purpose iterator. And that's lead me to writing this type of code:
public class FooCache {
private static Map<Integer, Integer> sortOrderCache = new ConcurrentHashMap<>();
private static Map<Integer, String> codeNameCache = new ConcurrentHashMap<>();
public static void populateCache() {
List<Foo> myThings = getThings();
myThings.forEach(thing -> {
sortOrderCache.put(thing.getId(), thing.getSortOrder());
codeNameCache.put(thing.getId(), thing.getCodeName())
});
}
}
This is a trivialized example. I understand that this code violates Oracle's warning against stateful lamdas and side-effects. But I don't understand why this warning exists.
When running this code it appears to behave as expected. So how do I break this to demonstrate why it's a bad idea?
In sort, I read this:
If executed in parallel, the non-thread-safety of ArrayList would
cause incorrect results, and adding needed synchronization would cause
contention, undermining the benefit of parallelism.
But can anyone add clarity to help me understand the warning?
From the Javadoc:
Note also that attempting to access mutable state from behavioral
parameters presents you with a bad choice with respect to safety and
performance; if you do not synchronize access to that state, you have
a data race and therefore your code is broken, but if you do
synchronize access to that state, you risk having contention undermine
the parallelism you are seeking to benefit from. The best approach is
to avoid stateful behavioral parameters to stream operations entirely;
there is usually a way to restructure the stream pipeline to avoid
statefulness.
The problem here is that if you access a mutable state, you loose on two side:
Safety, because you need synchronization which the Stream tries to minimize
Performance, because the required synchronization cost you (in your example, if you use a ConcurrentHashMap, this has a cost).
Now, in your example, there are several points here:
If you want to use Stream and multi threading stream, you need to use parralelStream() as in myThings.parralelStream(); as it stands, the forEach method provided by java.util.Collection is simple for each.
You use HashMap as a static member and you mutate it. HashMap is not threadsafe; you need to use a ConcurrentHashMap.
In the lambda, and in the case of a Stream, you must not mutate the source of your stream:
myThings.stream().forEach(thing -> myThings.remove(thing));
This may work (but I suspect it will throw a ConcurrentModificationException) but this will likely not work:
myThings.parallelStream().forEach(thing -> myThings.remove(thing));
That's because the ArrayList is not thread safe.
If you use a synchronized view (Collections.synchronizedList), then you would have a performance it because you synchronize on each access.
In your example, you would rather use:
sortOrderCache = myThings.stream()
.collect(Collectors.groupingBy(
Thing::getId, Thing::getSortOrder);
codeNameCache= myThings.stream()
.collect(Collectors.groupingBy(
Thing::getId, Thing::getCodeName);
The finisher (here the groupingBy) does the work you were doing and might be called sequentially (I mean, the Stream may be split across several thread, the the finisher may be invoked several times (in different thread) and then it might need to merge.
By the way, you might eventually drop the codeNameCache/sortOrderCache and simply store the id->Thing mapping.
I believe the documentation is mentioning about the side effects demonstrated by the below code:
List<Integer> matched = new ArrayList<>();
List<Integer> elements = new ArrayList<>();
for(int i=0 ; i< 10000 ; i++) {
elements.add(i);
}
elements.parallelStream()
.forEach(e -> {
if(e >= 100) {
matched.add(e);
}
});
System.out.println(matched.size());
This code streams through the list in parallel, and tries to add elements into other list if they match the certain criteria. As the resultant list is not synchronised, you will get java.lang.ArrayIndexOutOfBoundsException while executing the above code.
The fix would be to create a new list and return, e.g.:
List<Integer> elements = new ArrayList<>();
for(int i=0 ; i< 10000 ; i++) {
elements.add(i);
}
List<Integer> matched = elements.parallelStream()
.filter(e -> e >= 100)
.collect(Collectors.toList());
System.out.println(matched.size());
Side effects frequently makes assumptions about state and context. In parallel you are not guaranteed a specific order you see the elements in and multiple threads may run at the same time.
Unless you code for this this can give very subtle bugs which is very hard to track and fix when trying to go parallel.
The code snippet below updates a not-thread-safe map (itemsById is not thread safe) from a parallel stream's forEach block.
// Update stuff in `itemsById` by iterating over all stuff in newItemsById:
newItemsById.entrySet()
.parallelStream()
.unordered()
.filter(...)
.forEach(entry -> {
itemsById.put(entry.getKey(), entry.getValue()); <-- look
});
To me, this looks like not-thread-safe, because the parallel stream will call the forEach block in many threads at the same time, and thus call itemsById.put(..) in many threads at the same time, and itemsById isn't thread safe. (However, with a ConcurrentMap the code would be safe I think)
I wrote to a colleague: "Please note that the map might allocate new memory when you insert new data. That's likely not thread safe, since the collection is not thread safe. -- Whether or not writing to different keys from many threads, is thread safe, is implementation dependent, I would think. It's nothing I would choose to rely on."
He however says that the above code is thread safe. -- Is it?
((Please note: I don't think this question is too localized. Actually now with Java 8 I think fairly many people will do something like: parallelStream()...foreach(...) and then it might be good know about thread safety issues, for many people))
You're right: this code is not thread-safe and depending on the Map implementation and race condition may produce any random effect: correct result, silent loss of data, some exception or endless loop. You may easily check it like this:
int equal = 0;
for(int i=0; i<100; i++) {
// create test input map like {0 -> 0, 1 -> 1, 2 -> 2, ...}
Map<Integer, Integer> input = IntStream.range(0, 200).boxed()
.collect(Collectors.toMap(x -> x, x -> x));
Map<Integer, Integer> result = new HashMap<>();
// write it into another HashMap in parallel way without key collisions
input.entrySet().parallelStream().unordered()
.forEach(entry -> result.put(entry.getKey(), entry.getValue()));
if(result.equals(input)) equal++;
}
System.out.println(equal);
On my machine this code usually prints something between 20 and 40 instead of 100. If I change HashMap to TreeMap, it usually fails with NullPointerException or becomes stuck in the infinite loop inside TreeMap implementation.
I'm no expert on streams but I assume there is no fancy synchronization employed here and thus I wouldn't consider adding elements to itemsById in parallel as threadsafe.
One of the things that could happen would be an endless loop since if both elements would happen to end up in the same bucket the underlying list might be messed up and elements could refer to each other in a cycle (A.next = B, B.next = A). A ConcurrentHashMap would prevent that by synchronizing write access on the bucket, i.e. unless the elements end up in the same bucket it would not block but if they do the add is sequential.
This code is not thread-safe.
Oracle docs state:
Operations like forEach and peek are designed for side effects; a
lambda expression that returns void, such as one that invokes
System.out.println, can do nothing but have side effects. Even so, you
should use the forEach and peek operations with care; if you use one
of these operations with a parallel stream, then the Java runtime may
invoke the lambda expression that you specified as its parameter
concurrently from multiple threads.
I found the following code snippet in luaj and I started to doubt that if there is a possibility that changes made to the Map after it has been constructed might not be visible to other threads since there is no synchronization in place.
I know that since the Map is declared final, its initialized values after construction is visible to other threads, but what about changes that happen after that.
Some might also realize that this class is so not thread-safe that calling coerce in a multi-threaded environment might even cause infinite loop in the HashMap, but my question is not about that.
public class CoerceJavaToLua {
static final Map COERCIONS = new HashMap(); // this map is visible to all threads after construction, since its final
public static LuaValue coerce(Object paramObject) {
...;
if (localCoercion == null) {
localCoercion = ...;
COERCIONS.put(localClass, localCoercion); // visible?
}
return ...;
}
...
}
You're correct that changes to the Map may not be visible to other threads. Every method that accesses COERCIONS (both reading and writing) should be synchronized on the same object. Alternatively, if you never need sequences of accesses to be atomic, you could use a synchronized collection.
(BTW, why are you using raw types?)
This code is actually bad and may cause many problems (probably not infinite loop, that's more common with TreeMap, with HashMap it's more likely to get the silent data loss due to overwrite or probably some random exception). And you're right, it's not guaranteed that the changes made in one thread will be visible by another one.
Here the problem may look not very big as this Map is used for caching purposes, thus silent overwrites or visibility lag doesn't lead to real problems (just two distinct instances of coersion will be used for the same class, which is probably ok in this case). However it's still possible that such code will break your program. If you like, you can submit a patch to LuaJ team.
Two options:
// Synchronized (since Java 1.2)
static final Map COERCIONS = Collections.synchronizedMap(new HashMap());
// Concurrent (since Java 5)
static final Map COERCIONS = new ConcurrentHashMap();
They each have their pros and cons.
ConcurrentHashMap pro is no locking. Con is that operations are not atomic, e.g. an Iterator in one thread and a call to putAll in another will allow iterator to see some of the values added.