How to test that my usage of a ConcurrentHashMap is threadsafe?

How to test that my usage of a ConcurrentHashMap is threadsafe? - java

I have written the code below and want to test its concurrency.
Even though I have used ConcurrentHashMap, I am not confident that the code is threadsafe because get and update/replace are used on different lines.
Can anyone suggest how to test this?
ConcurrentHashMap<String, int> somemap = new ConcurrentHashMap<>();
boolean some_method() {
somemap.putifAbsent("somekey",value);
final int val = somemap.get("somekey");
if(val < total_count) {
val++;
}
somemap.replace("somekey", val);
}

Can anyone suggest how to test this?
Testing for thread safety is not reliable. The reason is, The specification for a programming language or library may promise that if your multi-threaded program obeys certain rules, then you can expect it to behave in a certain way, but you will practically never find a promise that if your program breaks the rules then it will fail to meet your expectations.
I personally have seen a software update go through weeks of unit testing and integration testing, and then be shipped out into the field, where it ran for half a year before a piece of "thread unsafe" code caused it to fail at a customer site.
The only way to be sure that your multi-threaded program will work, is to prove that (A) it says what you think it says, and (B) it obeys the rules.
Your example obeys the rules, but I don't think it does what you think it does. It has a race condition. Namely, two threads both could call get("somekey") at the same time, and both get the same value. Then they both could compute the same new value, and they both could put(...) it back. Two calls to some_method(), but the end result is that the value only goes up by one.

Related

Why does this code get stuck in infinite loop [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 12 days ago.
Improve this question
public void bad() {
final ConcurrentMap<String, Integer> chm = new ConcurrentHashMap<>();
final String key = "1";
chm.computeIfAbsent(key, __ -> {
chm.remove(key);
return 1;
});
}
Of course i understand this looks very silly. It is just a super simplified version of some problematic code i was dealing with. I understand it makes no sense to do this but i am trying to understand the behaviour it causes.
When running this code you get stuck in an infinite loop on line 1107 after invoking http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/util/concurrent/ConcurrentHashMap.java#l1096
I am finding it very difficult to understand exactly what is happening which is causing this. Same behaviour when done on a seperate thread but waiting
public void bad2() {
final ConcurrentMap<String, Integer> chm = new ConcurrentHashMap<>();
final String key = "1";
Thread worker = new Thread(() -> chm.remove("1"));
chm.computeIfAbsent(key, __ -> {
worker.start();
try {
worker.join();
} catch (InterruptedException e) {
e.printStackTrace();
}
return 1;
});
}
Why in both cases is it not that chm.remove(key) completes normally returning null and then the value of 1 is set?
Interestingly this was addressed at some point and the first example throws java.lang.IllegalStateException: Recursive update when i ran with java 17.

This is called a 'contract error'. This happens when javadoc explicitly tells you NOT to do thing X, and leaves it at that; it does not specify what happens if you do X, just that you shouldn't do that. In this case, X is 'update the map in your passed-in function', and the javadoc explicitly spells out: DO NOT.
so the computation should be short and simple,
and must not attempt to update any other mappings of this map.
When you perform such a contract error, anything can happen. The spec is clear: Don't. So, if you do, the spec essentially claims the ability to do anything. Hard-crash, whistle a dixie tune from the speakers, you name it.
Hence why the behaviour changed (ordinarily, java does not change its behaviours without quite a big ordeal about breaking compatibility, but here, the spec behaviour has not changed, because the spec merely says 'do not do this, this thing does not perform according to spec if you fail to heed this warning' and 'loop endlessly' and 'throw an exception' are both just 'doing unspecified broken stuff' in this regard.
Okay, but why does it endlessly loop?
Because concurrent hashmap is 'smart' and uses a retry/CAS update model. Instead of acquiring a bunch of stuff, it just tries the operation without doing that, but will then check during/afterwards if it actually succeeded, or if, due to other threads modifying the same map at the same time in the same general area, its write got overwritten or otherwise didn't apply, in which case it'll do it again. In this case, removing the key is essentially 'eliminating a marker', which makes CHM think it updated a thing concurrently with another thing and therefore it should try again. Forever and ever.
That's what the 'cas' in line 1656 in your linked source file (casTabAt) stands for: Compare-And-Set. This is a concurrency primitive that can be a lot faster than locks: "If the current value is X, then set it to Y. Otherwise, do not set it at all. Tell me whether you set it or not" - all that, in one atomic operation, which is speedy because CPUs tend to support it as barebones machine code. No lock acquiry required. The general principle is to check what the current value is, do some bookkeeping, then set the new value, using CAS to ensure that the 'state' you checked is still the state we're in. If not, some other thread so happened to also be updating stuff, so, start over.
That's just one implementation. Tomorrow, it can change. You cannot rely on 'it will endlessly loop' because the spec do not guarantee it, and indeed, in JDK17, you get an exception instead.

Calling method twice in same block. Why?

I'm reading a blog post and trying to understand what's going on.
This is the blogpost.
it has this code:
if (validation().hasErrors())
throw new IllegalArgumentException(validation().errorMessage());
In the validation() method we have some object initialization and calculations so let' say it's an expensive call. Is it going to be executed twice? Or will it be optimized by the compiler to be something like this?
var validation = validation();
if (validation.hasErrors())
throw new IllegalArgumentException(validation.errorMessage());
Thanks!

The validation method will be called twice, and it will do the same work each time. First, the method is relatively big, and so it won't get inlined. Without being inlined, the compiler doesn't know what it does. Therefore, it safely assumes that the method has side effects, and so it cannot optimize away the second call.
Even if the method was inlined, and the compiler could examine it, it would see that there are in fact side effects. Calling LocalDate.now() returns a different result each time. For this reason, the code that you linked to is defective, although it's not likely to experience a problem in practice.
It's safer to capture the validation result in a local variable not for performance reasons, but for stability reasons. Imagine the odd case in which the initial validation call fails, but the second call passes. You'd then throw an exception with no message in it.

The Java to Bytecode compiler has a limited set of optimization techniques (e.g. 9*9 in the condition would turn into 81).
The real optimization happens by the JIT (Just In Time) compiler. This compiler is the result of over a decade and a half of extensive research and there is no simple answer to tell what it is capable of in every scenario.
With that being said, as a good practice, I always handle repetitive identical method calls by storing their result before approaching any loop structure where that result is needed. Example:
int[] grades = new int[500];
int countOfGrades = arr.length;
for (int i = 0; i < countOfGrades; i++) {
// Some code here
}
For your code (which is only run twice), you shouldn't worry as much about such optimization. But if you're looking for the ultimate – guaranteed – optimization on the account of a fraction of space (which is cheap), then you're better off using a variable to store any identical method result when needed more than once:
var validation = validation();
if (validation.hasErrors())
throw new IllegalArgumentException(validation.errorMessage());

However, I must simply question ... "these days," does it even actually matter anymore? Simply write the source-code "in the most obvious manner available," as the original programmer certainly did.
"Microseconds" really don't matter anymore. But, "clarity still does." To me, the first version of the code is frankly more understandable than the second, and "that's what matters to me most." Please don't bother to try to "out-smart" the compiler, if it results in source-code that is in any way harder to understand.

AtomicInteger & lambda expressions in single-threaded app

I need to modify a local variable inside a lambda expression in a JButton's ActionListener and since I'm not able to modify it directly, I came across the AtomicInteger type.
I implemented it and it works just fine but I'm not sure if this is a good practice or if it is the correct way to solve this situation.
My code is the following:
newAnchorageButton.addActionListener(e -> {
AtomicInteger anchored = new AtomicInteger();
anchored.set(0);
cbSets.forEach(cbSet ->
cbSet.forEach(cb -> {
if (cb.isSelected())
anchored.incrementAndGet();
})
);
// more code where I use the 'anchored' variable...
}
I'm not sure if this is the right way to solve this since I've read that AtomicInteger is used mostly for concurrency-related applications and this program is single-threaded, but at the same time I can't find another way to solve this.
I could simply use two nested for-loops to go over those arrays but I'm trying to reduce the method's cognitive complexity as much as I can according to the sonarlint vscode extension, and leaving those for-loops theoretically increases the method complexity and therefore its readability and maintainability.
Replacing the for-loops with lambda expressions reduces the cognitive complexity but maybe I shouldn't pay that much attention to it.

While it is safe enough in single-threaded code, it would be better to count them in a functional way, like this:
long anchored = cbSets.stream() // get a stream of the sets
.flatMap(List::stream) // flatten to list of cb's
.filter(JCheckBox::isSelected) // only selected ones
.count(); // count them
Instead of mutating an accumulator, we limit the flattened stream to only the ones we're interested in and ask for the count.
More generally, though, it is always possible to sum things up or generally aggregate the values without a mutable variable. Consider:
record Country(int population) { }
countries.stream()
.mapToInt(Country::population)
.reduce(0, Math::addExact)
Note: we never mutate any values; instead, we combine each successive value with the preceding one, producing a new value. One could use sum() but I prefer reduce(0, Math::addExact) to avoid the possibility of overflow.

and leaving those for-loops theoretically increases the method complexity and therefore its readability and maintainability.
This is obvious horsepuckey. x.forEach(foo -> bar) is not 'cognitively simpler' than for (var foo : x) bar; - you can map each AST node straight over from one to the other.
If a definition is being used to define complexity which concludes that one is significantly more complex than the other, then the only correct conclusion is that the definition is silly and should be fixed or abandoned.
To make it practical: Yes, introducing AtomicInteger, whilst performance wise it won't make one iota of difference, does make the code way more complicated. AtomicInteger's simple existence in the code suggests that concurrency is relevant here. It isn't, so you'd have to add a comment to explain why you're using it. Comments are evil. (They imply the code does not speak for itself, and they cannot be tested in any way). They are often the least evil, but evil they are nonetheless.
The general 'trick' for keeping lambda-based code cognitively easily followed is to embrace the pipeline:
You write some code that 'forms' a stream. This can be as simple as list.stream(), but sometimes you do some stream joining or flatmapping a collection of collections.
You have a pipeline of operations that operate on single elements in the stream and do not refer to the whole or to any neighbour.
At the end, you reduce (using collect, reduce, max - some terminator) such that the reducing method returns what you need.
The above model (and the other answer follows it precisely) tends to result in code that is as readable/complex as the 'old style' code, and rarely (but sometimes!) more readable, and significantly less complicated. Deviate from it and the result is virtually always considerably more complicated - a clear loser.
Not all for loops in java fit the above model. If it doesn't fit, then trying to force that particular square peg into the round hole will take a lot of effort and almost always results in code that is significantly worse: Either an order of magnitude slower or considerably more cognitively complicated.
It also means that it is virtually never 'worth' rewriting perfectly fine readable non-stream based code into stream based code; at best it becomes a percentage point more readable according to some personal tastes, with no significant universally agreed upon improvement.
Turn off that silly linter rule. The fact that it considers the above 'less' complex, and that it evidently determines that for (var foo : x) bar; is 'more complicated' than x.forEach(foo -> bar) is proof enough that it's hurting way more than it is helping.

I have the following to add to the two other answers:
Two general good practices in your code are in question:
Lambdas shouldn't be longer than 3-4 lines
Except in some precise cases, lambdas of stream operations should be stateless.
For #1, consider extracting the code of the lambda to a private method for example, when it's getting too long.
You will probably gain in readability, and you will also probably gain in better separating UI from business logic.
For #2, you are probably not concerned since you are working in a single thread at the moment, but streams can be parallelized, and they may not always execute exactly as you think it does.
For that reason, it's always better to keep the code stateless in stream pipeline operations. Otherwise you might be surprised.
More generally, streams are very good, very concise, but sometimes it's just better to do the same with good old loops.
Don't hesitate to come back to classic loops.
When Sonar tells you that the complexity is too high, in fact, you should try to factorize your code: split into smaller methods, improve the model of your objects, etc.

Using scala's ParHashMap in Java's project instead of ConcurrentHashMap

I've got a fairly complicated project, which heavily uses Java's multithreading. In an answer to one of my previous questions I have described an ugly hack, which is supposed to overcome inherent inability to iterate over Java's ConcurrentHashMap in parallel. Although it works, I don't like ugly hacks, and I've had a lot of trouble trying to introduce proposed proof of concept in the real system. Trying to find an alternative solution I have encountered Scala's ParHashMap, which claims to implement a foreach method, which seems to operate in parallel. Before I start learning a new language to implement a single feature I'd like to ask the following:
1) Is foreach method of Scala's ParHashMap scalable?
2) Is it simple and straightforward to call Java's code from Scala and vice versa? I'll just remind that the code is concurrent and uses generics.
3) Is there going to be a performance penalty for switching a part of codebase to Scala?
For reference, this is my previous question about parallel iteration of ConcurrentHashMap:
Scalable way to access every element of ConcurrentHashMap<Element, Boolean> exactly once
EDIT
I have implemented the proof of concept, in probably very non-idiomatic Scala, but it works just fine. AFAIK it is IMPOSSIBLE to implement a corresponding solution in Java given the current state of its standard library and any available third-party libraries.
import scala.collection.parallel.mutable.ParHashMap
class Node(value: Int, id: Int){
var v = value
var i = id
override def toString(): String = v toString
}
object testParHashMap{
def visit(entry: Tuple2[Int, Node]){
entry._2.v += 1
}
def main(args: Array[String]){
val hm = new ParHashMap[Int, Node]()
for (i <- 1 to 10){
var node = new Node(0, i)
hm.put(node.i, node)
}
println("========== BEFORE ==========")
hm.foreach{println}
hm.foreach{visit}
println("========== AFTER ==========")
hm.foreach{println}
}
}

I come to this with some caveats:
Though I can do some things, I consider myself relatively new to Scala.
I have only read about but never used the par stuff described here.
I have never tried to accomplish what you are trying to accomplish.
If you still care what I have to say, read on.
First, here is an academic paper describing how the parallel collections work.
On to your questions.
1) When it comes to multi-threading, Scala makes life so much easier than Java. The abstractions are just awesome. The ParHashMap you get from a par call will distribute the work to multiple threads. I can't say how that will scale for you without a better understanding of your machine, configuration, and use case, but done right (particularly with regard to side effects) it will be at least as good as a Java implementation. However, you might also want to look at Akka to have more control over everything. It sounds like that might be more suitable to your use case than simply ParHashMap.
2) It is generally simple to convert between Java and Scala collections using JavaConverters and the asJava and asScala methods. I would suggest though making sure that the public API for your method calls "looks Java" since Java is the least common denominator. Besides, in this scenario, Scala is an implementation detail, and you never want to leak those anyway. So keep the abstraction at a Java level.
3) I would guess there will actually be a performance gain with Scala--at runtime. However, you will find much slower compile time (which can be worked around. ish). This Stack Overflow post by the author of Scala is old but still relevant.
Hope that helps. That's quite a problem you got there.

Since Scala compiles to the same bytecode as Java, doing the same in both languages is very well possible, no matter the task. There are however some things which are easier to solve in Scala, but if this is worth learning a new language is a different question. Especially since Java 8 will include exactly what you ask for: simple parallel execution of functions on lists.
But even now you can do this in Java, you just need to write what Scala already has on your own.
final ExecutorService executor = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
//...
final Entry<String, String>[] elements = (Entry<String, String>[]) myMap.entrySet().toArray();
final AtomicInteger index = new AtomicInteger(elements.length);
for (int i = Runtime.getRuntime().availableProcessors(); i > 0; --i) {
executor.submit(new Runnable() {
public void run() {
int myIndex;
while ((myIndex = index.decrementAndGet()) >= 0) {
process(elements[myIndex]);
}
}
});
}
The trick is to pull those elements into a temporary array, so threads can take out elements in a thread-safe way. Obviously doing some caching here instead of re-creating the Runnables and the array each time is encouraged, because the Runnable creation might already take longer than the actual task.
It is as well possible to instead copy the elements into a (reusable) LinkedBlockingQueue, then have the threads poll/take on it instead. This however adds more overhead and is only reasonable for tasks that require at least some calculation time.
I don't know how Scala actually works, but given the fact that it needs to run on the same JVM, it will do something similar in the background, it just happens to be easily accessible in the standard library.

Is it inefficient to reference a hashmap in another class multiple times?

Class A
Class A {
public HashMap <Integer,Double> myHashMap;
public A(){
myHashMap = new HashMap()
}
}
class B
Class B {
private A anInstanceOfA;
public B(A a) {
this.anInstanceOfA = a;
}
aMethod(){
anInstanceOfA.myHashMap.get(1); <--getting hashmap value for key = 1
//proceed to use this value, but instead of storing it to a variable
// I use anInstanceOfA.myHashMap.get(1) each time I need that value.
}
In aMethod() I use anInstanceOfA.myHashMap.get(1) to get the value for key = 1. I do that multiple times in aMethod() and I'm wondering if there is any difference in efficiency between using anInstanceOfA.myHashMap.get(1) multiple times or just assigning it to a variable and using the assigned variable multiple times.
I.E
aMethod(){
theValue = anInstanceOfA.myHashMap.get(1);
//proceed to use theValue in my calculations. Is there a difference in efficiency?
}

In theory the JVM can optimise away the difference to be very small (compared to what the rest of the program is doing). However I prefer to make it a local variable as I believe it makes the code clearer (as I can give it a meaningful name)
I suggest you do what you believe is simpler and clearer, unless you have measured a performance difference.

The question seems to be that you want to know if it is more expensive to call get(l) multiple times instead of just once.
The answer to this is yes. The question is if it is enough to matter. The definitive answer is to ask the JVM by profiling. You can, however, guess by looking at the get method in your chosen implementation and consider if you want to do all that work every time.
Note, that there is another reason that you might want to put the value in a variable, namely that you can give it a telling name, making your program easier to maintain in the future.

This seems like a micro-optimization, that really doesn't make much difference in the scheme of things.
As #peter already suggested, 'optimizing' for style/readability is a better rationale for choosing the second option over the first one. Optimizing for speed only starts making sense if you really do a lot of calls, or if the call is very expensive -- both are probably not the case in your current example.

Put it in a local variable, for multiple reasons:
It will be much faster. Reading a local variable is definitely cheaper than a HashMap lookup, probably by a factor of 10-100x.
You can give the local variable a good, meaningful name
Your code will probably be shorter / simpler overall, particularly if you use the local variable many times.
You may get bugs during future maintenance if someone modifies one of the get calls but forgets to change the others. This is a problem whenever you are duplicating code. Using a local variable minimises this risk.
In concurrent situations, the value could theoretically change if the HashMap is modified by some other code. You normally want to get the value once and work with the same value. Although if you are running into problems of this nature you should probably be looking at other solutions first (locking, concurrent collections etc.)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.