Let's say i have CPU with 2 cores. If i will run background processing service with Executors.newFixedThreadPool(4) threads am i correct that:
during lifetime of executor service single thread could be running on different cores
even with no synchronization if thread A runs its code on core 1 and left in CPU cache some value of shared singleton let's say, and then if thread B will run it's code on same core and will try get singleton value from same memory location which has representation left by thread A in cache - it will get it from CPU core L1 or L2 cache. And if thread B will use synchronization it will read new value from main memory(latest version). In general if some thread left in CPU core cache some value of private field of shared object - another thread that could be run on same core could see value of private member from cache left by other thread.
If both options on top will be true - if L2 cache will be used to store shared between threads(which will add new values to map) HashMap instance and L2 will be a shared between all cores cache - does it mean that while skipping not atomic operations(if we want just to see correct/latest values in map) we can skip synchronization. For example will it be correct to have a HashMap and skip synchronization on reading existing values from Map:
Example
import java.util.Arrays;
import java.util.HashMap;
import java.util.Map;
import java.util.Random;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicInteger;
public class Launcher {
public static void main(String[] args) throws Exception {
final Stats stats = new Stats();
final Random key = new Random();
ExecutorService service = Executors.newFixedThreadPool(2);
service.submit(new Runnable() {
#Override
public void run() {
while (!Thread.currentThread().isInterrupted()) {
String keyValue = String.valueOf(key.nextInt(10));
int value = stats.inc(keyValue);
System.out.println("[A] Key " + keyValue + " was incremented to " + value);
try {
TimeUnit.MILLISECONDS.sleep(1500);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
}
});
service.submit(new Runnable() {
#Override
public void run() {
while (!Thread.currentThread().isInterrupted()) {
int[] values = new int[10];
for (int i = 0; i< 10; i++) {
values[i] = stats.get(String.valueOf(i));
}
System.out.println("[B] " + Arrays.toString(values));
try {
TimeUnit.MILLISECONDS.sleep(1500);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
}
});
}
static class Stats {
private final Map<String, Number> statistics = new HashMap<String, Number>();
public int inc(String key) {
if (!statistics.containsKey(key)) {
synchronized (statistics) {
statistics.put(key, new AtomicInteger(0));
}
}
return ((AtomicInteger) statistics.get(key)).getAndIncrement();
}
public int get(String key) {
if (!statistics.containsKey(key)) {
return 0;
}
return statistics.get(key).intValue();
}
}
}
Could you point me to some valuable documentation of low level management of multithreaded code in java?
Guys i really understand that we should not rely on specific architecture/CPU/ etc. I'm just curious if probability of described points bigger than 0 :)
Thx in advance
You shouldn't make any assumptions about threads seeing values modified by other threads unless you synchronize on the access or make the variables volatile.
Any other behaviour is unreliable and subject to change.
Remember that Java is running on the JVM, not directly on your processor, and has license to make a LOT of optimisations to your running code. So while a lot of the behaviour carries over you cannot rely upon it. Especially since as soon as you run on different architecture or under different conditions the exact same bytecode may be optimised differently.
Related
I'm trying to simulate a non-thread safe counter class by incrementing the count in an executor service task and using countdown latches to wait for all threads to start and then stop before reading the value in the main thread.
The issue is that when I run it the System.out at the end always returns 10 as the correct count value. I was expecting to see some other value when I run this as the 10 threads may see different values.
My code is below. Any idea what is happening here? I'm running it in Java 17 and from Intellij IDEA.
Counter.java
public class Counter {
private int counter = 0;
public void incrementCounter() {
counter += 1;
}
public int getCounter() {
return counter;
}
}
Main.java
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class Main {
public static void main(String[] args) throws InterruptedException {
ExecutorService executorService = Executors.newFixedThreadPool(10);
CountDownLatch startSignal = new CountDownLatch(10);
CountDownLatch doneSignal = new CountDownLatch(10);
Counter counter = new Counter();
for (int i=0; i<10; i++) {
executorService.submit(() -> {
try {
startSignal.countDown();
startSignal.await();
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
counter.incrementCounter();
doneSignal.countDown();
});
}
doneSignal.await();
System.out.println("Finished: " + counter.getCounter());
executorService.shutdownNow();
}
}
It's worth remembering that just because something isn't synchronised correctly, it could still perform correctly under some circumstances, it just isn't guaranteed to do so in every situation, on every JVM, on every hardware.
In other words, there is no reverse guarantee, optimisers for example are free to decide your code can be replaced at little to no cost with a correctly synchronised implementation.
(Whether that is what's actually happening here isn't obvious to me at first glance.)
This question is NOT about how to use a ThreadLocal. My question is
about the side effect of the ForkJoinPool continuation of ForkJoinTask.compute() which breaks the ThreadLocal contract.
In a ForkJoinTask.compute(), I pull an arbitrary static ThreadLocal.
The value is some arbitrary stateful object but not stateful beyond the end of the compute() call. In other words, I prepare the threadlocal object/state, use it, then dispose.
In principle you would put that state in the ForkJoinTasK, but just assume this thread local value is in a 3rd party lib I cannot change. Hence the static threadlocal, as it is a resource that all tasks instances will share.
I anticipated, tested and proved that simple ThreadLocal gets initialized only once, of course. This means that due to thread continuation beneath the ForkJoinTask.join() call, my compute() method can get called again before it even exited. This exposes the state of the object being used on the previous compute call, many stackframes higher.
How do you solve that undesirable exposure issue?
The only way I currently see is to ensure new threads for every compute() call, but that defeats the F/J pool continuation and could dangerously explode the thread count.
Isn't there something to do in the JRE core to backup TL that changed since the first ForkJoinTask and revert the entire threadlocal map as if every task.compute is the first to run on the thread?
Thanks.
package jdk8tests;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ForkJoinPool;
import java.util.concurrent.ForkJoinWorkerThread;
import java.util.concurrent.RecursiveTask;
import java.util.concurrent.atomic.AtomicInteger;
public class TestForkJoin3 {
static AtomicInteger nextId = new AtomicInteger();
static long T0 = System.currentTimeMillis();
static int NTHREADS = 5;
static final ThreadLocal<StringBuilder> myTL = ThreadLocal.withInitial( () -> new StringBuilder());
static void log(Object msg) {
System.out.format("%09.3f %-10s %s%n", new Double(0.001*(System.currentTimeMillis()-T0)), Thread.currentThread().getName(), " : "+msg);
}
public static void main(String[] args) throws Exception {
ForkJoinPool p = new ForkJoinPool(
NTHREADS,
pool -> {
int id = nextId.incrementAndGet(); //count new threads
log("new FJ thread "+ id);
ForkJoinWorkerThread t = new ForkJoinWorkerThread(pool) {/**/};
t.setName("My FJThread "+id);
return t;
},
Thread.getDefaultUncaughtExceptionHandler(),
false
);
LowercasingTask t = new LowercasingTask("ROOT", 3);
p.invoke(t);
int nt = nextId.get();
log("number of threads was "+nt);
if(nt > NTHREADS)
log(">>>>>>> more threads than prescribed <<<<<<<<");
}
//=====================
static class LowercasingTask extends RecursiveTask<String> {
String name;
int level;
public LowercasingTask(String name, int level) {
this.name = name;
this.level = level;
}
#Override
protected String compute() {
StringBuilder sbtl = myTL.get();
String initialValue = sbtl.toString();
if(!initialValue.equals(""))
log("!!!!!! BROKEN ON START!!!!!!! value = "+ initialValue);
sbtl.append(":START");
if(level>0) {
log(name+": compute level "+level);
try {Thread.sleep(10);} catch (InterruptedException e) {e.printStackTrace();}
List<LowercasingTask> tasks = new ArrayList<>();
for(int i=1; i<=9; i++) {
LowercasingTask lt = new LowercasingTask(name+"."+i, level-1);
tasks.add(lt);
lt.fork();
}
for(int i=0; i<tasks.size(); i++) { //this can lead to compensation threads due to l1.join() method running lifo task lN
//for(int i=tasks.size()-1; i>=0; i--) { //this usually has the lN.join() method running task lN, without compensation threads.
tasks.get(i).join();
}
log(name+": returning from joins");
}
sbtl.append(":END");
String val = sbtl.toString();
if(!val.equals(":START:END"))
log("!!!!!! BROKEN AT END !!!!!!! value = "+val);
sbtl.setLength(0);
return "done";
}
}
}
I don't believe so. Not in general and specially not for the ForkJoinTask where tasks are expected to be pure functions on isolated objects.
Sometimes it is possible to change the order of the task to fork and join at the beginning and before the own task's work. That way the subtask will initialize and dispose the thread-local before returning. If that is not possible, maybe you can treat the thread-local as a stack and push, clear, and restore the value around each join.
This is a java concurrency question. 10 jobs need to be done, each of them will have 32 worker threads. Worker thread will increase a counter . Once the counter is 32, it means this job is done and then clean up counter map. From the console output, I expect that 10 "done" will be output, pool size is 0 and counterThread size is 0.
The issues are :
most of time, "pool size: 0 and countThreadMap size:3" will be
printed out. even those all threads are gone, but 3 jobs are not
finished yet.
some time, I can see nullpointerexception in line 27. I have used ConcurrentHashMap and AtomicLong, why still have concurrency
exception.
Thanks
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.atomic.AtomicLong;
public class Test {
final ConcurrentHashMap<Long, AtomicLong[]> countThreadMap = new ConcurrentHashMap<Long, AtomicLong[]>();
final ExecutorService cachedThreadPool = Executors.newCachedThreadPool();
final ThreadPoolExecutor tPoolExecutor = ((ThreadPoolExecutor) cachedThreadPool);
public void doJob(final Long batchIterationTime) {
for (int i = 0; i < 32; i++) {
Thread workerThread = new Thread(new Runnable() {
#Override
public void run() {
if (countThreadMap.get(batchIterationTime) == null) {
AtomicLong[] atomicThreadCountArr = new AtomicLong[2];
atomicThreadCountArr[0] = new AtomicLong(1);
atomicThreadCountArr[1] = new AtomicLong(System.currentTimeMillis()); //start up time
countThreadMap.put(batchIterationTime, atomicThreadCountArr);
} else {
AtomicLong[] atomicThreadCountArr = countThreadMap.get(batchIterationTime);
atomicThreadCountArr[0].getAndAdd(1);
countThreadMap.put(batchIterationTime, atomicThreadCountArr);
}
if (countThreadMap.get(batchIterationTime)[0].get() == 32) {
System.out.println("done");
countThreadMap.remove(batchIterationTime);
}
}
});
tPoolExecutor.execute(workerThread);
}
}
public void report(){
while(tPoolExecutor.getActiveCount() != 0){
//
}
System.out.println("pool size: "+ tPoolExecutor.getActiveCount() + " and countThreadMap size:"+countThreadMap.size());
}
public static void main(String[] args) throws Exception {
Test test = new Test();
for (int i = 0; i < 10; i++) {
Long batchIterationTime = System.currentTimeMillis();
test.doJob(batchIterationTime);
}
test.report();
System.out.println("All Jobs are done");
}
}
Let’s dig through all the mistakes of thread related programming, one man can make:
Thread workerThread = new Thread(new Runnable() {
…
tPoolExecutor.execute(workerThread);
You create a Thread but don’t start it but submit it to an executor. It’s a historical mistake of the Java API to let Thread implement Runnable for no good reason. Now, every developer should be aware, that there is no reason to treat a Thread as a Runnable. If you don’t want to start a thread manually, don’t create a Thread. Just create the Runnable and pass it to execute or submit.
I want to emphasize the latter as it returns a Future which gives you for free what you are attempting to implement: the information when a task has been finished. It’s even easier when using invokeAll which will submit a bunch of Callables and return when all are done. Since you didn’t tell us anything about your actual task, it’s not clear whether you can let your tasks simply implement Callable (may return null) instead of Runnable.
If you can’t use Callables or don’t want to wait immediately on submission, you have to remember the returned Futures and query them at a later time:
static final ExecutorService cachedThreadPool = Executors.newCachedThreadPool();
public static List<Future<?>> doJob(final Long batchIterationTime) {
final Random r=new Random();
List<Future<?>> list=new ArrayList<>(32);
for (int i = 0; i < 32; i++) {
Runnable job=new Runnable() {
public void run() {
// pretend to do something
LockSupport.parkNanos(TimeUnit.SECONDS.toNanos(r.nextInt(10)));
}
};
list.add(cachedThreadPool.submit(job));
}
return list;
}
public static void main(String[] args) throws Exception {
Test test = new Test();
Map<Long,List<Future<?>>> map=new HashMap<>();
for (int i = 0; i < 10; i++) {
Long batchIterationTime = System.currentTimeMillis();
while(map.containsKey(batchIterationTime))
batchIterationTime++;
map.put(batchIterationTime,doJob(batchIterationTime));
}
// print some statistics, if you really need
int overAllDone=0, overallPending=0;
for(Map.Entry<Long,List<Future<?>>> e: map.entrySet()) {
int done=0, pending=0;
for(Future<?> f: e.getValue()) {
if(f.isDone()) done++;
else pending++;
}
System.out.println(e.getKey()+"\t"+done+" done, "+pending+" pending");
overAllDone+=done;
overallPending+=pending;
}
System.out.println("Total\t"+overAllDone+" done, "+overallPending+" pending");
// wait for the completion of all jobs
for(List<Future<?>> l: map.values())
for(Future<?> f: l)
f.get();
System.out.println("All Jobs are done");
}
But note that if you don’t need the ExecutorService for subsequent tasks, it’s much easier to wait for all jobs to complete:
cachedThreadPool.shutdown();
cachedThreadPool.awaitTermination(Long.MAX_VALUE, TimeUnit.DAYS);
System.out.println("All Jobs are done");
But regardless of how unnecessary the manual tracking of the job status is, let’s delve into your attempt, so you may avoid the mistakes in the future:
if (countThreadMap.get(batchIterationTime) == null) {
The ConcurrentMap is thread safe, but this does not turn your concurrent code into sequential one (that would render multi-threading useless). The above line might be processed by up to all 32 threads at the same time, all finding that the key does not exist yet so possibly more than one thread will then be going to put the initial value into the map.
AtomicLong[] atomicThreadCountArr = new AtomicLong[2];
atomicThreadCountArr[0] = new AtomicLong(1);
atomicThreadCountArr[1] = new AtomicLong(System.currentTimeMillis());
countThreadMap.put(batchIterationTime, atomicThreadCountArr);
That’s why this is called the “check-then-act” anti-pattern. If more than one thread is going to process that code, they all will put their new value, being confident that this was the right thing as they have checked the initial condition before acting but for all but one thread the condition has changed when acting and they are overwriting the value of a previous put operation.
} else {
AtomicLong[] atomicThreadCountArr = countThreadMap.get(batchIterationTime);
atomicThreadCountArr[0].getAndAdd(1);
countThreadMap.put(batchIterationTime, atomicThreadCountArr);
Since you are modifying the AtomicInteger which is already stored into the map, the put operation is useless, it will put the very array that it retrieved before. If there wasn’t the mistake that there can be multiple initial values as described above, the put operation had no effect.
}
if (countThreadMap.get(batchIterationTime)[0].get() == 32) {
Again, the use of a ConcurrentMap doesn’t turn the multi-threaded code into sequential code. While it is clear that the only last thread will update the atomic integer to 32 (when the initial race condition doesn’t materialize), it is not guaranteed that all other threads have already passed this if statement. Therefore more than one, up to all threads can still be at this point of execution and see the value of 32. Or…
System.out.println("done");
countThreadMap.remove(batchIterationTime);
One of the threads which have seen the 32 value might execute this remove operation. At this point, there might be still threads not having executed the above if statement, now not seeing the value 32 but producing a NullPointerException as the array supposed to contain the AtomicInteger is not in the map anymore. This is what happens, occasionally…
After creating your 10 jobs, your main thread is still running - it doesn't wait for your jobs to complete before it calls report on the test. You try to overcome this with the while loop, but tPoolExecutor.getActiveCount() is potentially coming out as 0 before the workerThread is executed, and then the countThreadMap.size() is happening after the threads were added to your HashMap.
There are a number of ways to fix this - but I will let another answer-er do that because I have to leave at the moment.
I've read through the API documentation of the java.util.concurrent package, but have obviously misunderstood something. The overview says
A small toolkit of classes that support lock-free thread-safe
programming on single variables.
However, a small test application shows that the AtomicInteger class does not provide thread-safety, at least when it is shared across threads (I accept that the getAndSet / increment methods themselves are at least atomic)
Test:
import java.util.Random;
import java.util.concurrent.atomic.AtomicInteger;
public class AtomicIntTest
{
public static void main(String[] args) throws InterruptedException
{
AtomicInteger atomicInt = new AtomicInteger(0);
WorkerThread w1 = new WorkerThread(atomicInt);
WorkerThread w2 = new WorkerThread(atomicInt);
w1.start();
w2.start();
w2.join(); // <-- As pointed out by StuartLC and BarrySW19, this should be w1.join(). This typo allows the program to produce variable results because it does not correctly wait for *both* threads to finish before outputting a result.
w2.join();
System.out.println("Final value: " + atomicInt.get());
}
public static class WorkerThread extends Thread
{
private AtomicInteger atomicInt = null;
private Random random = new Random();
public WorkerThread(AtomicInteger atomicInt)
{
this.atomicInt = atomicInt;
}
#Override
public void run()
{
for (int i = 0; i < 500; i++)
{
this.atomicInt.incrementAndGet();
try
{
Thread.sleep(this.random.nextInt(50));
}
catch(InterruptedException e)
{
e.printStackTrace();
}
}
}
}
}
When I run this class, I consistently get results ranging from around 950 to 1000, when I would expect to always see exactly 1000.
Can you explain why do I not get consistent results when two threads access this shared AtomicInteger variable? Have I misunderstood the thread-safety guarantee?
Looks like a simple cut&paste error - you are joining to thread "w2" twice and never to "w1". At present, you would expect the thread "w1" to still be running half the time when you print the 'final' value.
I just write one producer-consumer demo in scala and java. The demo shows that the performance of Scala is so poor. Is my code wrong?
Java AVG:1933534.1171935236
Scala AVG:103943.7312328648
The Scala code:
import scala.actors.Actor.actor
import scala.actors.Actor.loop
import scala.actors.Actor.react
import scala.concurrent.ops.spawn
object EventScala {
case class Event(index: Int)
def test() {
val consumer = actor {
var count = 0l
val start = System.currentTimeMillis()
loop {
react {
case Event(c) => count += 1
case "End" =>
val end = System.currentTimeMillis()
println("Scala AVG:" + count * 1000.0 / (end - start))
exit()
}
}
}
var running = true;
for (i <- 0 to 1) {
{
spawn {
while (running) {
consumer ! Event(0)
}
consumer!"End"
}
}
}
Thread.sleep(5000)
running = false
}
def main(args: Array[String]): Unit = {
test
}
}
The Java code:
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.LinkedBlockingQueue;
public class EventJava {
static BlockingQueue<Event> queue = new LinkedBlockingQueue<EventJava.Event>();
static volatile boolean running = true;
static volatile Event sentinel = new Event(0);
static class Event {
final int index;
public Event(int index) {
this.index = index;
}
}
static class Consumer implements Runnable {
#Override
public void run() {
long count = 0;
long start = System.currentTimeMillis();
while (true) {
try {
Event event = queue.take();
if (event == sentinel) {
long end = System.currentTimeMillis();
System.out.println("Java AVG:" + count * 1000.0
/ (end - start));
break;
}
count++;
} catch (InterruptedException e) {
}
}
}
}
static class Producer implements Runnable {
#Override
public void run() {
while (running) {
queue.add(new Event(1));
}
queue.add(sentinel);
}
}
static void test() throws InterruptedException {
ExecutorService pool = Executors.newCachedThreadPool();
pool.submit(new Consumer());
pool.execute(new Producer());
pool.execute(new Producer());
Thread.sleep(5000);
running = false;
pool.shutdown();
}
public static void main(String[] args) throws InterruptedException {
test();
}
}
You are testing two very different codes. Let's consider Java, for instance:
while (true) {
Where's the opportunity for the other "actors" to take over the thread and do some processing of their own? This "actor" is pretty much hogging the thread. If you create 100000 of them, you'll see JVM get crushed under the weight of the competing "actors", or see some get all processing time while others languish.
Event event = queue.take();
if (event == sentinel) {
Why are you taking the event out of the queue without checking if it can be processed or not? If it couldn't be processed, you'll loose the event. If you added it back to the queue, it will end up after other events sent by the same source.
These are just two things that the Scala code does and the Java one doesn't.
Overall, this is a very un-scientific test. No warmup. Low number of iterations. Very very un-sciency. Look at google caliper or such for ideas on making better micro-benchmarks.
Once your numbers are clear: compile it into scala, and then decompile it into java. The answer may jump out.
I think in your case it may be the configuration of the actors. Try akka also.
I have a machine with 4 processors. If I run your java code, I get full processor usage on a single processor (25%). That is, you're using a single thread.
If I run your scala code I get full usage of all processors, I'm getting four threads.
So I suspect that two things are happening: you're getting contention updating count, and/or count isn't being incremented correctly.
Also, the test that you're doing in the loop is a pattern match in Scala, but is a simple equality in Java, but I suspect this is a minor part.
Actors are meant for small messages that result in meaningful computations, not for element-by-element data processing as above.
Your Actor code is really more comparable to an ExecutorService with multiple threads, where each message represents a new Runnable/Callable being submitted, rather than what you have in your Java code.
Your benchmark is really comparing "how fast a worker thread can consume an item from a queue" vs. "how fast can scala send a message to a mailbox, notify and schedule the actor, and handle the message". It's just not the same thing, and it's not fit for the same purpose.
Regardless, Scala can use Java threads too. Scala just gives you an additional (safer, simpler, and communications-based) concurrency mechanism.
loop and react both throw exceptions for the purpose of flow control. This means that there are two tasks given to the thread pool, only one of which does actual work. Exceptions are also much more expensive than regular returns, even when the JVM successfully optimizes them down to longjmps.