Threads run in serial not parallel - java

I am trying to learn concurrency in Java, but whatever I do, 2 threads run in serial, not parallel, so I am not able to replicate common concurrency issues explained in tutorials (like thread interference and memory consistency errors). Sample code:
public class Synchronization {
static int v;
public static void main(String[] args) {
Runnable r0 = () -> {
for (int i = 0; i < 10; i++) {
Synchronization.v++;
System.out.println(v);
}
};
Runnable r1 = () -> {
for (int i = 0; i < 10; i++) {
Synchronization.v--;
System.out.println(v);
}
};
Thread t0 = new Thread(r0);
Thread t1 = new Thread(r1);
t0.start();
t1.start();
}
}
This always give me a result starting from 1 and ending with 0 (whatever the loop length is). For example, the code above gives me every time:
1
2
3
4
5
6
7
8
9
10
9
8
7
6
5
4
3
2
1
0
Sometimes, the second thread starts first and the results are the same but negative, so it is still running in serial.
Tried in both Intellij and Eclipse with identical results. CPU has 2 cores if it matters.
UPDATE: it finally became reproducible with huge loops (starting from 1_000_000), though still not every time and just with small amount of final discrepancy. Also seems like making operations in loops "heavier", like printing thread name makes it more reproducible as well. Manually adding sleep to thread also works, but it makes experiment less cleaner, so to say. The reason doesn't seems to be that first loop finishes before the second starts, because I see both loops printing to console while continuing operating and still giving me 0 at the end. The reasons seems more like a thread race for same variable. I will dig deeper into that, thanks.

Seems like first started thread just never give a chance to second in Thread Race to take a variable/second one just never have a time to even start (couldn't say for sure), so the second almost* always will be waiting until first loop will be finished.
Some heavy operation will mix the result:
TimeUnit.MILLISECONDS.sleep(100);
*it is not always true, but you are was lucky in your tests

Starting a thread is heavyweight operation, meaning that it will take some time to perform. Due that fact, by the time you start second thread, first is finished.
The reasoning why sometimes it is in "revert order" is due how thread scheduler works. By the specs there are not guarantees about thread execution order - having that in mind, we know that it is possible for second thread to run first (and finish)
Increase iteration count to something meaningful like 10000 and see what will happen then.

This is called lucky timing as per Brian Goetz (Author of Java Concurrency In Practice). Since there is no synchronization to the static variable v it is clear that this class is not thread-safe.

Related

why the starvation is caused by notifyAll()?

Here is my code,
class Shared {
private static int index = 0;
public synchronized void printThread() {
try {
while(true) {
Thread.sleep(1000);
System.out.println(Thread.currentThread().getName() + ": " + index++);
notifyAll();
// notify();
wait();
}
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
class Example13 implements Runnable {
private Shared shared = new Shared();
#Override
public void run() {
shared.printThread();
}
}
public class tetest {
public static void main(String[] args) {
Example13 r = new Example13();
Thread t1 = new Thread(r, "Thread 1");
Thread t2 = new Thread(r, "Thread 2");
Thread t3 = new Thread(r, "Thread 3");
Thread t4 = new Thread(r, "Thread 4");
Thread t5 = new Thread(r, "Thread 5");
t1.start();
t2.start();
t3.start();
t4.start();
t5.start();
}
}
and the result is here
Thread 1: 0
Thread 5: 1
Thread 4: 2
Thread 3: 3
Thread 2: 4
Thread 3: 5
Thread 2: 6
Thread 3: 7
Thread 2: 8
Thread 3: 9
the question is, why only two of the threads are working? I'm so confused, I thought notify() is randomly wake up one of the waiting threads, but it's not.
is this starvation? then, why the starvation is caused? I tried notify() and notifyAll(), but got same results for both.
Can any one help my toasted brain?
This isn't 'starvation'. Your 5 threads are all doing nothing. They all want to 'wake up' - notify() will wake up an arbitrary one. It is neither unreliably random: The JMM does not ascribe an order to this, so one of them will wake up, you can't rely on it being random (do not use this to generate random numbers), nor can you rely on specific ordering behaviour.
It's not starvation (it's not: Oh no! Thread 2 and 3 are doing all the work and 4 and 5 are just hanging out doing nothing! That's bad - the system could be more efficient!) - because it doesn't matter which thread 'does the work'. A CPU core is a CPU core, it cares not which thread it ends up running.
Starvation is a different principle. Imagine instead of Thread.sleep (which means the threads aren't waiting for anything specific, other than some time to elapse), instead the threads want to print the result of some expensiv-ish math operation. If you just let 2 threads each say 'Hello!', then the impl of System.out says that it would be acceptable for the JVM to produce:
HelHellloo!!
So to prevent this, you use locks to create a 'talking stick' of sorts: A thread only gets to print if it has the talking stick. Each of 5 threads will all perform, in a loop, the following operation:
Do an expensive math operation.
Acquire the talking stick.
Print the result of the operation.
Release the talking stick.
Loop back to the top.
Now imagine that, despite the fact that the math operation is quite expensive, for whatever reason you have an excruciatingly slow terminal, and the 'print the result of the operation' job takes a really long time to finish.
Now you can run into starvation. Imagine this scenario:
Threads 1-5 all do their expensive math simultaneously.
Arbitrarily, thread 4 ends up nabbing the talking stick.
The other 4 threads soon too want the talking stick but they have to wait; t4 has it. They do nothing now. Twiddling their thumbs (they could be calculating, but they are not!)
after the excruciatingly long time, 4 is done and releases the stick. 1, 2, 3, and 5 dogpile on like its the All-Blacks and 2 so happens to win the scrum and crawls out of the pile with the stick. 1, 3, and 5 gnash their teeth and go back yet again to waiting for the stick, still not doing any work. Whilst 2 is busy spending the really long time printing results, 4 goes back to the top of the loop and calculates another result. It ends up doing this faster than 2 manages to print, so 4 ends up wanting the talking stick again before 2 is done.
2 is finally done and 1, 3, 4, and 5 all scrum into a pile again. 4 happens to get the stick - java makes absolutely no guarantees about fairness, any one of them can get it, there is also no guarantee of randomness or lack thereof. A JVM is not broken if 4 is destined to win this fight.
Repeat ad nauseam. 2 and 4 keep trading the stick back and forth. 1, 3, and 5 never get to talk.
The above is, as per the JMM, valid - a JVM is not broken if it behaves like this (it would be a tad weird). Any bugs filed about this behaviour would be denied: The lock isn't so-called "fair". Java has fair locks if you want them - in the java.util.concurrent package. Fair locks incur some slight extra bookkeeping cost, the assumption made by the synchronized and wait/notify system is that you don't want to pay this extra cost.
A better solution to the above scenario is possibly to make a 6th thread that JUST prints, with 5 threads JUST filling a buffer, at least then the 'print' part is left to a single thread and that might be faster. But mostly, the bottleneck in this getup is simply that printing - the code has zero benefit from being multicore (just having ONE thread do ONE math calculation, print it, do another, and so forth would be better. Or possibly 2 threads: Whilst printing, the other thread calculates a number, but there's no point in having more than one; even a single thread can calculate faster than results can be printed). Thus in some ways this is just what the situation honestly requires: This hypothetical scenario still prints as fast as it can. IF you need the printing to be 'fair' (and who says that? It's not intrinsic to the problem description that fairness is a requirement. Maybe all the various calculations are equally useful so it doesn't matter that one thread gets to print more than others; let's say its bitcoin miners, generating a random number and checking if that results in a hash with the requisite 7 zeroes at the end or whatever bitcoin is up to now - who cares that one thread gets more time than another? A 'fair' system is no more likely to successfully mine a block).
Thus, 'fairness' is something you need to explicitly determine you actually need. If you do, AND starvation is an issue, use a fair lock. new ReentrantLock(true) is all you need (that boolean parameter is the fair parameter - true means you want fairness).

Why does the iteration speed increase over time? [JAVA]

I was playing around with loops in java, when I saw that the iteration speed keeps increasing.
Kind of seemed interesting.
Any ideas why?
Code:
import org.junit.jupiter.api.Test;
public class RandomStuffTest {
public static long iterationsPerSecond = 0;
#Test
void testIterationSpeed() {
Thread t = new Thread(()->{
try{
while (true){
System.out.println("Iterations per second: "+iterationsPerSecond);
iterationsPerSecond = 0;
Thread.sleep(1000);
}
} catch (Exception e) {
e.printStackTrace();
}
});
t.setDaemon(true);
t.start();
while (true){
for (long i = 0; i < Long.MAX_VALUE; i++) {
iterationsPerSecond++;
}
}
}
}
Output:
Iterations per second: 6111
Iterations per second: 2199824206
Iterations per second: 4539572003
Iterations per second: 6919540856
Iterations per second: 9442209284
Iterations per second: 11899448226
Iterations per second: 14313220638
Iterations per second: 16827637088
Iterations per second: 19322118707
Iterations per second: 21807781722
Iterations per second: 24256315314
Iterations per second: 26641505580
Another thing that I noticed:
The CPU usage was around 20% all the time and not really increasing...
Maybe because I was running the code as a test using Junit?
The problem is the Java Memory Model (JMM).
Every thread is allowed to have (does not have to do this) a local copy of each field. Whenever it writes or reads this field it is free to just set its local copy and sync it up with other threads' local copies much, much later.
Said differently, the JVM is free to re-order instructions, do things in parallel, and otherwise apply whatever weird stuff it wants to optimize your code, as long as certain guarantees are never broken.
One guarantee that is easy to understand: The JVM is free to reorder or parallelize 2 sequential instructions, but it must never be possible to write code that can observe this except through timing.
In other words, int x = 0; x = 5; System.out.println(x); must necessarily print 5 and never 0.
You can establish such relationships between 2 threads as well but this involves the use of volatile and/or synchronized and/or something that does this internally (most things in the java.util.concurrent package).
You didn't, so this result is meaningless. Most likely, the instruction iterationsPerSecond = 0 is having no effect; the code iterationsPerSecond++ reads 9442209284, increments by one, and writes it back - and that field got written to 0 someplace in the middle of all that, which thus accomplished nothing whatsoever.
If you want to test this properly, try a volatile variable, or better yet an AtomicLong.
Like already indicated, the code is broken due to a data race.
The JIT can do some funny stuff with your code because of the data race:
while (true){
for (long i = 0; i < Long.MAX_VALUE; i++) {
iterationsPerSecond++;
}
}
Since it doesn't know that another thread is also messing with the iterationsPerSecond, the compiler could fold the for loop because it can calculate the outcome of the loop:
while (true){
iterationsPerSecond=Long.MAX_VALUE
}
And it could even decide to pull out the write of the loop since the same value is written (loop invariant code motion):
iterationsPerSecond=Long.MAX_VALUE
while (true){
}
It could even decide the throw away the store, because it doesn't know there are any readers. So effectively it is a dead store and hence it can apply dead code elimination.
while (true){
}
An atomic or volatile would solve the problem because a happens before edge is established. Using a volatile or an atomiclong.get/set is equally expensive. It has the same compiler restrictions and fences on hardware level.
If you want to run microbenchmarks, I would suggest checking out JMH. It will protect you against a lot of trivial mistakes.

Thread concurrency issue even within one single command?

I am slightly surprised by what I get if I compile and run the following (horrible non-synchronized) Java SE program.
public class ThreadRace {
// this is the main class.
public static void main(String[] args) {
TestRunnable tr=new TestRunnable(); // tr is a Runnable.
Thread one=new Thread(tr,"thread_one");
Thread two=new Thread(tr,"thread_two");
one.start();
two.start(); // starting two threads both with associated object tr.
}
}
class TestRunnable implements Runnable {
int counter=0; // Both threads can see this counter.
public void run() {
for(int x=0;x<1000;x++) {
counter++;
}
// We can't get here until we've added one to counter 1000 times.
// Can we??
System.out.println("This is thread "+
Thread.currentThread().getName()+" and the counter is "+counter);
}
}
If I run "java ThreadRace" at the command line, then here is my interpretation
of what happens. Two new threads are created and started. The threads have
the same Runnable object instance tr, and so they see the same tr.counter .
Both new threads add one to this counter 1000 times, and then print the value
of the counter.
If I run this lots and lots of times, then usually I get output of the form
This is thread thread_one and the counter is 1000
This is thread thread_two and the counter is 2000
and occasionally I get output of the form
This is thread thread_one and the counter is 1204
This is thread thread_two and the counter is 2000
Note that what happened in this latter case was that thread_one finished
adding one to the counter 1000 times, but thread_two had started adding
one already, before thread_one printed out the value of the counter.
In particular, this output is still comprehensible to me.
However, very occasionally I get something like
This is thread thread_one and the counter is 1723
This is thread thread_two and the counter is 1723
As far as I can see, this "cannot happen". The only way the System.out.println() line
can be reached in either thread, is if the thread has finished counting to 1000.
So I am not bothered if one of the threads reports the counter as being some
random number between 1000 and 2000, but I cannot see how both threads can
get as far as their System.out.println() line (implying both for loops have finished,
surely?) and counter not be 2000 by the time the second statement is printed.
Is what is happening that both threads somehow attempt to do counter++ at exactly
the same time, and one overwrites the other? That is, a thread can even be
interrupted even in the middle of executing a single statement?
The "++" operator is not atomic -- it doesn't happen in one uninterruptible cycle. Think of it like this:
1. Fetch the old value
2. Add one to it
3. Store the new value back
So imagine that you get this sequence:
Thread A: Step 1
Thread B: Step 1
Thread A: Step 2
Thread B: Step 2
Thread A: Step 3
Thread B: Step 3
Both threads think they've incremented the variable, but its value has only increased by one! The second "store back" operation effectively cancels out the result of the first.
Now, truth is, when you add in multiple levels of cache, far weirder things can actually happen; but this is an easy explanation to understand. You can fix these kinds of issues by synchronizing access to the variable: either the whole run() method, or the inside of the loop using a synchronized block. As Jon suggests, you could also use some of the fancier tools in java.util.concurrent.atomic.
It absolutely can happen. counter++ isn't an atomic operation. Consider it as:
int tmp = counter;
tmp++;
counter = tmp;
Now imagine two threads executing that code at the same about time:
Both read the counter
Both increment their local copy (0 to 1)
Both write 1 into counter
This sort of thing is precisely why java.util.concurrent.atomic exists. Change your code to:
class TestRunnable implements Runnable {
private final AtomicInteger counter = new AtomicInteger();
public void run() {
for(int x=0;x<1000;x++) {
counter.incrementAndGet();
}
System.out.println("This is thread "+
Thread.currentThread().getName()+" and the counter is " + counter.get());
}
}
That code is safe.

Why is my threaded sort algorithm slow compared to the non-threaded version?

I just have implemented a threaded version of the merge sort. ThreadedMerge.java: http://pastebin.com/5ZEvU6BV
Since merge sort is a divide and conquer algorithm I create a thread for every half of the array. But the number of avialable threads in Java-VM is limited so I check that before creating threads:
if(num <= nrOfProcessors){
num += 2;
//create more threads
}else{
//continue without threading
}
However the threaded sorting takes about ~ 6000 ms while the non-threaded version is much faster with just ~ 2500 ms.
Non-Threaded: http://pastebin.com/7FdhZ4Fw
Why is the threaded version slower and how do I solve that problem?
Update: I use atomic integer now for thread counting and declared a static field for Runtime.getRuntime().availableProcessors(). The sorting takes about ~ 1400 ms now.
However creating just one thread in the mergeSort method and let the current thread do the rest has no sigificant performance increase. Why?
Besides when after I call join on a thread and after that decrement the number of used threads with
num.set(num.intValue() - 1);
the sorting takes about ~ 200 ms longer. Here is the update of my algorithm http://pastebin.com/NTZq5zQp Why does this line of code make it even worse?
first off your accesses to num is not threadsafe (check http://download.oracle.com/javase/6/docs/api/java/util/concurrent/atomic/AtomicInteger.html )
you create an equal amount of processes to cores but you block half of them with the join call
num += 1;
ThreadedMerge tm1 = new ThreadedMerge(array, startIndex, startIndex + halfLength);
tm1.start();
sortedRightPart = mergeSort(array, startIndex + halfLength, endIndex);
try{
tm1.join();
num-=1
sortedLeftPart = tm1.list;
}catch(InterruptedException e){
}
this doesn't block the calling thread but uses it to sort the right part and let the created thread do the other part when that one returns the space it takes up can be used by another thread
Hhm, you should not create a thread for every single step (they are expensive and there are lightweight alternatives.)
Ideally, you should only create 4 threads if there are 4 CPU´s.
So let´s say you have 4 CPU´s, then you create one thread at the first level (now you have 2) and at the second level you also create a new thread. This gives you 4.
The reason why you only create one and not two is that you can use the thread you are currently running like:
Thread t = new Thread(...);
t.start();
// Do half of the job here
t.join(); // Wait for the other half to complete.
If you have, let´s say, 5 CPU´s (not in the power of two) then just create 8 threads.
One simple way to do this in practice, is to create the un-threaded version you already made when you reach the appropriate level. In this way you avoid to clutter the merge method when if-sentences etc.
The call to Runtime.availableProcessors() appears to be taking up a fair amount of extra time. You only need to call it once, so just move it outside of the method and define it as a static, e.g.:
static int nrOfProcessors = Runtime.getRuntime().availableProcessors();

What really is to “warm up” threads on multithreading processing?

I’m dealing with multithreading in Java and, as someone pointed out to me, I noticed that threads warm up, it is, they get faster as they are repeatedly executed. I would like to understand why this happens and if it is related to Java itself or whether it is a common behavior of every multithreaded program.
The code (by Peter Lawrey) that exemplifies it is the following:
for (int i = 0; i < 20; i++) {
ExecutorService es = Executors.newFixedThreadPool(1);
final double[] d = new double[4 * 1024];
Arrays.fill(d, 1);
final double[] d2 = new double[4 * 1024];
es.submit(new Runnable() {
#Override
public void run() {
// nothing.
}
}).get();
long start = System.nanoTime();
es.submit(new Runnable() {
#Override
public void run() {
synchronized (d) {
System.arraycopy(d, 0, d2, 0, d.length);
}
}
});
es.shutdown();
es.awaitTermination(10, TimeUnit.SECONDS);
// get a the values in d2.
for (double x : d2) ;
long time = System.nanoTime() - start;
System.out.printf("Time to pass %,d doubles to another thread and back was %,d ns.%n", d.length, time);
}
Results:
Time to pass 4,096 doubles to another thread and back was 1,098,045 ns.
Time to pass 4,096 doubles to another thread and back was 171,949 ns.
... deleted ...
Time to pass 4,096 doubles to another thread and back was 50,566 ns.
Time to pass 4,096 doubles to another thread and back was 49,937 ns.
I.e. it gets faster and stabilises around 50 ns. Why is that?
If I run this code (20 repetitions), then execute something else (lets say postprocessing of the previous results and preparation for another mulithreading round) and later execute the same Runnable on the same ThreadPool for another 20 repetitions, it will be warmed up already, in any case?
On my program, I execute the Runnable in just one thread (actually one per processing core I have, its a CPU-intensive program), then some other serial processing alternately for many times. It doesn’t seem to get faster as the program goes. Maybe I could find a way to warm it up…
It isn't the threads that are warming up so much as the JVM.
The JVM has what's called JIT (Just In Time) compiling. As the program is running, it analyzes what's happening in the program and optimizes it on the fly. It does this by taking the byte code that the JVM runs and converting it to native code that runs faster. It can do this in a way that is optimal for your current situation, as it does this by analyzing the actual runtime behavior. This can (not always) result in great optimization. Even more so than some programs that are compiled to native code without such knowledge.
You can read a bit more at http://en.wikipedia.org/wiki/Just-in-time_compilation
You could get a similar effect on any program as code is loaded into the CPU caches, but I believe this will be a smaller difference.
The only reasons I see that a thread execution can end up being faster are:
The memory manager can reuse already allocated object space (e.g., to let heap allocations fill up the available memory until the max memory is reached - the Xmx property)
The working set is available in the hardware cache
Repeating operations might create operations the compiler can easier reorder to optimize execution

Categories

Resources