Consider the following two blocks:
// block one
long start = System.currentTimeMillis();
while (System.currentTimeMillis() - start < TIMEOUT) {
if( SOME_CONDITION_IS_MET ) {
// do something
break;
} else {
Thread.sleep( 100 );
}
}
// block two
long start = System.currentTimeMillis();
while (System.currentTimeMillis() - start < TIMEOUT) {
if( SOME_CONDITION_IS_MET ) {
// do something
break;
}
}
The difference between the two is that the first one has a Thread.sleep(), which seemingly can reduce condition checking in while and if. However, is there any meaningful benefit by having this sleep, assuming the if condition doesn't have a heavy computation? Which one would you recommend for implementing timeout?
One key difference is that the second method involves busy waiting. If SOME_CONDITION_IS_MET doesn't involve any I/O, the second approach will likely consume an entire CPU core. This is a wasteful thing to do (but could be perfectly reasonable in some -- pretty rare -- circumstances). On the flip side, the second approach has lower latency.
I agree with Boris that, in a general setting, both approaches are basically hacks. A better way would be to use proper synchronization primitives to signal the condition.
Related
I was playing around with loops in java, when I saw that the iteration speed keeps increasing.
Kind of seemed interesting.
Any ideas why?
Code:
import org.junit.jupiter.api.Test;
public class RandomStuffTest {
public static long iterationsPerSecond = 0;
#Test
void testIterationSpeed() {
Thread t = new Thread(()->{
try{
while (true){
System.out.println("Iterations per second: "+iterationsPerSecond);
iterationsPerSecond = 0;
Thread.sleep(1000);
}
} catch (Exception e) {
e.printStackTrace();
}
});
t.setDaemon(true);
t.start();
while (true){
for (long i = 0; i < Long.MAX_VALUE; i++) {
iterationsPerSecond++;
}
}
}
}
Output:
Iterations per second: 6111
Iterations per second: 2199824206
Iterations per second: 4539572003
Iterations per second: 6919540856
Iterations per second: 9442209284
Iterations per second: 11899448226
Iterations per second: 14313220638
Iterations per second: 16827637088
Iterations per second: 19322118707
Iterations per second: 21807781722
Iterations per second: 24256315314
Iterations per second: 26641505580
Another thing that I noticed:
The CPU usage was around 20% all the time and not really increasing...
Maybe because I was running the code as a test using Junit?
The problem is the Java Memory Model (JMM).
Every thread is allowed to have (does not have to do this) a local copy of each field. Whenever it writes or reads this field it is free to just set its local copy and sync it up with other threads' local copies much, much later.
Said differently, the JVM is free to re-order instructions, do things in parallel, and otherwise apply whatever weird stuff it wants to optimize your code, as long as certain guarantees are never broken.
One guarantee that is easy to understand: The JVM is free to reorder or parallelize 2 sequential instructions, but it must never be possible to write code that can observe this except through timing.
In other words, int x = 0; x = 5; System.out.println(x); must necessarily print 5 and never 0.
You can establish such relationships between 2 threads as well but this involves the use of volatile and/or synchronized and/or something that does this internally (most things in the java.util.concurrent package).
You didn't, so this result is meaningless. Most likely, the instruction iterationsPerSecond = 0 is having no effect; the code iterationsPerSecond++ reads 9442209284, increments by one, and writes it back - and that field got written to 0 someplace in the middle of all that, which thus accomplished nothing whatsoever.
If you want to test this properly, try a volatile variable, or better yet an AtomicLong.
Like already indicated, the code is broken due to a data race.
The JIT can do some funny stuff with your code because of the data race:
while (true){
for (long i = 0; i < Long.MAX_VALUE; i++) {
iterationsPerSecond++;
}
}
Since it doesn't know that another thread is also messing with the iterationsPerSecond, the compiler could fold the for loop because it can calculate the outcome of the loop:
while (true){
iterationsPerSecond=Long.MAX_VALUE
}
And it could even decide to pull out the write of the loop since the same value is written (loop invariant code motion):
iterationsPerSecond=Long.MAX_VALUE
while (true){
}
It could even decide the throw away the store, because it doesn't know there are any readers. So effectively it is a dead store and hence it can apply dead code elimination.
while (true){
}
An atomic or volatile would solve the problem because a happens before edge is established. Using a volatile or an atomiclong.get/set is equally expensive. It has the same compiler restrictions and fences on hardware level.
If you want to run microbenchmarks, I would suggest checking out JMH. It will protect you against a lot of trivial mistakes.
from time to time have to implement the classic concurrent producer-consumer solution across the project I'm involved in, pretty much the problem is reduced in having some collection which gets populated from multiple threads and which is being consumed by several consumers.
In a nutshell the collection say is bounded to 10k entities,
once buffer size is hit a worker task is submitted consuming these 10k entities, there is a limit of such workers say its set to 10, which in worst case scenario means I can have up to 10 workers each consuming 10k entities.
I do have to play with some locking here and there some checks around buffer overflows (case when producers generate too much data while all workers are busy processing their chunks) thus have to discard new events to avoid OOM (not the best solution but stability is p1 ;))
Was looking these days around reactor and a way to use it instead of going low level and do all the things described above, so the dumb question is: "can reactor be used for this use case?"
for now forget about overflow/discarding.. how can i achieve the N consumers for a broadcaster?
was looking particularly around broadcaster with the buffer + a thread pooled dispatcher:
void test() {
final Broadcaster<String> sink = Broadcaster.create(Environment.initialize());
Dispatcher dispatcher = Environment.newDispatcher(2048, 20, DispatcherType.WORK_QUEUE);
sink
.buffer(100)
.consumeOn(dispatcher, this::log);
for (int i=0; i<100000; i++) {
sink.onNext("elementent " + i);
if (i%1000 == 0) {
System.out.println("addded elements " + i);
}
}
}
void log(List<String> values) {
System.out.print("simulating slow processing....");
System.out.println("processing: " + Arrays.toString(values.toArray()));
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
my intention here is have a broadcaster execute the log(..) in asynch manner when buffer size was reached, however it looks like it is always executing log(...) it in blocking mode. executing 100 once done next 100 and so on.. how can i make it asynch ?
thanks
vyvalyty
A possible pattern is to use flatMap with publishOn:
Flux.range(1, 1_000_000)
.buffer(100)
.flatMap(b -> Flux.just(b).publishOn(SchedulerGroup.io())
.doOnNext(this::log))
.consume(...);
I am wondering how to reach a compromise between fast-cancel-responsiveness and performance with my threads which body look similar to this loop:
for(int i=0; i<HUGE_NUMBER; ++i) {
//some easy computation like adding numbers
//which are result of previous iteration of this loop
}
If a computation in loop body is quite easy then adding simple check-reaction to each iteration:
if (Thread.currentThread().isInterrupted()) {
throw new InterruptedException("Cancelled");
}
may slow down execution of the code.
Even if I change the above condition to:
if (i % 100 && Thread.currentThread().isInterrupted()) {
throw new InterruptedException("Cancelled");
}
Then compilator cannot just precompute values of i and check condition only in some specific situations since HUGE_NUMBER is variable and can have different values.
So I'd like to ask if there's any smart way of adding such check to a presented code knowing that:
HUGE_NUMBER is variable and can have different values
loop body consists of some easy-to-compute, but relying on prevoius computations code.
What I want to say is that one iteration of a loop is quite fast, but HUGE_NUMBER of iterations can take a little more time and this is what I want to avoid.
First of all, use Thread.interrupted() instead of Thread.currentThread().isInterrupted() in that case.
You should think about if checking the interruption flag really slows down your calculation too much! One the one hand, if the loop body is VERY simple, even a huge number of iterations (the upper limit is Integer.MAX_VALUE) will run in a few seconds. Even when checking the interruption flag will result in an overhead of 20 or 30%, this will not add very much to the total runtime of your algorithm.
On the other hand, if the loop body is not that simple and so it will run longer, testing the interruption flag will not be a remarkable overhead I think.
Don't do tricks like if (i % 10000 == 0), as this will slow down calculation much more than a 'short' Thread.interrupted().
There is one small trick that you could use - but think twice because it makes your code more complex and less readable:
Whenever you have a loop like that:
for (int i = 0; i < max; i++) {
// loop-body using i
}
you can split up the total range of i into several intervals of size INTERVAL_SIZE:
int start = 0;
while (start < max) {
final int next = Math.min(start + INTERVAL_SIZE, max);
for(int i = start; i < next; i++) {
// loop-body using i
}
start = next;
}
Now you can add your interruption check right before or after the inner loop!
I've done some tests on my system (JDK 7) using the following loop-body
if (i % 2 == 0) x++;
and Integer.MAX_VALUE / 2 iterations. The results are as follows (after warm-up):
Simple loop without any interruption checks: 1,949 ms
Simple loop with check per iteration: 2,219 ms (+14%)
Simple loop with check per 1 million-th iteration using modulo: 3,166 ms (+62%)
Simple loop with check per 1 million-th iteration using bit-mask: 2,653 ms (+36%)
Interval-loop as described above with check in outer loop: 1,972 ms (+1.1%)
So even if the loop-body is as simple as above, the overhead for a per-iteration check is only 14%! So it's recommended to not do any tricks but simply check the interruption flag via Thread.interrupted() in every iteration!
Make your calculation an Iterator.
Although this does not sound terribly useful the benefit here is that you can then quite easily write filter iterators that can be surprisingly flexible. They can be added and removed simply - even through configuration if you wish. There are a number of benefits - try it.
You can then add a filtering Iterator that watches the time and checks for interrupt on a regular basis - or something even more flexible.
You can even add further filtering without compromising the original calculation by interspersing it with brittle status checks.
I've been reading about non-blocking approaches for some time. Here is a piece of code for so called lock-free counter.
public class CasCounter {
private SimulatedCAS value;
public int getValue() {
return value.get();
}
public int increment() {
int v;
do {
v = value.get();
}
while (v != value.compareAndSwap(v, v + 1));
return v + 1;
}
}
I was just wondering about this loop:
do {
v = value.get();
}
while (v != value.compareAndSwap(v, v + 1));
People say:
So it tries again, and again, until all other threads trying to change the value have done so. This is lock free as no lock is used, but not blocking free as it may have to try again (which is rare) more than once (very rare).
My question is:
How can they be so sure about that? As for me I can't see any reason why this loop can't be infinite, unless JVM has some special mechanisms to solve this.
The loop can be infinite (since it can generate starvation for your thread), but the likelihood for that happening is very small. In order for you to get starvation you need some other thread succeeding in changing the value that you want to update between your read and your store and for that to happen repeatedly.
It would be possible to write code to trigger starvation but for real programs it would be unlikely to happen.
The compare and swap is usually used when you don't think you will have write conflicts very often. Say there is a 50% chance of "miss" when you update, then there is a 25% chance that you will miss in two loops and less than 0.1% chance that no update would succeed in 10 loops. For real world examples, a 50% miss rate is very high (basically not doing anything than updating), and as the miss rate is reduces, to say 1% then the risk of not succeeding in two tries is only 0.01% and in 3 tries 0.0001%.
The usage is similar to the following problem
Set a variable a to 0 and have two threads updating it with a = a+1 a million times each concurrently. At the end a could have any answer between 1000000 (every other update was lost due to overwrite) and 2000000 (no update was overwritten).
The closer to 2000000 you get the more likely the CAS usage is to work since that mean that quite often the CAS would see the expected value and be able to set with the new value.
Edit: I think I have a satisfactory answer now. The bit that confused me was the 'v != compareAndSwap'. In the actual code, CAS returns true if the value is equal to the compared expression. Thus, even if the first thread is interrupted between get and CAS, the second thread will succeed the swap and exit the method, so the first thread will be able to do the CAS.
Of course, it is possible that if two threads call this method an infinite number of times, one of them will not get the chance to run the CAS at all, especially if it has a lower priority, but this is one of the risks of unfair locking (the probability is very low however). As I've said, a queue mechanism would be able to solve this problem.
Sorry for the initial wrong assumptions.
Traceview shows that updatePhysics() is being called every 10ms or so and it takes about 8ms to run. The methods that I call inside updatePhysics are only running once every 5 or 6 times updatePhysics() runs, however. Is this simply a bug of Traceview, or what is going on? My game is stuttering a fair amount, so I am trying to figure out what is causing it.
Traceview is generally showing that a lot of my methods go several hundred milliseconds without being called once, even though there appears to be no reason they shouldnt be called. Ideas?
Run Method:
while (mRun)
{
Thread.currentThread().setPriority(Thread.MAX_PRIORITY);
Canvas c = null;
try
{
c = mSurfaceHolder.lockCanvas(null);//null
{
time2 = System.nanoTime()/100000; // Get current time
float delta = (time2 - time1)/1000f;
if (mMode == STATE_RUNNING) updatePhysics(delta);
else updateMenus();
doDraw(c);
time1 = time2;
}
} finally
{
// do this in a finally so that if an exception is thrown
// during the above, we don't leave the Surface in an
// inconsistent state
if (c != null)
{
mSurfaceHolder.unlockCanvasAndPost(c);
}
}
}
Update Physics:
private void updatePhysics(float delta)
{
updateScore(delta);
updateDifficulty();
}
EDIT: As you can see here, updateDifficulty is often not called for several hundred ms EVEN THOUGH updatePhysics is being called regularly... It makes no sense. Screenshot of traceview
Most likely somewhere in your thread you're calling thread.Sleep(0) or thread.Yield() or something like that. This will cause a thread to yield to other threads that are being scheduled. And it will often take 10ms or more before the thread gets scheduled back in. I think traceview doesn't understand this fully and counts the time the thread is in suspended state as being active.
Most games use a constant game loop, a real while(true) that never yields to anything.
Some other comments:
I would try the code without the try-catch block, this will slow things down considerabely. Als remove the threadpriority line, this is an OS call and could be slow, and would not add any speed in case of a bug. (It should be fine on normal priority)
Also are you sure this is correct:
time2 = System.nanoTime()/100000; // Get current time
float delta = (time2 - time1)/1000f;
I don't see why you require to devide the delta and the current time. Either convert the time from nanotime to seconds (or whatever you require), and then have a delta in seconds. Or keep time in nanoseconds and convert the delta to seconds. Now you first convert to seconds and then to 1/1000th of a second.