I need a few easily implementable single cpu and memory intensive calculations that I can write in java for a test thread scheduler.
They should be slightly time consuming, but more importantly resource consuming.
Any ideas?
A few easy examples of CPU-intensive tasks:
searching for prime numbers (involves lots of BigInteger divisions)
calculating large factorials e.g. 2000! ((involves lots of BigInteger multiplications)
many Math.tan() calculations (this is interesting because Math.tan is native, so you're using two call stacks: one for Java calls, the other for C calls.)
Multiply two matrices. The matrices should be huge and stored on the disk.
String search. Or, index a huge document (detect and count the occurrence of each word or strings of alphabets) For example, you can index all of the identifiers in the source code of a large software project.
Calculate pi.
Rotate a 2D matrix, or an image.
Compress some huge files.
...
The CPU soak test for the PDP-11 was tan(atan(tan(atan(...))) etc. Works the FPU pretty hard and also the stack and registers.
Ok this is not Java, but this is based on Dhrystone benchmark algorithm found here. These implementations of the algorithm might give you an idea on how is it done. The link here contains sources to C/C++ and Assembler to obtain the benchmarks.
Calculate nth term of the fibonacci series, where n is greater than 70. (time consuming)
Calculate factorials of large numbers. (time consuming)
Find all possible
paths between two nodes, in a graph. (memory consuming)
Official RSA Challenge
Unofficial RSA Challenge - Grab some ciphertext that you want to read in plaintext. Let the computer at it. If u use a randomized algorithm, there is a small but non-zero chance that u will succeed.
I was messing around with Thread priority in Java and used the code below. It seems to keep the CPU busy enough that the thread priority makes a difference.
#Test
public void testCreateMultipleThreadsWithDifferentPriorities() throws Exception {
class MyRunnable implements Runnable {
#Override
public void run() {
for (int i=0; i<1_000_000; i++) {
double d = tan(atan(tan(atan(tan(atan(tan(atan(tan(atan(123456789.123456789))))))))));
cbrt(d);
}
LOGGER.debug("I am {}, and I have finished", Thread.currentThread().getName());
}
}
final int NUMBER_OF_THREADS = 32;
List<Thread> threadList = new ArrayList<Thread>(NUMBER_OF_THREADS);
for (int i=1; i<=NUMBER_OF_THREADS; i++) {
Thread t = new Thread(new MyRunnable());
if (i == NUMBER_OF_THREADS) {
// Last thread gets MAX_PRIORITY
t.setPriority(Thread.MAX_PRIORITY);
t.setName("T-" + i + "-MAX_PRIORITY");
} else {
// All other threads get MIN_PRIORITY
t.setPriority(Thread.MIN_PRIORITY);
t.setName("T-" + i);
}
threadList.add(t);
}
threadList.forEach(t->t.start());
for (Thread t : threadList) {
t.join();
}
}
Related
I was playing around with loops in java, when I saw that the iteration speed keeps increasing.
Kind of seemed interesting.
Any ideas why?
Code:
import org.junit.jupiter.api.Test;
public class RandomStuffTest {
public static long iterationsPerSecond = 0;
#Test
void testIterationSpeed() {
Thread t = new Thread(()->{
try{
while (true){
System.out.println("Iterations per second: "+iterationsPerSecond);
iterationsPerSecond = 0;
Thread.sleep(1000);
}
} catch (Exception e) {
e.printStackTrace();
}
});
t.setDaemon(true);
t.start();
while (true){
for (long i = 0; i < Long.MAX_VALUE; i++) {
iterationsPerSecond++;
}
}
}
}
Output:
Iterations per second: 6111
Iterations per second: 2199824206
Iterations per second: 4539572003
Iterations per second: 6919540856
Iterations per second: 9442209284
Iterations per second: 11899448226
Iterations per second: 14313220638
Iterations per second: 16827637088
Iterations per second: 19322118707
Iterations per second: 21807781722
Iterations per second: 24256315314
Iterations per second: 26641505580
Another thing that I noticed:
The CPU usage was around 20% all the time and not really increasing...
Maybe because I was running the code as a test using Junit?
The problem is the Java Memory Model (JMM).
Every thread is allowed to have (does not have to do this) a local copy of each field. Whenever it writes or reads this field it is free to just set its local copy and sync it up with other threads' local copies much, much later.
Said differently, the JVM is free to re-order instructions, do things in parallel, and otherwise apply whatever weird stuff it wants to optimize your code, as long as certain guarantees are never broken.
One guarantee that is easy to understand: The JVM is free to reorder or parallelize 2 sequential instructions, but it must never be possible to write code that can observe this except through timing.
In other words, int x = 0; x = 5; System.out.println(x); must necessarily print 5 and never 0.
You can establish such relationships between 2 threads as well but this involves the use of volatile and/or synchronized and/or something that does this internally (most things in the java.util.concurrent package).
You didn't, so this result is meaningless. Most likely, the instruction iterationsPerSecond = 0 is having no effect; the code iterationsPerSecond++ reads 9442209284, increments by one, and writes it back - and that field got written to 0 someplace in the middle of all that, which thus accomplished nothing whatsoever.
If you want to test this properly, try a volatile variable, or better yet an AtomicLong.
Like already indicated, the code is broken due to a data race.
The JIT can do some funny stuff with your code because of the data race:
while (true){
for (long i = 0; i < Long.MAX_VALUE; i++) {
iterationsPerSecond++;
}
}
Since it doesn't know that another thread is also messing with the iterationsPerSecond, the compiler could fold the for loop because it can calculate the outcome of the loop:
while (true){
iterationsPerSecond=Long.MAX_VALUE
}
And it could even decide to pull out the write of the loop since the same value is written (loop invariant code motion):
iterationsPerSecond=Long.MAX_VALUE
while (true){
}
It could even decide the throw away the store, because it doesn't know there are any readers. So effectively it is a dead store and hence it can apply dead code elimination.
while (true){
}
An atomic or volatile would solve the problem because a happens before edge is established. Using a volatile or an atomiclong.get/set is equally expensive. It has the same compiler restrictions and fences on hardware level.
If you want to run microbenchmarks, I would suggest checking out JMH. It will protect you against a lot of trivial mistakes.
I was playing around with infinite streams and made this program for benchmarking. Basically the bigger the number you provide, the faster it will finish. However, I was amazed to find that using a parellel stream resulted in exponentially worse performance compared to a sequential stream. Intuitively, one would expect an infinite stream of random numbers to be generated and evaluated much faster in a multi-threaded environment, but this appears not to be the case. Why is this?
final int target = Integer.parseInt(args[0]);
if (target <= 0) {
System.err.println("Target must be between 1 and 2147483647");
return;
}
final long startTime, endTime;
startTime = System.currentTimeMillis();
System.out.println(
IntStream.generate(() -> new Double(Math.random()*2147483647).intValue())
//.parallel()
.filter(i -> i <= target)
.findFirst()
.getAsInt()
);
endTime = System.currentTimeMillis();
System.out.println("Execution time: "+(endTime-startTime)+" ms");
I totally agree with the other comments and answers but indeed your test behaves strange in case that the target is very low. On my modest laptop the parallel version is on average about 60x slower when very low targets are given. This extreme difference cannot be explained by the overhead of the parallelization in the stream APIs so I was also amazed :-). IMO the culprit lies here:
Math.random()
Internally this call relies on a global instance of java.util.Random. In the documentation of Random it is written:
Instances of java.util.Random are threadsafe. However, the concurrent
use of the same java.util.Random instance across threads may encounter
contention and consequent poor performance. Consider instead using
ThreadLocalRandom in multithreaded designs.
So I think that the really poor performance of the parallel execution compared to the sequential one is explained by the thread contention in random rather than any other overheads. If you use ThreadLocalRandom instead (as recommended in the documentation) then the performance difference will not be so dramatic. Another option would be to implement a more advanced number supplier.
The cost of passing work to multiple thread is expensive es the first time you do it. This cost is fairly fixed so even if your task is trivial, the overhead is relatively high.
One of the problems you have is that highly inefficient code is a very poor way to determine how well a solution performs. Also, how it runs the first time and how it runs after a few seconds can often be 100x different (can be much more) I suggest using an example which is already optimal and only then attempt to use multiple threads.
e.g.
long start = System.nanoTime();
int value = (int) (Math.random() * (target+1L));
long time = System.nanoTime() - value;
// don't time IO as it is sooo much slower
System.out.println(value);
Note: this will not be efficient until the code has warmuped and been compiled. i.e. ignore the first 2-5 seconds this code is run.
Following suggestions from various answers, I think I've fixed it. I'm not sure what the exact bottleneck was but on an i5-4590T the parallel version with the following code performs faster than the sequential variant. For brevity, I've included only the relevant parts of the (refactored) code:
static IntStream getComputation() {
return IntStream
.generate(() -> ThreadLocalRandom.current().nextInt(2147483647));
}
static void computeSequential(int target) {
for (int loop = 0; loop < target; loop++) {
final int result = getComputation()
.filter(i -> i <= target)
.findAny()
.getAsInt();
System.out.println(result);
}
}
static void computeParallel(int target) {
IntStream.range(0, target)
.parallel()
.forEach(loop -> {
final int result = getComputation()
.parallel()
.filter(i -> i <= target)
.findAny()
.getAsInt();
System.out.println(result);
});
}
EDIT: I should also note that I put it all in a loop to get longer running times.
I've been reading about non-blocking approaches for some time. Here is a piece of code for so called lock-free counter.
public class CasCounter {
private SimulatedCAS value;
public int getValue() {
return value.get();
}
public int increment() {
int v;
do {
v = value.get();
}
while (v != value.compareAndSwap(v, v + 1));
return v + 1;
}
}
I was just wondering about this loop:
do {
v = value.get();
}
while (v != value.compareAndSwap(v, v + 1));
People say:
So it tries again, and again, until all other threads trying to change the value have done so. This is lock free as no lock is used, but not blocking free as it may have to try again (which is rare) more than once (very rare).
My question is:
How can they be so sure about that? As for me I can't see any reason why this loop can't be infinite, unless JVM has some special mechanisms to solve this.
The loop can be infinite (since it can generate starvation for your thread), but the likelihood for that happening is very small. In order for you to get starvation you need some other thread succeeding in changing the value that you want to update between your read and your store and for that to happen repeatedly.
It would be possible to write code to trigger starvation but for real programs it would be unlikely to happen.
The compare and swap is usually used when you don't think you will have write conflicts very often. Say there is a 50% chance of "miss" when you update, then there is a 25% chance that you will miss in two loops and less than 0.1% chance that no update would succeed in 10 loops. For real world examples, a 50% miss rate is very high (basically not doing anything than updating), and as the miss rate is reduces, to say 1% then the risk of not succeeding in two tries is only 0.01% and in 3 tries 0.0001%.
The usage is similar to the following problem
Set a variable a to 0 and have two threads updating it with a = a+1 a million times each concurrently. At the end a could have any answer between 1000000 (every other update was lost due to overwrite) and 2000000 (no update was overwritten).
The closer to 2000000 you get the more likely the CAS usage is to work since that mean that quite often the CAS would see the expected value and be able to set with the new value.
Edit: I think I have a satisfactory answer now. The bit that confused me was the 'v != compareAndSwap'. In the actual code, CAS returns true if the value is equal to the compared expression. Thus, even if the first thread is interrupted between get and CAS, the second thread will succeed the swap and exit the method, so the first thread will be able to do the CAS.
Of course, it is possible that if two threads call this method an infinite number of times, one of them will not get the chance to run the CAS at all, especially if it has a lower priority, but this is one of the risks of unfair locking (the probability is very low however). As I've said, a queue mechanism would be able to solve this problem.
Sorry for the initial wrong assumptions.
I’m dealing with multithreading in Java and, as someone pointed out to me, I noticed that threads warm up, it is, they get faster as they are repeatedly executed. I would like to understand why this happens and if it is related to Java itself or whether it is a common behavior of every multithreaded program.
The code (by Peter Lawrey) that exemplifies it is the following:
for (int i = 0; i < 20; i++) {
ExecutorService es = Executors.newFixedThreadPool(1);
final double[] d = new double[4 * 1024];
Arrays.fill(d, 1);
final double[] d2 = new double[4 * 1024];
es.submit(new Runnable() {
#Override
public void run() {
// nothing.
}
}).get();
long start = System.nanoTime();
es.submit(new Runnable() {
#Override
public void run() {
synchronized (d) {
System.arraycopy(d, 0, d2, 0, d.length);
}
}
});
es.shutdown();
es.awaitTermination(10, TimeUnit.SECONDS);
// get a the values in d2.
for (double x : d2) ;
long time = System.nanoTime() - start;
System.out.printf("Time to pass %,d doubles to another thread and back was %,d ns.%n", d.length, time);
}
Results:
Time to pass 4,096 doubles to another thread and back was 1,098,045 ns.
Time to pass 4,096 doubles to another thread and back was 171,949 ns.
... deleted ...
Time to pass 4,096 doubles to another thread and back was 50,566 ns.
Time to pass 4,096 doubles to another thread and back was 49,937 ns.
I.e. it gets faster and stabilises around 50 ns. Why is that?
If I run this code (20 repetitions), then execute something else (lets say postprocessing of the previous results and preparation for another mulithreading round) and later execute the same Runnable on the same ThreadPool for another 20 repetitions, it will be warmed up already, in any case?
On my program, I execute the Runnable in just one thread (actually one per processing core I have, its a CPU-intensive program), then some other serial processing alternately for many times. It doesn’t seem to get faster as the program goes. Maybe I could find a way to warm it up…
It isn't the threads that are warming up so much as the JVM.
The JVM has what's called JIT (Just In Time) compiling. As the program is running, it analyzes what's happening in the program and optimizes it on the fly. It does this by taking the byte code that the JVM runs and converting it to native code that runs faster. It can do this in a way that is optimal for your current situation, as it does this by analyzing the actual runtime behavior. This can (not always) result in great optimization. Even more so than some programs that are compiled to native code without such knowledge.
You can read a bit more at http://en.wikipedia.org/wiki/Just-in-time_compilation
You could get a similar effect on any program as code is loaded into the CPU caches, but I believe this will be a smaller difference.
The only reasons I see that a thread execution can end up being faster are:
The memory manager can reuse already allocated object space (e.g., to let heap allocations fill up the available memory until the max memory is reached - the Xmx property)
The working set is available in the hardware cache
Repeating operations might create operations the compiler can easier reorder to optimize execution
Suppose, that we need to sort 50 000 000 numbers. suppose, that the numbers is stored in a file. What is the most efficient algorithm for solving this problem? Parallel algorithm for sorting...
How to do it? Maybe useful link )
I can't use standard algorithm Therefore i ask you about methods and algorithms :)
Ok.. I read about parallel mergesort... But it's not clear for me.
solution, the first version
code is located here
50 million is not particularly large. I would just read them into memory. Sort them and write them out. It should take just a few seconds. How fast do you need it be? How compilcated do you need it to be?
On my old labtop it took 28 seconds. If I had more processors, it might be a little faster but much of the time is spent reading and writing the file (15 seconds) which wouldn't be any faster.
One of the critical factors is the size of your cache. The comparison itself is very cheap provided the data is in cache. As the L3 cache is shared, one thread is all you need to make full use of it.
public static void main(String...args) throws IOException {
generateFile();
long start = System.currentTimeMillis();
int[] nums = readFile("numbers.bin");
Arrays.sort(nums);
writeFile("numbers2.bin", nums);
long time = System.currentTimeMillis() - start;
System.out.println("Took "+time+" secs to sort "+nums.length+" numbers.");
}
private static void generateFile() throws IOException {
Random rand = new Random();
int[] ints = new int[50*1000*1000];
for(int i= 0;i<ints.length;i++)
ints[i] = rand.nextInt();
writeFile("numbers.bin", ints);
}
private static int[] readFile(String filename) throws IOException {
DataInputStream dis = new DataInputStream(new BufferedInputStream(new FileInputStream(filename), 64*1024));
int len = dis.readInt();
int[] ints = new int[len];
for(int i=0;i<len;i++)
ints[i] = dis.readInt();
return ints;
}
private static void writeFile(String name, int[] numbers) throws IOException {
DataOutputStream dos = new DataOutputStream(new BufferedOutputStream(new FileOutputStream(name), 64*1024));
dos.writeInt(numbers.length);
for (int number : numbers)
dos.writeInt(number);
dos.close();
}
From top of my head, merge sort seems to be the best option when it comes to parallelisation and distribution, as it uses divide-and-conquer approach. For more information, google for "parallel merge sort" and "distributed merge sort".
For single-machine, multiple cores example, see see Correctly multithreaded quicksort or mergesort algo in Java?. If you can use Java 7 fork/join then see: "Java 7: more concurrency" and "Parallelism with Fork/Join in Java 7".
For distributing it over many machines, see Hadoop, it has a distributed merge sort implementation: see MergeSort and MergeSorter. Also of interest: Hadoop Sorts a Petabyte in 16.25 Hours and a Terabyte in 62 Seconds
For sorting than many elements, your best shot is Merge Sort. It's usually the algorithms used by databases. Even though is not as fast as Quick Sort, it uses intermediate storage so you don't need huge amounts of memory to perform the sort.
Also, as pointed by sje397 and Scott in the comments, Merge Sort is highly parallelizable.
It depends a lot on the problem domain. For instance, if all the numbers are positive ints, the best way may be to make an array of 0-MAX_INT and then just count how many times each number occurs as you read the file, and then print out each int with a non-zero count however many times it occurred. That's an O(n) "sort". There's an official name for that sort, but I forget what it is.
By the way, I got asked this question in a Google interview. From the problem constraints I came up with this solution, and it seemed to be the answer they were looking for. (I turned down the job because I didn't want to move.)
do not be afraid of the large number. in fact, 50 000 000 numbers is not that big. so if the numbers were integers then each number is 4bytes in size so the overall memory needed to be allocated for this array is 50 000 000*4 /1024/1024 = 190.7 Mega Bytes which is relatively small. Having the math done, you can proceed to do QuickSort which runs in O(nLogn). note that the builtin sort method in .net arrays uses QuickSort, im not sure if this is the case in java too.
sorting 250 000 000 integers on my machine took about 2 minutes so go for it :)
They are not so many. If they are 10 bytes long extended for example it would be an array of 500Mbytes, it almost can stay on my phone! ;)
So I'd say go for Quicksort if it's only that.
50e6 numbers is very small nowadays, don't make things more complex than they need to be...
bash$ sort < file > sorted.file