Threadprogramming, Heap OutOfMemory

Threadprogramming, Heap OutOfMemory - java

i've been programming for the first time with Threads in Java, so here's a pretty much beginners question in terms of Threads.
My Code:
import java.util.ArrayList;
import java.util.List;
public class CollatzRunner implements Runnable {
private int lastNumber = 0;
private int highestCounter = 0;
private int highestValue = 0;
public void run() {
while(this.lastNumber < 1000000) {
this.lastNumber++;
Collatz c = new Collatz(lastNumber);
List<Integer> values = new ArrayList<Integer>();
while(c.hasNext()) {
values.add(c.next());
}
if(this.highestCounter < values.size()) {
this.highestCounter = values.size();
this.highestValue = values.get(0);
}
//System.out.println(Thread.currentThread().getName() + ": " + this.lastNumber);
System.out.println(Thread.currentThread().getName());
}
}
}
and:
public class CollatzSimulator {
public static void main(String[] args) {
CollatzRunner runner = new CollatzRunner();
Thread t1 = new Thread(runner, "Thread-1");
Thread t2 = new Thread(runner, "Thread-2");
Thread t3 = new Thread(runner, "Thread-3");
Thread t4 = new Thread(runner, "Thread-4");
//Thread t5 = new Thread(runner, "Thread-5");
//Thread t6 = new Thread(runner, "Thread-6");
//Thread t7 = new Thread(runner, "Thread-7");
//Thread t8 = new Thread(runner, "Thread-8");
t1.start();
t2.start();
t3.start();
t4.start();
//t5.start();
//t6.start();
//t7.start();
//t8.start();
System.out.println("bla");
}
}
When running this code if almost immediately get OutOfMemoryError: Heap space. So i suspect i got a pretty major memory leak here. Problem is that i have no experience in this field, therefore i ask on this site.
What i've tried so far:
- Did a heap space dump file (800MB in 5 sec generated)
- tried to set the Collatz instance to null after using it to kill the reference in hope that the garbage collector will free the heap space.
My program is just a little class Collatz, that generates the collatzsequence for a given number and i want to use threads to generate all collatzsequenzes for 0 < n < 1000000.
Thanks for any help!

I am pretty sure the problem can be found here:
List<Integer> values = new ArrayList<Integer>();
while(c.hasNext()) {
values.add(c.next());
}
From my understanding a Collatz-sequence can get very long. What c.next() does is to compute the next member of the sequence, this uses nearly no space at all. But concatenating all sequence members into a list takes a huge amount of space. And the more threads you create, the sooner you'll hit that barrier.
If you don't have the option to allocate a huge amount of space for the heap, I'm pretty sure the only option would be to write a part of the values list out to a file as soon as it exceeds a specific length (you'll have to output it at some point anyway). However, with all that I/O the usage of multithreading will be pretty pointless

In order to determine the length of a Collatz sequence for a given starting value: do NOT store the sequence, simply count the number of values Collatz returns:
Collatz c = new Collatz(lastNumber);
int length = 0;
while(c.hasNext()) {
length++;
}
if(this.highestCounter < length) {
this.highestCounter = length;
this.highestValue = lastNumber;
}
Later
After reading about the "thread" challenge: If you have a computer with more than one core, it makes sense to try and search for the longest Collatz sequence up to 1M by running several threads (number of cores) in parallel, but not using the very same code. Add arguments startValue and increment to Collatz runner and add this for each iteration, so that the first thread of, say, four, computes 1, 5, 9,... the second 2, 6, 10,... the third 3, 7, 11,... and the fourth 4, 8, 12,....

Related

threading: search for a value and stop all threads

I need to implement a search method that will search through haystack and return first founded index of needle.
static int search(T needle, T[] haystack, int numThreads)
My question: How can I stop all threads if one of the thread finds result?
For example: I am searching for 5, I have 10 numbers in array such that [2, 4, 5, 6, 1, 4, 5, 8, 9, 3] and there are 2 threads. So first thread will look for first part [0 - 5), second thread will search other part [5 - 10). If thread 2 starts firstly and finds result quicker than other thread, it should return 6 and terminate thread 1 and 2.

The classic way of doing this is to simply have shared data between the threads so that they can communicate with each other. In other words, initialise some flag value to "not found" before starting the threads.
Then, when the threads are running, they process elements in the array until either their elements are exhausted, or the flag value has been set to "found".
In pseudo-code, that would be something like:
main():
global array = createArray(size = 10000, values = random)
global foundIndex = -1
global mutex = createMutex()
startThread(id = 1, func = threadFn, param = (0, 4999))
startThread(id = 2, func = threadFn, param = (5000, 9999))
waitFor(id = 1)
waitFor(id = 2)
print("Result is ", foundIndex)
threadFn(first, last):
for index in first through last inclusive:
if userSpecifiedCheckFound(array[index]):
mutex.lock()
if foundIndex == -1:
foundIndex = index
mutex.unlock()
return
mutex.lock()
localIndex = foundIndex
mutex.unlock()
if localIndex != -1:
return
You can see from that that each instance of the function will set the shared data and return if it finds a value that matches whatever criteria you're looking for. It will also return (without setting the shared data) if another thread has already set the shared data, meaning it can exit early if another thread has already found something.
Just keep in mind that the shared data, foundIndex in this case, needs to be protected from simultaneous changes lest it become corrupted. In the pseudo-code, I've shown how to do that with low-level mutual exclusion semaphores.
In Java, that means using synchronized to achieve the same effect. By way of example, the following code sets up some suitable test data so that the sixteenth cell of the twenty-cell array will satisfy the search criteria.
It then runs two threads, one on each half of the data, until it finds that cell.
public class TestProg extends Thread {
// Shared data.
static int [] sm_array = new int[20];
static int sm_foundIndex = -1;
// Each thread responsible for its own stuff.
private int m_id, m_curr, m_last;
public TestProg(int id, int first, int last) {
m_id = id;
m_curr = first;
m_last = last;
}
// Runnable: continue until someone finds it.
public void run() {
// Try all cells allotted to thread.
while (m_curr <= m_last) {
System.out.println(m_id + ": processing " + m_curr);
// If I find it first, save and exit.
if (sm_array[m_curr] != 0) {
synchronized(this) {
if (sm_foundIndex == -1) {
sm_foundIndex = m_curr;
System.out.println(m_id + ": early exit, I found it");
return;
}
}
}
// If someone else finds it, just exit.
synchronized(this) {
if (sm_foundIndex != -1) {
System.out.println(m_id + ": early exit, sibling found it");
return;
}
}
// Kludge to ensure threads run side-by-side.
try { Thread.sleep(100); } catch(Exception e) {}
m_curr++;
}
}
public static void main(String[] args) {
// Create test data.
for (int i = 0; i < 20; i++) {
sm_array[i] = 0;
}
sm_array[15] = 1;
// Create and start threads.
HelloWorld thread1 = new HelloWorld(1, 0, 9);
HelloWorld thread2 = new HelloWorld(2, 10, 19);
thread1.start();
thread2.start();
// Wait for both to finish, then print result.
try {
thread1.join();
thread2.join();
System.out.println("=> Result was " + sm_foundIndex);
} catch(Exception e) {
System.out.println("Interrupted: " + e);
}
}
}
The output of that code (although threading makes it a little non-deterministic) is:
1: processing 0
2: processing 10
1: processing 1
2: processing 11
1: processing 2
2: processing 12
1: processing 3
2: processing 13
1: processing 4
2: processing 14
1: processing 5
2: processing 15
2: early exit, I found it
1: processing 6
1: early exit, sibling found it
=> Result was 15

You could look at ExecutorCompletionService, once first result is available then cancel all other tasks.
CompletionService uses a supplied Executor to execute tasks and
placing all the future results on a queue from which you can take the results in the order they are completed

Java: Thread joins over 10,000 iterations inconsistent

Alright folks.. I'm back again (seems to be my home lately).
I'm going through the whole cave of programming YouTube vids on multi-threading. This particular one uses 2 threads that go through a for loop which adds 1 to a variable 10,000 times each. So you join them so the result is 20,000 when it's done.
public class main {
private int count = 0;
public static void main(String[] args) {
main main = new main();
main.doWork();
}
public void doWork(){
Thread t1 = new Thread(new Runnable(){
public void run(){
for (int i = 0; i < 10000; i++){
count++;
}
}
});
Thread t2 = new Thread(new Runnable(){
public void run(){
for (int i = 0; i < 10000; i++){
count++;
}
}
});
t1.start();
t2.start();
try {
t1.join();
t2.join();
} catch (InterruptedException ex) {
Logger.getLogger(main.class.getName()).log(Level.SEVERE, null, ex);
}
System.out.println("Count is: " + count);
}
}
Thing is.. when i change the iterations:
i < 10 = 20 (correct)
i < 100 = 200 (correct)
i < 1000 = 2000 (correct)
i < 10000 = 13034 (first run)
= 14516 (second run)
= ... etc..
Why won't it properly handle iterations in the tens of thousands?

You have demonstrated the classic race condition, which occurs when 2 or more threads are reading and writing to the same variable in conflicting ways. This arises because the ++ operator isn't an atomic operation -- multiple operations are occurring, and a thread could be interrupted in between operations, e.g.:
Thread t1 reads count (0), and calculates the incremented value (1), but it hasn't stored the value back to count yet.
Thread t2 reads count (still 0), and calculates the incremented value (1), but it hasn't stored the value back to count yet.
Thread t1 stores its 1 value back to count.
Thread t2 stores its 1 value back to count.
Two updates have occurred, but the net result is only an increase of 1. The set of operations which must not be interrupted is a critical section.
This appears to have happened 20,000 - 13,034 times, or 6,966 times in your first execution. You may have gotten lucky with lower bounds, but regardless of the magnitude of the bounds, the race condition can happen.
In Java, there are several solutions:
Place synchronized blocks around the critical sections (both count++ lines), locking on this.
Change count to an AtomicInteger, which encapsulates such operations atomically on its own. The getAndIncrement method would replace the ++ operator here.

Why this piece of Java code is not running concurrently

I have written Sieve of Eratosthenes which is supposed to work in parallel, but it's not. When I increase number of threads, time of computing is not getting lower. Any ideas why?
Main class
import java.util.Date;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class ConcurrentTest {
public static void main(String[] args) throws InterruptedException {
Sieve task = new Sieve();
int x = 1000000;
int threads = 4;
task.setArray(x);
Long beg = new Date().getTime();
ExecutorService exec = Executors.newCachedThreadPool();
for (int i = 0; i < threads; i++) {
exec.execute(task);
}
exec.shutdown();
Long time = 0L;
// Main thread is waiting until all threads are terminated
// ( it means that computing is done)
while (true)
if (exec.isTerminated()) {
time = new Date().getTime() - beg;
break;
}
System.out.println("Time is " + time);
}
}
Sieve class
import java.util.concurrent.ConcurrentHashMap;
public class Sieve implements Runnable {
private ConcurrentHashMap<Integer, Boolean> array =
new ConcurrentHashMap<Integer, Boolean>();
private int x;
public void run() {
while(true){
// I am getting synchronized number to check if it's prime
int n = getCounter();
// If no more numbers to check, stop loop
if( n == -1)
break;
// If HashMap contains number, we can further
if(!array.containsKey(n))continue;
for (int i = 2 * n; i <= x; i += n) {
// Compound numbers are removed from HashMap, Eg. 6, 12 and much more.
array.remove(i);
}
}
}
private synchronized int getCounter(){
if( counter < x)
return counter++;
else return -1;
}
public void setArray(int x) {
this.x = x;
for (int i = 2; i <= x; i++)
array.put(i, false);
}
}
I made some tests with different number of threads. These are results:
Nr of threads 1 Time is 1850, 1795, 1825
Nr of threads 2 Time is 1845, 1836, 1814
Nr of threads 3 Time is 1767, 1820, 1756
Nr of threads 4 Time is 1732, 1840, 2083
Nr of threads 5 Time is 1791, 1795, 1803
Nr of threads 6 Time is 1825, 1728, 1707
Nr of threads 7 Time is 1754, 1729, 1686
Nr of threads 8 Time is 1760, 1717, 1817
Nr of threads 9 Time is 1721, 1699, 1673
Nr of threads 10 Time is 1661, 1722, 1718

When I increase number of threads, time of computing is not getting
lower
tl;dr: your problem size is too small. If you increase x to 10000000, the differences will become more obvious. They won't be what you're expecting, though.
I tried your code on an eight core machine with two slight modifications:
For timing, I used System.nanoTime() instead of getTime() on a Date.
I used the awaitTermination method of ExecutorService rather than a spinloop to check for the end of run.
I tried launching your Sieve tasks using a fixed thread pool, a cached thread pool and a fork join pool and comparing the results of different values for your thread variable.
I see the following results (in milliseconds) on my machine with x=10000000:
Thread count = 1 2 4 8 16
Fixed thread pool = 5451 3866 3639 3227 3120
Cached thread pool= 5434 3763 3709 3258 3078
Fork-join pool = 6732 3670 3735 3190 3102
What these results show us is a clear benefit of changing from a single thread of execution to two threads. However, the benefit of additional threads drops off rapidly. There's an interesting plateau going from two to four threads and marginal benefits up to 16.
In addition, you can also see that the different threading mechanisms have different initial overhead: I didn't expect the Fork-Join pool to cost that much more to start than the other mechanisms.
So, as written, you shouldn't really expect a benefit past two threads for small but non-trivial problem sets.
If you'd like to increase the benefit of additional threads, you're going to need to look at your current implementation. For example, when I switched from your synchronized getCounter() to an AtomicInteger using incrementAndGet(), I eliminated the overhead of the synchronized method. The result is that all of my four thread numbers dropped on the order of 1000 milliseconds.

Multithreading a massive file read

I'm still in the process of wrapping my brain around how concurrency works in Java. I understand that (if you're subscribing to the OO Java 5 concurrency model) you implement a Task or Callable with a run() or call() method (respectively), and it behooves you to parallelize as much of that implemented method as possible.
But I'm still not understanding something inherent about concurrent programming in Java:
How is a Task's run() method assigned the right amount of concurrent work to be performed?
As a concrete example, what if I have an I/O-bound readMobyDick() method that reads the entire contents of Herman Melville's Moby Dick into memory from a file on the local system. And let's just say I want this readMobyDick() method to be concurrent and handled by 3 threads, where:
Thread #1 reads the first 1/3rd of the book into memory
Thread #2 reads the second 1/3rd of the book into memory
Thread #3 reads the last 1/3rd of the book into memory
Do I need to chunk Moby Dick up into three files and pass them each to their own task, or do I I just call readMobyDick() from inside the implemented run() method and (somehow) the Executor knows how to break the work up amongst the threads.
I am a very visual learner, so any code examples of the right way to approach this are greatly appreciated! Thanks!

You have probably chosen by accident the absolute worst example of parallel activities!
Reading in parallel from a single mechanical disk is actually slower than reading with a single thread, because you are in fact bouncing the mechanical head to different sections of the disk as each thread gets its turn to run. This is best left as a single threaded activity.
Let's take another example, which is similar to yours but can actually offer some benefit: assume I want to search for the occurrences of a certain word in a huge list of words (this list could even have come from a disk file, but like I said, read by a single thread). Assume I can use 3 threads like in your example, each searching on 1/3rd of the huge word list and keeping a local counter of how many times the searched word appeared.
In this case you'd want to partition the list in 3 parts, pass each part to a different object whose type implements Runnable and have the search implemented in the run method.
The runtime itself has no idea how to do the partitioning or anything like that, you have to specify it yourself. There are many other partitioning strategies, each with its own strengths and weaknesses, but we can stick to the static partitioning for now.
Let's see some code:
class SearchTask implements Runnable {
private int localCounter = 0;
private int start; // start index of search
private int end;
private List<String> words;
private String token;
public SearchTask(int start, int end, List<String> words, String token) {
this.start = start;
this.end = end;
this.words = words;
this.token = token;
}
public void run() {
for(int i = start; i < end; i++) {
if(words.get(i).equals(token)) localCounter++;
}
}
public int getCounter() { return localCounter; }
}
// meanwhile in main :)
List<String> words = new ArrayList<String>();
// populate words
// let's assume you have 30000 words
// create tasks
SearchTask task1 = new SearchTask(0, 10000, words, "John");
SearchTask task2 = new SearchTask(10000, 20000, words, "John");
SearchTask task3 = new SearchTask(20000, 30000, words, "John");
// create threads for each task
Thread t1 = new Thread(task1);
Thread t2 = new Thread(task2);
Thread t3 = new Thread(task3);
// start threads
t1.start();
t2.start();
t3.start();
// wait for threads to finish
t1.join();
t2.join();
t3.join();
// collect results
int counter = 0;
counter += task1.getCounter();
counter += task2.getCounter();
counter += task3.getCounter();
This should work nicely. Note that in practical cases you would build a more generic partitioning scheme. You could alternatively use an ExecutorService and implement Callable instead of Runnable if you wish to return a result.
So an alternative example using more advanced constructs:
class SearchTask implements Callable<Integer> {
private int localCounter = 0;
private int start; // start index of search
private int end;
private List<String> words;
private String token;
public SearchTask(int start, int end, List<String> words, String token) {
this.start = start;
this.end = end;
this.words = words;
this.token = token;
}
public Integer call() {
for(int i = start; i < end; i++) {
if(words.get(i).equals(token)) localCounter++;
}
return localCounter;
}
}
// meanwhile in main :)
List<String> words = new ArrayList<String>();
// populate words
// let's assume you have 30000 words
// create tasks
List<Callable> tasks = new ArrayList<Callable>();
tasks.add(new SearchTask(0, 10000, words, "John"));
tasks.add(new SearchTask(10000, 20000, words, "John"));
tasks.add(new SearchTask(20000, 30000, words, "John"));
// create thread pool and start tasks
ExecutorService exec = Executors.newFixedThreadPool(3);
List<Future> results = exec.invokeAll(tasks);
// wait for tasks to finish and collect results
int counter = 0;
for(Future f: results) {
counter += f.get();
}

You picked a bad example, as Tudor was so kind to point out. Spinning disk hardware is subject to physical constraints of moving platters and heads, and the most efficient read implementation is to read each block in order, which reduces the need to move the head or wait for the disk to align.
That said, some operating systems don't always store things continuously on disks, and for those who remember, defragmentation could provide a disk performance boost if you OS / filesystem didn't do the job for you.
As you mentioned wanting a program that would benefit, let me suggest a simple one, matrix addition.
Assuming you made one thread per core, you can trivially divide any two matrices to be added into N (one for each thread) rows. Matrix addition (if you recall) works as such:
A + B = C
or
[ a11, a12, a13 ] [ b11, b12, b13] = [ (a11+b11), (a12+b12), (a13+c13) ]
[ a21, a22, a23 ] + [ b21, b22, b23] = [ (a21+b21), (a22+b22), (a23+c23) ]
[ a31, a32, a33 ] [ b31, b32, b33] = [ (a31+b31), (a32+b32), (a33+c33) ]
So to distribute this across N threads, we simply need to take the row count and modulus divide by the number of threads to get the "thread id" it will be added with.
matrix with 20 rows across 3 threads
row % 3 == 0 (for rows 0, 3, 6, 9, 12, 15, and 18)
row % 3 == 1 (for rows 1, 4, 7, 10, 13, 16, and 19)
row % 3 == 2 (for rows 2, 5, 8, 11, 14, and 17)
// row 20 doesn't exist, because we number rows from 0
Now each thread "knows" which rows it should handle, and the results "per row" can be computed trivially because the results do not cross into other thread's domain of computation.
All that is needed now is a "result" data structure which tracks when the values have been computed, and when last value is set, then the computation is complete. In this "fake" example of a matrix addition result with two threads, computing the answer with two threads takes approximately half the time.
// the following assumes that threads don't get rescheduled to different cores for
// illustrative purposes only. Real Threads are scheduled across cores due to
// availability and attempts to prevent unnecessary core migration of a running thread.
[ done, done, done ] // filled in at about the same time as row 2 (runs on core 3)
[ done, done, done ] // filled in at about the same time as row 1 (runs on core 1)
[ done, done, .... ] // filled in at about the same time as row 4 (runs on core 3)
[ done, ...., .... ] // filled in at about the same time as row 3 (runs on core 1)
More complex problems can be solved by multithreading, and different problems are solved with different techniques. I purposefully picked one of the simplest examples.

you implement a Task or Callable with a run() or call() method
(respectively), and it behooves you to parallelize as much of that
implemented method as possible.
A Task represents a discrete unit of work
Loading a file into memory is a discrete unit of work and can therefore this activity can be delegated to a background thread. I.e. a background thread runs this task of loading the file.
It is a discrete unit of work since it has no other dependencies needed in order to do its job (load the file) and has discrete boundaries.
What you are asking is to further divide this into task. I.e. a thread loads 1/3 of the file while another thread the 2/3 etc.
If you were able to divide the task into further subtasks then it would not be a task in the first place by definition. So loading a file is a single task by itself.
To give you an example:
Let's say that you have a GUI and you need to present to the user data from 5 different files. To present them you need also to prepare some data structures to process the actual data.
All these are separate tasks.
E.g. the loading of files is 5 different tasks so could be done by 5 different threads.
The preparation of the data structures could be done a different thread.
The GUI runs of course in another thread.
All these can happen concurrently

If you system supported high-throughput I/O , here is how you can do it:
How to read a file using multiple threads in Java when a high throughput(3GB/s) file system is available
Here is the solution to read a single file with multiple threads.
Divide the file into N chunks, read each chunk in a thread, then merge them in order. Beware of lines that cross chunk boundaries. It is the basic idea as suggested by user
slaks
Bench-marking below implementation of multiple-threads for a single 20 GB file:
1 Thread : 50 seconds : 400 MB/s
2 Threads: 30 seconds : 666 MB/s
4 Threads: 20 seconds : 1GB/s
8 Threads: 60 seconds : 333 MB/s
Equivalent Java7 readAllLines() : 400 seconds : 50 MB/s
Note: This may only work on systems that are designed to support high-throughput I/O , and not on usual personal computers
Here is the essential nits of the code, for complete details , follow the link
public class FileRead implements Runnable
{
private FileChannel _channel;
private long _startLocation;
private int _size;
int _sequence_number;
public FileRead(long loc, int size, FileChannel chnl, int sequence)
{
_startLocation = loc;
_size = size;
_channel = chnl;
_sequence_number = sequence;
}
#Override
public void run()
{
System.out.println("Reading the channel: " + _startLocation + ":" + _size);
//allocate memory
ByteBuffer buff = ByteBuffer.allocate(_size);
//Read file chunk to RAM
_channel.read(buff, _startLocation);
//chunk to String
String string_chunk = new String(buff.array(), Charset.forName("UTF-8"));
System.out.println("Done Reading the channel: " + _startLocation + ":" + _size);
}
//args[0] is path to read file
//args[1] is the size of thread pool; Need to try different values to fing sweet spot
public static void main(String[] args) throws Exception
{
FileInputStream fileInputStream = new FileInputStream(args[0]);
FileChannel channel = fileInputStream.getChannel();
long remaining_size = channel.size(); //get the total number of bytes in the file
long chunk_size = remaining_size / Integer.parseInt(args[1]); //file_size/threads
//thread pool
ExecutorService executor = Executors.newFixedThreadPool(Integer.parseInt(args[1]));
long start_loc = 0;//file pointer
int i = 0; //loop counter
while (remaining_size >= chunk_size)
{
//launches a new thread
executor.execute(new FileRead(start_loc, toIntExact(chunk_size), channel, i));
remaining_size = remaining_size - chunk_size;
start_loc = start_loc + chunk_size;
i++;
}
//load the last remaining piece
executor.execute(new FileRead(start_loc, toIntExact(remaining_size), channel, i));
//Tear Down
}
}

Why is my multi threaded sorting algorithm not faster than my single threaded mergesort

There are certain algorithms whose running time can decrease significantly when one divides up a task and gets each part done in parallel. One of these algorithms is merge sort, where a list is divided into infinitesimally smaller parts and then recombined in a sorted order. I decided to do an experiment to test whether or not I could I increase the speed of this sort by using multiple threads. I am running the following functions in Java on a Quad-Core Dell with Windows Vista.
One function (the control case) is simply recursive:
// x is an array of N elements in random order
public int[] mergeSort(int[] x) {
if (x.length == 1)
return x;
// Dividing the array in half
int[] a = new int[x.length/2];
int[] b = new int[x.length/2+((x.length%2 == 1)?1:0)];
for(int i = 0; i < x.length/2; i++)
a[i] = x[i];
for(int i = 0; i < x.length/2+((x.length%2 == 1)?1:0); i++)
b[i] = x[i+x.length/2];
// Sending them off to continue being divided
mergeSort(a);
mergeSort(b);
// Recombining the two arrays
int ia = 0, ib = 0, i = 0;
while(ia != a.length || ib != b.length) {
if (ia == a.length) {
x[i] = b[ib];
ib++;
}
else if (ib == b.length) {
x[i] = a[ia];
ia++;
}
else if (a[ia] < b[ib]) {
x[i] = a[ia];
ia++;
}
else {
x[i] = b[ib];
ib++;
}
i++;
}
return x;
}
The other is in the 'run' function of a class that extends thread, and recursively creates two new threads each time it is called:
public class Merger extends Thread
{
int[] x;
boolean finished;
public Merger(int[] x)
{
this.x = x;
}
public void run()
{
if (x.length == 1) {
finished = true;
return;
}
// Divide the array in half
int[] a = new int[x.length/2];
int[] b = new int[x.length/2+((x.length%2 == 1)?1:0)];
for(int i = 0; i < x.length/2; i++)
a[i] = x[i];
for(int i = 0; i < x.length/2+((x.length%2 == 1)?1:0); i++)
b[i] = x[i+x.length/2];
// Begin two threads to continue to divide the array
Merger ma = new Merger(a);
ma.run();
Merger mb = new Merger(b);
mb.run();
// Wait for the two other threads to finish
while(!ma.finished || !mb.finished) ;
// Recombine the two arrays
int ia = 0, ib = 0, i = 0;
while(ia != a.length || ib != b.length) {
if (ia == a.length) {
x[i] = b[ib];
ib++;
}
else if (ib == b.length) {
x[i] = a[ia];
ia++;
}
else if (a[ia] < b[ib]) {
x[i] = a[ia];
ia++;
}
else {
x[i] = b[ib];
ib++;
}
i++;
}
finished = true;
}
}
It turns out that function that does not use multithreading actually runs faster. Why? Does the operating system and the java virtual machine not "communicate" effectively enough to place the different threads on different cores? Or am I missing something obvious?

The problem is not multi-threading: I've written a correctly multi-threaded QuickSort in Java and it owns the default Java sort. I did this after witnessing a gigantic dataset being process and had only one core of a 16-cores machine working.
One of your issue (a huge one) is that you're busy looping:
// Wait for the two other threads to finish
while(!ma.finished || !mb.finished) ;
This is a HUGE no-no: it is called busy looping and you're destroying the perfs.
(Another issue is that your code is not spawning any new threads, as it has already been pointed out to you)
You need to use other way to synchronize: an example would be to use a CountDownLatch.
Another thing: there's no need to spawn two new threads when you divide the workload: spawn only one new thread, and do the other half in the current thread.
Also, you probably don't want to create more threads than there are cores availables.
See my question here (asking for a good Open Source multithreaded mergesort/quicksort/whatever). The one I'm using is proprietary, I can't paste it.
Multithreaded quicksort or mergesort
I haven't implemented Mergesort but QuickSort and I can tell you that there's no array copying going on.
What I do is this:
pick a pivot
exchange values as needed
have we reached the thread limit? (depending on the number of cores)
yes: sort first part in this thread
no: spawn a new thread
sort second part in current thread
wait for first part to finish if it's not done yet (using a CountDownLatch).
The code spawning a new thread and creating the CountDownLatch may look like this:
final CountDownLatch cdl = new CountDownLatch( 1 );
final Thread t = new Thread( new Runnable() {
public void run() {
quicksort(a, i+1, r );
cdl.countDown();
}
} };
The advantage of using synchronization facilities like the CountDownLatch is that it is very efficient and that your not wasting time dealing with low-level Java synchronization idiosynchrasies.
In your case, the "split" may look like this (untested, it is just to give an idea):
if ( threads.getAndIncrement() < 4 ) {
final CountDownLatch innerLatch = new CountDownLatch( 1 );
final Thread t = new Merger( innerLatch, b );
t.start();
mergeSort( a );
while ( innerLatch.getCount() > 0 ) {
try {
innerLatch.await( 1000, TimeUnit.SECONDS );
} catch ( InterruptedException e ) {
// Up to you to decide what to do here
}
}
} else {
mergeSort( a );
mergeSort( b );
}
(don't forget to "countdown" the latch when each merge is done)
Where you'd replace the number of threads (up to 4 here) by the number of available cores. You may use the following (once, say to initialize some static variable at the beginning of your program: the number of cores is unlikely to change [unless you're on a machine allowing CPU hotswapping like some Sun systems allows]):
Runtime.getRuntime().availableProcessors()

As others said; This code isn't going to work because it starts no new threads. You need to call the start() method instead of the run() method to create new threads. It also has concurrency errors: the checks on the finished variable are not thread safe.
Concurrent programming can be pretty difficult if you do not understand the basics. You might read the book Java Concurrency in Practice by Brian Goetz. It explains the basics and explains constructs (such as Latch, etc) to ease building concurrent programs.

The overhead cost of synchronization may be comparatively large and prevent many optimizations.
Furthermore you are creating way too many threads.
The other is in the 'run' function of a class that extends thread, and recursively creates two new threads each time it is called.
You would be better off with a fixed number of threads, suggestively 4 on a quad core. This could be realized with a thread pool (tutorial) and the pattern would be "bag of tasks". But perhaps it would be better yet, to initially divide the task into four equally large tasks and do "single-threaded" sorting on those tasks. This would then utilize the caches a lot better.
Instead of having a "busy-loop" waiting for the threads to finish (stealing cpu-cycles) you should have a look at Thread.join().

How many elements in the array you have to do sort? If there are too few elements, the time of sync and CPU switching will over the time you save for dividing the job for paralleling

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.