Java Parallel File Processing - java

I have following code:
import java.io.*;
import java.util.concurrent.* ;
public class Example{
public static void main(String args[]) {
try {
FileOutputStream fos = new FileOutputStream("1.dat");
DataOutputStream dos = new DataOutputStream(fos);
for (int i = 0; i < 200000; i++) {
dos.writeInt(i);
}
dos.close(); // Two sample files created
FileOutputStream fos1 = new FileOutputStream("2.dat");
DataOutputStream dos1 = new DataOutputStream(fos1);
for (int i = 200000; i < 400000; i++) {
dos1.writeInt(i);
}
dos1.close();
Exampless.createArray(200000); //Create a shared array
Exampless ex1 = new Exampless("1.dat");
Exampless ex2 = new Exampless("2.dat");
ExecutorService executor = Executors.newFixedThreadPool(2); //Exexuted parallaly to cont number of matches in two file
long startTime = System.nanoTime();
long endTime;
Future<Integer> future1 = executor.submit(ex1);
Future<Integer> future2 = executor.submit(ex2);
int count1 = future1.get();
int count2 = future2.get();
endTime = System.nanoTime();
long duration = endTime - startTime;
System.out.println("duration with threads:"+duration);
executor.shutdown();
System.out.println("Matches: " + (count1 + count2));
startTime = System.nanoTime();
ex1.call();
ex2.call();
endTime = System.nanoTime();
duration = endTime - startTime;
System.out.println("duration without threads:"+duration);
} catch (Exception e) {
System.err.println("Error: " + e.getMessage());
}
}
}
class Exampless implements Callable {
public static int[] arr = new int[20000];
public String _name;
public Exampless(String name) {
this._name = name;
}
static void createArray(int z) {
for (int i = z; i < z + 20000; i++) { //shared array
arr[i - z] = i;
}
}
public Object call() {
try {
int cnt = 0;
FileInputStream fin = new FileInputStream(_name);
DataInputStream din = new DataInputStream(fin); // read file and calculate number of matches
for (int i = 0; i < 20000; i++) {
int c = din.readInt();
if (c == arr[i]) {
cnt++;
}
}
return cnt ;
} catch (Exception e) {
System.err.println("Error: " + e.getMessage());
}
return -1 ;
}
}
Where I am trying to count number of matches in an array with two files. Now, though I am running it on two threads, code is not doing well because:
(running it on single thread, file 1 + file 2 reading time) < (file 1 || file 2 reading time in multiple thread).
Can anyone help me how to solve this (I have 2 core CPU and file size is approx. 1.5 GB).

In the first case you are reading sequentially one file, byte-by-byte, block-by-block. This is as fast as disk I/O can be, providing the file is not very fragmented. When you are done with the first file, disk/OS finds the beginning of the second file and continues very efficient, linear reading of disk.
In the second case you are constantly switching between the first and the second file, forcing the disk to seek from one place to another. This extra seeking time (approximately 10 ms) is the root of your confusion.
Oh, and you know that disk access is single-threaded and your task is I/O bound so there is no way splitting this task to multiple threads could help, as long as your reading from the same physical disk? Your approach could only be justified if:
each thread, except reading from a file, was also performing some CPU intensive or blocking operations, slower by an order of magnitude compared to I/O.
files are on different physical drives (different partition is not enough) or on some RAID configurations
you are using SSD drive

You will not get any benefit from multithreading as Tomasz pointed out from reading the data from disk. You may get some improvement in speed if you multithread the checks, i.e. you load the data from the files into arrays sequentially and then the threads execute the checking in parallel. But considering the small size of your files (~80kb) and the fact that you are just comparing ints I doubt the performance improvement will be worth the effort.
Something that will definitely improve your execution speed is if you do not use readInt(). Since you know you are comparing 20000 ints, you should read all 20000 ints into an array at once for each file (or at least in blocks), rather than calling the readInt() function 20000 times.

Related

Given n files with their lengths, divide total bytes equally among d threads equally such that any two threads differ by atmost 1 byte

You are given a list of file names and their lengths in bytes.
Example:
File1: 200 File2: 500 File3: 800
You are given a number N. We want to launch N threads to read all the files parallelly such that each thread approximately reads an equal amount of bytes
You should return N lists. Each list describes the work of one thread: Example, when N=2, there are two threads. In the above example, there is a total of 1500 bytes (200 + 500 + 800). A fairway to divide is for each thread to read 750 bytes. So you will return:
Two lists
List 1: File1: 0 - 199 File2: 0 - 499 File3: 0-49 ---------------- Total 750 bytes
List 2: File3: 50-799 -------------------- Total 750 bytes
Implement the following method
List<List<FileRange>> getSplits(List<File> files, int N)
Class File {
String filename; long length }
Class FileRange {
String filename Long startOffset Long endOffset }
I tried with this one but it's not working any help would be highly appreciated.
List<List<FileRange>> getSplits(List<File> files, int n) {
List<List<FileRange>> al=new ArrayList<>();
long s=files.size();
long sum=0;
for(int i=0;i<s;i++){
long l=files.get(i).length;
sum+=(long)l;
}
long div=(long)sum/n; // no of bytes per thread
long mod=(long)sum%n;
long[] lo=new long[(long)n];
for(long i=0;i<n;i++)
lo[i]=div;
if(mod!=0){
long i=0;
while(mod>0){
lo[i]+=1;
mod--;
i++;
}
}
long inOffset=0;
for(long j=0;j<n;j++){
long val=lo[i];
for(long i=0;i<(long)files.size();i++){
String ss=files.get(i).filename;
long ll=files.get(i).length;
if(ll<val){
inOffset=0;
val-=ll;
}
else{
inOffset=ll-val;
ll=val;
}
al.add(new ArrayList<>(new File(ss,inOffset,ll-1)));
}
}
}
I'm getting problem in startOffset and endOffset with it's corresponding file. I tried it but I was not able to extract from List and add in the form of required return type List>.
The essence of the problem is to simultaneously walk through two lists:
the input list, which is a list of files
the output list, which is a list of threads (where each thread has a list of ranges)
I find that the easiest approach to such problems is an infinite loop that looks something like this:
while (1)
{
move some information from the input to the output
decide whether to advance to the next input item
decide whether to advance to the next output item
if we've reached (the end of the input _OR_ the end of the output)
break
if we advanced to the next input item
prepare the next input item for processing
if we advanced to the next output item
prepare the next output item for processing
}
To keep track of the input, we need the following information
fileIndex the index into the list of files
fileOffset the offset of the first unassigned byte in the file, initially 0
fileRemain the number of bytes in the file that are unassigned, initially the file size
To keep track of the output, we need
threadIndex the index of the thread we're currently working on (which is the first index into the List<List<FileRange>> that the algorithm produces)
threadNeeds the number of bytes that the thread still needs, initially base or base+1
Side note: I'm using base as the minimum number bytes assigned to each thread (sum/n), and extra as the number of threads that get an extra byte (sum%n).
So now we get to the heart of the algorithm: what information to move from input to output:
if fileRemain is less than threadNeeds then the rest of the file (which may be the entire file) gets assigned to the current thread, and we move to the next file
if fileRemain is greater than threadNeeds then a portion of the file is assigned to the current thread, and we move to the next thread
if fileRemain is equal to threadNeeds then the rest of the file is assigned to the thread, and we move to the next file, and the next thread
Those three cases are easily handled by comparing fileRemain and threadNeeds, and choosing a byteCount that is the minimum of the two.
With all that in mind, here's some pseudo-code to help get you started:
base = sum/n;
extra = sum%n;
// initialize the input control variables
fileIndex = 0
fileOffset = 0
fileRemain = length of file 0
// initialize the output control variables
threadIndex = 0
threadNeeds = base
if (threadIndex < extra)
threadNeeds++
while (1)
{
// decide how many bytes can be assigned, and generate some output
byteCount = min(fileRemain, threadNeeds)
add (file.name, fileOffset, fileOffset+byteCount-1) to the list of ranges
// decide whether to advance to the next input and output items
threadNeeds -= byteCount
fileRemain -= byteCount
if (threadNeeds == 0)
threadIndex++
if (fileRemain == 0)
fileIndex++
// are we done yet?
if (threadIndex == n || fileIndex == files.size())
break
// if we've moved to the next input item, reinitialize the input control variables
if (fileRemain == 0)
{
fileOffset = 0
fileRemain = length of file
}
// if we've moved to the next output item, reinitialize the output control variables
if (threadNeeds == 0)
{
threadNeeds = base
if (threadIndex < extra)
threadNeeds++
}
}
Debugging tip: Reaching the end of the input, and the end of the output, should happen simultaneously. In other words, you should run out of files at exactly the same time as you run out of threads. So during development, I would check both conditions, and verify that they do, in fact, change at the same time.
Here's the code solution for your problem (in Java) :
The custom class 'File' and 'FileRange' are as follows :
public class File{
String filename;
long length;
public File(String filename, long length) {
this.filename = filename;
this.length = length;
}
public String getFilename() {
return filename;
}
public void setFilename(String filename) {
this.filename = filename;
}
public long getLength() {
return length;
}
public void setLength(long length) {
this.length = length;
}
}
public class FileRange {
String filename;
Long startOffset;
Long endOffset;
public FileRange(String filename, Long startOffset, Long endOffset) {
this.filename = filename;
this.startOffset = startOffset;
this.endOffset = endOffset;
}
public String getFilename() {
return filename;
}
public void setFilename(String filename) {
this.filename = filename;
}
public Long getStartOffset() {
return startOffset;
}
public void setStartOffset(Long startOffset) {
this.startOffset = startOffset;
}
public Long getEndOffset() {
return endOffset;
}
public void setEndOffset(Long endOffset) {
this.endOffset = endOffset;
}
}
The main class will be as follows :
import java.util.ArrayList;
import java.util.List;
import java.util.Scanner;
import java.util.concurrent.atomic.AtomicInteger;
public class MainClass {
private static List<List<FileRange>> getSplits(List<File> files, int N) {
List<List<FileRange>> results = new ArrayList<>();
long sum = files.stream().mapToLong(File::getLength).sum(); // Total bytes in all the files
long div = sum/N;
long mod = sum%N;
// Storing how many bytes each thread gets to process
long thread_bytes[] = new long[N];
// At least 'div' number of bytes will be processed by each thread
for(int i=0;i<N;i++)
thread_bytes[i] = div;
// Left over bytes to be processed by each thread
for(int i=0;i<mod;i++)
thread_bytes[i] += 1;
int count = 0;
int len = files.size();
long processed_bytes[] = new long[len];
long temp = 0L;
int file_to_be_processed = 0;
while(count < N && sum > 0) {
temp = thread_bytes[count];
sum -= temp;
List<FileRange> internal = new ArrayList<>();
while (temp > 0) {
// Start from the file to be processed - Will be 0 in the first iteration
// Will be updated in the subsequent iterations
for(int j=file_to_be_processed;j<len && temp>0;j++){
File f = files.get(j);
if(f.getLength() - processed_bytes[j] <= temp){
internal.add(new FileRange(f.getFilename(), processed_bytes[j], f.getLength()- 1));
processed_bytes[j] = f.getLength() - processed_bytes[j];
temp -= processed_bytes[j];
file_to_be_processed++;
}
else{
internal.add(new FileRange(f.getFilename(), processed_bytes[j], processed_bytes[j] + temp - 1));
// In this case, we won't update the number for file to be processed
processed_bytes[j] += temp;
temp -= processed_bytes[j];
}
}
results.add(internal);
count++;
}
}
return results;
}
public static void main(String args[]){
Scanner scn = new Scanner(System.in);
int N = scn.nextInt();
// Inserting demo records in list
File f1 = new File("File 1",200);
File f2 = new File("File 2",500);
File f3 = new File("File 3",800);
List<File> files = new ArrayList<>();
files.add(f1);
files.add(f2);
files.add(f3);
List<List<FileRange>> results = getSplits(files, N);
final AtomicInteger result_count = new AtomicInteger();
// Displaying the results
results.forEach(result -> {
System.out.println("List "+result_count.incrementAndGet() + " : ");
result.forEach(res -> {
System.out.print(res.getFilename() + " : ");
System.out.print(res.getStartOffset() + " - ");
System.out.print(res.getEndOffset() + "\n");
});
System.out.println("---------------");
});
}
}
If some part is still unclear, consider a case and dry run the program.
Say 999 bytes have to be processed by 100 threads
So the 100 threads get 9 bytes each and out of the remaining 99 bytes, each thread except the 100th gets 1 byte. By doing this, we'll make sure no 2 threads differ by at most 1 byte. Proceed with this idea and follow up with the code.

Same code block takes different time duration to excute

I was trying to check the time of execution with similar blocks.
Sample code and output are below,
public class Tester {
public static void main(String[] args) {
System.out.println("Run 1");
List<Integer> list = new ArrayList<>();
int i = 0;
long st = System.currentTimeMillis();
while (++i < 10000) {
list.add(i);
}
System.out.println("Time taken :" + (System.currentTimeMillis() - st));
System.out.println("Run 2");
int j = 0;
List<Integer> list2 = new ArrayList<>();
long ST = System.currentTimeMillis();
while (++j < 10000) {
list2.add(j);
}
System.out.println("Time taken :" + (System.currentTimeMillis() - ST));
System.out.println("Run 3");
int k = 0;
List<Integer> list3 = new ArrayList<>();
long ST2 = System.currentTimeMillis();
while (++k < 10000) {
list3.add(k);
}
System.out.println("Time taken :" + (System.currentTimeMillis() - ST2));
}
}
Output
Run 1
Time taken :6
Run 2
Time taken :3
Run 3
Time taken :1
Why am I getting different time of execution?
This is probably to just-in-time compilation and hotspot optimizing on the array list, but you cannot be 100% sure.
Apart from that, your sample size is much too small to be significant.
a) Since the java code is compiled to bytecode some optimizations are done to your code, anyway this might not have something to do with your observations
b) Each subsequent similar operation has better execution time until the jvm is "warmed up" for that operation, due to JVM lazy loading or CPU caching for example.
c) If you want to try benchmarking check out Java Microbenchmark harness (JMH)

looping StringBuilder memory leak

For a project in my data structures class, we have to traverse a graph using both breadth first and depth first traversals. Depth first traversals, in this case, are commonly between 1 and 100,000 nodes long.
The content of each node is a string. Traversing through the graph and finding the solution haven't been a large issue. However, once the traversal is completed, it must be displayed.
To do this, I went back through each node's parent, and added the node's string to a Stack. Still at this point the code runs fine (to my knowledge). The massive slowdown comes when attempting to display these strings.
Given a stack of several tens of thousands of strings, how do I display them? Straight calling System.out.println() takes way too long. Yet using a StringBuilder iteratively somehow just eats up the memory. I've tried to clear out the StringBuilder at a rather arbitrary interval (in a rather shoddy manner), and this seemed to slightly help. Also introducing a small sleep seemed to help as well. Either way, by the time the function is finished printing, either Java has run out of memory, or is incredibly slow and unresponsive.
Is there some way I can restructure this code to output all of the strings in a relatively timely manner (~10 seconds shouldn't be asking too much), without crashing the VM?
if(DFSResult.hash().equals(myPuzzle.answerHash()))
{
System.out.println("Success");
int depth = 0;
while(DFSResult.parent != null)
{
depth++;
if(in.equals("y")) dfs.push(DFSResult.hash());
DFSResult = DFSResult.parent;
}
System.out.println("found at a depth of "+depth+"\n\n");
StringBuilder s = new StringBuilder();
int x = 0;
boolean isSure = false;
if(in.equals("y"))
{
System.out.println("Are you really sure you want to print out a lineage this long? (y/n)");
try{
if(input.readLine().toLowerCase().equals("y"))isSure = true;
}catch (Exception e) {
}//
}
s = new StringBuilder();
while(!dfs.empty() && isSure)
{
//System.out.println(dfs.pop());
s.append(dfs.pop()).append("\n");
x++;
if(x % 500 == 0)
{//Flush the stringbuilder
try{Thread.sleep(50);}catch(Exception e){e.printStackTrace();}
System.out.println(s.toString());
s.delete(0,s.length());//supposed 25% efficiency increase over saying s = new StringBuilder()
}
}
System.out.println(s.toString());
You don't need to use StringBuilder in your example.
Instead just do:
System.out.println(dfs.pop());
inside your loop. No sleeps needed either. Just one line inside the loop.
There's no problem with System.out.println() efficiency. Please consider the following code:
public class TestMain {
public static void main(String[] args) {
long timeBefore = System.currentTimeMillis();
for (int i = 0; i < 50000; i++) {
System.out.println("Value = " + i);
}
long timeAfter = System.currentTimeMillis();
System.out.println("Time elapsed (ms): " + (timeAfter - timeBefore));
}
}
This is the last lines of output on my machine:
. . .
Value = 49994
Value = 49995
Value = 49996
Value = 49997
Value = 49998
Value = 49999
Time elapsed (ms): 538
As you can see it's super fast. The problem is not in the println() method.
Example of stack usage below:
public static void main(String[] args) {
long timeBefore1 = System.currentTimeMillis();
Stack<String> s = new Stack<String>();
for (int i = 0; i < 50000; i++) {
s.push("Value = " + i);
}
long timeAfter1 = System.currentTimeMillis();
long timeBefore2 = System.currentTimeMillis();
while (!s.isEmpty()) {
System.out.println(s.pop());
}
long timeAfter2 = System.currentTimeMillis();
System.out.println("Time spent on building stack (ms): " + (timeAfter1 - timeBefore1));
System.out.println("Time spent on reading stack (ms): " + (timeAfter2 - timeBefore2));
}
Output:
. . .
Value = 2
Value = 1
Value = 0
Time spent on building stack (ms): 31
Time spent on reading stack (ms): 551

Multi threaded object creation slower then in a single thread

I have what probably is a basic question. When I create 100 million Hashtables it takes approximately 6 seconds (runtime = 6 seconds per core) on my machine if I do it on a single core. If I do this multi-threaded on 12 cores (my machine has 6 cores that allow hyperthreading) it takes around 10 seconds (runtime = 112 seconds per core).
This is the code I use:
Main
public class Tests
{
public static void main(String args[])
{
double start = System.currentTimeMillis();
int nThreads = 12;
double[] runTime = new double[nThreads];
TestsThread[] threads = new TestsThread[nThreads];
int totalJob = 100000000;
int jobsize = totalJob/nThreads;
for(int i = 0; i < threads.length; i++)
{
threads[i] = new TestsThread(jobsize,runTime, i);
threads[i].start();
}
waitThreads(threads);
for(int i = 0; i < runTime.length; i++)
{
System.out.println("Runtime thread:" + i + " = " + (runTime[i]/1000000) + "ms");
}
double end = System.currentTimeMillis();
System.out.println("Total runtime = " + (end-start) + " ms");
}
private static void waitThreads(TestsThread[] threads)
{
for(int i = 0; i < threads.length; i++)
{
while(threads[i].finished == false)//keep waiting untill the thread is done
{
//System.out.println("waiting on thread:" + i);
try {
Thread.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
}
}
Thread
import java.util.HashMap;
import java.util.Map;
public class TestsThread extends Thread
{
int jobSize = 0;
double[] runTime;
boolean finished;
int threadNumber;
TestsThread(int job, double[] runTime, int threadNumber)
{
this.finished = false;
this.jobSize = job;
this.runTime = runTime;
this.threadNumber = threadNumber;
}
public void run()
{
double start = System.nanoTime();
for(int l = 0; l < jobSize ; l++)
{
double[] test = new double[65];
}
double end = System.nanoTime();
double difference = end-start;
runTime[threadNumber] += difference;
this.finished = true;
}
}
I do not understand why creating the object simultaneously in multiple threads takes longer per thread then doing it in serial in only 1 thread. If I remove the line where I create the Hashtable this problem disappears. If anyone could help me with this I would be greatly thankful.
Update: This problem has an associated bug report and has been fixed with Java 1.7u40. And it was never an issue for Java 1.8 as Java 8 has an entirely different hash table algorithm.
Since you are not using the created objects that operation will get optimized away. So you’re only measuring the overhead of creating threads. This is surely the more overhead the more threads you start.
I have to correct my answer regarding a detail, I didn’t know yet: there is something special with the classes Hashtable and HashMap. They both invoke sun.misc.Hashing.randomHashSeed(this) in the constructor. In other words, their instances escape during construction which has an impact on the memory visibility. This implies that their construction, unlike let’s say for an ArrayList, cannot optimized away, and multi-threaded construction slows down due to what happens inside that method (i.e. synchronization).
As said, that’s special to these classes and of course this implementation (my setup:1.7.0_13). For ordinary classes the construction time goes straight to zero for such code.
Here I add a more sophisticated benchmark code. Watch the difference between DO_HASH_MAP = true and DO_HASH_MAP = false (when false it will create an ArrayList instead which has no such special behavior).
import java.util.*;
import java.util.concurrent.*;
public class AllocBench {
static final int NUM_THREADS = 1;
static final int NUM_OBJECTS = 100000000 / NUM_THREADS;
static final boolean DO_HASH_MAP = true;
public static void main(String[] args) throws InterruptedException, ExecutionException {
ExecutorService threadPool = Executors.newFixedThreadPool(NUM_THREADS);
Callable<Long> task=new Callable<Long>() {
public Long call() {
return doAllocation(NUM_OBJECTS);
}
};
long startTime=System.nanoTime(), cpuTime=0;
for(Future<Long> f: threadPool.invokeAll(Collections.nCopies(NUM_THREADS, task))) {
cpuTime+=f.get();
}
long time=System.nanoTime()-startTime;
System.out.println("Number of threads: "+NUM_THREADS);
System.out.printf("entire allocation required %.03f s%n", time*1e-9);
System.out.printf("time x numThreads %.03f s%n", time*1e-9*NUM_THREADS);
System.out.printf("real accumulated cpu time %.03f s%n", cpuTime*1e-9);
threadPool.shutdown();
}
static long doAllocation(int numObjects) {
long t0=System.nanoTime();
for(int i=0; i<numObjects; i++)
if(DO_HASH_MAP) new HashMap<Object, Object>(); else new ArrayList<Object>();
return System.nanoTime()-t0;
}
}
What about if you do it on 6 cores? Hyperthreading isn't the exact same as having double the cores, so you might want to try the amount of real cores too.
Also the OS won't necessarily schedule each of your threads to their own cores.
Since all you are doing is measuring the time and churning memory, your bottleneck is likely to be in your L3 cache or bus to main memory. In this cases, coordinating the work between threads could be producing so much overhead it is worse instead of better.
This is too long for a comment but your inner loop can be just
double start = System.nanoTime();
for(int l = 0; l < jobSize ; l++){
Map<String,Integer> test = new HashMap<String,Integer>();
}
// runtime is an AtomicLong for thread safety
runtime.addAndGet(System.nanoTime() - start); // time in nano-seconds.
Taking the time can be as slow creating a HashMap so you might not be measuring what you think you if you call the timer too often.
BTW Hashtable is synchronized and you might find using HashMap is faster, and possibly more scalable.

Java lock/concurrency issue when searching array with multiple threads

I am new to Java and trying to write a method that finds the maximum value in a 2D array of longs.
The method searches through each row in a separate thread, and the threads maintain a shared current maximal value. Whenever a thread finds a value larger than its own local maximum, it compares this value with the shared local maximum and updates its current local maximum and possibly the shared maximum as appropriate. I need to make sure that appropriate synchronization is implemented so that the result is correct regardless of how to computations interleave.
My code is verbose and messy, but for starters, I have this function:
static long sharedMaxOf2DArray(long[][] arr, int r){
MyRunnableShared[] myRunnables = new MyRunnableShared[r];
for(int row = 0; row < r; row++){
MyRunnableShared rr = new MyRunnableShared(arr, row, r);
Thread t = new Thread(rr);
t.start();
myRunnables[row] = rr;
}
return myRunnables[0].sharedMax; //should be the same as any other one (?)
}
For the adapted runnable, I have this:
public static class MyRunnableShared implements Runnable{
long[][] theArray;
private int row;
private long rowMax;
public long localMax;
public long sharedMax;
private static Lock sharedMaxLock = new ReentrantLock();
MyRunnableShared(long[][] a, int r, int rm){
theArray = a;
row = r;
rowMax = rm;
}
public void run(){
localMax = 0;
for(int i = 0; i < rowMax; i++){
if(theArray[row][i] > localMax){
localMax = theArray[row][i];
sharedMaxLock.lock();
try{
if(localMax > sharedMax)
sharedMax = localMax;
}
finally{
sharedMaxLock.unlock();
}
}
}
}
}
I thought this use of a lock would be a safe way to prevent multiple threads from messing with the sharedMax at a time, but upon testing/comparing with a non-concurrent maximum-finding function on the same input, I found the results to be incorrect. I'm thinking the problem might come from the fact that I just say
...
t.start();
myRunnables[row] = rr;
...
in the sharedMaxOf2DArray function. Perhaps a given thread needs to finish before I put it in the array of myRunnables; otherwise, I will have "captured" the wrong sharedMax? Or is it something else? I'm not sure on the timing of things..
I'm not sure if this is a typo or not, but your Runnable implementation declares sharedMax as an instance variable:
public long sharedMax;
rather than a shared one:
public static long sharedMax;
In the former case, each Runnable gets its own copy and will not "see" the values of others. Changing it to the latter should help. Or, change it to:
public long[] sharedMax; // array of size 1 shared across all threads
and you can now create an array of size one outside the loop and pass it in to each Runnable to use as shared storage.
As an aside: please note that there will be tremendous lock contention since every thread checks the common sharedMax value by holding a lock for every iteration of its loop. This will likely lead to poor performance. You'd have to measure, but I'd surmise that letting each thread find the row maximum and then running a final pass to find the "max of maxes" might actually be comparable or quicker.
From JavaDocs:
public interface Callable
A task that returns a result and may
throw an exception. Implementors define a single method with no
arguments called call.
The Callable interface is similar to Runnable, in that both are
designed for classes whose instances are potentially executed by
another thread. A Runnable, however, does not return a result and
cannot throw a checked exception.
Well, you can use Callable to calculate your result from one 1darray and wait with an ExecutorService for the end. You can now compare each result of the Callable to fetch the maximum. The code may look like this:
Random random = new Random(System.nanoTime());
long[][] myArray = new long[5][5];
for (int i = 0; i < 5; i++) {
myArray[i] = new long[5];
for (int j = 0; j < 5; j++) {
myArray[i][j] = random.nextLong();
}
}
ExecutorService executor = Executors.newFixedThreadPool(myArray.length);
List<Future<Long>> myResults = new ArrayList<>();
// create a callable for each 1d array in the 2d array
for (int i = 0; i < myArray.length; i++) {
Callable<Long> callable = new SearchCallable(myArray[i]);
Future<Long> callResult = executor.submit(callable);
myResults.add(callResult);
}
// This will make the executor accept no new threads
// and finish all existing threads in the queue
executor.shutdown();
// Wait until all threads are finish
while (!executor.isTerminated()) {
}
// now compare the results and fetch the biggest one
long max = 0;
for (Future<Long> future : myResults) {
try {
max = Math.max(max, future.get());
} catch (InterruptedException | ExecutionException e) {
// something bad happend...!
e.printStackTrace();
}
}
System.out.println("The result is " + max);
And your Callable:
public class SearchCallable implements Callable<Long> {
private final long[] mArray;
public SearchCallable(final long[] pArray) {
mArray = pArray;
}
#Override
public Long call() throws Exception {
long max = 0;
for (int i = 0; i < mArray.length; i++) {
max = Math.max(max, mArray[i]);
}
System.out.println("I've got the maximum " + max + ", and you guys?");
return max;
}
}
Your code has serious lock contention and thread safety issues. Even worse, it doesn't actually wait for any of the threads to finish before the return myRunnables[0].sharedMax which is a really bad race condition. Also, using explicit locking via ReentrantLock or even synchronized blocks is usually the wrong way of doing things unless you're implementing something low level (eg your own/new concurrent data structure)
Here's a version that uses the Future concurrent primitive and an ExecutorService to handle the thread creation. The general idea is:
Submit a number of concurrent jobs to your ExecutorService
Add the Future returned backed from submit(...) to a List
Loop through the list calling get() on each Future and aggregating the result
This version has the added benefit that there is no lock contention (or locking in general) between the worker threads as each just returns back the max for its slice of the array.
import java.util.concurrent.*;
import java.util.*;
public class PMax {
public static long pmax(final long[][] arr, int numThreads) {
ExecutorService pool = Executors.newFixedThreadPool(numThreads);
try {
List<Future<Long>> list = new ArrayList<Future<Long>>();
for(int i=0;i<arr.length;i++) {
// put sub-array in a final so the inner class can see it:
final long[] subArr = arr[i];
list.add(pool.submit(new Callable<Long>() {
public Long call() {
long max = Long.MIN_VALUE;
for(int j=0;j<subArr.length;j++) {
if( subArr[j] > max ) {
max = subArr[j];
}
}
return max;
}
}));
}
// find the max of each slice's max:
long max = Long.MIN_VALUE;
for(Future<Long> future : list) {
long threadMax = future.get();
System.out.println("threadMax: " + threadMax);
if( threadMax > max ) {
max = threadMax;
}
}
return max;
} catch( RuntimeException e ) {
throw e;
} catch( Exception e ) {
throw new RuntimeException(e);
} finally {
pool.shutdown();
}
}
public static void main(String args[]) {
int x = 1000;
int y = 1000;
long max = Long.MIN_VALUE;
long[][] foo = new long[x][y];
for(int i=0;i<x;i++) {
for(int j=0;j<y;j++) {
long r = (long)(Math.random() * 100000000);
if( r > max ) {
// save this to compare against pmax:
max = r;
}
foo[i][j] = r;
}
}
int numThreads = 32;
long pmax = pmax(foo, numThreads);
System.out.println("max: " + max);
System.out.println("pmax: " + pmax);
}
}
Bonus: If you're calling this method repeatedly then it would probably make sense to pull the ExecutorService creation out of the method and have it be reused across calls.
Well, that definetly is an issue - but without more code it is hard to understand if it is the only thing.
There is basically a race condition between the access of thread[0] (and this read of sharedMax) and the modification of the sharedMax in other threads.
Think what happens if the scheduler decides to let no let any thread run for now - so when you are done creating the threads, you will return the answer without modifying it even once! (of course there are other possible scenarios...)
You can overcome it by join()ing all threads before returning an answer.

Categories

Resources