I need to split an array into 5 parts (the last one would not be an equal part probably),and feed into threads for processing in parallel.
An attempt is as below:
int skip=arr.length/5;
for(int i=0;i<arr.length;i+=skip){
int[] ssub;
if(i+skip>=arr.length){
sub=new int[arr.length-1]
}
else
{
sub=new int[skip+1];
}
System.arraycopy(arr,0,sub,0,sub.length);
Thread t=new Runnable(barrier,sub){
};
t.start();
}
Any suggestion to make it more functional and avoiding local arrays is welcome.
Well in terms of making well organized Threads. You should look into ExecutorService, Simple youtube video on ExecutorService
In terms of organizing the arrays you can either split them non-locally or can create a Queue for that look towards, Java Queue implementations, which one?
These threads may work at different speeds so a Queue is recommended please look into them.
Related
I am writing a command-line application in Java 8. There's a part that involves some computation, and I believe it could benefit from running in parallel using multiple threads. However, I have not much experience in writing multi-threaded applications, so I hope you could steer me in the right direction how should I design the parallel part of my code.
For simplicity, let's pretend the method in question receives a relatively big array of longs, and it should return a Set containing only prime numbers:
public final static boolean checkIfNumberIsPrime(long number) {
// algorithm implementation, not important here
// ...
}
// a single-threaded version
public Set<Long> extractPrimeNumbers(long[] inputArray) {
Set<Long> result = new HashSet<>();
for (long number : inputArray) {
if (checkIfNumberIsPrime(number)) {
result.add(number);
}
}
return result;
}
Now, I would like to refactor method extractPrimeNumbers() in such way that it would be executed by four threads in parallel, and when all of them are finished, return the result. Off the top of my head, I have the following questions:
Which approach would be more suitable for the task: ExecutorService or Fork/Join? (each element of inputArray[] is completely independent and they can be processed in any order whatsoever)
Assuming there are 1 million elements in inputArray[], should I "ask" thread #1 to process all indexes 0..249999, thread #2 - 250000..499999, thread #3 - 500000..749999 and thread #4 - 750000..999999? Or should I rather treat each element of inputArray[] as a separate task to be queued and then executed by an applicable worker thread?
If a prime number is detected, it should be added to `Set result, therefore it needs to be thread-safe (synchronized). So, perhaps it would be better if each thread maintained its own, local result-set, and only when it is finished, it would transfer its contents to the global result, in one go?
Is Spliterator of any use here? Should they be used to partition inputArray[] somehow?
Parallel stream
Use none of these. Parallel streams are going to be enough to deal with this problem much more straightforwardly than any of the alternatives you list.
return Arrays.parallelStream(inputArray)
.filter(n -> checkIfNumberIsPrime(n))
.boxed()
.collect(Collectors.toSet());
For more info, see The Java™ Tutorials > Aggregate Operations > Parallelism.
I have a method
public boolean contains(int valueToFind, List<Integer> list) {
//
}
How can I split the array into x chunks? and have a new thread for searching every chunk looking for the value. If the method returns true, I would like to stop the other threads from searching.
I see there are lots of examples for simply splitting work between threads, but how I do structure it so that once one thread returns true, all threads and return that as the answer?
I do not want to use parallel streams for this reason (from source):
If you do, please look at the previous example again. There is a big
error. Do you see it? The problem is that all parallel streams use
common fork-join thread pool, and if you submit a long-running task,
you effectively block all threads in the pool. Consequently, you block
all other tasks that are using parallel streams. Imagine a servlet
environment, when one request calls getStockInfo() and another one
countPrimes(). One will block the other one even though each of them
requires different resources. What's worse, you can not specify thread
pool for parallel streams; the whole class loader has to use the same
one.
You could use the built-in Stream API:
//For a List
public boolean contains(int valueToFind, List<Integer> list) {
return list.parallelStream().anyMatch(Integer.valueOf(valueToFind)::equals);
}
//For an array
public boolean contains(int valueToFind, int[] arr){
return Arrays.stream(arr).parallel().anyMatch(x -> x == valueToFind);
}
Executing Streams in Parallel:
You can execute streams in serial or in parallel. When a stream executes in parallel, the Java runtime partitions the stream into multiple substreams. Aggregate operations iterate over and process these substreams in parallel and then combine the results.
When you create a stream, it is always a serial stream unless otherwise specified. To create a parallel stream, invoke the operation Collection.parallelStream.
I have a simple multi threading problem (in Java). I have 2 sets of 4 very large arrays and I have 4 threads, 1 for each array in the set. I want the threads, in parallel, to check if both sets, if their arrays have identical values. If one of the values in one of the arrays does not match the corresponding index value in the other array, then the two sets are not identical and all threads should stop what they are doing and move on to next 2 sets of 4 very large arrays. This process continues until all the pairs of array sets have been compared and deemed equal or not equal. I want all the threads to stop when one of the threads finds a mis-match. What is the correct way to implement this?
Here's one simple solution, but I don't know if it's the most efficient: Simply declare an object with a public boolean field.
public class TerminationEvent {
public boolean terminated = false;
}
Before starting the threads, create a new TerminationEvent object. Use this object as a parameter when you construct the thread objects, e.g.
public class MyThread implements Runnable {
private TerminationEvent terminationEvent;
public MyThread(TerminationEvent event) {
terminationEvent = event;
}
}
The same object will be passed to every MyThread, so they will all see the same boolean.
Now, the run() method in each MyThread will have something like
if (terminationEvent.terminated) {
break;
}
in the loop, and will set terminationEvent.terminated = true; when the other threads need to stop.
(Normally I wouldn't use public fields like terminated, but you said you wanted efficiency. I think this is a bit more efficient than a getter method, but I haven't tried benchmarking anything. Also, in a simple case like this, I don't think you need to worry about synchronization when the threads read or write the terminated field.)
Stopping other threads are usually done through the use of interrupts. Java threads do no longer use Thread.stop() because this was seen as unsafe in that it unlocks all monitors held by the thread, possibly leading to other threads being able to view objects in an inconsistent state (Ref: http://docs.oracle.com/javase/1.5.0/docs/guide/misc/threadPrimitiveDeprecation.html). The threads are not "stopped" as such, but are commonly used to set a flag false:
The thread should check the interrupted flag (infrequently) before performing computations:
if (Thread.interrupted()) {
throw new InterruptedException();
}
Use a volatile variable to set the abort condition. In your check loop that is run by all threads, let those threads check a number N of values uninterrupted so they don't have to fetch the volatile too often, which may be costly compared to the value match test. Benchmark your solution to find the optimum for N on your target hardware.
Another way would be to use a ForkJoin approach where your result is true if a mismatch was found. Divide your array slices down to a minimum size similar to N.
I'm reading the book "Java SE 8 for the really impatient", in the first chapter I came across with the next exercise question:
Is the comparator code in the Arrays.sort method called in the same thread as the call to sort or a different thread?
I've searched the javadoc for the Arrays.sort overloading which takes a Comparator argument but it doesn't specify anything about threads.
I assume that for performance reasons that code could be executed in another thread, but it is just a guess.
You can always test it by logging the id of Thread.currentThread().
Add something this just before calling sort() and in your compare() method.
logger.debug("Thread # " + Thread.currentThread().getId());
Imagine you have code to get the largest element in an array:
int[] array = new int[] {.........};
/// Few/many lines of code between...
Arrays.sort(array);
int largest = array[array.length - 1];
If sorting were spawned in another thread, you'd have a race condition -- would you sort the array first, or would largest get assigned to first? You could avoid that problem by locking array, but what happens if the code you're running in already locked array? You can block the original thread using join(), but then you've pretty much defeated the purpose of spawning another thread, as your code would act exactly the same way as if there were no additional thread spawned.
For Arrays#sort(), the sorting takes place in the original thread, as there really isn't much point in spawning another thread. Your thread will block until the sorting is complete, just like for any other piece of code.
The closest thing to spawning another thread to sort in is the Arrays#parallelSort() method introduced in Java 8. This still acts pretty much the same as the regular Arrays#sort(), as it blocks your current thread until the sorting is done, but there are threads spawned in the background to help sort the array. For larger datasets, I'd expect an improvement of around the number of threads that were spawned, minus whatever threading overhead there might be.
In the first test the code will run in single Thread.
In the second test the code will run in multiple Threads.
#Test
public void shouldSortInSingleThread() {
List<String> labels = new ArrayList<String>();
IntStream.range(0, 50000).forEach(nbr -> labels.add("str" + nbr));
System.out.println(Thread.currentThread());
Arrays.sort(labels.toArray(new String[] {}), (String first,
String second) -> {
System.out.println(Thread.currentThread());
return Integer.compare(first.length(), second.length());
});
}
#Test
public void shouldSortInParallel() {
List<String> labels = new ArrayList<String>();
IntStream.range(0, 50000).forEach(nbr -> labels.add("str" + nbr));
System.out.println(Thread.currentThread());
Arrays.parallelSort(labels.toArray(new String[] {}), (String first,
String second) -> {
System.out.println(Thread.currentThread());
return Integer.compare(first.length(), second.length());
});
}
I am writing a program that reads words from a file and sorts them in alphabetical order. You provide the input and output files in the command line, and the program reads the words from the input file and writes a sorted list back to the output file. This is done, and it works as it should do. No questions here.
I am not looking for specific code, but rather help on how to approach a problem. The next part of the assignment states that in the command line, you are to be able to set the number of Threads you want the program to use in the sorting process.
For instance, if you compile with the following:
java Sort 12 infile.txt outfile.txt
The above program is meant to use 12 Threads to sort the words from "infile.txt". Each Thread is to sort a number of N = (numberOfWords)/(numberOfThreads) words. All the words are read into memory, before the Threads are started. I'm aware that this might sound cryptic, but I have been googling around looking for a good explanation on "multithreading"/defining the number of Threads in a Java program, yet I am not any wiser.
If anyone knows how to explain how you can set the number of Threads in Java, even with a small example, I would be very grateful!
Thanks!
You could use the Executors.newFixedThreadPool(int nThreads) method (see details here) to get a ThreadPool with the required number of threads. Then, divide your work into the appropriate number of chunks (12 in your example), create a Runnable object for each chunk of work and pass those Runnable objects to the ThreadPool's submit method.
Oh sure. Well a thread is just a class with a "run" method.
You create the class and either have it extend Thread or implement Runnable. If you extend thread you can just call Thread.start() on it and that would start the thread. If you implement Runnable instead you have to so something like Thread t = new Thread(yourRunnableClass);, and then start T.
So for your example:
public class Sort {
class RunnableClass implements Runnable(){
String args;
RunnableClass(String[] args){
this.args = args;
}
run(){
//Do your sorting
}
}
public static void main(String[] args){
//some code that chops the args beyond arg 0 into arrays or something
int numberOfThreads = Integer.parseInt(args[0]);
for(int x=0;x<numberOfThreads;x++){
Thread t = new Thread(new RunnableClass(String[] wordsToSort));
}
//something to manage the threads and coordinate their work
}
}
You could make this more elaborate or complex, one simple implementation would be to just loop over the words, passing 2 to each thread to sort and then once the threads complete if the order didn't change increment along the list till no orders change. That's a form of bubble sort. So in other words Thread A sorts words 1 and 2 Thread B sorts words 3 and 4 and so on.
The threads can communicate with each other, share state or have their own state, etc. There are many ways to implement this.
The threads could terminate, or be re-entrant, could have state, etc.
Executors class has static newFixedThreadPool (int numberOfThreads) that can be given the number of threads to pool. For example, if you have class implementing Runnable
public class MyCustomThread implements Runnable {
#Override
public void run() {
//do your work
}
}
you can create pool with 5 threads like this
..
int numberOfThreads = 5;
ExecutorService srv = Executors.newFixedThreadPool(numberOfThreads);
for (int i = 0; i < numberOfThreads; i++) {
srv.execute(new MyCustomThread());
}
Using ExecutorService it will be much easier for you to manage lifecycles of threads. Read Oracle concurrency tutorial for more information.
Here I want to ask you one question is which version of java you are using. As this task is not trivial to achieve as you are required to take care of couple things like threads join etc. Java 7 has a feature 'Fork/Join' by which you can leverage the task.
You can refer the following for an example.
Sorting using Fork/Join
You can start from this
What you're looking for is a Fork/Join framework. This splits a single task into parts, handing the parts to multiple threads to be processed.
ExecutorService's FixedThreadPool allows you to create 12 worker threads, but leaves you with all the hard work of separating the work between the threads. The Fork/Join framework makes this easy, using a recursive system to break the process down if needed so it could be split between threads.
http://docs.oracle.com/javase/tutorial/essential/concurrency/forkjoin.html
define a runnable, then in a loop add new threads with that runnable to a list. then start all the threads, either in the same loop or a separate one, passing all the words you need to process to each runnable on construction?
you will also have to control access to the output file, and possibly the input file depending on how you access it, otherwise your thread will run into trouble, so take a look at race-conditions and how to deal with them