I'm reading the book "Java SE 8 for the really impatient", in the first chapter I came across with the next exercise question:
Is the comparator code in the Arrays.sort method called in the same thread as the call to sort or a different thread?
I've searched the javadoc for the Arrays.sort overloading which takes a Comparator argument but it doesn't specify anything about threads.
I assume that for performance reasons that code could be executed in another thread, but it is just a guess.
You can always test it by logging the id of Thread.currentThread().
Add something this just before calling sort() and in your compare() method.
logger.debug("Thread # " + Thread.currentThread().getId());
Imagine you have code to get the largest element in an array:
int[] array = new int[] {.........};
/// Few/many lines of code between...
Arrays.sort(array);
int largest = array[array.length - 1];
If sorting were spawned in another thread, you'd have a race condition -- would you sort the array first, or would largest get assigned to first? You could avoid that problem by locking array, but what happens if the code you're running in already locked array? You can block the original thread using join(), but then you've pretty much defeated the purpose of spawning another thread, as your code would act exactly the same way as if there were no additional thread spawned.
For Arrays#sort(), the sorting takes place in the original thread, as there really isn't much point in spawning another thread. Your thread will block until the sorting is complete, just like for any other piece of code.
The closest thing to spawning another thread to sort in is the Arrays#parallelSort() method introduced in Java 8. This still acts pretty much the same as the regular Arrays#sort(), as it blocks your current thread until the sorting is done, but there are threads spawned in the background to help sort the array. For larger datasets, I'd expect an improvement of around the number of threads that were spawned, minus whatever threading overhead there might be.
In the first test the code will run in single Thread.
In the second test the code will run in multiple Threads.
#Test
public void shouldSortInSingleThread() {
List<String> labels = new ArrayList<String>();
IntStream.range(0, 50000).forEach(nbr -> labels.add("str" + nbr));
System.out.println(Thread.currentThread());
Arrays.sort(labels.toArray(new String[] {}), (String first,
String second) -> {
System.out.println(Thread.currentThread());
return Integer.compare(first.length(), second.length());
});
}
#Test
public void shouldSortInParallel() {
List<String> labels = new ArrayList<String>();
IntStream.range(0, 50000).forEach(nbr -> labels.add("str" + nbr));
System.out.println(Thread.currentThread());
Arrays.parallelSort(labels.toArray(new String[] {}), (String first,
String second) -> {
System.out.println(Thread.currentThread());
return Integer.compare(first.length(), second.length());
});
}
Related
I am writing a command-line application in Java 8. There's a part that involves some computation, and I believe it could benefit from running in parallel using multiple threads. However, I have not much experience in writing multi-threaded applications, so I hope you could steer me in the right direction how should I design the parallel part of my code.
For simplicity, let's pretend the method in question receives a relatively big array of longs, and it should return a Set containing only prime numbers:
public final static boolean checkIfNumberIsPrime(long number) {
// algorithm implementation, not important here
// ...
}
// a single-threaded version
public Set<Long> extractPrimeNumbers(long[] inputArray) {
Set<Long> result = new HashSet<>();
for (long number : inputArray) {
if (checkIfNumberIsPrime(number)) {
result.add(number);
}
}
return result;
}
Now, I would like to refactor method extractPrimeNumbers() in such way that it would be executed by four threads in parallel, and when all of them are finished, return the result. Off the top of my head, I have the following questions:
Which approach would be more suitable for the task: ExecutorService or Fork/Join? (each element of inputArray[] is completely independent and they can be processed in any order whatsoever)
Assuming there are 1 million elements in inputArray[], should I "ask" thread #1 to process all indexes 0..249999, thread #2 - 250000..499999, thread #3 - 500000..749999 and thread #4 - 750000..999999? Or should I rather treat each element of inputArray[] as a separate task to be queued and then executed by an applicable worker thread?
If a prime number is detected, it should be added to `Set result, therefore it needs to be thread-safe (synchronized). So, perhaps it would be better if each thread maintained its own, local result-set, and only when it is finished, it would transfer its contents to the global result, in one go?
Is Spliterator of any use here? Should they be used to partition inputArray[] somehow?
Parallel stream
Use none of these. Parallel streams are going to be enough to deal with this problem much more straightforwardly than any of the alternatives you list.
return Arrays.parallelStream(inputArray)
.filter(n -> checkIfNumberIsPrime(n))
.boxed()
.collect(Collectors.toSet());
For more info, see The Java™ Tutorials > Aggregate Operations > Parallelism.
Here 2 threads work on same arraylist and one thread read the elements and another thread remove a specific element . I expect this to throw ConcurrentModificationException . But it is not throwing why?
import java.util.ArrayList;
import java.util.ConcurrentModificationException;
import java.util.Iterator;
public class IteratorStudies {
public static final ArrayList<String> arr ;
static{
arr = new ArrayList<>();
for(int i=0;i<100;i++) {
arr.add("someCommonValue");
}
arr.add("someSpecialValue");
}
private static Integer initialValue = 4;
public static void main(String x[]) {
Thread t1 = new Thread(){
#Override
public void start(){
Iterator<String> arrIter = arr.iterator();
while(arrIter.hasNext()){
try {
String str = arrIter.next();
System.out.println("value :" + str);
}catch(ConcurrentModificationException e){
e.printStackTrace();
}
}
System.out.println("t1 complete:"+arr);
}
};
Thread t2 = new Thread(){
#Override
public void start(){
Iterator<String> arrIter = arr.iterator();
while(arrIter.hasNext()){
String str = arrIter.next();
if(str.equals("someSpecialValue")){
arrIter.remove();
}
}
System.out.println("t2 complete:"+arr);
}
};
t2.start();
t1.start();
}
}
You've made 2 somewhat common mistakes.
ConcurrentModificationException is not about concurrency
You'd think, given the name, that CoModEx is about concurrency. It's not. As in, you don't need threads to get it. Here, this trivial code will throw it:
void example() {
var list = new ArrayList<String>();
list.add("a");
list.add("b");
for (String elem : list) {
if (elem.equals("a")) list.remove(elem);
}
}
That's becauseCoModEx is thrown by iterators and simply means this happened:
Somebody made an iterator.
Somebody changed the list somehow (and not via the iterator's .remove() method)
Somebody runs any relevant method on the iterator made in #1
So, in the above, the foreach loop implicitly makes an iterator (#1), then the list.remove method is invoked (#2), then by hitting the foreach loop again, we call a relevant method on that iterator (.hasNext()), and, voila, CoModEx occurs.
In fact, multithreaded is less likely: After all, you should assume that if you interact with some object from multiple threads, that it is broken, in that behaviour is unspecified, thus, you have a bug, and worse, a hard to test for one. If you modify a plain jane arraylist from another thread whilst iterating over it, you are not guaranteed a CoModEx. You may get it. You may not. The computer may walk off the desk and try its luck on broadway. "Unspecified behaviour" is a nice way of saying: "Don't, seriously. It'll hurt the whole time because you cannot test it; this will work fine the entire time you are developing it, and juuust as you're giving that important demo to big wig client, it'll fail on you, in embarassing ways".
The way to interact with one object from multiple threads is very carefully: Check the docs of the specific object explicitly states what happens (i.e. use stuff from the java.util.concurrent package which is specifically designed with 'interact with it from more than one thread' use cases in mind), and failing that, use locking. These are tricky things, so the usual way to do multi-threading in java is to not have shared state in the first place. Isolate as much as you can, invert control, and use messaging strategies that have built-in transactional intrinsics, such as message queues (rabbitmq and friends) and databases (which have transactions).
How to use Threads
You override the run() method, and then start the thread by calling the start method. Or better yet, don't override run, pass a Runnable instance along as you create the thread instance.
That's how you use thread. You didn't - you overrode start, which means starting these threads doesn't make a new thread at all, it just runs the payload in your thread. That explains your specific case, but what you're trying to do (witness CoModEx by messing with a list from another thread) doesn't get you a CoModEx either - it gets you unspecified behaviour, which means anything goes.
You have overridden start methods for both thread instances instead of run, and these methods complete within the main execution thread, thus, no simultaneous thread execution takes place and no ConcurrentModificationThreadException is able to occur here.
We have multiple threads calling add(obj) on an ArrayList.
My theory is that when add is called concurrently by two threads, that only one of the two objects being added is really added to the ArrayList. Is this plausable?
If so, how do you get around this? Use a synchronized collection like Vector?
There is no guaranteed behavior for what happens when add is called concurrently by two threads on ArrayList. However, it has been my experience that both objects have been added fine. Most of the thread safety issues related to lists deal with iteration while adding/removing. Despite this, I strongly recommend against using vanilla ArrayList with multiple threads and concurrent access.
Vector used to be the standard for concurrent lists, but now the standard is to use the Collections synchronized list.
Also I highly recommend Java Concurrency in Practice by Goetz et al if you're going to be spending any time working with threads in Java. The book covers this issue in much better detail.
Any number of things could happen. You could get both objects added correctly. You could get only one of the objects added. You could get an ArrayIndexOutOfBounds exception because the size of the underlying array was not adjusted properly. Or other things may happen. Suffice it to say that you cannot rely on any behavior occurring.
As alternatives, you could use Vector, you could use Collections.synchronizedList, you could use CopyOnWriteArrayList, or you could use a separate lock. It all depends on what else you are doing and what kind of control you have over access to the collection.
You could also get a null, an ArrayOutOfBoundsException, or something left up to the implementation. HashMaps have been observed to go into an infinite loop in production systems. You don't really need to know what might go wrong, just don't do it.
You could use Vector, but it tends to work out the interface is not rich enough. You will probably find that you want a different data structure in most cases.
I came up with the following code to mimic somewhat a real world scenario.
100 tasks are run in parallel and they update their completed status to the main program. I use a CountDownLatch to wait for task completion.
import java.util.concurrent.*;
import java.util.*;
public class Runner {
// Should be replaced with Collections.synchronizedList(new ArrayList<Integer>())
public List<Integer> completed = new ArrayList<Integer>();
/**
* #param args
*/
public static void main(String[] args) {
Runner r = new Runner();
ExecutorService exe = Executors.newFixedThreadPool(30);
int tasks = 100;
CountDownLatch latch = new CountDownLatch(tasks);
for (int i = 0; i < tasks; i++) {
exe.submit(r.new Task(i, latch));
}
try {
latch.await();
System.out.println("Summary:");
System.out.println("Number of tasks completed: "
+ r.completed.size());
} catch (InterruptedException e) {
e.printStackTrace();
}
exe.shutdown();
}
class Task implements Runnable {
private int id;
private CountDownLatch latch;
public Task(int id, CountDownLatch latch) {
this.id = id;
this.latch = latch;
}
public void run() {
Random r = new Random();
try {
Thread.sleep(r.nextInt(5000)); //Actual work of the task
} catch (InterruptedException e) {
e.printStackTrace();
}
completed.add(id);
latch.countDown();
}
}
}
When i ran the application 10 times and at least 3 to 4 times the program did not print correct number of completed tasks. Ideally it should print 100(if no exceptions happen). But in some cases it was printing 98, 99 etc.
Thus it proves that concurrent updates of ArrayList will not give correct results.
If i replace the ArrayList with a Synchronized version, the program outputs the correct results.
you can use List l = Collections.synchronizedList(new ArrayList()); if you want thread safe version of arrayList.
The behavior is probably undefined since ArrayList isn't threadsafe. If you modify the list while an Iterator is interating over it then you will get a ConcurrentModificationException. You can wrap the ArrayList with Collection.synchronizedList or use a thread-safe collection (there are many), or just put the add calls in a synchronized block.
You could use instead of ArrayList(); :
Collections.synchronizedList( new ArrayList() );
or
new Vector();
synchronizedList as of me preferable because it's:
faster on 50-100%
can work with already existing ArrayList's
In my recently experience using ArrayList to add new elements from different threads will miss a few of them, so using Collections.synchronizedList(new ArrayList()) avoid that issue.
List<String> anotherCollection = new ArrayList<>();
List<String> list = new ArrayList<>();
// if 'anotherCollection' is bigger enough it will miss some elements.
anotherCollection.parallelStream().forEach(el -> list.add("element" + el));
List<String> listSync = Collections.synchronizedList(new ArrayList<>());
// regardless of 'anotherCollection' is bigger it will add all the elements.
anotherCollection.parallelStream().forEach(el -> list.add("element" + el));
java.util.concurrent has a thread-safe array list. The standard ArrayList is not thread-safe and the behavior when multiple threads update at the same time is undefined. There can also be odd behaviors with multiple readers when one or more threads is writing at the same time.
http://java.sun.com/j2se/1.4.2/docs/api/java/util/ArrayList.html
Note that this implementation is not synchronized. If multiple threads access an ArrayList instance concurrently, and at least one of the threads modifies the list structurally, it must be synchronized externally.
Since there is no synchronization internally, what you theorize is not plausible.
So, things get out of sync, with unpleasant and unpredictable results.
I need to split an array into 5 parts (the last one would not be an equal part probably),and feed into threads for processing in parallel.
An attempt is as below:
int skip=arr.length/5;
for(int i=0;i<arr.length;i+=skip){
int[] ssub;
if(i+skip>=arr.length){
sub=new int[arr.length-1]
}
else
{
sub=new int[skip+1];
}
System.arraycopy(arr,0,sub,0,sub.length);
Thread t=new Runnable(barrier,sub){
};
t.start();
}
Any suggestion to make it more functional and avoiding local arrays is welcome.
Well in terms of making well organized Threads. You should look into ExecutorService, Simple youtube video on ExecutorService
In terms of organizing the arrays you can either split them non-locally or can create a Queue for that look towards, Java Queue implementations, which one?
These threads may work at different speeds so a Queue is recommended please look into them.
We have multiple threads calling add(obj) on an ArrayList.
My theory is that when add is called concurrently by two threads, that only one of the two objects being added is really added to the ArrayList. Is this plausable?
If so, how do you get around this? Use a synchronized collection like Vector?
There is no guaranteed behavior for what happens when add is called concurrently by two threads on ArrayList. However, it has been my experience that both objects have been added fine. Most of the thread safety issues related to lists deal with iteration while adding/removing. Despite this, I strongly recommend against using vanilla ArrayList with multiple threads and concurrent access.
Vector used to be the standard for concurrent lists, but now the standard is to use the Collections synchronized list.
Also I highly recommend Java Concurrency in Practice by Goetz et al if you're going to be spending any time working with threads in Java. The book covers this issue in much better detail.
Any number of things could happen. You could get both objects added correctly. You could get only one of the objects added. You could get an ArrayIndexOutOfBounds exception because the size of the underlying array was not adjusted properly. Or other things may happen. Suffice it to say that you cannot rely on any behavior occurring.
As alternatives, you could use Vector, you could use Collections.synchronizedList, you could use CopyOnWriteArrayList, or you could use a separate lock. It all depends on what else you are doing and what kind of control you have over access to the collection.
You could also get a null, an ArrayOutOfBoundsException, or something left up to the implementation. HashMaps have been observed to go into an infinite loop in production systems. You don't really need to know what might go wrong, just don't do it.
You could use Vector, but it tends to work out the interface is not rich enough. You will probably find that you want a different data structure in most cases.
I came up with the following code to mimic somewhat a real world scenario.
100 tasks are run in parallel and they update their completed status to the main program. I use a CountDownLatch to wait for task completion.
import java.util.concurrent.*;
import java.util.*;
public class Runner {
// Should be replaced with Collections.synchronizedList(new ArrayList<Integer>())
public List<Integer> completed = new ArrayList<Integer>();
/**
* #param args
*/
public static void main(String[] args) {
Runner r = new Runner();
ExecutorService exe = Executors.newFixedThreadPool(30);
int tasks = 100;
CountDownLatch latch = new CountDownLatch(tasks);
for (int i = 0; i < tasks; i++) {
exe.submit(r.new Task(i, latch));
}
try {
latch.await();
System.out.println("Summary:");
System.out.println("Number of tasks completed: "
+ r.completed.size());
} catch (InterruptedException e) {
e.printStackTrace();
}
exe.shutdown();
}
class Task implements Runnable {
private int id;
private CountDownLatch latch;
public Task(int id, CountDownLatch latch) {
this.id = id;
this.latch = latch;
}
public void run() {
Random r = new Random();
try {
Thread.sleep(r.nextInt(5000)); //Actual work of the task
} catch (InterruptedException e) {
e.printStackTrace();
}
completed.add(id);
latch.countDown();
}
}
}
When i ran the application 10 times and at least 3 to 4 times the program did not print correct number of completed tasks. Ideally it should print 100(if no exceptions happen). But in some cases it was printing 98, 99 etc.
Thus it proves that concurrent updates of ArrayList will not give correct results.
If i replace the ArrayList with a Synchronized version, the program outputs the correct results.
you can use List l = Collections.synchronizedList(new ArrayList()); if you want thread safe version of arrayList.
The behavior is probably undefined since ArrayList isn't threadsafe. If you modify the list while an Iterator is interating over it then you will get a ConcurrentModificationException. You can wrap the ArrayList with Collection.synchronizedList or use a thread-safe collection (there are many), or just put the add calls in a synchronized block.
You could use instead of ArrayList(); :
Collections.synchronizedList( new ArrayList() );
or
new Vector();
synchronizedList as of me preferable because it's:
faster on 50-100%
can work with already existing ArrayList's
In my recently experience using ArrayList to add new elements from different threads will miss a few of them, so using Collections.synchronizedList(new ArrayList()) avoid that issue.
List<String> anotherCollection = new ArrayList<>();
List<String> list = new ArrayList<>();
// if 'anotherCollection' is bigger enough it will miss some elements.
anotherCollection.parallelStream().forEach(el -> list.add("element" + el));
List<String> listSync = Collections.synchronizedList(new ArrayList<>());
// regardless of 'anotherCollection' is bigger it will add all the elements.
anotherCollection.parallelStream().forEach(el -> list.add("element" + el));
java.util.concurrent has a thread-safe array list. The standard ArrayList is not thread-safe and the behavior when multiple threads update at the same time is undefined. There can also be odd behaviors with multiple readers when one or more threads is writing at the same time.
http://java.sun.com/j2se/1.4.2/docs/api/java/util/ArrayList.html
Note that this implementation is not synchronized. If multiple threads access an ArrayList instance concurrently, and at least one of the threads modifies the list structurally, it must be synchronized externally.
Since there is no synchronization internally, what you theorize is not plausible.
So, things get out of sync, with unpleasant and unpredictable results.