speed up sequential java iterator possible? - java

this is how i normally iterate a collection
for(Iterator iterator = collectionthing.iterator(); iterator.hasNext();){
I believe most of us doing this, I wonder is there any better approach than have to iterate sequentially? is there any java library..can I can make this parallel executed by multi-code cpu? =)
looking forward feedback from you all.

Java's multithreading is quite low level in this respect. The best you could do is something like this:
ExecutorService executor = Executors.newFixedThreadPool(10);
for (final Object item : collectionThingy) {
executor.submit(new Runnable() {
#Override
public void run() {
// do stuff with item
}
});
}
executor.shutdown();
executor.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
This is Java 6 code. If running on Java 5 drop the #Override annotation (it doesn't apply to objects implementing interfaces in java 5 but it does in Java 6).
What this does is it creates a task for each item in the collection. A thread pool (size 10) is created to run those tasks). You can replace that with anything you want. Lastly, the thread pool is shut down and the code blocks awaiting the finishing of all the tasks.
The last has at least one or two exceptions you will need to catch. At a guess, InterruptedException and ExecutionException.

In most cases, the added complexity wouldn't be worth the potential performance gain. However, if you needed to process a Collection in multiple threads, you could possibly use Executors to do this, which would run all the tasks in a pool of threads:
int numThreads = 4;
ExecutorService threadExecutor = Executors.newFixedThreadPool(numThreads);
for(Iterator iterator = collectionthing.iterator(); iterator.hasNext();){
Runnable runnable = new CollectionThingProcessor(iterator.next());
threadExecutor.execute(runnable);
}

As part of the fork-join framework JDK7 should (although not certain) have parallel arrays. This is designed to allow efficient implementation of certain operations across arrays on many-core machines. But just cutting the array into pieces and throwing it at a thread pool will also work.

Sorry, Java does not have this sort of language-level support for automatic parallelism, if you wish it, you will have to implement it yourself using libraries and threads.

Related

Should I make a list thread safe even if I only add to it? [duplicate]

We have multiple threads calling add(obj) on an ArrayList.
My theory is that when add is called concurrently by two threads, that only one of the two objects being added is really added to the ArrayList. Is this plausable?
If so, how do you get around this? Use a synchronized collection like Vector?
There is no guaranteed behavior for what happens when add is called concurrently by two threads on ArrayList. However, it has been my experience that both objects have been added fine. Most of the thread safety issues related to lists deal with iteration while adding/removing. Despite this, I strongly recommend against using vanilla ArrayList with multiple threads and concurrent access.
Vector used to be the standard for concurrent lists, but now the standard is to use the Collections synchronized list.
Also I highly recommend Java Concurrency in Practice by Goetz et al if you're going to be spending any time working with threads in Java. The book covers this issue in much better detail.
Any number of things could happen. You could get both objects added correctly. You could get only one of the objects added. You could get an ArrayIndexOutOfBounds exception because the size of the underlying array was not adjusted properly. Or other things may happen. Suffice it to say that you cannot rely on any behavior occurring.
As alternatives, you could use Vector, you could use Collections.synchronizedList, you could use CopyOnWriteArrayList, or you could use a separate lock. It all depends on what else you are doing and what kind of control you have over access to the collection.
You could also get a null, an ArrayOutOfBoundsException, or something left up to the implementation. HashMaps have been observed to go into an infinite loop in production systems. You don't really need to know what might go wrong, just don't do it.
You could use Vector, but it tends to work out the interface is not rich enough. You will probably find that you want a different data structure in most cases.
I came up with the following code to mimic somewhat a real world scenario.
100 tasks are run in parallel and they update their completed status to the main program. I use a CountDownLatch to wait for task completion.
import java.util.concurrent.*;
import java.util.*;
public class Runner {
// Should be replaced with Collections.synchronizedList(new ArrayList<Integer>())
public List<Integer> completed = new ArrayList<Integer>();
/**
* #param args
*/
public static void main(String[] args) {
Runner r = new Runner();
ExecutorService exe = Executors.newFixedThreadPool(30);
int tasks = 100;
CountDownLatch latch = new CountDownLatch(tasks);
for (int i = 0; i < tasks; i++) {
exe.submit(r.new Task(i, latch));
}
try {
latch.await();
System.out.println("Summary:");
System.out.println("Number of tasks completed: "
+ r.completed.size());
} catch (InterruptedException e) {
e.printStackTrace();
}
exe.shutdown();
}
class Task implements Runnable {
private int id;
private CountDownLatch latch;
public Task(int id, CountDownLatch latch) {
this.id = id;
this.latch = latch;
}
public void run() {
Random r = new Random();
try {
Thread.sleep(r.nextInt(5000)); //Actual work of the task
} catch (InterruptedException e) {
e.printStackTrace();
}
completed.add(id);
latch.countDown();
}
}
}
When i ran the application 10 times and at least 3 to 4 times the program did not print correct number of completed tasks. Ideally it should print 100(if no exceptions happen). But in some cases it was printing 98, 99 etc.
Thus it proves that concurrent updates of ArrayList will not give correct results.
If i replace the ArrayList with a Synchronized version, the program outputs the correct results.
you can use List l = Collections.synchronizedList(new ArrayList()); if you want thread safe version of arrayList.
The behavior is probably undefined since ArrayList isn't threadsafe. If you modify the list while an Iterator is interating over it then you will get a ConcurrentModificationException. You can wrap the ArrayList with Collection.synchronizedList or use a thread-safe collection (there are many), or just put the add calls in a synchronized block.
You could use instead of ArrayList(); :
Collections.synchronizedList( new ArrayList() );
or
new Vector();
synchronizedList as of me preferable because it's:
faster on 50-100%
can work with already existing ArrayList's
In my recently experience using ArrayList to add new elements from different threads will miss a few of them, so using Collections.synchronizedList(new ArrayList()) avoid that issue.
List<String> anotherCollection = new ArrayList<>();
List<String> list = new ArrayList<>();
// if 'anotherCollection' is bigger enough it will miss some elements.
anotherCollection.parallelStream().forEach(el -> list.add("element" + el));
List<String> listSync = Collections.synchronizedList(new ArrayList<>());
// regardless of 'anotherCollection' is bigger it will add all the elements.
anotherCollection.parallelStream().forEach(el -> list.add("element" + el));
java.util.concurrent has a thread-safe array list. The standard ArrayList is not thread-safe and the behavior when multiple threads update at the same time is undefined. There can also be odd behaviors with multiple readers when one or more threads is writing at the same time.
http://java.sun.com/j2se/1.4.2/docs/api/java/util/ArrayList.html
Note that this implementation is not synchronized. If multiple threads access an ArrayList instance concurrently, and at least one of the threads modifies the list structurally, it must be synchronized externally.
Since there is no synchronization internally, what you theorize is not plausible.
So, things get out of sync, with unpleasant and unpredictable results.

Defining Multiple Threads in Java

I am writing a program that reads words from a file and sorts them in alphabetical order. You provide the input and output files in the command line, and the program reads the words from the input file and writes a sorted list back to the output file. This is done, and it works as it should do. No questions here.
I am not looking for specific code, but rather help on how to approach a problem. The next part of the assignment states that in the command line, you are to be able to set the number of Threads you want the program to use in the sorting process.
For instance, if you compile with the following:
java Sort 12 infile.txt outfile.txt
The above program is meant to use 12 Threads to sort the words from "infile.txt". Each Thread is to sort a number of N = (numberOfWords)/(numberOfThreads) words. All the words are read into memory, before the Threads are started. I'm aware that this might sound cryptic, but I have been googling around looking for a good explanation on "multithreading"/defining the number of Threads in a Java program, yet I am not any wiser.
If anyone knows how to explain how you can set the number of Threads in Java, even with a small example, I would be very grateful!
Thanks!
You could use the Executors.newFixedThreadPool(int nThreads) method (see details here) to get a ThreadPool with the required number of threads. Then, divide your work into the appropriate number of chunks (12 in your example), create a Runnable object for each chunk of work and pass those Runnable objects to the ThreadPool's submit method.
Oh sure. Well a thread is just a class with a "run" method.
You create the class and either have it extend Thread or implement Runnable. If you extend thread you can just call Thread.start() on it and that would start the thread. If you implement Runnable instead you have to so something like Thread t = new Thread(yourRunnableClass);, and then start T.
So for your example:
public class Sort {
class RunnableClass implements Runnable(){
String args;
RunnableClass(String[] args){
this.args = args;
}
run(){
//Do your sorting
}
}
public static void main(String[] args){
//some code that chops the args beyond arg 0 into arrays or something
int numberOfThreads = Integer.parseInt(args[0]);
for(int x=0;x<numberOfThreads;x++){
Thread t = new Thread(new RunnableClass(String[] wordsToSort));
}
//something to manage the threads and coordinate their work
}
}
You could make this more elaborate or complex, one simple implementation would be to just loop over the words, passing 2 to each thread to sort and then once the threads complete if the order didn't change increment along the list till no orders change. That's a form of bubble sort. So in other words Thread A sorts words 1 and 2 Thread B sorts words 3 and 4 and so on.
The threads can communicate with each other, share state or have their own state, etc. There are many ways to implement this.
The threads could terminate, or be re-entrant, could have state, etc.
Executors class has static newFixedThreadPool (int numberOfThreads) that can be given the number of threads to pool. For example, if you have class implementing Runnable
public class MyCustomThread implements Runnable {
#Override
public void run() {
//do your work
}
}
you can create pool with 5 threads like this
..
int numberOfThreads = 5;
ExecutorService srv = Executors.newFixedThreadPool(numberOfThreads);
for (int i = 0; i < numberOfThreads; i++) {
srv.execute(new MyCustomThread());
}
Using ExecutorService it will be much easier for you to manage lifecycles of threads. Read Oracle concurrency tutorial for more information.
Here I want to ask you one question is which version of java you are using. As this task is not trivial to achieve as you are required to take care of couple things like threads join etc. Java 7 has a feature 'Fork/Join' by which you can leverage the task.
You can refer the following for an example.
Sorting using Fork/Join
You can start from this
What you're looking for is a Fork/Join framework. This splits a single task into parts, handing the parts to multiple threads to be processed.
ExecutorService's FixedThreadPool allows you to create 12 worker threads, but leaves you with all the hard work of separating the work between the threads. The Fork/Join framework makes this easy, using a recursive system to break the process down if needed so it could be split between threads.
http://docs.oracle.com/javase/tutorial/essential/concurrency/forkjoin.html
define a runnable, then in a loop add new threads with that runnable to a list. then start all the threads, either in the same loop or a separate one, passing all the words you need to process to each runnable on construction?
you will also have to control access to the output file, and possibly the input file depending on how you access it, otherwise your thread will run into trouble, so take a look at race-conditions and how to deal with them

Does Java have an indexable multi-queue thread pool?

Is there a Java class such that:
Executable tasks can be added via an id, where all tasks with the same id are guaranteed to never run concurrently
The number of threads can be limited to a fixed amount
A naive solution of a Map would easily solve (1), but it would be difficult to manage (2). Similarly, all thread pooling classes that I know of will pull from a single queue, meaning (1) is not guaranteed.
Solutions involving external libraries are welcome.
For each id, you need a SerialExecutor, described in the documentation of java.util.concurrent.Executor. All serial executors delegate work to a ThreadPoolExecutor with given corePoolSize.
Opimized version of SerialExecutor can be found at my code samples.
If you don't find something that does this out of the box, it shouldn't be hard to roll your own. One thing you could do is to wrap each task in a simple class that reads on a queue unique per id, e.g.:
public static class SerialCaller<T> implements Callable<T> {
private final BlockingQueue<Caller<T>> delegates;
public SerialCaller(BLockingQueue<Caller<T>> delegates) {
this.delegates = delegates;
}
public T call() throws Exception {
return delegates.take().call();
}
}
It should be easy to maintain a map of ids to queues for submitting tasks. That satisfies condition (1), and then you can look for simple solutions to condition (2), such as Executors. newFixedThreadPool
I think that the simplest solution is to just have a separate queue for each index and a separate executor (with one thread) for each queue.
The only thing you could achieve with a more complex solution would be to use fewer threads, but if the number of indexes is small and bounded that's probably not worth the effort.
Yes, there is such a library now: https://github.com/jano7/executor
int maxTasks = 10;
ExecutorService underlyingExecutor = Executors.newFixedThreadPool(maxTasks);
KeySequentialBoundedExecutor executor = new KeySequentialBoundedExecutor(maxTasks, underlyingExecutor);
Runnable task = new Runnable() {
#Override
public void run() {
// do something
}
};
executor.execute(new KeyRunnable<>("ID-1", task)); // execute the task by the underlying executor
executor.execute(new KeyRunnable<>("ID-2", task)); // execution is not blocked by the task for ID-1
executor.execute(new KeyRunnable<>("ID-1", task)); // execution starts when the previous task for ID-1 completes

How to multithread a computationally intensive code segment in Java?

I hava a java program, a section of it is compute intensive, like this
for i = 1 :512
COMPUTE INTENSIVE SECTION
end
I want to split it into multithread, make it faster when running.
COMPUTE INTENSIVE SECTION is not sequential-wise. It means running i=1 first or i=5 fist are the same...
Can anybody give me a grand guide about this. How to do it?
Thanks indeed!
Happy Thanksgiving!
You should read the Concurrency Trail of the Java Tutorial. Especially Executors and Thread Pools should be relevant for you.
Basically, you create a thread pool (which is an Executor) through one of the factory methods in the Executors class and submit Runnable instances to it:
for(int i = 0; i < 512; i++){
executor.execute(new Runnable(){public void run(){
// your heavy code goes here
}});
}
Sounds like a thread pool would be good. Basically, you whip up a collection of N different threads, then request them in a loop. The request blocks until a thread is available.
ThreadPool pool = Executors.newFixedThreadPool(10); // 10 threads in the pool
ArrayList<Callable> collectionOfCallables = new ArrayList<Callable>( );
for (...) {
Callable callable = new Callable<Foo>() { public Foo call() { COMPUTE INTENSIVE SECTION } }
collectionOfCallables.add(callable);
}
ArrayList<Future<Foo>> results = pool.invokeAll( collectionOfCallables );
pool.awaitTermination(5, TimeUnit.MINUTES ); // blocks till everything is done or 5 minutes have passed.
With the Future's you really don't need to await termination. get()ing the result from a future will block until the corresponding thread is done (or canceled).
Look at any Java multi-threading tutorial, either the official one:
http://download.oracle.com/javase/tutorial/essential/concurrency/index.html
or some of the others, e.g.:
Very nice in my opinion - http://www.ibm.com/developerworks/java/tutorials/j-threads/section2.html
Short one - http://www.tutorialspoint.com/java/java_multithreading.htm
A bit succinct and touches a bit more then basics - http://www.vogella.de/articles/JavaConcurrency/article.html
Similar to Sean Patrick Floyd's answer, but a bit less verbose with a lambda expression:
ExecutorService es = Executors.newCachedThreadPool();
for(int i = 0; i < 512; i++){
es.execute(() -> {
// code goes here
});
}
If you can split your intensive action to recursive smaller sub tasks, ForkJoinPool is ideal for you.
If your server is running with 8 core CPU, you can set the pool size as 8
ForkJoinPool forkJoinPool = new ForkJoinPool(8);
OR
you can use Executor Service FixedThreadPool by moving compute intensive task to Callable as below
ExecutorService executorService = Executors.newFixedThreadPool(8);
Future future = executorService.submit(new Runnable() {
public void run() {
System.out.println("Your compute intensive task");
}
});
future.get(); //returns null if the task has finished correctly.
There is one advantage with ForkJoinPool. Idle threads will steal jobs from busy threads from blokcingQueue where your Runnable/Callable tasks have been submitted.
Java 8 added one more new API in Executors : newWorkStealingPool
If you need to wait for completion of all tasks, use can use invokeAll() on ExecutorService.
Have a look at this article by Benjamin for advanced concurrent APIs using Java 8

Concurrent threads adding to ArrayList at same time - what happens?

We have multiple threads calling add(obj) on an ArrayList.
My theory is that when add is called concurrently by two threads, that only one of the two objects being added is really added to the ArrayList. Is this plausable?
If so, how do you get around this? Use a synchronized collection like Vector?
There is no guaranteed behavior for what happens when add is called concurrently by two threads on ArrayList. However, it has been my experience that both objects have been added fine. Most of the thread safety issues related to lists deal with iteration while adding/removing. Despite this, I strongly recommend against using vanilla ArrayList with multiple threads and concurrent access.
Vector used to be the standard for concurrent lists, but now the standard is to use the Collections synchronized list.
Also I highly recommend Java Concurrency in Practice by Goetz et al if you're going to be spending any time working with threads in Java. The book covers this issue in much better detail.
Any number of things could happen. You could get both objects added correctly. You could get only one of the objects added. You could get an ArrayIndexOutOfBounds exception because the size of the underlying array was not adjusted properly. Or other things may happen. Suffice it to say that you cannot rely on any behavior occurring.
As alternatives, you could use Vector, you could use Collections.synchronizedList, you could use CopyOnWriteArrayList, or you could use a separate lock. It all depends on what else you are doing and what kind of control you have over access to the collection.
You could also get a null, an ArrayOutOfBoundsException, or something left up to the implementation. HashMaps have been observed to go into an infinite loop in production systems. You don't really need to know what might go wrong, just don't do it.
You could use Vector, but it tends to work out the interface is not rich enough. You will probably find that you want a different data structure in most cases.
I came up with the following code to mimic somewhat a real world scenario.
100 tasks are run in parallel and they update their completed status to the main program. I use a CountDownLatch to wait for task completion.
import java.util.concurrent.*;
import java.util.*;
public class Runner {
// Should be replaced with Collections.synchronizedList(new ArrayList<Integer>())
public List<Integer> completed = new ArrayList<Integer>();
/**
* #param args
*/
public static void main(String[] args) {
Runner r = new Runner();
ExecutorService exe = Executors.newFixedThreadPool(30);
int tasks = 100;
CountDownLatch latch = new CountDownLatch(tasks);
for (int i = 0; i < tasks; i++) {
exe.submit(r.new Task(i, latch));
}
try {
latch.await();
System.out.println("Summary:");
System.out.println("Number of tasks completed: "
+ r.completed.size());
} catch (InterruptedException e) {
e.printStackTrace();
}
exe.shutdown();
}
class Task implements Runnable {
private int id;
private CountDownLatch latch;
public Task(int id, CountDownLatch latch) {
this.id = id;
this.latch = latch;
}
public void run() {
Random r = new Random();
try {
Thread.sleep(r.nextInt(5000)); //Actual work of the task
} catch (InterruptedException e) {
e.printStackTrace();
}
completed.add(id);
latch.countDown();
}
}
}
When i ran the application 10 times and at least 3 to 4 times the program did not print correct number of completed tasks. Ideally it should print 100(if no exceptions happen). But in some cases it was printing 98, 99 etc.
Thus it proves that concurrent updates of ArrayList will not give correct results.
If i replace the ArrayList with a Synchronized version, the program outputs the correct results.
you can use List l = Collections.synchronizedList(new ArrayList()); if you want thread safe version of arrayList.
The behavior is probably undefined since ArrayList isn't threadsafe. If you modify the list while an Iterator is interating over it then you will get a ConcurrentModificationException. You can wrap the ArrayList with Collection.synchronizedList or use a thread-safe collection (there are many), or just put the add calls in a synchronized block.
You could use instead of ArrayList(); :
Collections.synchronizedList( new ArrayList() );
or
new Vector();
synchronizedList as of me preferable because it's:
faster on 50-100%
can work with already existing ArrayList's
In my recently experience using ArrayList to add new elements from different threads will miss a few of them, so using Collections.synchronizedList(new ArrayList()) avoid that issue.
List<String> anotherCollection = new ArrayList<>();
List<String> list = new ArrayList<>();
// if 'anotherCollection' is bigger enough it will miss some elements.
anotherCollection.parallelStream().forEach(el -> list.add("element" + el));
List<String> listSync = Collections.synchronizedList(new ArrayList<>());
// regardless of 'anotherCollection' is bigger it will add all the elements.
anotherCollection.parallelStream().forEach(el -> list.add("element" + el));
java.util.concurrent has a thread-safe array list. The standard ArrayList is not thread-safe and the behavior when multiple threads update at the same time is undefined. There can also be odd behaviors with multiple readers when one or more threads is writing at the same time.
http://java.sun.com/j2se/1.4.2/docs/api/java/util/ArrayList.html
Note that this implementation is not synchronized. If multiple threads access an ArrayList instance concurrently, and at least one of the threads modifies the list structurally, it must be synchronized externally.
Since there is no synchronization internally, what you theorize is not plausible.
So, things get out of sync, with unpleasant and unpredictable results.

Categories

Resources