java concurrency Shared Data with Runnable vs Callable and local data

java concurrency Shared Data with Runnable vs Callable and local data - java

First Case: Lets say you have a lot of tasks that all return a result of some kind, lets just call it `result' for now, and these all have to be stored in an arraylist. There's two options:
1) Create one arraylist in the main method and use runnables with access to the shared list and a synchronized add method
2) Create one arraylist in the main method and use callable to perform the task and return the result and let the main method add the Result to its list.
Are there any performance differences between the two, seeing as the runnable need synchronized acces, but the callables do not?
Then, to the Second Case: Lets now say each tasks generates a `small' arraylist, lets say less than 10 items per task. This again gives two options:
1) One arraylist in main and runnables with access to the shared list that add result items whenever generated.
2) One arrayList in main and callables> with each their own local arraylist that stores the Results untill the task is finished and then in main the addAll is used to add the found result.
Same question as before, what are the performance difference?
For the sake of clearity, performance both in terms of speed (some synchronization issues etc.) and in terms of memory(do the callables use a lot more memory due to the local small arraylist or is this small to neglible)?

For the First Case:
Option One: If we use Runnable tasks, then we cannot get anything returned from run() method. So I think this option will not suits your requirement.
Option Two: Callable
As per my understanding of your requirement, Callable is good candidate.
But there is a little change, We will create a list of Future and for every Callable task (which we will submit to executors) will add the Future result of this Callable(see below code for details) to this list. Then whenever we need the result of any task, we can get the result from the corresponding Future.
class MainTaskExecutor {
private static ExecutorService exe = Executors.newCachedThreadPool();
private static List<Future<Result>> futureResults = new ArrayList<>();
public static void main(String[] args) throws ExecutionException, InterruptedException {
Callable<Result> dummyTask = ()-> {
System.out.println("Task is executed");
Result dummyResult = new Result();
return dummyResult;
};
//Submit a task
submitTask(dummyTask);
//Getting result of "0" index
System.out.println(futureResults.get(0).get());
}
private static void submitTask(Callable<Result> task) {
futureResults.add(exe.submit(task));
}
private static Result getResult(int taskNumber) throws ExecutionException, InterruptedException {
return futureResults.get(taskNumber).get();
}
}
class Result {
// data to be added
}

Related

JavaFX Task Callable

I was developing a JavaFX app and I was supplying the JavaFX tasks in an ExecutorService submit method. Also I was trying to get the return value of the Task in the return value of the submit in a Future object. Then I discovered that ExecutorService only returns value when you submit a Callable object, and JavaFX Tasks are runnables despite having a call method. so is there any workaround for this problem?
I tried and solved my problem this way but I'm open to suggestions when I don't want to write my own class.
My main method:
public static void main(String[] args) throws InterruptedException, ExecutionException {
ExecutorService executorService = Executors.newSingleThreadExecutor();
Semaphore semaphore = new Semaphore(1);
List<Integer> list = IntStream.range(0,100).boxed().collect(Collectors.toList());
Iterator<Integer> iterator = list.iterator();
while (iterator.hasNext()){
List<Integer> sendingList = new ArrayList<>();
for (int i = 0; i < 10; i++) {
sendingList.add(iterator.next());
}
System.out.println("SUBMITTING");
Future<Integer> future = executorService.submit((Callable<Integer>) new TestCallable(sendingList,semaphore));
System.out.println(future.get());
semaphore.acquire();
}
executorService.shutdown();
System.out.println("COMPLETED");
}
My TestCallable class:
class TestCallable extends Task<Integer> implements Callable<Integer> {
private Random random = new Random();
private List<Integer> list;
private Semaphore semaphore;
TestCallable(List<Integer> list, Semaphore semaphore) {
this.list = list;
this.semaphore = semaphore;
}
#Override
public Integer call(){
System.out.println("SENDING");
System.out.println(list);
try {
Thread.sleep(1000+random.nextInt(500));
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println("RECEIVED");
semaphore.release();
return list.size();
}
}

Task extends java.util.concurrent.FutureTask which in turn implements the Future interface. This means you can use a Task just like a Future.
Executor executor = ...;
Task<?> task = ...;
executor.execute(task);
task.get(); // Future method
This will cause the thread calling get() to wait until completion. However, a Task's purpose is to communicate the progress of a background process with the JavaFX Application Thread. It's close relationship to the GUI means you will most likely be launching a Task from the FX thread. This will lead to get() being called on the FX thread which is not what you want as it will freeze the GUI until get() returns; you might as well have just called Task.run directly.
Instead, you should be using the asynchronous functionality provided by Task. If you want to retrieve the value when the Task completes successfully you can use the onSucceeded property or listen to the value/state property. There's also ways to listen for failure/cancellation.
Executor executor = ...;
Task<?> task = ...;
task.setOnSucceeded(event -> handleResult(task.getValue()));
task.setOnFailed(event -> handleException(task.getException()));
executor.execute(task);
If you don't need the functionality provided by Task then it would probably be best to simply use Runnable or Callable directly.

It's not very clear what you want to do here.
Firstly, your Semaphore does nothing because you used Executors.newSingleThreadExecutor(), which already guarantees that only one task can run at any point in time.
Secondly, like what #Slaw mentioned, you are potentially blocking on JavaFX Application thread, depending on your actual implementation (your example isn't really a JavaFX application).
Next, ExecutorService has 2 main overloads for submit().
The first overload takes in a Callable. This overload allows you to retrieve the value returned by the Callable (by calling get() on the returned Future), because Callable refers to something that is can be called - it can return value.
The second overload takes in a Runnable. Since Task implements Future RunnableFuture interface, and Future RunnableFuture interface extends Runnable interface, passing in a Task would be equivalent to calling this overload. This overload does not expect a result to be returned, because Runnable is something that you run without a result. Calling get() on the Future returned by this overload will block until the task finishes, and null will be returned. If you need to retrieve the value returned by the Task, you need to call get() of the Task, not the Future returned by ExecutorService.submit().
Edit based on OP's comments
Firstly, since the calling method is already running in a background thread, and all tasks are expected to run sequentially (instead of parallelly), then you should just run them without all these additional ExecutorService and Task, unless there is another reason why this has to be done.
Secondly, a List object is nothing but an object doing referencing. What could have really affected performance is that you are copying the reference of the elements to the new list. You could have used List.subList()if the indices are known, as the returned list would use the same backing array as the original list, so there isn't an additional O(n) operation for copying.

Java: Getting ExecutorService to produce repeatable behavior?

I have been trying to parallelize a portion of a method within my code (as shown in the Example class's function_to_parallelize(...) method). I have examined the executor framework and found that Futures & Callables can be used to create several worker threads that will ultimately return values. However, the online examples often shown with the executor framework are very simple and none of them appear to suffer my particular case of requiring methods in the class that contains that bit of code I'm trying to parallelize. As per one Stackoverflow thread, I've managed to write an external class that implements Callable called Solver that implements that method call() and set up the executor framework as shown in the method function_to_parallelize(...). Some of the computation that would occur in each worker thread requires methods *subroutine_A(...)* that operate on the data members of the Example class (and further, some of these subroutines make use of random numbers for various sampling functions).
My issue is while my program executes and produces results (sometimes accurate, sometimes not), every time I run it the results of the combined computation of the various worker threads is different. I figured it must be a shared memory problem, so I input into the Solver constructor copies of every data member of the Example class, including the utility that contained the Random rng. Further, I copied the subroutines that I require even directly into the Solver class (even though it's able to call those methods from Example without this). Why would I be getting different values each time? Is there something I need to implement, such as locking mechanisms or synchronization?
Alternatively, is there a simpler way to inject some parallelization into that method? Rewriting the "Example" class or drastically changing my class structuring is not an option as I need it in its current form for a variety of other aspects of my software/system.
Below is my code vignette (well, it's an incredibly abstracted/reduced form so as to show you basic structure and the target area, even if it's a bit longer than usual vignettes):
public class Tools{
Random rng;
public Tools(Random rng){
this.rng = rng;
}...
}
public class Solver implements Callable<Tuple>{
public Tools toolkit;
public Item W;
public Item v;
Item input;
double param;
public Solver(Item input, double param, Item W, Item v, Tools toolkit){
this.input = input;
this.param = param;
//...so on & so forth for rest of arguments
}
public Item call() throws Exception {
//does computation that utilizes the data members W, v
//and calls some methods housed in the "toolkit" object
}
public Item subroutine_A(Item in){....}
public Item subroutine_B(Item in){....}
}
public class Example{
private static final int NTHREDS = 4;
public Tools toolkit;
public Item W;
public Item v;
public Example(...,Tools toolkit...){
this.toolkit = toolkit; ...
}
public Item subroutine_A(Item in){
// some of its internal computation involves sampling & random # generation using
// a call to toolkit, which houses functions that use the initialize Random rng
...
}
public Item subroutine_B(Item in){....}
public void function_to_parallelize(Item input, double param,...){
ExecutorService executor = Executors.newFixedThreadPool(NTHREDS);
List<Future<Tuple>> list = new ArrayList<Future<Tuple>>();
while(some_stopping_condition){
// extract subset of input and feed into Solver constructor below
Callable<Tuple> worker = new Solver(input, param, W, v, toolkit);
Future<Tuple> submit = executor.submit(worker);
list.add(submit);
}
for(Future<Tuple> future : list){
try {
Item out = future.get();
// update W via some operation using "out" (like multiplying matrices for example)
}catch(InterruptedException e) {
e.printStackTrace();
}catch(ExecutionException e) {
e.printStackTrace();
}
}
executor.shutdown(); // properly terminate the threadpool
}
}
ADDENDUM: While flob's answer below did address a problem with my vignette/code (you should make sure that you are setting your code up to wait for all threads to catch up with .await()), the issue did not go away after I made this correction. It turns out that the problem lies in how Random works with threads. In essence, the threads are scheduled in various orders (via the OS/scheduler) and hence will not repeat the order in which they are executed every run of the program to ensure that a purely deterministic result is obtained. I examined the thread-safe version of Random (and used it to gain a bit more efficiency) but alas it does not allow you to set the seed. However, I highly recommend those who are looking to incorporate random computations within their thread workers to use this as the RNG for multi-threaded work.

The problem I see is you don't wait for all the tasks to finish before updating W and because of that some of the Callable instances will get the updated W instead of the one you were expecting
At this point W is updated even if not all tasks have finished
Blockquote
// update W via some operation using "out" (like multiplying matrices for example)
The tasks that are not finished will take the W updated above instead the one you expect
A quick solution (if you know how many Solver tasks you'll have) would be to use a CountDownLatch in order to see when all the tasks have finished:
public void function_to_parallelize(Item input, double param,...){
ExecutorService executor = Executors.newFixedThreadPool(NTHREDS);
List<Future<Tuple>> list = new ArrayList<Future<Tuple>>();
CountDownLatch latch = new CountDownLatch(<number_of_tasks_created_in_next_loop>);
while(some_stopping_condition){
// extract subset of input and feed into Solver constructor below
Callable<Tuple> worker = new Solver(input, param, W, v, toolkit,latch);
Future<Tuple> submit = executor.submit(worker);
list.add(submit);
}
latch.await();
for(Future<Tuple> future : list){
try {
Item out = future.get();
// update W via some operation using "out" (like multiplying matrices for example)
}catch(InterruptedException e) {
e.printStackTrace();
}catch(ExecutionException e) {
e.printStackTrace();
}
}
executor.shutdown(); // properly terminate the threadpool
}
then in the Solver class you have to decrement the latch when call method ends:
public Item call() throws Exception {
//does computation that utilizes the data members W, v
//and calls some methods housed in the "toolkit" object
latch.countDown();
}

How to use List Class in Java When multithread is needed?

I'm using SpringMVC, and I've got a class AService which acts as a buffer to store a list of String, After the size of list hitting 1000, write all of the queries into database.
#Service
class AService {
List<String> list;
public void addAndInsert(String query) {
list.add(query);
if(list.size() >= 1000) {
writeIntoDatabase(list);
list.clear();
}
}
}
This will works fine when there's only one thread. But as we know that queries can be invoked from different users (that is MultiThread of course.), so how can I guarantee that this works properly:
When the query hit 1000, I'd like to use another thread to do the write-into-database, because this procedure could be long, I don't want the user to wait for something not relevant to there query.
The query can not be lost or duplicated.
Could anyone tell me how can I deal with this scenario, which implementation of List class should I use? Thanks!

There are two parts to my answer:
Synchronization of adding query items to the list
Scheduling the insert of the query data into your database
Just for completeness I will also highlight that you are open to loosing queries if your JVM crashes. You state that queries cannot be lost, but at the minute everything is being held in memory. I assume that you are OK with this.
Synchronizing addition to the list
Whilst a system can be inherently multi-threaded, Spring will only create a singleton of your #Service class, which means that all Threads access the same instance. Therefore we can quite easily synchronize access to member variables of that instance using basic Java functionality.
The JDK does provide some basic synchronized List implementations out of the box. Take a look at Collections.synchronizedList() or CopyOnWriteArrayList for example.
These implementations generally provide synchronization for a single operation on a list e.g add() or get(). They do not provide synchronization across multiple method calls. However basic Java synchronization lets us achieve this:
public void addAndInsert(String query)
{
synchronized(list)
{
list.add(query);
if(list.size() >= 1000)
{
writeIntoDatabase(list);
list.clear();
}
}
}
This code uses the object monitor for your List instance to ensure that all operations on it are synchronized. One Thread's operations on the list must complete before the next's.
Scheduling insert of data into the database
You have said that you would like to use another Thread to insert data into the database. I would suggest that you get familiar with the ExecutorService interface in the java.util.concurrent package. This provides excellent implementations that provide managed pools of Threads to execute tasks. From what you have said, I would suggest that ThreadPoolExecutor is ideal for what you need. It is also imperative that you remember to pass a copy of the data within the list to the other Thread so that your List.clear() operation doesn't interfere with the insert into the database.
So this would leave us with final code looking similar to:
#Service
public class AService
{
private List<String> list;
private ExecutorService executorService;
public void addAndInsert(String query)
{
synchronized(list)
{
list.add(query);
if(list.size() >= 1000)
{
executorService.execute(writeIntoDataBase(new LinkedList<String>(list)));
list.clear();
}
}
}
private Runnable writeIntoDataBase(List<String> list)
{
//TODO - Create your Runnable to write data to the db.
}
}

An ArrayList will do fine, provided all its accesses are synchronized, and you create a copy before passing it to the inserting thread:
#Service
class AService {
private List<String> list = new ArrayList<>(1000);
public synchronized void addAndInsert(String query) {
list.add(query);
if (list.size() >= 1000) {
List<String> copy = new ArrayList<>(list);
writeIntoDatabase(copy);
list.clear();
}
}
}
But if it's critical that the query is not lost, you shouldn't use a buffer, because obviously, if the server crashes when the list contains 999 elements, you'll lose 999 queries.

You can user concurency collection like BlockingQueue. And another thread can get queries from collection and update database.

Building on Robs answer, I assume you want to make sure that inserts into the DB are made for 1000 queries at once. So you can use BlockingQueues (like ArrayBlockingQueue or LinkedBlockingQueue) which handle all synchronization for you. You also get methods like drainTo, which take a specified number of elements out of your blocking queue and return them in another collection that you can use for writeIntoDataBase. As in
BlockingQueue<String> list;
public void addAndInsert(String query) {
list.add(query);
if ( list.size() >= 1000) {
int size = 1000;
final ArrayList<String> toInsert = new ArrayList<String>( 1000);
list.drainTo( toInsert, size);
executorService.execute( new Runnable() {
public void run() {
writeIntoDataBase( toInsert);
}
});
}
}

Multi threading with Java Executor

I am stuck with this following problem.
Say, I have a request which has 1000 items, and I would like to utilize Java Executor to resolve this.
Here is the main method
public static void main(String[] args) {
//Assume that I have request object that contain arrayList of names
//and VectorList is container for each request result
ExecutorService threadExecutor = Executors.newFixedThreadPool(3);
Vector<Result> vectorList = new Vector<Result();
for (int i=0;i<request.size();i++) {
threadExecutor.execute(new QueryTask(request.get(i).getNames, vectorList)
}
threadExecutor.shutdown();
response.setResult(vectorList)
}
And here is the QueryTask class
public QueryTask() implements Runnable {
private String names;
private Vector<Result> vectorList;
public QueryTask(String names, Vector<Result> vectorList) {
this.names = names;
this.vectorList = vectorList;
}
public void run() {
// do something with names, for example, query database
Result result = process names;
//add result to vectorList
vectorList.add(result);
}
}
So, based on the example above, I want to make thread pool for each data I have in the request, run it simultaneously, and add result to VectorList.
And at the end of the process, I want to have all the result already in the Vector list.
I keep getting inconsistent result in the response.
For example, if I pass request with 10 names, I am getting back only 3 or 4, or sometimes nothing in the response.
I was expecting if I pass 10, then I will get 10 back.
Does anyone know whats causing the problem?
Any help will be appreciate it.
Thanks

The easy solution is to add a call to ExecutorService.awaitTermination()
public static void main(String[] args) {
//Assume that I have request object that contain arrayList of names
//and VectorList is container for each request result
ExecutorService threadExecutor = Executors.newFixedThreadPool(3);
Vector<Result> vectorList = new Vector<Result();
for (int i=0;i<request.size();i++) {
threadExecutor.execute(new QueryTask(request.get(i).getNames, vectorList)
}
threadExecutor.shutdown();
threadExecutor.awaitTermination(aReallyLongTime,TimeUnit.SECONDS);
response.setResult(vectorList)
}

You need to replace threadExecutor.shutdown(); with threadExecutor.awaitTermination();. After calling threadExecutor.shutdown(), you need to also call threadExecutor.awaitTermination(). The former is a nonblocking call that merely initiates a shutdown whereas the latter is a blocking call that actually waits for all tasks to finish. Since you are using the former, you are probably returning before all tasks have finished, which is why you don't always get back all of your results. The Java API isn't too clear, so someone filed a bug about this.

There are at least 2 issues here.
In your main, you shut down the ExecutorService, then try to get the results out right away. The executor service will execute your jobs asychronously, so there is a very good chance that all of your jobs are not done yet. When you call response.setResult(vectorList), vectorList is not fully populated.
2. You are concurrently accessing the same Vector object from within all of your runnables. This is likely to cause ConcurrentModificationExceptions, or just clobber stuff in the vector. You need to either manually synchronize on the vector inside of QueryTask, or pass in a thread-safe container instead, like Collections.synchronizedList( new ArrayList() );

Which ThreadPool in Java should I use?

There are a huge amount of tasks.
Each task is belong to a single group. The requirement is each group of tasks should executed serially just like executed in a single thread and the throughput should be maximized in a multi-core (or multi-cpu) environment. Note: there are also a huge amount of groups that is proportional to the number of tasks.
The naive solution is using ThreadPoolExecutor and synchronize (or lock). However, threads would block each other and the throughput is not maximized.
Any better idea? Or is there exist a third party library satisfy the requirement?

A simple approach would be to "concatenate" all group tasks into one super task, thus making the sub-tasks run serially. But this will probably cause delay in other groups that will not start unless some other group completely finishes and makes some space in the thread pool.
As an alternative, consider chaining a group's tasks. The following code illustrates it:
public class MultiSerialExecutor {
private final ExecutorService executor;
public MultiSerialExecutor(int maxNumThreads) {
executor = Executors.newFixedThreadPool(maxNumThreads);
}
public void addTaskSequence(List<Runnable> tasks) {
executor.execute(new TaskChain(tasks));
}
private void shutdown() {
executor.shutdown();
}
private class TaskChain implements Runnable {
private List<Runnable> seq;
private int ind;
public TaskChain(List<Runnable> seq) {
this.seq = seq;
}
#Override
public void run() {
seq.get(ind++).run(); //NOTE: No special error handling
if (ind < seq.size())
executor.execute(this);
}
}
The advantage is that no extra resource (thread/queue) is being used, and that the granularity of tasks is better than the one in the naive approach. The disadvantage is that all group's tasks should be known in advance.
--edit--
To make this solution generic and complete, you may want to decide on error handling (i.e whether a chain continues even if an error occures), and also it would be a good idea to implement ExecutorService, and delegate all calls to the underlying executor.

I would suggest to use task queues:
For every group of tasks You have create a queue and insert all tasks from that group into it.
Now all Your queues can be executed in parallel while the tasks inside one queue are executed serially.
A quick google search suggests that the java api has no task / thread queues by itself. However there are many tutorials available on coding one. Everyone feel free to list good tutorials / implementations if You know some:

I mostly agree on Dave's answer, but if you need to slice CPU time across all "groups", i.e. all task groups should progress in parallel, you might find this kind of construct useful (using removal as "lock". This worked fine in my case although I imagine it tends to use more memory):
class TaskAllocator {
private final ConcurrentLinkedQueue<Queue<Runnable>> entireWork
= childQueuePerTaskGroup();
public Queue<Runnable> lockTaskGroup(){
return entireWork.poll();
}
public void release(Queue<Runnable> taskGroup){
entireWork.offer(taskGroup);
}
}
and
class DoWork implmements Runnable {
private final TaskAllocator allocator;
public DoWork(TaskAllocator allocator){
this.allocator = allocator;
}
pubic void run(){
for(;;){
Queue<Runnable> taskGroup = allocator.lockTaskGroup();
if(task==null){
//No more work
return;
}
Runnable work = taskGroup.poll();
if(work == null){
//This group is done
continue;
}
//Do work, but never forget to release the group to
// the allocator.
try {
work.run();
} finally {
allocator.release(taskGroup);
}
}//for
}
}
You can then use optimum number of threads to run the DoWork task. It's kind of a round robin load balance..
You can even do something more sophisticated, by using this instead of a simple queue in TaskAllocator (task groups with more task remaining tend to get executed)
ConcurrentSkipListSet<MyQueue<Runnable>> sophisticatedQueue =
new ConcurrentSkipListSet(new SophisticatedComparator());
where SophisticatedComparator is
class SophisticatedComparator implements Comparator<MyQueue<Runnable>> {
public int compare(MyQueue<Runnable> o1, MyQueue<Runnable> o2){
int diff = o2.size() - o1.size();
if(diff==0){
//This is crucial. You must assign unique ids to your
//Subqueue and break the equality if they happen to have same size.
//Otherwise your queues will disappear...
return o1.id - o2.id;
}
return diff;
}
}

Actor is also another solution for this specified type of issues.
Scala has actors and also Java, which provided by AKKA.

I had a problem similar to your, and I used an ExecutorCompletionService that works with an Executor to complete collections of tasks.
Here is an extract from java.util.concurrent API, since Java7:
Suppose you have a set of solvers for a certain problem, each returning a value of some type Result, and would like to run them concurrently, processing the results of each of them that return a non-null value, in some method use(Result r). You could write this as:
void solve(Executor e, Collection<Callable<Result>> solvers)
throws InterruptedException, ExecutionException {
CompletionService<Result> ecs = new ExecutorCompletionService<Result>(e);
for (Callable<Result> s : solvers)
ecs.submit(s);
int n = solvers.size();
for (int i = 0; i < n; ++i) {
Result r = ecs.take().get();
if (r != null)
use(r);
}
}
So, in your scenario, every task will be a single Callable<Result>, and tasks will be grouped in a Collection<Callable<Result>>.
Reference:
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ExecutorCompletionService.html

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.