Multi threading with Java Executor - java

I am stuck with this following problem.
Say, I have a request which has 1000 items, and I would like to utilize Java Executor to resolve this.
Here is the main method
public static void main(String[] args) {
//Assume that I have request object that contain arrayList of names
//and VectorList is container for each request result
ExecutorService threadExecutor = Executors.newFixedThreadPool(3);
Vector<Result> vectorList = new Vector<Result();
for (int i=0;i<request.size();i++) {
threadExecutor.execute(new QueryTask(request.get(i).getNames, vectorList)
}
threadExecutor.shutdown();
response.setResult(vectorList)
}
And here is the QueryTask class
public QueryTask() implements Runnable {
private String names;
private Vector<Result> vectorList;
public QueryTask(String names, Vector<Result> vectorList) {
this.names = names;
this.vectorList = vectorList;
}
public void run() {
// do something with names, for example, query database
Result result = process names;
//add result to vectorList
vectorList.add(result);
}
}
So, based on the example above, I want to make thread pool for each data I have in the request, run it simultaneously, and add result to VectorList.
And at the end of the process, I want to have all the result already in the Vector list.
I keep getting inconsistent result in the response.
For example, if I pass request with 10 names, I am getting back only 3 or 4, or sometimes nothing in the response.
I was expecting if I pass 10, then I will get 10 back.
Does anyone know whats causing the problem?
Any help will be appreciate it.
Thanks

The easy solution is to add a call to ExecutorService.awaitTermination()
public static void main(String[] args) {
//Assume that I have request object that contain arrayList of names
//and VectorList is container for each request result
ExecutorService threadExecutor = Executors.newFixedThreadPool(3);
Vector<Result> vectorList = new Vector<Result();
for (int i=0;i<request.size();i++) {
threadExecutor.execute(new QueryTask(request.get(i).getNames, vectorList)
}
threadExecutor.shutdown();
threadExecutor.awaitTermination(aReallyLongTime,TimeUnit.SECONDS);
response.setResult(vectorList)
}

You need to replace threadExecutor.shutdown(); with threadExecutor.awaitTermination();. After calling threadExecutor.shutdown(), you need to also call threadExecutor.awaitTermination(). The former is a nonblocking call that merely initiates a shutdown whereas the latter is a blocking call that actually waits for all tasks to finish. Since you are using the former, you are probably returning before all tasks have finished, which is why you don't always get back all of your results. The Java API isn't too clear, so someone filed a bug about this.

There are at least 2 issues here.
In your main, you shut down the ExecutorService, then try to get the results out right away. The executor service will execute your jobs asychronously, so there is a very good chance that all of your jobs are not done yet. When you call response.setResult(vectorList), vectorList is not fully populated.
2. You are concurrently accessing the same Vector object from within all of your runnables. This is likely to cause ConcurrentModificationExceptions, or just clobber stuff in the vector. You need to either manually synchronize on the vector inside of QueryTask, or pass in a thread-safe container instead, like Collections.synchronizedList( new ArrayList() );

Related

ArrayList throwing `ConcurrentModificationException` when trying to run `.size()` method

Update
As was pointed out by Jiri Tousek, the error that was being thrown in my code has misled many amateur (and experienced) Java developers. Contrary to what the name seems to imply, ConcurrentModificationException does not have anything to do with multi-threading. Consider the following code:
import java.util.List;
import java.util.ArrayList;
class Main {
public static void main(String[] args) {
List<String> originalArray = new ArrayList<>();
originalArray.add("foo");
originalArray.add("bar");
List<String> arraySlice = originalArray.subList(0, 1);
originalArray.remove(0);
System.out.println(Integer.toString(arraySlice.size()));
}
}
This will throw a ConcurrentModificationException despite there being no threading involved.
The misleading exception name led me to think my problem was the result of how I was handling multi-threading. I've updated the title of my post with the actual issue.
Original (title: How to inform Java that you are finished modifying an ArrayList in a thread?)
I have code that looks roughly like the following:
class MessageQueue {
private List<String> messages = new ArrayList<>();
private List<String> messagesInFlight = new ArrayList<>();
public void add(String message) {
messages.add(message);
}
public void send() {
if (messagesInFlight.size() > 0) {
// Wait for previous request to finish
return;
}
messagesInFlight = messages.subList(0, Math.min(messages.size, 10));
for( int i = 0; i < messagesInFlight.size(); i++ )
{
messages.remove(0);
}
sendViaHTTP(messagesInFlight, new Callback() {
#Override
public void run() {
messagesInFlight.clear();
}
});
}
}
This is utilized in my code for analytics purposes. Every 10 seconds I call messageQueue.send() from a timer, and whenever an event of interest occurs I call messageQueue.add(). This class works *for the most part* -- I can add messages and they get sent via HTTP, and when the HTTP request completes the callback is run
The issue lies on the second tick of the timer. When I hit the line if (messagesInFlight.size() > 0) {, I get the following error:
java.util.ConcurrentModificationException
at java.util.ArrayList$SubList.size(ArrayList.java:1057)
It seems like I can't read the .size() of the array in one thread (the second timer's callback) because it thinks the array is still being modified by the other thread (the first timer's callback). However I would expect the first timer's thread was destroyed and cleaned up after my call to sendViaHTTP, since there was no additional code for it to execute. Furthermore, the HTTP request is completing within 500 milliseconds, so a full 9.5 seconds passes without anything touching the empty messagesInFlight array
Is there a way to say say "hey, I'm done modifying this array, people can safely read it now"? Or perhaps a better way to organize my code?
The most glaring issue you have there is that you're using ArrayList.subList() while you don't seem to understand what it really does:
Returns a view of the portion of this list ... The returned list is backed by this list.
What you're storing in messagesInFlight is a view, not a copy. When you delete the first messages from messages, you're in fact deleting the very messages you had in your messagesInFlight right after subList() call. So after the for loop, there will be completely different messages, and the first n messages will be completely lost.
As to why you are getting the error you see - subList() specifically allows non-structural changes to both sub-list and the original list (non-structural means replacing the elements, not adding or removing them), and the example in the documentation also showcases how the original list may be modified by modifying the sub-list. However, modifying the original list and then accessing it through the sub-list is not allowed and may result in ConcurrentModifcationException, similarly to what happens when you change a list you're iterating through with an iterator.
If you need to be sure method send will not be called if previous call was not finished you can drop messages in fly collection and just use flag variable:
private AtomicBoolean inProgress = new AtomicBoolean(false);
public void send() {
if(inProgress.getAndSet(true)) {
return;
}
// Your logic here ...
sendViaHTTP(messagesInFlight, new Callback() {
#Override
public void run() {
inProgress.set(false);
}
});
}

java concurrency Shared Data with Runnable vs Callable and local data

First Case: Lets say you have a lot of tasks that all return a result of some kind, lets just call it `result' for now, and these all have to be stored in an arraylist. There's two options:
1) Create one arraylist in the main method and use runnables with access to the shared list and a synchronized add method
2) Create one arraylist in the main method and use callable to perform the task and return the result and let the main method add the Result to its list.
Are there any performance differences between the two, seeing as the runnable need synchronized acces, but the callables do not?
Then, to the Second Case: Lets now say each tasks generates a `small' arraylist, lets say less than 10 items per task. This again gives two options:
1) One arraylist in main and runnables with access to the shared list that add result items whenever generated.
2) One arrayList in main and callables> with each their own local arraylist that stores the Results untill the task is finished and then in main the addAll is used to add the found result.
Same question as before, what are the performance difference?
For the sake of clearity, performance both in terms of speed (some synchronization issues etc.) and in terms of memory(do the callables use a lot more memory due to the local small arraylist or is this small to neglible)?
For the First Case:
Option One: If we use Runnable tasks, then we cannot get anything returned from run() method. So I think this option will not suits your requirement.
Option Two: Callable
As per my understanding of your requirement, Callable is good candidate.
But there is a little change, We will create a list of Future and for every Callable task (which we will submit to executors) will add the Future result of this Callable(see below code for details) to this list. Then whenever we need the result of any task, we can get the result from the corresponding Future.
class MainTaskExecutor {
private static ExecutorService exe = Executors.newCachedThreadPool();
private static List<Future<Result>> futureResults = new ArrayList<>();
public static void main(String[] args) throws ExecutionException, InterruptedException {
Callable<Result> dummyTask = ()-> {
System.out.println("Task is executed");
Result dummyResult = new Result();
return dummyResult;
};
//Submit a task
submitTask(dummyTask);
//Getting result of "0" index
System.out.println(futureResults.get(0).get());
}
private static void submitTask(Callable<Result> task) {
futureResults.add(exe.submit(task));
}
private static Result getResult(int taskNumber) throws ExecutionException, InterruptedException {
return futureResults.get(taskNumber).get();
}
}
class Result {
// data to be added
}

Synchronizing Hashmap in Threading

I am facing some issues with synchronizing my idMap. This map is being used in two run() methods which are run concurrently. In the 1st run() method i'm simply mapping event id value) to response id (key). in the 2nd run() method I wish to obtain the event id (value) with that same response id (key). However, at times some event id is there and at times they can't be obtained. The program compiles just fine, but i'm no expert at threading and i believe threading is causing this idMap to be out of sync. My question is simply, how can I make idMap work smoothly and obtain the event ids as I intend to?
ConcurrentHashMap<String, String> idMap = new ConcurrentHashMap<String, String>();
ConcurrentHashMap<String, ExecutorService> executors = new ConcurrentHashMap<String, ExecutorService>();
private final class ResponderTask implements Runnable {
private ResponderTask(Event event) {
this.event = event;
}
// 1st run()
public void run() {
idMap.put(response.getId(), event.getId());
}
}//end ResponderTask
private final class QuoteTask implements Runnable {
//constructor
//2nd run()
public void run() {
IdMap.get(response.getId());
}
}//end QuoteTask
public void onResponse(final Response response) {
ExecutorService quoteExecutor = executors.get(response.getId());
if (quoteExecutor == null) {
quoteExecutor = Executors.newSingleThreadExecutor();
executors.put(event.getId(), quoteExecutor);
}
quoteExecutor.execute(new ResponderTask(event));
}
However, at times some event id is there and at times they can't be obtained. The program compiles just fine, but i'm no expert at threading and i believe threading is causing this idMap to be out of sync.
idMap is a ConcurrentHashMap which is properly synchronized and highly used and tested by many folks. If the id is not in the map when you look it up then it has not been put in there. If you explain a bit more about your code we may be able to find your problem.
For example, I don't see where the response object originates from. Is it possible that the ResponseTask is processing a different response then you expect? Is response and event supposed to be the same argument?
My question is simply, how can I make idMap work smoothly and obtain the event ids as I intend to?
It's a little hard to figure out what the proper operation of the program is supposed to be. If you are looking for the other thread to get the event you could use a BlockingQueue. Then your other thread can do a queue.take() which will wait until there is an event to process. But I'm not sure what the goal is here.
One thing that is very strange is the use of a map of ExecutorService. Do you really need multiple of them? I suspect that you really should use a single Executors.newCachedThreadPool(). Maybe, however, you want a single thread working on all requests with the same id in which case your code should work. I assume you are doing something like the following when you want to shutdown you application:
for (ExecutorService executor : executors.values()) {
executor.shutdown();
}

Java Threading: Futures only using results from first and last thread

I have a simple utility which pings a set of nodes and returns an ArrayList of strings to a future object to be outputted to a file. The program should run until terminated by the user.
It doesn't appear that the future receives the results (or at least passes them to the method to output to the file). No matter the number of threads I have concurrently running (always less than 100, determined by an input file), I am only outputting the results from the first and last initialized threads.
As a sanity check, I created a global variable in which each thread will send its results before closing and returning its results to the Future object. This variable is correctly updated by all threads.
Does anyone have any ideas why Future doesn't seem to be receiving all my results from the threads?
public class PingUtility{
public static ExecutorService pool = Executors.newFixedThreadPool(100);
static Future<ArrayList<String>> future;
public static void main(String[] args) throws Exception {
Timer timer = new Timer();
TimerTask task = new TimerTask(){
public void run(){
//Creates a pool of threads to be executed
ArrayList<String[]> nodes = new ArrayList<String[]>()
future = pool.submit(new PingNode(nodes));
}
}
};
timer.scheduleAtFixedRate(task, 0, interval);
while(true){
try{
ArrayList<String[]> tempOutputArray = future.get();
Iterator<String[]> it = tempOutputArray.iterator();
while(it.hasNext()) appendFile(it.next());
tempOutputArray.clear();
}catch(Exception nullException){
//Do nothing
}
}
}
Your problem is that you are modifying the future static field without synchronization in your timer-task thread(s) and reading it in the main thread. You need to either synchronize on it when you modify and read it or use another mechanism to share information between the threads.
I'd recommend switching from a static field to a LinkedBlockingQueue as a better way to send information from the PingNode call to the appendFile(...) method. This saves from needing to do the synchronization yourself and protects against the race conditions where multiple timer-tasks will start and overwrite the future before the consumer can get() from them. Maybe something like:
BlockingQueue<String[]> queue = new LinkedBlockingQueue<String[]>();
...
// inside of run, producer passes the queue into the PingNode
public void run() {
pool.submit(new PingNode(queue));
}
// consumer
while (true) {
String[] array = queue.take();
...
}
This doesn't take into effect how you are going to stop the threads when you are done. If the timer task is killed the entity could add to the queue a termination object to stop the main loop.
A Future object is not a bin, like an ArrayList, it merely points to a single computational result. Because you only have one static pointer to this Future, what I imagine is happening is this:
future = null
nullException
nullException
nullException
nullException
...
First thread finally sets future = Future<ArrayList<String>>
Call to future.get() blocks...
Meanwhile, all other threads get scheduled, and they reassign future
The last thread will obviously get the last say in what future points to
Data is gathered, written to file, loop continues
future now points to the Future from the last thread
Results from last thread get printed

Java multithreading and iterators, should be simple, beginner

First I'd like to say that I'm working my way up from python to more complicated code. I'm now on to Java and I'm extremely new. I understand that Java is really good at multithreading which is good because I'm using it to process terabytes of data.
The data input is simply input into an iterator and I have a class that encapsulates a run function that takes one line from the iterator, does some analysis, and then writes the analysis to a file. The only bit of info the threads have to share with each other is the name of the object they are writing to. Simple right? I just want each thread executing the run function simultaneously so we can iterate through the input data quickly. In python it would b e simple.
from multiprocessing import Pool
f = open('someoutput.csv','w');
def run(x):
f.write(analyze(x))
p = Pool(8);
p.map(run,iterator_of_input_data);
So in Java, I have my 10K lines of analysis code and can very easily iterate through my input passing it my run function which in turn calls on all my analysis code sending it to an output object.
public class cool {
...
public static void run(Input input,output) {
Analysis an = new Analysis(input,output);
}
public static void main(String args[]) throws Exception {
Iterator iterator = new Parser(File(input_file)).iterator();
File output = File(output_object);
while(iterator.hasNext(){
cool.run(iterator.next(),output);
}
}
}
All I want to do is get multiple threads taking the iterator objects and executing the run statement. Everything is independent. I keep looking at java multithreading stuff but its for talking over networks, sharing data etc. Is this is simple as I think it is? If someone can just point me in the right direction I would be happy to do the leg work.
thanks
A ExecutorService (ThreadPoolExecutor) would be the Java equivelant.
ExecutorService executorService =
new ThreadPoolExecutor(
maxThreads, // core thread pool size
maxThreads, // maximum thread pool size
1, // time to wait before resizing pool
TimeUnit.MINUTES,
new ArrayBlockingQueue<Runnable>(maxThreads, true),
new ThreadPoolExecutor.CallerRunsPolicy());
ConcurrentLinkedQueue<ResultObject> resultQueue;
while (iterator.hasNext()) {
executorService.execute(new MyJob(iterator.next(), resultQueue))
}
Implement your job as a Runnable.
class MyJob implements Runnable {
/* collect useful parameters in the constructor */
public MyJob(...) {
/* omitted */
}
public void run() {
/* job here, submit result to resultQueue */
}
}
The resultQueue is present to collect the result of your jobs.
See the java api documentation for detailed information.

Categories

Resources