Java Threading: Futures only using results from first and last thread

Java Threading: Futures only using results from first and last thread - java

I have a simple utility which pings a set of nodes and returns an ArrayList of strings to a future object to be outputted to a file. The program should run until terminated by the user.
It doesn't appear that the future receives the results (or at least passes them to the method to output to the file). No matter the number of threads I have concurrently running (always less than 100, determined by an input file), I am only outputting the results from the first and last initialized threads.
As a sanity check, I created a global variable in which each thread will send its results before closing and returning its results to the Future object. This variable is correctly updated by all threads.
Does anyone have any ideas why Future doesn't seem to be receiving all my results from the threads?
public class PingUtility{
public static ExecutorService pool = Executors.newFixedThreadPool(100);
static Future<ArrayList<String>> future;
public static void main(String[] args) throws Exception {
Timer timer = new Timer();
TimerTask task = new TimerTask(){
public void run(){
//Creates a pool of threads to be executed
ArrayList<String[]> nodes = new ArrayList<String[]>()
future = pool.submit(new PingNode(nodes));
}
}
};
timer.scheduleAtFixedRate(task, 0, interval);
while(true){
try{
ArrayList<String[]> tempOutputArray = future.get();
Iterator<String[]> it = tempOutputArray.iterator();
while(it.hasNext()) appendFile(it.next());
tempOutputArray.clear();
}catch(Exception nullException){
//Do nothing
}
}
}

Your problem is that you are modifying the future static field without synchronization in your timer-task thread(s) and reading it in the main thread. You need to either synchronize on it when you modify and read it or use another mechanism to share information between the threads.
I'd recommend switching from a static field to a LinkedBlockingQueue as a better way to send information from the PingNode call to the appendFile(...) method. This saves from needing to do the synchronization yourself and protects against the race conditions where multiple timer-tasks will start and overwrite the future before the consumer can get() from them. Maybe something like:
BlockingQueue<String[]> queue = new LinkedBlockingQueue<String[]>();
...
// inside of run, producer passes the queue into the PingNode
public void run() {
pool.submit(new PingNode(queue));
}
// consumer
while (true) {
String[] array = queue.take();
...
}
This doesn't take into effect how you are going to stop the threads when you are done. If the timer task is killed the entity could add to the queue a termination object to stop the main loop.

A Future object is not a bin, like an ArrayList, it merely points to a single computational result. Because you only have one static pointer to this Future, what I imagine is happening is this:
future = null
nullException
nullException
nullException
nullException
...
First thread finally sets future = Future<ArrayList<String>>
Call to future.get() blocks...
Meanwhile, all other threads get scheduled, and they reassign future
The last thread will obviously get the last say in what future points to
Data is gathered, written to file, loop continues
future now points to the Future from the last thread
Results from last thread get printed

Related

Get results of scheduled non-blocking operations in Java

I am trying to do some blocking operations (say HTTP request) in a scheduled and non-blocking manner. Let's say I have 10 requests and one request takes 3 seconds but I would like not to wait for 3 seconds but wait 1 second and send the next one. After all executions are finished I would like to gather all results in a list and return to the user.
Below, there is a prototype of my scenario (thread sleep used as blocking operation instead of HTTP req.)
public static List<Integer> getResults(List<Integer> inputs) throws InterruptedException, ExecutionException {
List<Integer> results = new LinkedList<Integer>();
Queue<Callable<Integer>> tasks = new LinkedList<Callable<Integer>>();
List<Future<Integer>> futures = new LinkedList<Future<Integer>>();
for (Integer input : inputs) {
Callable<Integer> task = new Callable<Integer>() {
public Integer call() throws InterruptedException {
Thread.sleep(3000);
return input + 1000;
}
};
tasks.add(task);
}
ExecutorService es = Executors.newCachedThreadPool();
ScheduledExecutorService ses = Executors.newScheduledThreadPool(1);
ses.scheduleAtFixedRate(new Runnable() {
#Override
public void run() {
Callable<Integer> task = tasks.poll();
if (task == null) {
ses.shutdown();
es.shutdown();
return;
}
futures.add(es.submit(task));
}
}, 0, 1000, TimeUnit.MILLISECONDS);
while(true) {
if(futures.size() == inputs.size()) {
for (Future<Integer> future : futures) {
Integer result = future.get();
results.add(result);
}
return results;
}
}
}
public static void main(String[] args) throws InterruptedException, ExecutionException {
List<Integer> results = getResults(new LinkedList<Integer>(Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)));
System.out.println(Arrays.toString(results.toArray()));
}
I am waiting in a while loop until all tasks return a proper result. But it never enters inside the breaking condition and it infinitely loops. Whenever I put an I/O operation like logger or even a breakpoint, it just break the while loop and everything becomes ok.
I am relatively new to Java concurrency and trying to understand what is happening and whether this is the correct way to do. I guess I/O operation triggers something on thread scheduler and make it check the collections' sizes.

You need to synchronize your threads. You have two different threads (the main thread and the exectuor service thread) accessing the futures list and since LinkedList is not synchronized, these two threads see two different values of futures.
while(true) {
synchronized(futures) {
if(futures.size() == inputs.size()) {
...
}
}
}
This happens because threads in java use the cpu cache to improve performance. So each thread could have different values of a variable until they are synchronized.
This SO question has more information on this.
Also from this answer:
It's all about memory. Threads communicate through shared memory, but when there are multiple CPUs in a system, all trying to access the same memory system, then the memory system becomes a bottleneck. Therefore, the CPUs in a typical multi-CPU computer are allowed to delay, re-order, and cache memory operations in order to speed things up.
That works great when threads are not interacting with one another, but it causes problems when they actually do want to interact: If thread A stores a value into an ordinary variable, Java makes no guarantee about when (or even if) thread B will see the value change.
In order to overcome that problem when it's important, Java gives you certain means of synchronizing threads. That is, getting the threads to agree on the state of the program's memory. The volatile keyword and the synchronized keyword are two means of establishing synchronization between threads.
And finally, the futures list does not update in your code because the main thread is continuously occupied, because of the infinte while block. Doing any I/O operation in your while loop gives the cpu enough breathing space to update its local cache.
An infinite while loop is generally a bad idea because it is very resource intensive. Adding a small delay before the next iteration can make it a little better (though still inefficient).

When is runnable object garbage collected in ExecutorService?

I have a runnable object A which exchanges heart beat signals with a server on instantiation. I submit n such objects to a executor service with fixed thread pool size of n. When the run method encounters exception it would return. For a given case, all my threads encounter exception and return, but the object created remains alive and keeps on exchanging the heart beat signals. How do I mark such objects up for garbage collection so that they would stop the heart beat signals exchange?
class A implements Runnable {
public void run(){
try{
\\throws error
} catch(Exception e){
\\returns
}
}
public static void main(){
ExecutorService executor = Executors.newFixedThreadPool(n)
for(i = 1 to n){
A a = new A()
executor.submit(a)
}
}
}
Should I put a awaitTermination call at the end of my main and do a return?
Edit:
Putting the question other way, one way to terminate the executorservice after all the threads return would be to call shutdown() after the for loop and call awaitTermination with Integer.MAX long seconds which is roughly 70 years ( which is a time constraint I am reluctant to impose). Is there any other alternative?

one way to terminate the executorservice after all the threads return would be to call shutdown() after the for loop and call awaitTermination with Integer.MAX long seconds which is roughly 70 years
as the doc says the awaitTermination method will block util:
all tasks have completed execution after a shutdown request
or the timeout occurs,
or the current thread is interrupted, whichever happens first
So it will game over as soon as one of the three event turn up, rather than have to wait 70 years.

calling shutdown() on pool means the pool will no longer accept any new task for execution, but the current ones will run without interruption.
calling awaitTermination(timeout) holds the calling thread till the pool is finished, but if timeout is reached, then current thread throws execption, but it will not affect the tasks in pool.
If your runnable throws uncought exception when is run by thread pool, then this runnable is no longer in run state - thread pool doesn't hold any reference to such object usually.
If you use FixedThreadPool, then this pool will create as many threads as you wish, and will not stop any of them until you call shutdown() on this pool.
If you don't have reference to the runnable object that throwed the exception it behaves as regular unreferenced Object to be Garbage Collected.
if you call shutdown() and then awaitTermination() on thread pool, and your program doesn't stop anyway, that means not all instances of your runnable have thrown an exception, and some are still running thus blocking the pool from complete shutdown.
In java you can't kill or stop running thread just like that (you can only kill entire JVM using eg. System.exit(0), but not just choosen thread), if you need such functionality you need to program the body of the runnable in a way that lets you communicate somehow with it, ie. using some "volatile boolean" variable, and that it will respond to change in the value of this variable - it means that you need to add "if checks" for the value of this variable in the body of the run() method that will return when it should.

The tasks themselves are eligible for garbage collecting as soon as their execution is complete. If and when they are actually collected depends on the garbage collector.
Example code:
public class Main implements Runnable {
#Override
protected void finalize() throws Throwable {
super.finalize();
System.out.println("finalize");
}
#Override
public void run() {
try {
throw new Exception("Error");
} catch (Exception e) {
//returns
}
}
public static void main(String args[]) {
int n = 8;
ExecutorService executor = Executors.newFixedThreadPool(n);
for (int i = 0 ; i < n; ++i) {
Main a = new Main();
executor.submit(a);
}
System.gc();
System.out.println("end");
}
}

How to cancel all the thread/ threads in ExcecutorService?

I've written following multi thread program. I want to cancel the all the thread if one of the thread sends back false as return. However though I'm canceling the thread by canceling individual task. Its not working. What changes I need to make inorder to cancel the thread?
I've written following multi thread program. I want to cancel the all the thread if one of the thread sends back false as return. However though I'm canceling the thread by canceling individual task. Its not working. What changes I need to make inorder to cancel the thread?
import java.util.Iterator;
import java.util.List;
import java.util.concurrent.Callable;
public class BeamWorkerThread implements Callable<Boolean> {
private List<BeamData> beamData;
private String threadId;
public BeamScallopingWorkerThread(
List<BeamData> beamData, String threadId) {
super();
this.beamData = beamData;
this.threadId = threadId;
}
#Override
public Boolean call() throws Exception {
Boolean result = true;
DataValidator validator = new DataValidator();
Iterator<BeamScallopingData> it = beamData.iterator();
BeamData data = null;
while(it.hasNext()){
data = it.next();
if(!validator.validateDensity(data.getBin_ll_lat(), data.getBin_ll_lon(), data.getBin_ur_lat(), data.getBin_ur_lon())){
result = false;
break;
}
}
return result;
}
}
ExecutorService threadPool = Executors.newFixedThreadPool(100);
List<Future<Boolean>> results = new ArrayList<Future<Boolean>>();
long count = 0;
final long RowLimt = 10000;
long threadCount = 1;
while ((beamData = csvReader.read(
BeamData.class, headers1, processors)) != null) {
if (count == 0) {
beamDataList = new ArrayList<BeamData>();
}
beamDataList.add(beamData);
count++;
if (count == RowLimt) {
results.add(threadPool
.submit(new BeamWorkerThread(
beamDataList, "thread:"
+ (threadCount++))));
count = 0;
}
}
results.add(threadPool.submit(new BeamWorkerThread(
beamDataList, "thread:" + (threadCount++))));
System.out.println("Number of threads" + threadCount);
for (Future<Boolean> fs : results)
try {
if(fs.get() == false){
System.out.println("Thread is false");
for(Future<Boolean> fs1 : results){
fs1.cancel(true);
}
}
} catch(CancellationException e){
} catch (InterruptedException e) {
} catch (ExecutionException e) {
} finally {
threadPool.shutdownNow();
}
}
My comments
Thanks all for your input I'm overwhelmed by the response. I do know that, well implemented thread takes an app to highs and mean time it a bad implementation brings the app to knees. I agree I'm having fancy idea but I don't have other option. I've a 10 million plus record hence I will have memory constraint and time constraint. I need to tackle both. Hence rather than swallowing whole data I'm breaking it into chunks and also if one data is invalid i don't want to waste time in processing remaining million data. I find #Mark Peters suggestion is an option. Made the changes accordingly I mean added flag to interrupt the task and I'm pretty confused how the future list works. what I understand is that looping through each field of future list starts once all the thread returns its value. In that case, there is no way to cancel all the task in half way from main list. I need to pass on the reference of object to each thread. and if one thread finds invalid data using the thread refernce call the cancel mathod of each thread to set the interrupt flag.
while(it.hasNext() && !cancelled) {
if(!validate){
// loop through each thread reference and call Cancel method
}
}

Whatever attempt you make to cancel all the remaining tasks, it will fail if your code is not carefully written to be interruptible. What that exactly entails is beyond just one StackOverflow answer. Some guidelines:
do not swallow InterruptedException. Make its occurrence break the task;
if your code does not spend much time within interruptible methods, you must insert explicit Thread.interrupted() checks and react appropriately.
Writing interruptible code is in general not beginner's stuff, so take care.

Cancelling the Future will not interrupt running code. It primarily serves to prevent the task from being run in the first place.
While you can provide a true as a parameter, which will interrupt the thread running the task, that only has an effect if the thread is blocked in code that throws an InterruptedException. Other than that, nothing implicitly checks the interrupted status of the thread.
In your case, there is no blocking; it's busy work that is taking time. One option would be to have a volatile boolean that you check at each stage of your loop:
public class BeamWorkerThread implements Callable<Boolean> {
private volatile boolean cancelled = false;
#Override
public Boolean call() throws Exception {
//...
while(it.hasNext() && !cancelled) {
//...
}
}
public void cancel() {
cancelled = true;
}
}
Then you would keep references to your BeamWorkerThread objects and call cancel() on it to preempt its execution.
Why don't I like interrupts?
Marko mentioned that the cancelled flag above is essentially reinventing Thread.interrupted(). It's a valid criticism. Here's why I prefer not to use interrupts in this scenario.
1. It's dependent on certain threading configurations.
If your task represents a cancellable piece of code that can be submitted to an executor, or called directly, using Thread.interrupt() to cancel execution in the general case assumes that the code receiving the interrupt will be the code that should know how to cleanly cancel the task.
That might be true in this case, but we only know so because we know how both the cancel and the task work internally. But imagine we had something like this:
Task does piece of work
Listeners are notified on-thread for that first piece of work
First listener decides to cancel the task using Thread.interrupt()
Second listener does some interruptible piece of work, and is interrupted. It logs but otherwise ignores the interrupt.
Task does not receive interrupt, and task is not cancelled.
In other words, I feel that interrupt() is too global of a mechanism. Like any shared global state, it makes assumptions about all of the actors. That's what I mean by saying that using interrupt() exposes/couples to details about the run context. By encapsulating it in a cancel() method applicable only for that task instance, you eliminate that global state.
2. It's not always an option.
The classic example here is an InputStream. If you have a task that blocks on reading from an InputStream, interrupt() will do nothing to unblock it. The only way to unblock it is to manually close the stream, and that's something best done in a cancel() method for the task itself. Having one way to cancel a task (e.g. Cancellable), regardless of its implementation, seems ideal to me.

Use the ExecutorService.shutdownNow() method. It will stop the executor from accepting more submissions and returns with the Future objects of the ongoing tasks that you can call cancel(true) on to interrupt the execution. Of course, you will have to discard this executor as it cannot be restarted.
The cancel() method may not terminate the execution immediately if the Thread is not waiting on a monitor (not blocked interruptibly), and also if you swallow the InterruptedException that will be raised in this case.

Java - threads + action

I'm new to Java so I have a simple question that I don't know where to start from -
I need to write a function that accepts an Action, at a multi-threads program , and only the first thread that enter the function do the action, and all the other threads wait for him to finish, and then return from the function without doing anything.
As I said - I don't know where to begin because,
first - there isn't a static var at the function (static like as in c / c++ ) so how do I make it that only the first thread would start the action, and the others do nothing ?
second - for the threads to wait, should I use
public synchronized void lala(Action doThis)
{....}
or should i write something like that inside the function
synchronized (this)
{
...
notify();
}
Thanks !

If you want all threads arriving at a method to wait for the first, then they must synchronize on a common object. It could be the same instance (this) on which the methods are invoked, or it could be any other object (an explicit lock object).
If you want to ensure that the first thread is the only one that will perform the action, then you must store this fact somewhere, for all other threads to read, for they will execute the same instructions.
Going by the previous two points, one could lock on this 'fact' variable to achieve the desired outcome
static final AtomicBoolean flag = new AtomicBoolean(false); // synchronize on this, and also store the fact. It is static so that if this is in a Runnable instance will not appear to reset the fact. Don't use the Boolean wrapper, for the value of the flag might be different in certain cases.
public void lala(Action doThis)
{
synchronized (flag) // synchronize on the flag so that other threads arriving here, will be forced to wait
{
if(!flag.get()) // This condition is true only for the first thread.
{
doX();
flag.set(true); //set the flag so that other threads will not invoke doX.
}
}
...
doCommonWork();
...
}

If you're doing threading in any recent version of Java, you really should be using the java.util.concurrent package instead of using Threads directly.
Here's one way you could do it:
private final ExecutorService executor = Executors.newCachedThreadPool();
private final Map<Runnable, Future<?>> submitted
= new HashMap<Runnable, Future<?>>();
public void executeOnlyOnce(Runnable action) {
Future<?> future = null;
// NOTE: I was tempted to use a ConcurrentHashMap here, but we don't want to
// get into a possible race with two threads both seeing that a value hasn't
// been computed yet and both starting a computation, so the synchronized
// block ensures that no other thread can be submitting the runnable to the
// executor while we are checking the map. If, on the other hand, it's not
// a problem for two threads to both create the same value (that is, this
// behavior is only intended for caching performance, not for correctness),
// then it should be safe to use a ConcurrentHashMap and use its
// putIfAbsent() method instead.
synchronized(submitted) {
future = submitted.get(action);
if(future == null) {
future = executor.submit(action);
submitted.put(action, future);
}
}
future.get(); // ignore return value because the runnable returns void
}
Note that this assumes that your Action class (I'm assuming you don't mean javax.swing.Action, right?) implements Runnable and also has a reasonable implementation of equals() and hashCode(). Otherwise, you may need to use a different Map implementation (for example, IdentityHashMap).
Also, this assumes that you may have multiple different actions that you want to execute only once. If that's not the case, then you can drop the Map entirely and do something like this:
private final ExecutorService executor = Executors.newCachedThreadPool();
private final Object lock = new Object();
private volatile Runnable action;
private volatile Future<?> future = null;
public void executeOnlyOnce(Runnable action) {
synchronized(lock) {
if(this.action == null) {
this.action = action;
this.future = executor.submit(action);
} else if(!this.action.equals(action)) {
throw new IllegalArgumentException("Unexpected action");
}
}
future.get();
}

public synchronized void foo()
{
...
}
is equivalent to
public void foo()
{
synchronized(this)
{
...
}
}
so either of the two options should work. I personally like the synchronized method option.

Synchronizing the whole method can sometimes be overkill if there is only a certain part of the code that deals with shared data (for example, a common variable that each thread is updating).
Best approach for performance is to only use the synchronized keyword just around the shared data. If you synchronized the whole method when it is not entirely necessarily then a lot of threads will be waiting when they can still do work within their own local scope.
When a thread enters the synchronize it acquires a lock (if you use the this object it locks on the object itself), the other will wait till the lock-acquiring thread has exited. You actually don't need a notify statement in this situation as the threads will release the lock when they exit the synchronize statement.

Thread that can restart based on a condition

The basic idea is that I have a native function I want to call in a background thread with a user selected value and the thread cannot be interrupted when started. If the user decides to change the value used to perform the task while the thread is running (they can do this from a GUI), the thread should finish its task with the previous value and then restart with the new value. When the task is done and the value hasn't changed, the thread should end and call a callback function.
This is what my current code looks like for the thread starting part:
volatile int taskValue;
volatile boolean taskShouldRestart;
void setTaskValue(int value)
{
taskValue = value;
synchronized (threadShouldRestart)
{
if task thread is already running
threadShouldRestart = true
else
{
threadShouldRestart = false
create and start new thread
}
}
}
And the actual work thread looks like this:
while (true)
{
nativeFunctionCall(taskValue);
synchronized (threadShouldRestart)
{
if (!threadShouldRestart)
{
invokeTaskCompletedCallbackFunction();
return;
}
}
}
I'm locking on the "threadShouldRestart" part because e.g. I don't want this changing to true just as the thread decides it's done which means the thread wouldn't restart when it was meant to.
Are there any cleaner ways to do this or Java utility classes I could be using?

You could design your run() method as follows:
public void run() {
int currentTaskValue;
do {
currentTaskValue = taskValue;
// perform the work...
} while (currentTaskValue != taskValue);
}
I think the volatile declaration on taskValue is enough for this, since reads and writes of primitives no larger than 32 bits are atomic.

Have you considered a ThreadPoolExecutor? It seems to lend itself well to your problem as you mentioned you have no need to restart or stop a thread which has already started.
http://download.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/ThreadPoolExecutor.html
A user could submit as many tasks as they like to a task queue, tasks will be processed concurrently by some number of worker threads you define in the ThreadPoolExecutor constructor.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.