Java + Threads: processing lines in parallel

Java + Threads: processing lines in parallel - java

I want to process a large number of independant lines in parallel. In the following code I'm creating a pool of NUM_THREAD Theads containing POOL_SIZE lines.
Each thread is started and I then wait for each thread using 'join'.
I guess it is a bad practice as here, a finished Thread will have to wait for his siblings in the pool.
What would be the correct way to implement this code ? Which classes should I use ?
Thanks !
class FasterBin extends Thread
{
private List<String> dataRows=new ArrayList<String>();
private Object result=null;
#Override
public void run()
{
for(String s:dataRows)
{
//Process item here (....)
}
}
}
(...)
List<FasterBin> threads=new Vector<FasterBin>();
String line;
Iterator<String> iter=(...);
for(;;)
{
while(threads.size()< NUM_THREAD)
{
FasterBin bin=new FasterBin();
while(
bin.dataRows.size() < POOL_SIZE &&
iter.hasNext()
)
{
nRow++;
bin.dataRows.add(iter.next());
}
if(bin.dataRows.isEmpty()) break;
threads.add(bin);
}
if(threads.isEmpty()) break;
for(FasterBin t:threads)
{
t.start();
}
for(FasterBin t:threads)
{
t.join();
}
for(FasterBin t:threads)
{
save(t.result);// ## do something with the result (save into a db etc...)
}
threads.clear();
}
finally
{
while(!threads.isEmpty())
{
FasterBin b=threads.remove(threads.size()-1);
try {
b.interrupt();
}
catch (Exception e)
{
}
}
}

Do NOT do all this by yourself! It is extremely hard to get 1) robust and 2) right.
Instead rewrite your stuff to create a lot of Runnables or Callables and use a suitable ExecutorService to get an Executor to process them with the behaviour you want.
Note that this stay inside the current JVM. If you have more than one JVM available (on multiple machines) I would recommend opening a new question.

java.util.concurrent.ThreadPoolExecutor.
ThreadPoolExecutor x=new ScheduledThreadPoolExecutor(10);
x.execute(runnable);
See this for an overview: Java API for util.concurrent

Direct use of Threads is actually discouraged - look at the package java.util.concurrent, you'll find there ThreadPools and Futures which should be used instead.
Thread.join doesn't mean that the Thread waits for others, it means your main Thread waits for one of the Thread in list to die. In this case your main Thread waits for the slowiest working Thread to finish. I don't see a problem with this approach.

Yes, in some sense, a finished Thread would have to wait for his siblings in the pool: when a thread finishes, it stops, and does not help other threads to finish sooner. Better say, the whole work waits for the thread which works for the longest time.
This is because each thread has exactly one task. You better create many tasks, much more than the number of threads, and put them all in a single queue. Let all working threads take their tasks from that queue in a loop. Then the difference in time for all threads would be roughly the time to execute one task, which is small because tasks are small.
You can start the pool of working threads yourself, or you can wrap each task in a Runnable and submit them to a standard thread pool - this makes no difference.

Related

How to share the variable between two threads in java?

I have a loop that doing this:
WorkTask wt = new WorkTask();
wt.count = count;
Thread a = new Thread(wt);
a.start();
When the workTask is run, the count will wt++ ,
but the WorkTask doesn't seems change the count number, and between the thread, the variable can't share within two thread, what did I wrote wrong? Thanks.

Without seeing the code for WorkThread it's hard to pin down the problem, but most likely you are missing synchronization between the two threads.
Whenever you start a thread, there are no guarantees on whether the original thread or the newly created thread runs first, or how they are scheduled. The JVM/operating system could choose to run the original thread to completion and then start running the newly created thread, run the newly created thread to completion and then switch back to the original thread, or anything in between.
In order to control how the threads run, you have to synchronize them explicitly. There are several ways to control the interaction between threads - certainly too much to describe in a single answer. I would recommend the concurrency trail of the Java tutorials for a broad overview, but in your specific case the synchronization mechanisms to get you started will probably be Thread.join and the synchronized keyword (one specific use of this keyword is described in the Java tutorials).

Make the count variable static (it looks like each thread has its own version of the variable right now) and use a mutex to make it thread safe (ie use the synchronized instruction)

From your description I came up with the following to demonstrate what I perceived as your issue. This code, should output 42. But it outputs 41.
public class Test {
static class WorkTask implements Runnable {
static int count;
#Override
public void run() {
count++;
}
}
public static void main(String... args) throws Exception {
WorkTask wt = new WorkTask();
wt.count = 41;
Thread a = new Thread(wt);
a.start();
System.out.println(wt.count);
}
}
The problem is due to the print statement running before thread had a chance to start.
To cause the current thread ( the thread that is going to read variable count ) to wait until the thread finishes, add the following after starting thre thread.
a.join();

If you are wishing to get a result back from a thread, I would recommend you to use Callable
interface and an ExecutorSercive to submit it. e.g:
Future future = Executors.newCachedThreadPool().submit
(new Callable<Interger>()
{
int count = 1000;
#Override public Integer call() throws Exception
{
//here goes the operations you want to be executed concurrently.
return count + 1; //Or whatever the result is.
}
}
//Here goes the operations you need before the other thread is done.
System.out.println(future.get()); //Here you will retrieve the result from
//the other thread. if the result is not ready yet, the main thread
//(current thread) will wait for it to finish.
this way you don't have to deal with the synchronization problems and etc.
you can see further about this in Java documentations:
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/package-summary.html

Does Java notify waiting threads implicitly?

I wrote a test app that should never stop. It issues t.wait() (t is a Thread object), but I never call notify. Why does this code end?
Despite the main thread synchronizing on t, the spawned thread runs, so it doesn't lock this object.
public class ThreadWait {
public static void main(String sArgs[]) throws InterruptedException {
System.out.println("hello");
Thread t = new MyThread();
synchronized (t) {
t.start();
Thread.sleep(5000);
t.wait();
java.lang.System.out.println("main done");
}
}
}
class MyThread extends Thread {
public void run() {
for (int i = 1; i <= 5; i++) {
java.lang.System.out.println("" + i);
try {
Thread.sleep(500);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
}
}
The result is that the main thread waits 5 seconds and during this time worker gives its output. Then after 5 seconds are finished, the program exits. t.wait() does not wait. If the main thread wouldn't sleep for 5 seconds (commenting this line), then t.wait() would actually wait until the worker finishes. Of course, join() is a method to use here, but, unexpectedly, wait() does the same thing as join(). Why?
Maybe the JVM sees that, since only one thread is running, there is no chance to notify the main thread and solves the deadlock. If this is true, is it a documented feature?
I'm testing on Windows XP, Java 6.

You're waiting on a Thread - and while most objects aren't implicitly notified, a Thread object is notified when the thread terminates. It's documented somewhere (I'm looking for it...) that you should not use wait/notify on Thread objects, as that's done internally.
This is a good example of why it's best practice to use a "private" object for synchronization (and wait/notify) - something which only your code knows about. I usually use something like:
private final Object lock = new Object();
(In general, however, it's cleaner to use some of the higher-level abstractions provided by java.util.concurrent if you can. As noted in comments, it's also a good idea to implement Runnable rather than extending Thread yourself.)

The JavaDoc for wait gives the answer: spurious wakeups are possible. This means the JVM is free to end a call to wait whenever it wants.
The documentation even gives you a solution if you don't want this (which is probably always the case): put the call to wait in a loop and check whether the condition you wait for has become true after every wakeup.

How to wait for the start of a thread in java

I have a litte race condition in my current android instrumentation test. What I want is:
T1: Start Thread T2
T2: Do something
T1: Join with T2
With step 1 and 3 being Android live cycle events. But because in the instrumentation test everything happens very fast I get:
T1: Start Thread T2
T1: Join with T2 (which turn out to be a no-op)
T2: Do something
Sure I could add a few sleeps to get the desired behaviour but I wonder if there is better way to do it. i.E. is there a way to make sure the thread which was just start ()-ed did actually start for good and is not still sitting in some scheduling queue awaiting start-up.
(Andy boy, do I miss Ada's rendezvous based multitasking)
And to answer mat's question:
if (this.thread != null && this.thread.isAlive ())
{
this.stop.set (true);
try
{
this.thread.join (1000);
}
catch (final InterruptedException Exception)
{
android.util.Log.w (Actor.TAG, "Thread did not want to join.", Exception);
} // try
} // if
As I said: no-op when because the thread has not started yet.

I typically use a CountDownLatch e.g. see this answer on testing asynchronous processes.
If you want to synchronise the starting of many threads you can also use a CyclicBarrier.

Martin, looking at your code I get the feeling that you may not be using the Thread class the way it was designed to be used. In particular, testing whether the other thread is alive seems like an anti-pattern. In most practical scenarios you can omit the this.thread.isAlive () condition from your code, and the program will still work.
It seems that you're trying to make two threads (that should do two different things) run the same piece of code, and you use logical conditions (such as this.thread != null) to decide which of the two threads is currently running.
Typically, you'd write two classes each one extending Thread and implementing a run() method. Each run() method realizes the logic of a single thread. Then you'd launch the 2nd thread from the first, and call join() on that 2nd thread to wait for it to complete.
public class SecondThread extends Thread {
public void run() {
...
}
}
public class FirstThread extends Thread {
public void run() {
// Only FirstThread is running
...
SecondThread st = new SecondThread();
st.start();
// Now both threads are running
...
st.join(); // Wait for SecondThread to complete
// Only FirstThread is running
...
}
}

killing an infinite loop in java

I am using a third-party library to process a large number of data sets. The process very occasionally goes into an infinite loop (or is blocked - don't know why and can't get into the code). I'd like to kill this after a set time and continue to the next case. A simple example is:
for (Object data : dataList) {
Object result = TheirLibrary.processData(data);
store(result);
}
processData normally takes 1 second max. I'd like to set a timer which kills processData() after , say, 10 seconds
EDIT
I would appreciate a code snippet (I am not practiced in using Threads). The Executor approach looks useful but I don't quite know how to start. Also the pseudocode for the more conventional approach is too general for me to code.
#Steven Schlansker - suggests that unless the thirdparty app anticipates the interrupt it won't work. Again detail and examples would be appreciated
EDIT
I got the precise solution I was wanting from my colleagues Sam Adams, which I am appending as an answer. It has more detail than the other answers, but I will give them both a vote. I'll mark Sam's as the approved answer

One of the ExecutorService.invokeAll(...) methods takes a timeout argument. Create a single Callable that calls the library, and wrap it in a List as an argument to that method. The Future returned indicate how it went.
(Note: untested by me)

Put the call to the library in another thread and kill this thread after a timeout. That way you could also proces multiple objects at the same time if they are not dependant to each other.
EDIT: Democode request
This is pseudo code so you have to improve and extend it. Also error checking weather a call was succesful or not will be of help.
for (Object data : dataList) {
Thread t = new LibThread(data);
// store the thread somewhere with an id
// tid and starting time tstart
// threads
t.start();
}
while(!all threads finished)
{
for (Thread t : threads)
{
// get start time of thread
// and check the timeout
if (runtime > timeout)
{
t.stop();
}
}
}
class LibThread extends Thread {
Object data;
public TextThread(Object data)
{
this.data = data;
}
public void processData()
{
Object result = TheirLibrary.processData(data);
store(result);
}
}

Sam Adams sent me the following answer, which is my accepted one
Thread thread = new Thread(myRunnableCode);
thread.start();
thread.join(timeoutMs);
if (thread.isAlive()) {
thread.interrupt();
}
and myRunnableCode regularly checks Thread.isInterrupted(), and exits cleanly if this returns true.
Alternatively you can do:
Thread thread = new Thread(myRunnableCode);
thread.start();
thread.join(timeoutMs);
if (thread.isAlive()) {
thread.stop();
}
But this method has been deprecated since it is DANGEROUS.
http://download.oracle.com/javase/1.4.2/docs/api/java/lang/Thread.html#stop()
"This method is inherently unsafe. Stopping a thread with Thread.stop causes it to unlock all of the monitors that it has locked (as a natural consequence of the unchecked ThreadDeath exception propagating up the stack). If any of the objects previously protected by these monitors were in an inconsistent state, the damaged objects become visible to other threads, potentially resulting in arbitrary behavior."
I've implemented the second and it does what I want at present.

Is Future.get() a replacement for Thread.join()?

I want to write a command line daemon that runs forever. I understand that if I want the JVM to be able to shutdown gracefully in linux, one needs to wrap the bootstrap via some C code. I think I'll be ok with a shutdown hook for now.
On to my questions:
My main(String[]) block will fire off a separate Superdaemon.
The Superdaemon will poll and loop forever.
So normally I would do:
class Superdaemon extends Thread { ... }
class Bootstrap
{
public static void main( String[] args )
{
Thread t = new Superdaemon();
t.start();
t.join();
}
}
Now I figured that if I started Superdaemon via an Executor, I can do
Future<?> f = exec.submit( new Superdaemon() );
f.get();
Is Future.get() implemented with Thread.join() ?
If not, does it behave equivalently ?
Regards,
ashitaka

Yes, the way you've written these is equivalent.
However, you don't really need to wait for the Superdaemon thread to complete. When the main thread finishes executing main(), that thread exits, but the JVM will not. The JVM will keep running until the last non-daemon thread exits its run method.
For example,
public class KeepRunning {
public static void main(String[] args) {
Superdaemon d = new Superdaemon();
d.start();
System.out.println(Thread.currentThread().getName() + ": leaving main()");
}
}
class Superdaemon extends Thread {
public void run() {
System.out.println(Thread.currentThread().getName() + ": starting");
try { Thread.sleep(2000); } catch(InterruptedException e) {}
System.out.println(Thread.currentThread().getName() + ": completing");
}
}
You'll see the output:
main: leaving main()
Thread-0: starting
Thread-0: completing
In other words, the main thread finishes first, then the secondary thread completes and the JVM exits.

The issue is that books like JCIP is advocating that we use Executors to starts Threads. So I'm trying my best not to use Thread.start(). I'm not sure if I would necessarily choose a particular way of doing things just based on simplicity. There must be a more convincing reason, no ?
The convincing reason to use java.util.concurrent is that multi-threaded programming is very tricky. Java offers the tools to that (Threads, the synchronized and volatile keywords), but that does not mean that you can safely use them directly without shooting yourself in the foot: Either too much synchronization, resulting in unnecessary bottlenecks and deadlocks, or too less, resulting in erratic behaviour due to race conditions).
With java.util.concurrent you get a set of utilities (written by experts) for the most common usage patterns, that you can just use without worrying that you got the low-level stuff right.
In your particular case, though, I do not quite see why you need a separate Thread at all, you might as well use the main one:
public static void main( String[] args )
{
Runnable t = new Superdaemon();
t.run();
}
Executors are meant for tasks that you want to run in the background (when you have multiple parallel tasks or when your current thread can continue to do something else).

Future.get() will get the future response from an asynchronous call. This will also block if the call has not been completed yet. It is much like a thread join.
http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/Future.html

Sort'a. Future.get() is for having a thread go off and calculate something and then return it to the calling thread in a safe fashion. It'd work if the get never returned. But, I'd stick with the join call as it's simpler and no Executer overhead (not that there would be all that much).
Edit
It looks like ExecutorService.submit(Runnable) is intended to do exectly what you're attempting. It just returns null when the Runnable completes. Interesting.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java + Threads: processing lines in parallel - java

java.util.concurrent.ThreadPoolExecutor. ThreadPoolExecutor x=new ScheduledThreadPoolExecutor(10); x.execute(runnable); See this for an overview: Java API for util.concurrent

Related

How to share the variable between two threads in java?

Does Java notify waiting threads implicitly?

How to wait for the start of a thread in java

killing an infinite loop in java

Is Future.get() a replacement for Thread.join()?

Categories

Resources