Starting thread in a servlet, what can be the issues?

Starting thread in a servlet, what can be the issues? - java

I have a web application, that, on a single request may require to load hundreds of data. Now the problem is that data is scattered. So, I have to load data from several places, apply filters on them, process them and then respond. Performing all these operations sequentially makes servlet slow!
So I have thought of loading all the data in separate threads like t[i] = new Thread(loadData).start();, waiting for all threads to finish using while(i < count) t[i].join(); and when done, join the data and respond.
Now I am not sure if this approach is right or there is some better method. I have read somewhere is that spawning thread in servlets is not advisable.
My desired code will look something like this.
protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException
{
Iterable<?> requireddata = requiredData(request);
Thread[] t = new Thread[requireddata.size];
int i = 0;
while (requireddata.hasNext())
{
t[i] = new Thread(new loadData(requiredata.next())).start();
i++;
}
for(i = 0 ; i < t.length ; i++)
t[i].join();
// after getting the data process and respond!
}

The main problem is that you'll bring the server to its knees if many concurrent requests comes in for your servlet, because you don't limit the number of threads that can be spawned. Another problem is that you keep creating new threads instead of reusing them, which is inefficient.
These two problems are solved easily by using a thread pool. And Java has native support for them. Read the tutorial.
Also, make sure to shutdown the thread pool when the webapp is shut down, using a ServletContextListener.

Sounds like a problem for the CyclicBarrier.
For example:
ExecutorService executor = Executors.newFixedThreadPool(requireddata.size);
public void executeAllAndAwaitCompletion(List<? extends T> threads){
final CyclicBarrier barrier = new CyclicBarrier(threads.size() + 1);
for(final T thread : threads){
executor.submit(new Runnable(){
public void run(){
//it is not a mistake to call run() here
thread.run();
barrier.await();
}
});
}
barrier.await();
}
The last thread from threads will be excuted once the all others finish.
Instead of calling Executors.newFixedThreadPool(requireddata.size);, it is better to reuse some existing thread pool.

You may consider using Executor framework from java.util.concurrent api. For example you can create your computation task as Callable and then submit that task to a ThreadPoolExecutor. Sample code from Java Concurrency in Practice:-
public class Renderer {
private final ExecutorService executor;
Renderer(ExecutorService executor) { this.executor = executor; }
void renderPage(CharSequence source) {
final List<ImageInfo> info = scanForImageInfo(source);
CompletionService<ImageData> completionService =
new ExecutorCompletionService<ImageData>(executor);
for (final ImageInfo imageInfo : info)
completionService.submit(new Callable<ImageData>() {
public ImageData call() {
return imageInfo.downloadImage();
}
});
renderText(source);
try {
for (int t = 0, n = info.size(); t < n; t++) {
Future<ImageData> f = completionService.take();
ImageData imageData = f.get();
renderImage(imageData);
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
} catch (ExecutionException e) {
throw launderThrowable(e.getCause());
}
}
}

Since you are waiting for all the threads to complete and then you are providing the response, IMO multiple threads won't help if you are using just CPU cycles. It will only increase the response time by adding the context switch delay in the threads. A single thread will be better. However if network/IO etc are involved you can make use of thread pool.
But you would like to re-consider your approach. Processing huge amount of data synchronously in a http request is not advisable. Will not be a good experience for the end user. What you can do is start a thread to process the data and provide a response saying "It is processing". You can provide the web user with some kind gesture to check the status whenever he wants.

Related

Execution of Tasks in ExecutorService without Thread pauses

I have a thread pool with 8 threads
private static final ExecutorService SERVICE = Executors.newFixedThreadPool(8);
My mechanism emulating the work of 100 user (100 Tasks):
List<Callable<Boolean>> callableTasks = new ArrayList<>();
for (int i = 0; i < 100; i++) { // Number of users == 100
callableTasks.add(new Task(client));
}
SERVICE.invokeAll(callableTasks);
SERVICE.shutdown();
The user performs the Task of generating a document.
Get UUID of Task;
Get Task status every 10 seconds;
If Task is ready get document.
public class Task implements Callable<Boolean> {
private final ReportClient client;
public Task(ReportClient client) {
this.client = client;
}
#Override
public Boolean call() {
final var uuid = client.createDocument(documentId);
GetStatusResponse status = null;
do {
try {
Thread.sleep(10000); // This stop current thread, but not a Task!!!!
} catch (InterruptedException e) {
return Boolean.FALSE;
}
status = client.getStatus(uuid);
} while (Status.PENDING.equals(status.status()));
final var document = client.getReport(uuid);
return Boolean.TRUE;
}
}
I want to give the idle time (10 seconds) to another task. But when the command Thread.sleep(10000); is called, the current thread suspends its execution. First 8 Tasks are suspended and 92 Tasks are pending 10 seconds. How can I do 100 Tasks in progress at the same time?

The Answer by Yevgeniy looks correct, regarding Java today. You want to have your cake and eat it too, in that you want a thread to sleep before repeating a task but you also want that thread to do other work. That is not possible today, but may be in the future.
Project Loom
In current Java, a Java thread is mapped directly to a host OS thread. In all common OSes such as macOS, BSD, Linux, Windows, and such, when code executing in a host thread blocks (stops to wait for sleep, or storage I/O, or network I/O, etc.) the thread too blocks. The blocked thread suspends, and the host OS generally runs another thread on that otherwise unused core. But the crucial point is that the suspended thread performs no further work until your blocking call to sleep returns.
This picture may change in the not-so-distant future. Project Loom seeks to add virtual threads to the concurrency facilities in Java.
In this new technology, many Java virtual threads are mapped to each host OS thread. Juggling the many Java virtual threads is managed by the JVM rather than by the OS. When the JVM detects a virtual thread’s executing code is blocking, that virtual thread is "parked", set aside by the JVM, with another virtual thread swapped out for execution on that "real" host OS thread. When the other thread returns from its blocking call, it can be reassigned to a "real" host OS thread for further execution. Under Project Loom, the host OS threads are kept busy, never idled while any pending virtual thread has work to do.
This swapping between virtual threads is highly efficient, so that thousands, even millions, of threads can be running at a time on conventional computer hardware.
Using virtual threads, your code will indeed work as you had hoped: A blocking call in Java will not block the host OS thread. But virtual threads are experimental, still in development, scheduled as a preview feature in Java 19. Early-access builds of Java 19 with Loom technology included are available now for you to try. But for production deployment today, you'll need to follow the advice in the Answer by Yevgeniy.
Take my coverage here with a grain of salt, as I am not an expert on concurrency. You can hear it from the actual experts, in the articles, interviews, and presentations by members of the Project Loom team including Ron Pressler and Alan Bateman.

EDIT: I just posted this answer and realized that you seem to be using that code to emulate real user interactions with some system. I would strongly recommend just using a load testing utility for that, rather than trying to come up with your own. However, in that case just using a CachedThreadPool might do the trick, although probably not a very robust or scalable solution.
Thread.sleep() behavior here is working as intended: it suspends the thread to let the CPU execute other threads.
Note that in this state a thread can be interrupted for a number of reasons unrelated to your code, and in that case your Task returns false: I'm assuming you actually have some retry logic down the line.
So you want two mutually exclusive things: on the one hand, if the document isn't ready, the thread should be free to do something else, but should somehow return and check that document's status again in 10 seconds.
That means you have to choose:
You definitely need that once-every-10-seconds check for each document - in that case, maybe use a cachedThreadPool and have it generate as many threads as necessary, just keep in mind that you'll carry the overhead for numerous threads doing virtually nothing.
Or, you can first initiate that asynchronous document creation process and then only check for status in your callables, retrying as needed.
Something like:
public class Task implements Callable<Boolean> {
private final ReportClient client;
private final UUID uuid;
// all args constructor omitted for brevity
#Override
public Boolean call() {
GetStatusResponse status = client.getStatus(uuid);
if (Status.PENDING.equals(status.status())) {
final var document = client.getReport(uuid);
return Boolean.TRUE;
} else {
return Boolean.FALSE; //retry next time
}
}
}
List<Callable<Boolean>> callableTasks = new ArrayList<>();
for (int i = 0; i < 100; i++) {
var uuid = client.createDocument(documentId); //not sure where documentId comes from here in your code
callableTasks.add(new Task(client, uuid));
}
List<Future<Boolean>> results = SERVICE.invokeAll(callableTasks);
// retry logic until all results come back as `true` here
This assumes that createDocument is relatively efficient, but that stage can be parallelized just as well, you just need to use a separate list of Runnable tasks and invoke them using the executor service.
Note that we also assume that the document's status will indeed eventually change to something other than PENDING, and that might very well not be the case. You might want to have a timeout for retries.

In your case, it seems like you need to check if a certain condition is met every x seconds. In fact, from your code the document generation seems asynchronous and what the Task keeps doing after that is just is waiting for the document generation to happen.
You could launch every document generation from your Thread-Main and use a ScheduledThreadPoolExecutor to verify every x seconds whether the document generation has been completed. At that point, you retrieve the result and cancel the corresponding Task's scheduling.
Basically, one ConcurrentHashMap is shared among the thread-main and the Tasks you've scheduled (mapRes), while the other, mapTask, is just used locally within the thread-main to keep track of the ScheduledFuture returned by every Task.
public class Main {
public static void main(String[] args) {
ScheduledThreadPoolExecutor pool = (ScheduledThreadPoolExecutor) Executors.newScheduledThreadPool(8);
//ConcurrentHashMap shared among the submitted tasks where each Task updates its corresponding outcome to true as soon as the document has been produced
ConcurrentHashMap<Integer, Boolean> mapRes = new ConcurrentHashMap<>();
for (int i = 0; i < 100; i++) {
mapRes.put(i, false);
}
String uuid;
ScheduledFuture<?> schedFut;
//HashMap containing the ScheduledFuture returned by scheduling each Task to cancel their repetition as soon as the document has been produced
Map<String, ScheduledFuture<?>> mapTask = new HashMap<>();
for (int i = 0; i < 100; i++) {
//Starting the document generation from the thread-main
uuid = client.createDocument(documentId);
//Scheduling each Task 10 seconds apart from one another and with an initial delay of i*10 to not start all of them at the same time
schedFut = pool.scheduleWithFixedDelay(new Task(client, uuid, mapRes), i * 10, 10000, TimeUnit.MILLISECONDS);
//Adding the ScheduledFuture to the map
mapTask.put(uuid, schedFut);
}
//Keep checking the outcome of each task until all of them have been canceled due to completion
while (!mapTasks.values().stream().allMatch(v -> v.isCancelled())) {
for (Integer key : mapTasks.keySet()) {
//Canceling the i-th task scheduling if:
// - Its result is positive (i.e. its verification is terminated)
// - The task hasn't been canceled already
if (mapRes.get(key) && !mapTasks.get(key).isCancelled()) {
schedFut = mapTasks.get(key);
schedFut.cancel(true);
}
}
//... eventually adding a sleep to check the completion every x seconds ...
}
pool.shutdown();
}
}
class Task implements Runnable {
private final ReportClient client;
private final String uuid;
private final ConcurrentHashMap mapRes;
public Task(ReportClient client, String uuid, ConcurrentHashMap mapRes) {
this.client = client;
this.uuid = uuid;
this.mapRes = mapRes;
}
#Override
public void run() {
//This is taken form your code and I'm assuming that if it's not pending then it's completed
if (!Status.PENDING.equals(client.getStatus(uuid).status())) {
mapRes.replace(uuid, true);
}
}
}
I've tested your case locally, by emulating a scenario where n Tasks wait for a folder with their same id to be created (or uuid in your case). I'll post it right here as a sample in case you'd like to try something simpler first.
public class Main {
public static void main(String[] args) {
ScheduledThreadPoolExecutor pool = (ScheduledThreadPoolExecutor) Executors.newScheduledThreadPool(2);
ConcurrentHashMap<Integer, Boolean> mapRes = new ConcurrentHashMap<>();
for (int i = 0; i < 16; i++) {
mapRes.put(i, false);
}
ScheduledFuture<?> schedFut;
Map<Integer, ScheduledFuture<?>> mapTasks = new HashMap<>();
for (int i = 0; i < 16; i++) {
schedFut = pool.scheduleWithFixedDelay(new MyTask(i, mapRes), i * 20, 3000, TimeUnit.MILLISECONDS);
mapTasks.put(i, schedFut);
}
while (!mapTasks.values().stream().allMatch(v -> v.isCancelled())) {
for (Integer key : mapTasks.keySet()) {
if (mapRes.get(key) && !mapTasks.get(key).isCancelled()) {
schedFut = mapTasks.get(key);
schedFut.cancel(true);
}
}
}
pool.shutdown();
}
}
class MyTask implements Runnable {
private int num;
private ConcurrentHashMap mapRes;
public MyTask(int num, ConcurrentHashMap mapRes) {
this.num = num;
this.mapRes = mapRes;
}
#Override
public void run() {
System.out.println("Task " + num + " is checking whether the folder exists: " + Files.exists(Path.of("./" + num)));
if (Files.exists(Path.of("./" + num))) {
mapRes.replace(num, true);
}
}
}

Why jvm recreate thread pool in case fixedThreadPool and don't do it in case of cachedThreadPool?

I have the code sample:
public class ThreadPoolTest {
public static void main(String[] args) throws InterruptedException {
for (int i = 0; i < 100; i++) {
if (test() != 5 * 100) {
throw new RuntimeException("main");
}
}
test();
}
private static long test() throws InterruptedException {
ExecutorService executorService = Executors.newFixedThreadPool(100);
CountDownLatch countDownLatch = new CountDownLatch(100 * 5);
Set<Thread> threads = Collections.synchronizedSet(new HashSet<>());
AtomicLong atomicLong = new AtomicLong();
for (int i = 0; i < 5 * 100; i++) {
Thread.sleep(100);
executorService.submit(new Runnable() {
#Override
public void run() {
try {
threads.add(Thread.currentThread());
atomicLong.incrementAndGet();
countDownLatch.countDown();
Thread.sleep(1000);
} catch (Exception e) {
System.out.println(e);
}
}
});
}
executorService.shutdown();
countDownLatch.await();
if (threads.size() != 100) {
throw new RuntimeException("test");
}
return atomicLong.get();
}
}
I especially made application to work long.
And I see jvisualVM.
Each time gap threadpool was recreated.
After several minutes I see:
but if I use newCachedThreadPool instead of newFixedThreadPool I see constant picture:
Can you explain this behaviour?
P.S.
Problem was that exception occures in code and second iteration was not started

To answer your question; just look here:
private static long test() throws InterruptedException {
ExecutorService executorService = Executors.newFixedThreadPool(100);
The JVM creates a new ThreadPool during each run of test(), because you tell it to do so.
In other words: if you intend to re-use the same threadpool, then avoid creating/shutting down your instances all the time.
In that sense, the simple fix is: move the creation of that ExecutorService into your main() method; and pass the service as argument to your test() method.
Edit: regarding your last comment on cached vs. fixed threadpool; you probably want to look into this question.

Because you asked it to, in your code ? :) Try moving the Pool creation code outside the test.

From docs:
newFixedThreadPool
Creates a thread pool that reuses a fixed number of threads operating off a shared unbounded queue. At any point, at most nThreads threads will be active processing tasks. If additional tasks are submitted when all threads are active, they will wait in the queue until a thread is available. If any thread terminates due to a failure during execution prior to shutdown, a new one will take its place if needed to execute subsequent tasks. The threads in the pool will exist until it is explicitly shutdown.
newCachedThreadPool
Creates a thread pool that creates new threads as needed, but will reuse previously constructed threads when they are available. These pools will typically improve the performance of programs that execute many short-lived asynchronous tasks. Calls to execute will reuse previously constructed threads if available. If no existing thread is available, a new thread will be created and added to the pool. Threads that have not been used for sixty seconds are terminated and removed from the cache. Thus, a pool that remains idle for long enough will not consume any resources. Note that pools with similar properties but different details (for example, timeout parameters) may be created using ThreadPoolExecutor constructors.

Why cannot `ExecutorService` consistently schedule threads?

I am attempting to reimplement my concurrent code using CyclicBarrier which is new to me. I can do without it but am time trialling it against my other solution, the problem I have is a deadlock situation with the following code:
//instance variables (fully initialised elsewhere).
private final ExecutorService exec = Executors.newFixedThreadPool(4);
private ArrayList<IListener> listeners = new ArrayList<IListener>();
private int[] playerIds;
private class WorldUpdater {
final CyclicBarrier barrier1;
final CyclicBarrier barrier2;
volatile boolean anyChange;
List<Callable<Void>> calls = new ArrayList<Callable<Void>>();
class SyncedCallable implements Callable<Void> {
final IListener listener;
private SyncedCallable(IListener listener) {
this.listener = listener;
}
#Override
public Void call() throws Exception {
listener.startUpdate();
if (barrier1.await() == 0) {
anyChange = processCommons();
}
barrier2.await();
listener.endUpdate(anyChange);
return null;
}
}
public WorldUpdater(ArrayList<IListener> listeners, int[] playerIds) {
barrier2 = new CyclicBarrier(listeners.size());
barrier1 = new CyclicBarrier(listeners.size());
for (int i : playerIds)
calls.add(new SyncedCallable(listeners.get(i)));
}
void start(){
try {
exec.invokeAll(calls);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
void someMethodCalledEveryFrame() {
//Calls some Fisher-something method that shuffles int[]
shufflePIDs();
WorldUpdater updater = new WorldUpdater(listeners, playerIds);
updater.start();
}
I use the debugger in Android Studio (intelliJ) to pause execution at this stage. I get multiple threads showing the my await calls as the last of my code to be executed
->Unsafe.park
->LockSupport.park
->AbstractQueuedSynchronizer$ConditionObject.await
->CyclicBarrier.doWait
->CyclicBarrier.await
At least one thread will be have this stack:
->Unsafe.park.
->LockSupport.park
->AbstractQueuedSynchronizer$ConditionObject.await
->LinkedBlockingQueue.take
->ThreadPoolExecutor.getTask
->ThreadPoolExecutor.runWorker
->ThreadPoolExecutor$Worker.run
->Thread.run
I notice that the CyclicBarrier plays no part in these latter stray threads.
processCommons is calling exec.invokeAll (on the 3 listeners), I suppose this means I am running out of threads. But many times this doesn't happen so please could someone clarify why ExecutorService cannot consistently schedule my threads? They have their own stack and program counter so I would have thought this to not be a problem. I only ever have max 4 running at once. Someone help me with the math?

What is the value of listeners.size() when your WorldUpdater is created? If it is more than four, then your threads will never get past the barrier.
Your ExecutorService has exactly four threads. No more, no fewer. The callers of barrier1.await() and barrier2.await() will not get past the barrier until exactly listeners.size() threads are waiting.
My gut reaction is, it would be a mistake for pool threads to use a CyclicBarrier. CyclicBarrier is only useful when you know exactly how many threads will be using it. But, when you're using a thread pool, you often do not know the size of the pool. In fact, in a real-world (i.e., commercial) application, if you're using a thread pool, It probably was not created by your code at all. It probably was created somewhere else, and passed in to your code as an injected dependency.

I did a little experiment and came up with:
#Override
public Void call() throws Exception {
System.out.println("startUpdate, Thread:" + Thread.currentThread());
listener.startUpdate();
if (barrier1.await() == 0) {
System.out.println("processCommons, Thread:" + Thread.currentThread());
anyChange = processCommons();
}
barrier2.await();
System.out.println("endUpdate, Thread:" + Thread.currentThread());
listener.endUpdate(anyChange);
return null;
}
Which revealed when using a pool of 3 with 3 listeners, I will always hang in processCommons which contains the following:
List<Callable<Void>> calls = new ArrayList<Callable<Void>>();
for (IListener listiner : listeners)
calls.add(new CommonsCallable(listener));
try {
exec.invokeAll(calls);
} catch (InterruptedException e) {
e.printStackTrace();
}
With 2 threads waiting at the barrier and the third attempting to create 3 more. I needed one extra thread in the ExecutorService and the 2 at the barrier could be "recycled" as I was asking in my question. I've got references to 6 threads at this stage when exec is only holding 4. This can run happily for many minutes.
private final ExecutorService exec = Executors.newFixedThreadPool(8);
Should be better, but it was not.
Finally I did breakpoint stepping in intelliJ (thanks ideaC!)
The problem is
if (barrier1.await() == 0) {
anyChange = processCommons();
}
barrier2.await();
Between the 2 await you may get several suspended threads that haven't actually reached the await. In the case of 3 listeners out of a pool of 4 it only takes one to get "unscheduled" (or whatever) and barrier2 will never get the full complement. But what about when I have a pool of 8? The same behaviour manifests with all but two of the threads the stack of limbo:
->Unsafe.park.
->LockSupport.park
->AbstractQueuedSynchronizer$ConditionObject.await
->LinkedBlockingQueue.take
->ThreadPoolExecutor.getTask
->ThreadPoolExecutor.runWorker
->ThreadPoolExecutor$Worker.run
->Thread.run
What can be happening here to disable all 5 threads? I should have taken James Large's advice and avoided crowbarring in this over elaborate CyclicBarrier.--UPDATE-- It can run all night now without CyclicBarrier.

Process Large File for HTTP Calls in Java

I have a file with millions of lines in it that I need to process. Each line of the file will result in an HTTP call. I'm trying to figure out the best way to attack the problem.
I obviously could just read the file and make the calls sequentially, but it would be incredibly slow. I'd like to parallelize the calls, but I'm not sure if I should read the entire file into memory (something I'm not a huge fan of) or try to parallelize the reading of the file as well (which I'm not sure would make sense).
Just looking for some thoughts here on the best way to attack the problem. If there is an existing framework or library that does something similar I'm happy to use that as well.
Thanks.

I'd like to parallelize the calls, but I'm not sure if I should read the entire file into memory
You should used an ExecutorService with a bounded BlockingQueue. As you read in your million lines you submit jobs to the thread-pool until the BlockingQueue is full. This way you will be able to run 100 (or whatever number is optimal) of HTTP requests simultaneously without having to read all of the lines of the file beforehand.
You'll need to set up a RejectedExecutionHandler that blocks if the queue is full. This is better than a caller runs handler.
BlockingQueue<Runnable> queue = new ArrayBlockingQueue<Runnable>(100);
// NOTE: you want the min and max thread numbers here to be the same value
ThreadPoolExecutor threadPool =
new ThreadPoolExecutor(nThreads, nThreads, 0L, TimeUnit.MILLISECONDS, queue);
// we need our RejectedExecutionHandler to block if the queue is full
threadPool.setRejectedExecutionHandler(new RejectedExecutionHandler() {
#Override
public void rejectedExecution(Runnable r, ThreadPoolExecutor executor) {
try {
// this will block the producer until there's room in the queue
executor.getQueue().put(r);
} catch (InterruptedException e) {
throw new RejectedExecutionException(
"Unexpected InterruptedException", e);
}
}
});
// now read in the urls
while ((String url = urlReader.readLine()) != null) {
// submit them to the thread-pool. this may block.
threadPool.submit(new DownloadUrlRunnable(url));
}
// after we submit we have to shutdown the pool
threadPool.shutdown();
// wait for them to complete
threadPool.awaitTermination(Long.MAX_VALUE, TimeUnit.MILLISECONDS);
...
private class DownloadUrlRunnable implements Runnable {
private final String url;
public DownloadUrlRunnable(String url) {
this.url = url;
}
public void run() {
// download the URL
}
}

Gray's approach seems to be good. The other approach I would suggest is to split the files into chunks (you will have to write the logic), and process those with multiple threads.

How to wait for all threads to complete

I created some workflow how to wait for all thread which I created. This example works in 99 % of cases but sometimes method waitForAllDone is finished sooner then all thread are completed. I know it because after waitForAllDone I am closing stream which is using created thread so then occurs exception
Caused by: java.io.IOException: Stream closed
my thread start with:
#Override
public void run() {
try {
process();
} finally {
Factory.close(this);
}
}
closing:
protected static void close(final Client client) {
clientCount--;
}
when I creating thread I call this:
public RobWSClient getClient() {
clientCount++;
return new Client();
}
and clientCount variable inside factory:
private static volatile int clientCount = 0;
wait:
public void waitForAllDone() {
try {
while (clientCount > 0) {
Thread.sleep(10);
}
} catch (InterruptedException e) {
LOG.error("Error", e);
}
}

You need to protect the modification and reading of clientCount via synchronized. The main issue is that clientCount-- and clientCount++ are NOT an atomic operation and therefore two threads could execute clientCount-- / clientCount++ and end up with the wrong result.
Simply using volatile as you do above would ONLY work if ALL operations on the field were atomic. Since they are not, you need to use some locking mechanism. As Anton states, AtomicInteger is an excellent choice here. Note that it should be either final or volatile to ensure it is not thread-local.
That being said, the general rule post Java 1.5 is to use a ExecutorService instead of Threads. Using this in conjuction with Guava's Futures class could make waiting for all to complete to be as simple as:
Future<List<?>> future = Futures.successfulAsList(myFutureList);
future.get();
// all processes are complete
Futures.successfulAsList

I'm not sure that the rest of your your code has no issues, but you can't increment volatile variable like this - clientCount++; Use AtomicInteger instead

The best way to wait for threads to terminate, is to use one of the high-level concurrency facilities.
In this case, the easiest way would be to use an ExecutorService.
You would 'offer' a new task to the executor in this way:
...
ExecutorService executor = Executors.newFixedThreadPool(POOL_SIZE);
...
Client client = getClient(); //assuming Client implements runnable
executor.submit(client);
...
public void waitForAllDone() {
executor.awaitTermination(30, TimeUnit.SECOND) ; wait termination of all threads for 30 secs
...
}
In this way, you don't waste valuable CPU cycles in busy waits or sleep/awake cycles.
See ExecutorService docs for details.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.