Java concurrency counter not properly clean up

Java concurrency counter not properly clean up - java

This is a java concurrency question. 10 jobs need to be done, each of them will have 32 worker threads. Worker thread will increase a counter . Once the counter is 32, it means this job is done and then clean up counter map. From the console output, I expect that 10 "done" will be output, pool size is 0 and counterThread size is 0.
The issues are :
most of time, "pool size: 0 and countThreadMap size:3" will be
printed out. even those all threads are gone, but 3 jobs are not
finished yet.
some time, I can see nullpointerexception in line 27. I have used ConcurrentHashMap and AtomicLong, why still have concurrency
exception.
Thanks
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.atomic.AtomicLong;
public class Test {
final ConcurrentHashMap<Long, AtomicLong[]> countThreadMap = new ConcurrentHashMap<Long, AtomicLong[]>();
final ExecutorService cachedThreadPool = Executors.newCachedThreadPool();
final ThreadPoolExecutor tPoolExecutor = ((ThreadPoolExecutor) cachedThreadPool);
public void doJob(final Long batchIterationTime) {
for (int i = 0; i < 32; i++) {
Thread workerThread = new Thread(new Runnable() {
#Override
public void run() {
if (countThreadMap.get(batchIterationTime) == null) {
AtomicLong[] atomicThreadCountArr = new AtomicLong[2];
atomicThreadCountArr[0] = new AtomicLong(1);
atomicThreadCountArr[1] = new AtomicLong(System.currentTimeMillis()); //start up time
countThreadMap.put(batchIterationTime, atomicThreadCountArr);
} else {
AtomicLong[] atomicThreadCountArr = countThreadMap.get(batchIterationTime);
atomicThreadCountArr[0].getAndAdd(1);
countThreadMap.put(batchIterationTime, atomicThreadCountArr);
}
if (countThreadMap.get(batchIterationTime)[0].get() == 32) {
System.out.println("done");
countThreadMap.remove(batchIterationTime);
}
}
});
tPoolExecutor.execute(workerThread);
}
}
public void report(){
while(tPoolExecutor.getActiveCount() != 0){
//
}
System.out.println("pool size: "+ tPoolExecutor.getActiveCount() + " and countThreadMap size:"+countThreadMap.size());
}
public static void main(String[] args) throws Exception {
Test test = new Test();
for (int i = 0; i < 10; i++) {
Long batchIterationTime = System.currentTimeMillis();
test.doJob(batchIterationTime);
}
test.report();
System.out.println("All Jobs are done");
}
}

Let’s dig through all the mistakes of thread related programming, one man can make:
Thread workerThread = new Thread(new Runnable() {
…
tPoolExecutor.execute(workerThread);
You create a Thread but don’t start it but submit it to an executor. It’s a historical mistake of the Java API to let Thread implement Runnable for no good reason. Now, every developer should be aware, that there is no reason to treat a Thread as a Runnable. If you don’t want to start a thread manually, don’t create a Thread. Just create the Runnable and pass it to execute or submit.
I want to emphasize the latter as it returns a Future which gives you for free what you are attempting to implement: the information when a task has been finished. It’s even easier when using invokeAll which will submit a bunch of Callables and return when all are done. Since you didn’t tell us anything about your actual task, it’s not clear whether you can let your tasks simply implement Callable (may return null) instead of Runnable.
If you can’t use Callables or don’t want to wait immediately on submission, you have to remember the returned Futures and query them at a later time:
static final ExecutorService cachedThreadPool = Executors.newCachedThreadPool();
public static List<Future<?>> doJob(final Long batchIterationTime) {
final Random r=new Random();
List<Future<?>> list=new ArrayList<>(32);
for (int i = 0; i < 32; i++) {
Runnable job=new Runnable() {
public void run() {
// pretend to do something
LockSupport.parkNanos(TimeUnit.SECONDS.toNanos(r.nextInt(10)));
}
};
list.add(cachedThreadPool.submit(job));
}
return list;
}
public static void main(String[] args) throws Exception {
Test test = new Test();
Map<Long,List<Future<?>>> map=new HashMap<>();
for (int i = 0; i < 10; i++) {
Long batchIterationTime = System.currentTimeMillis();
while(map.containsKey(batchIterationTime))
batchIterationTime++;
map.put(batchIterationTime,doJob(batchIterationTime));
}
// print some statistics, if you really need
int overAllDone=0, overallPending=0;
for(Map.Entry<Long,List<Future<?>>> e: map.entrySet()) {
int done=0, pending=0;
for(Future<?> f: e.getValue()) {
if(f.isDone()) done++;
else pending++;
}
System.out.println(e.getKey()+"\t"+done+" done, "+pending+" pending");
overAllDone+=done;
overallPending+=pending;
}
System.out.println("Total\t"+overAllDone+" done, "+overallPending+" pending");
// wait for the completion of all jobs
for(List<Future<?>> l: map.values())
for(Future<?> f: l)
f.get();
System.out.println("All Jobs are done");
}
But note that if you don’t need the ExecutorService for subsequent tasks, it’s much easier to wait for all jobs to complete:
cachedThreadPool.shutdown();
cachedThreadPool.awaitTermination(Long.MAX_VALUE, TimeUnit.DAYS);
System.out.println("All Jobs are done");
But regardless of how unnecessary the manual tracking of the job status is, let’s delve into your attempt, so you may avoid the mistakes in the future:
if (countThreadMap.get(batchIterationTime) == null) {
The ConcurrentMap is thread safe, but this does not turn your concurrent code into sequential one (that would render multi-threading useless). The above line might be processed by up to all 32 threads at the same time, all finding that the key does not exist yet so possibly more than one thread will then be going to put the initial value into the map.
AtomicLong[] atomicThreadCountArr = new AtomicLong[2];
atomicThreadCountArr[0] = new AtomicLong(1);
atomicThreadCountArr[1] = new AtomicLong(System.currentTimeMillis());
countThreadMap.put(batchIterationTime, atomicThreadCountArr);
That’s why this is called the “check-then-act” anti-pattern. If more than one thread is going to process that code, they all will put their new value, being confident that this was the right thing as they have checked the initial condition before acting but for all but one thread the condition has changed when acting and they are overwriting the value of a previous put operation.
} else {
AtomicLong[] atomicThreadCountArr = countThreadMap.get(batchIterationTime);
atomicThreadCountArr[0].getAndAdd(1);
countThreadMap.put(batchIterationTime, atomicThreadCountArr);
Since you are modifying the AtomicInteger which is already stored into the map, the put operation is useless, it will put the very array that it retrieved before. If there wasn’t the mistake that there can be multiple initial values as described above, the put operation had no effect.
}
if (countThreadMap.get(batchIterationTime)[0].get() == 32) {
Again, the use of a ConcurrentMap doesn’t turn the multi-threaded code into sequential code. While it is clear that the only last thread will update the atomic integer to 32 (when the initial race condition doesn’t materialize), it is not guaranteed that all other threads have already passed this if statement. Therefore more than one, up to all threads can still be at this point of execution and see the value of 32. Or…
System.out.println("done");
countThreadMap.remove(batchIterationTime);
One of the threads which have seen the 32 value might execute this remove operation. At this point, there might be still threads not having executed the above if statement, now not seeing the value 32 but producing a NullPointerException as the array supposed to contain the AtomicInteger is not in the map anymore. This is what happens, occasionally…

After creating your 10 jobs, your main thread is still running - it doesn't wait for your jobs to complete before it calls report on the test. You try to overcome this with the while loop, but tPoolExecutor.getActiveCount() is potentially coming out as 0 before the workerThread is executed, and then the countThreadMap.size() is happening after the threads were added to your HashMap.
There are a number of ways to fix this - but I will let another answer-er do that because I have to leave at the moment.

Related

Execution of Tasks in ExecutorService without Thread pauses

I have a thread pool with 8 threads
private static final ExecutorService SERVICE = Executors.newFixedThreadPool(8);
My mechanism emulating the work of 100 user (100 Tasks):
List<Callable<Boolean>> callableTasks = new ArrayList<>();
for (int i = 0; i < 100; i++) { // Number of users == 100
callableTasks.add(new Task(client));
}
SERVICE.invokeAll(callableTasks);
SERVICE.shutdown();
The user performs the Task of generating a document.
Get UUID of Task;
Get Task status every 10 seconds;
If Task is ready get document.
public class Task implements Callable<Boolean> {
private final ReportClient client;
public Task(ReportClient client) {
this.client = client;
}
#Override
public Boolean call() {
final var uuid = client.createDocument(documentId);
GetStatusResponse status = null;
do {
try {
Thread.sleep(10000); // This stop current thread, but not a Task!!!!
} catch (InterruptedException e) {
return Boolean.FALSE;
}
status = client.getStatus(uuid);
} while (Status.PENDING.equals(status.status()));
final var document = client.getReport(uuid);
return Boolean.TRUE;
}
}
I want to give the idle time (10 seconds) to another task. But when the command Thread.sleep(10000); is called, the current thread suspends its execution. First 8 Tasks are suspended and 92 Tasks are pending 10 seconds. How can I do 100 Tasks in progress at the same time?

The Answer by Yevgeniy looks correct, regarding Java today. You want to have your cake and eat it too, in that you want a thread to sleep before repeating a task but you also want that thread to do other work. That is not possible today, but may be in the future.
Project Loom
In current Java, a Java thread is mapped directly to a host OS thread. In all common OSes such as macOS, BSD, Linux, Windows, and such, when code executing in a host thread blocks (stops to wait for sleep, or storage I/O, or network I/O, etc.) the thread too blocks. The blocked thread suspends, and the host OS generally runs another thread on that otherwise unused core. But the crucial point is that the suspended thread performs no further work until your blocking call to sleep returns.
This picture may change in the not-so-distant future. Project Loom seeks to add virtual threads to the concurrency facilities in Java.
In this new technology, many Java virtual threads are mapped to each host OS thread. Juggling the many Java virtual threads is managed by the JVM rather than by the OS. When the JVM detects a virtual thread’s executing code is blocking, that virtual thread is "parked", set aside by the JVM, with another virtual thread swapped out for execution on that "real" host OS thread. When the other thread returns from its blocking call, it can be reassigned to a "real" host OS thread for further execution. Under Project Loom, the host OS threads are kept busy, never idled while any pending virtual thread has work to do.
This swapping between virtual threads is highly efficient, so that thousands, even millions, of threads can be running at a time on conventional computer hardware.
Using virtual threads, your code will indeed work as you had hoped: A blocking call in Java will not block the host OS thread. But virtual threads are experimental, still in development, scheduled as a preview feature in Java 19. Early-access builds of Java 19 with Loom technology included are available now for you to try. But for production deployment today, you'll need to follow the advice in the Answer by Yevgeniy.
Take my coverage here with a grain of salt, as I am not an expert on concurrency. You can hear it from the actual experts, in the articles, interviews, and presentations by members of the Project Loom team including Ron Pressler and Alan Bateman.

EDIT: I just posted this answer and realized that you seem to be using that code to emulate real user interactions with some system. I would strongly recommend just using a load testing utility for that, rather than trying to come up with your own. However, in that case just using a CachedThreadPool might do the trick, although probably not a very robust or scalable solution.
Thread.sleep() behavior here is working as intended: it suspends the thread to let the CPU execute other threads.
Note that in this state a thread can be interrupted for a number of reasons unrelated to your code, and in that case your Task returns false: I'm assuming you actually have some retry logic down the line.
So you want two mutually exclusive things: on the one hand, if the document isn't ready, the thread should be free to do something else, but should somehow return and check that document's status again in 10 seconds.
That means you have to choose:
You definitely need that once-every-10-seconds check for each document - in that case, maybe use a cachedThreadPool and have it generate as many threads as necessary, just keep in mind that you'll carry the overhead for numerous threads doing virtually nothing.
Or, you can first initiate that asynchronous document creation process and then only check for status in your callables, retrying as needed.
Something like:
public class Task implements Callable<Boolean> {
private final ReportClient client;
private final UUID uuid;
// all args constructor omitted for brevity
#Override
public Boolean call() {
GetStatusResponse status = client.getStatus(uuid);
if (Status.PENDING.equals(status.status())) {
final var document = client.getReport(uuid);
return Boolean.TRUE;
} else {
return Boolean.FALSE; //retry next time
}
}
}
List<Callable<Boolean>> callableTasks = new ArrayList<>();
for (int i = 0; i < 100; i++) {
var uuid = client.createDocument(documentId); //not sure where documentId comes from here in your code
callableTasks.add(new Task(client, uuid));
}
List<Future<Boolean>> results = SERVICE.invokeAll(callableTasks);
// retry logic until all results come back as `true` here
This assumes that createDocument is relatively efficient, but that stage can be parallelized just as well, you just need to use a separate list of Runnable tasks and invoke them using the executor service.
Note that we also assume that the document's status will indeed eventually change to something other than PENDING, and that might very well not be the case. You might want to have a timeout for retries.

In your case, it seems like you need to check if a certain condition is met every x seconds. In fact, from your code the document generation seems asynchronous and what the Task keeps doing after that is just is waiting for the document generation to happen.
You could launch every document generation from your Thread-Main and use a ScheduledThreadPoolExecutor to verify every x seconds whether the document generation has been completed. At that point, you retrieve the result and cancel the corresponding Task's scheduling.
Basically, one ConcurrentHashMap is shared among the thread-main and the Tasks you've scheduled (mapRes), while the other, mapTask, is just used locally within the thread-main to keep track of the ScheduledFuture returned by every Task.
public class Main {
public static void main(String[] args) {
ScheduledThreadPoolExecutor pool = (ScheduledThreadPoolExecutor) Executors.newScheduledThreadPool(8);
//ConcurrentHashMap shared among the submitted tasks where each Task updates its corresponding outcome to true as soon as the document has been produced
ConcurrentHashMap<Integer, Boolean> mapRes = new ConcurrentHashMap<>();
for (int i = 0; i < 100; i++) {
mapRes.put(i, false);
}
String uuid;
ScheduledFuture<?> schedFut;
//HashMap containing the ScheduledFuture returned by scheduling each Task to cancel their repetition as soon as the document has been produced
Map<String, ScheduledFuture<?>> mapTask = new HashMap<>();
for (int i = 0; i < 100; i++) {
//Starting the document generation from the thread-main
uuid = client.createDocument(documentId);
//Scheduling each Task 10 seconds apart from one another and with an initial delay of i*10 to not start all of them at the same time
schedFut = pool.scheduleWithFixedDelay(new Task(client, uuid, mapRes), i * 10, 10000, TimeUnit.MILLISECONDS);
//Adding the ScheduledFuture to the map
mapTask.put(uuid, schedFut);
}
//Keep checking the outcome of each task until all of them have been canceled due to completion
while (!mapTasks.values().stream().allMatch(v -> v.isCancelled())) {
for (Integer key : mapTasks.keySet()) {
//Canceling the i-th task scheduling if:
// - Its result is positive (i.e. its verification is terminated)
// - The task hasn't been canceled already
if (mapRes.get(key) && !mapTasks.get(key).isCancelled()) {
schedFut = mapTasks.get(key);
schedFut.cancel(true);
}
}
//... eventually adding a sleep to check the completion every x seconds ...
}
pool.shutdown();
}
}
class Task implements Runnable {
private final ReportClient client;
private final String uuid;
private final ConcurrentHashMap mapRes;
public Task(ReportClient client, String uuid, ConcurrentHashMap mapRes) {
this.client = client;
this.uuid = uuid;
this.mapRes = mapRes;
}
#Override
public void run() {
//This is taken form your code and I'm assuming that if it's not pending then it's completed
if (!Status.PENDING.equals(client.getStatus(uuid).status())) {
mapRes.replace(uuid, true);
}
}
}
I've tested your case locally, by emulating a scenario where n Tasks wait for a folder with their same id to be created (or uuid in your case). I'll post it right here as a sample in case you'd like to try something simpler first.
public class Main {
public static void main(String[] args) {
ScheduledThreadPoolExecutor pool = (ScheduledThreadPoolExecutor) Executors.newScheduledThreadPool(2);
ConcurrentHashMap<Integer, Boolean> mapRes = new ConcurrentHashMap<>();
for (int i = 0; i < 16; i++) {
mapRes.put(i, false);
}
ScheduledFuture<?> schedFut;
Map<Integer, ScheduledFuture<?>> mapTasks = new HashMap<>();
for (int i = 0; i < 16; i++) {
schedFut = pool.scheduleWithFixedDelay(new MyTask(i, mapRes), i * 20, 3000, TimeUnit.MILLISECONDS);
mapTasks.put(i, schedFut);
}
while (!mapTasks.values().stream().allMatch(v -> v.isCancelled())) {
for (Integer key : mapTasks.keySet()) {
if (mapRes.get(key) && !mapTasks.get(key).isCancelled()) {
schedFut = mapTasks.get(key);
schedFut.cancel(true);
}
}
}
pool.shutdown();
}
}
class MyTask implements Runnable {
private int num;
private ConcurrentHashMap mapRes;
public MyTask(int num, ConcurrentHashMap mapRes) {
this.num = num;
this.mapRes = mapRes;
}
#Override
public void run() {
System.out.println("Task " + num + " is checking whether the folder exists: " + Files.exists(Path.of("./" + num)));
if (Files.exists(Path.of("./" + num))) {
mapRes.replace(num, true);
}
}
}

Java: How do I use the result of the first of multiple threads that complete?

I have a problem in Java where I want to spawn multiple concurrent threads simultaneously. I want to use the result of whichever thread/task finishes first, and abandon/ignore the results of the other threads/tasks. I found a similar question for just cancelling slower threads but thought that this new question was different enough to warrant an entirely new question.
Note that I have included an answer below based what I considered to be the best answer from this similar question but changed it to best fit this new (albeit similar) problem. I wanted to share the knowledge and see if there is a better way of solving this problem, hence the question and self-answer below.

You can use ExecutorService.invokeAny. From its documentation:
Executes the given tasks, returning the result of one that has completed successfully …. Upon normal or exceptional return, tasks that have not completed are cancelled.

This answer is based off #lreeder's answer to the question "Java threads - close other threads when first thread completes".
Basically, the difference between my answer and his answer is that he closes the threads via a Semaphore and I just record the result of the fastest thread via an AtomicReference. Note that in my code, I do something a little weird. Namely, I use an instance of AtomicReference<Integer> instead of the simpler AtomicInteger. I do this so that I can compare and set the value to a null integer; I can't use null integers with AtomicInteger. This allows me to set any integer, not just a set of integers, excluding some sentinel value. Also, there are a few less important details like the use of an ExecutorService instead of explicit threads, and the changing of how Worker.completed is set, because previously it was possible that more than one thread could finish first.
public class ThreadController {
public static void main(String[] args) throws Exception {
new ThreadController().threadController();
}
public void threadController() throws Exception {
int numWorkers = 100;
List<Worker> workerList = new ArrayList<>(numWorkers);
CountDownLatch startSignal = new CountDownLatch(1);
CountDownLatch doneSignal = new CountDownLatch(1);
//Semaphore prevents only one thread from completing
//before they are counted
AtomicReference<Integer> firstInt = new AtomicReference<Integer>();
ExecutorService execSvc = Executors.newFixedThreadPool(numWorkers);
for (int i = 0; i < numWorkers; i++) {
Worker worker = new Worker(i, startSignal, doneSignal, firstInt);
execSvc.submit(worker);
workerList.add(worker);
}
//tell workers they can start
startSignal.countDown();
//wait for one thread to complete.
doneSignal.await();
//Look at all workers and find which one is done
for (int i = 0; i < numWorkers; i++) {
if (workerList.get(i).isCompleted()) {
System.out.printf("Thread %d finished first, firstInt=%d\n", i, firstInt.get());
}
}
}
}
class Worker implements Runnable {
private final CountDownLatch startSignal;
private final CountDownLatch doneSignal;
// null when not yet set, not so for AtomicInteger
private final AtomicReference<Integer> singleResult;
private final int id;
private boolean completed = false;
public Worker(int id, CountDownLatch startSignal, CountDownLatch doneSignal, AtomicReference<Integer> singleResult) {
this.id = id;
this.startSignal = startSignal;
this.doneSignal = doneSignal;
this.singleResult = singleResult;
}
public boolean isCompleted() {
return completed;
}
#Override
public void run() {
try {
//block until controller counts down the latch
startSignal.await();
//simulate real work
Thread.sleep((long) (Math.random() * 1000));
//try to get the semaphore. Since there is only
//one permit, the first worker to finish gets it,
//and the rest will block.
boolean finishedFirst = singleResult.compareAndSet(null, id);
// only set this if the result was successfully set
if (finishedFirst) {
//Use a completed flag instead of Thread.isAlive because
//even though countDown is the last thing in the run method,
//the run method may not have before the time the
//controlling thread can check isAlive status
completed = true;
}
}
catch (InterruptedException e) {
//don't care about this
}
//tell controller we are finished, if already there, do nothing
doneSignal.countDown();
}
}

Learning about Threads

I have written a simple program, that is intended to start a few threads. The threads should then pick a integer n from an integer array, use it to wait n and return the time t the thread waited back into an array for the results.
If one thread finishes it's task, it should pick the next one, that has not yet being assigned to another thread.
Of course: The order in the arrays has to be maintained, so that integers and results match.
My code runs smoothly as far I see.
However I use one line of code block I find in particular unsatisfying and hope there is a good way to fix this without changing too much:
while(Thread.activeCount() != 1); // first evil line
I kinda abuse this line to make sure all my threads finish getting all the tasks done, before I access my array with the results. I want to do that to prevent ill values, like 0.0, Null Pointer Exception... etc. (in short anything that would make an application with an actual use crash)
Any sort of constructive help is appreciated. I am also not sure, if my code still runs smoothly for very very long arrays of tasks for the threads, for example the results no longer match the order of the integer.
Any constructive help is appreciated.
First class:
public class ThreadArrayWriterTest {
int[] repitions;
int len = 0;
double[] timeConsumed;
public boolean finished() {
synchronized (repitions) {
return len <= 0;
}
}
public ThreadArrayWriterTest(int[] repitions) {
this.repitions = repitions;
this.len = repitions.length;
timeConsumed = new double[this.len];
}
public double[] returnTimes(int[] repititions, int numOfThreads, TimeConsumer timeConsumer) {
for (int i = 0; i < numOfThreads; i++) {
new Thread() {
public void run() {
while (!finished()) {
len--;
timeConsumed[len] = timeConsumer.returnTimeConsumed(repititions[len]);
}
}
}.start();
}
while (Thread.activeCount() != 1) // first evil line
;
return timeConsumed;
}
public static void main(String[] args) {
long begin = System.currentTimeMillis();
int[] repitions = { 3, 1, 3, 1, 2, 1, 3, 3, 3 };
int numberOfThreads = 10;
ThreadArrayWriterTest t = new ThreadArrayWriterTest(repitions);
double[] times = t.returnTimes(repitions, numberOfThreads, new TimeConsumer());
for (double d : times) {
System.out.println(d);
}
long end = System.currentTimeMillis();
System.out.println("Total time of execution: " + (end - begin));
}
}
Second class:
public class TimeConsumer {
double returnTimeConsumed(int repitions) {
long before = System.currentTimeMillis();
for (int i = 0; i < repitions; i++) {
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
long after = System.currentTimeMillis();
double ret = after - before;
System.out.println("It takes: " + ret + "ms" + " for " + repitions + " runs through the for-loop");
return ret;
}
}

The easiest way to wait for all threads to complete is to keep a Collection of them and then call Thread.join() on each one in turn.

In addition to .join() you can use ExecutorService to manage pools of threads,
An Executor that provides methods to manage termination and methods
that can produce a Future for tracking progress of one or more
asynchronous tasks.
An ExecutorService can be shut down, which will cause it to reject new
tasks. Two different methods are provided for shutting down an
ExecutorService. The shutdown() method will allow previously submitted
tasks to execute before terminating, while the shutdownNow() method
prevents waiting tasks from starting and attempts to stop currently
executing tasks. Upon termination, an executor has no tasks actively
executing, no tasks awaiting execution, and no new tasks can be
submitted. An unused ExecutorService should be shut down to allow
reclamation of its resources.
Method submit extends base method Executor.execute(Runnable) by
creating and returning a Future that can be used to cancel execution
and/or wait for completion. Methods invokeAny and invokeAll perform
the most commonly useful forms of bulk execution, executing a
collection of tasks and then waiting for at least one, or all, to
complete.
ExecutorService executorService = Executors.newFixedThreadPool(maximumNumberOfThreads);
CompletionService completionService = new ExecutorCompletionService(executorService);
for (int i = 0; i < numberOfTasks; ++i) {
completionService.take();
}
executorService.shutdown();
Plus take a look at ThreadPoolExecutor

Since java provides more advanced threading API with concurrent package, You should have look into ExecutorService, which simplifies thread management mechanism.
Simple to solution to your problem.
Use Executors API to create thread pool
static ExecutorService newFixedThreadPool(int nThreads)
Creates a thread pool that reuses a fixed number of threads operating off a shared unbounded queue.
Use invokeAll to wait for all tasks to complete.
Sample code:
ExecutorService service = Executors.newFixedThreadPool(10);
List<MyCallable> futureList = new ArrayList<MyCallable>();
for ( int i=0; i<12; i++){
MyCallable myCallable = new MyCallable((long)i);
futureList.add(myCallable);
}
System.out.println("Start");
try{
List<Future<Long>> futures = service.invokeAll(futureList);
for(Future<Long> future : futures){
try{
System.out.println("future.isDone = " + future.isDone());
System.out.println("future: call ="+future.get());
}
catch(Exception err1){
err1.printStackTrace();
}
}
}catch(Exception err){
err.printStackTrace();
}
service.shutdown();
Refer to this related SE question for more details on achieving the same:
wait until all threads finish their work in java

How to make a thread limit in Java

Let's say I have 1000 files to read and because of some limits, I want to read maximum 5 files in parallel. And, as soon as one of them is finished, I want a new one starts.
I have a main function who have the list of the files and I try changing a counter whenever one thread is finished. but it doesn't works!
Any suggestion?
The following is the main function loop
for (final File filename : folder.listFiles()) {
Object lock1 = new Object();
new myThread(filename, lock1).start();
counter++;
while (counter > 5);
}

Spawning threads like this is not the way to go. Use an ExecutorService and specify the pool to be 5. Put all the files in something like a BlockingQueue or another thread-safe collection and all the executing ones can just poll() it at will.
public class ThreadReader {
public static void main(String[] args) {
File f = null;//folder
final BlockingQueue<File> queue = new ArrayBlockingQueue<File>(1000);
for(File kid : f.listFiles()){
queue.add(kid);
}
ExecutorService pool = Executors.newFixedThreadPool(5);
for(int i = 1; i <= 5; i++){
Runnable r = new Runnable(){
public void run() {
File workFile = null;
while((workFile = queue.poll()) != null){
//work on the file.
}
}
};
pool.execute(r);
}
}
}

You can use an ExecutorService as a thread pool AND a queue.
ExecutorService pool = Executors.newFixedThreadPool(5);
File f = new File(args[0]);
for (final File kid : f.listFiles()) {
pool.execute(new Runnable() {
#Override
public void run() {
process(kid);
}
});
}
pool.shutdown();
// wait for them to finish for up to one minute.
pool.awaitTermination(1, TimeUnit.MINUTES);

The approach in Kylar's answer is the correct one. Use the executor classes provided by the Java class libraries rather than implementing thread pooling yourself from scratch (badly).
But I thought it might be useful to discuss the code in your question and why it doesn't work. (I've filled in some of the parts that you left out as best I can ...)
public class MyThread extends Thread {
private static int counter;
public MyThread(String fileName, Object lock) {
// Save parameters in instance variables
}
public void run() {
// Do stuff with instance variables
counter--;
}
public static void main(String[] args) {
// ...
for (final File filename : folder.listFiles()) {
Object lock1 = new Object();
new MyThread(filename, lock1).start();
counter++;
while (counter > 5);
}
// ...
}
}
OK, so what is wrong with this? Why doesn't it work?
Well the first problem is that in main you are reading and writing counter without doing any synchronization. I assume that it is also being updated by the worker threads - the code makes no sense otherwise. So that means that there is a good chance that the main threads won't see the result of the updates made by the child threads. In other words, while (counter > 5); could be an infinite loop. (In fact, this is pretty likely. The JIT compiler is allowed to generate code in which the counter > 5 simply tests the value of counter left in a register after the previous counter++; statement.
The second problem is that your while (counter > 5); loop is incredibly wasteful of resources. You are telling the JVM to poll a variable ... and it will do this potentially BILLIONS of times a second ... running one processor (core) flat out. You shouldn't do that. If you are going to implement this kind of stuff using low-level primitives, you should use Java's Object.wait() and Object.notify() methods; e.g. the main thread waits, and each worker thread notifies.

Whatever method you are using to create a new Thread, increment a global counter, add a conditional statement around the thread creation that if the limit has been reached then don't create a new thread, maybe push the files onto a queue (a list?) and then you could add another conditional statement, after a thread is created, if there are items in the queue, to process those items first.

Are tasks parallelized when executed via an ExecutorCompletionService?

I submitted 5 jobs to an ExecutorCompletionService, but it seems like the jobs are executed in sequence. The ExecutorService that is passed to the constructor of ExecutorCompletionService is created using newCacheThreadPool form. Am I doing anything wrong ?
UPDATE Each job is basically doing a database query & some calculation. The code for the ExecutorCompletionService is lifted as-is off the javadoc. I just replaced the Callables with my own custom Callable implementations.

The ExecutorCompletionService has nothing to do with how jobs are executed, it's simply a convenient way of retrieving the results.
Executors.newCachedThreadPool by default executes tasks in separate threads, which can be parallel, given that:
tasks are independent, and don't e.g. synchronize on the same object inside;
you have multiple hardware CPU threads.
The last point deserves an explanation. Although there are no guarantees, in practice the Sun JVM favours the currently executing thread so it's never swapped out in favour of another one. That means that your 5 tasks might end up being executed serially due to the JVM implementation and not having e.g. a multi-core machine.

I assume you meant Executors.newCachedThreadPool(). If so, execution should be parallelized as you expect.

Each job is basically doing a database query & some calculation. The code for the ExecutorCompletionService is lifted as-is off the javadoc. I just replaced the Callables with my own custom Callable implementations.
In that case, are you sure you're not mistaken in thinking they're executed sequentially because you're retrieving the results sequentially?
Throw in some debug logging lines in your callables to rule this out, and/or have a look at this limited usage scenario:
public static void main(String... args) throws InterruptedException, ExecutionException {
List<Callable<String>> list = new ArrayList<Callable<String>>();
list.add(new PowersOfX(2));
list.add(new PowersOfX(3));
list.add(new PowersOfX(5));
solve(Executors.newCachedThreadPool(), list);
}
static void solve(Executor e, Collection<Callable<String>> solvers) throws InterruptedException, ExecutionException {
CompletionService<String> ecs = new ExecutorCompletionService<String>(e);
for (Callable<String> s : solvers)
ecs.submit(s);
int n = solvers.size();
for (int i = 0; i < n; ++i) {
String r = ecs.take().get();
if (r != null)
System.out.println("Retrieved: " + r);
}
}
static class PowersOfX implements Callable<String> {
int x;
public PowersOfX(int x) {this.x = x;}
#Override
public String call() throws Exception {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < 10; i++) {
sb.append(Math.pow(2, i)).append('\t');
System.out.println(Math.pow(x, i));
Thread.sleep(2000);
}
return sb.toString();
}
}
Executing this you'll see the numbers are generated intermixed (and thus executed concurrently), but retrieving the results alone wont show you this level detail..

The execution will depend on a number of things. For example:
the length of time it takes to complete a job
the number of threads in the thread pool (a cached thread pool will only create threads if it thinks they are needed)
Executing in sequence is not necessarily wrong.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.