Java increase and decrease number of threads in interval

Java increase and decrease number of threads in interval - java

I want to create an API Load Test where number of parallel users(threads) increase to the pool size and after a while decrease. Right now I have a test where I can start all threads start at once.
// Thread pull - How many threads at once we will use as concurrent USERS
ExecutorService executor = Executors.newFixedThreadPool(Integer.parseInt(prop.getProperty("threadPool")));
// numberOfRequests - How many requests we want to send in total
int numberOfRequests = Integer.parseInt(prop.getProperty("numberOfRequests"));
CountDownLatch latch = new CountDownLatch(numberOfRequests);
List<PostCallableData> tasks = IntStream.range(0, numberOfRequests).mapToObj(i -> {
return new PostCallableData("Thread ", branch, 4, 5, latch);
}).collect(Collectors.toList());
List<Future<List<Integer>>> futures = executor.invokeAll(tasks);
latch.await();
executor.shutdown();
List<List<Integer>> results = futures.stream()
.map(future -> {
try {
return future.get();
} catch (Exception e) {
throw new RuntimeException(e);
}
})
.collect(Collectors.toList());
My goal is to start with one thread and add next after interval(can be changed by variable) to max thread pool. ex. till 30 threads add next every 30s.
Keep that for ex. for 45 minutes(variable) or number of and then decrease number of threads by one every 30 seconds.
List<PostCallableData> tasks = IntStream.range(0, numberOfRequests).mapToObj(i -> {
return new PostCallableData("Thread ", branch, 4, 5, latch);
}).collect(Collectors.toList());
List<Future<List<Integer>>> futures = executor.invokeAll(tasks);
Ideally lines above will be replaced by sort of a random action - I want to Post/Get/Delete actions to be run in this test.
What I need to do to have it increasing and decreasing gradually?

Related

how to block threads until smallrye mutiny reach specific condition?

I'd like to parallel into 3 threads. Then I read the output, if there's returning output more than 99, stop the two other threads. Then main thread will give an output as "99+". Otherwise if not reach 99, store it as is (integer value) then wait until other threads end with giving another value, then accumulate it. In short, accumulate value from all of those threads. If more than 99, give it as "99+" then stop unfinished thread. This is how I implemented it:
RequestDTO request; //this is http request data
ExecutorService executor = Executors.newFixedThreadPool(3);
//for flagging purpose, for counting how many sub threads end
//but I can't reference it directly just like I did to DTOResponse totalAll;
short asyncFlag = 0;
Cancellable
cancellableThreads1,
cancellableThreads2,
cancellableThreads3;
DTOResponse totalAll = new DTOResponse(); totalAll.total = 0;
LOGGER.info("start threads 1");
cancellableThreads1 =
Uni.createFrom().item(asyncFlag)
.runSubscriptionOn(executor).subscribe().with(consumer ->
{//it runs on new thread
Response response = method1(request).await().indefinitely();
LOGGER.info("got uniMethod1!");
DTOResponse totalTodo = response.readEntity(DTOResponse.class);
Integer total =(Integer) totalTodo.total;
totalAll.total = (Integer) totalAll.total + total;
LOGGER.info("total thread1 done: "+total);
if ((Integer) totalAll.total > 99){
totalAll.total = "99+";
}
//as I mentioned on comments above, I can't refer asyncFlag directly, so I put those as .item() parameter
//then I just refer it as consumer, but no matter how many consumer increase, it not change the asyncFlag on main thread
consumer++;
});
LOGGER.info("thread 1 already running asynchronus");
LOGGER.info("start threads 2");
cancellableThreads2 =
Uni.createFrom().item(asyncFlag)
.runSubscriptionOn(executor).subscribe().with(consumer ->
{//it runs on new thread
Response response = method2(request).await().indefinitely();
LOGGER.info("got uniMethod2!");
DTOResponse totalTodo = response.readEntity(DTOResponse.class);
Integer total =(Integer) totalTodo.total;
totalAll.total = (Integer) totalAll.total + total;
LOGGER.info("total thread2 done: "+total);
if ((Integer) totalAll.total > 99){
totalAll.total = "99+";
}
//as I mentioned on comments above, I can't refer asyncFlag directly, so I put those as .item() parameter
//then I just refer it as consumer, but no matter how many consumer increase, it not change the asyncFlag on main thread
consumer++;
});
LOGGER.info("thread 2 already running asynchronus");
LOGGER.info("start threads 3");
cancellableThreads2 =
Uni.createFrom().item(asyncFlag)
.runSubscriptionOn(executor).subscribe().with(consumer ->
{//it runs on new thread
Response response = method3(request).await().indefinitely();
LOGGER.info("got uniMethod3!");
DTOResponse totalTodo = response.readEntity(DTOResponse.class);
Integer total =(Integer) totalTodo.total;
totalAll.total = (Integer) totalAll.total + total;
LOGGER.info("total thread3 done: "+total);
if ((Integer) totalAll.total > 99){
totalAll.total = "99+";
}
//as I mentioned on comments above, I can't refer asyncFlag directly, so I put those as .item() parameter
//then I just refer it as consumer, but no matter how many consumer increase, it not change the asyncFlag on main thread
consumer++;
});
LOGGER.info("thread 3 already running asynchronus");
do{
//executed by main threads.
//I wanted to block in here until those condition is met
//actually is not blocking thread but forever loop instead
if(totalAll.total instanceof String || asyncFlag >=3){
cancellableThreads1.cancel();
cancellableThreads2.cancel();
cancellableThreads3.cancel();
}
//asyncFlag isn't increase even all of 3 threads has execute consumer++
}while(totalAll.total instanceof Integer && asyncFlag <3);
ResponseBuilder responseBuilder = Response.ok().entity(totalAll);
return Uni.createFrom().item("").onItem().transform(s->responseBuilder.build());
totalAll is able to be accessed by those subthreads, but not asyncFlag. my editor gave me red line with Local variable asyncFlag defined in an enclosing scope must be final or effectively finalJava(536871575) if asyncFlag written inside subthreads block. So I use consumer but it doesn't affected. Making loop is never ending unless total value turned into String (first condition)

You are better switching gears to use a reactive(-native) approach to your problem.
Instead of subscribing to each Uni then collecting their results individually in an imperative approach monitoring their progress, here down the series of steps that you should rather use in a rxified way:
Create all your Uni request-representing objects with whatever concurrency construct you would like: Uni#emitOn
Combine all your requests Unis into a Multi merging all of your initial requests executing them concurrently (not in an ordered fashion): MultiCreatedBy#merging
Scan the Multi emitted items, which are your requests results, as they come adding each item to an initial seed: MultiOnItem#scan
Keep on skipping the items sum until you first see a value exceeding a threshold (99 in your case) in which case you let the result flow through your stream pipeline: MultiSkip#first (not that the skip stage will automatically cancel upstream requests hence stop any useless request processing already inflight)
In case no item has been emitted downstream, meaning that the requests sum has not exceeded the , you sum up the initial Uni results (which are cached to avoid re-triggering the requests): UniOnNull#ifNull
Here down a pseudo implementation of the described stages:
public Uni<Response> request() {
RequestDTO request; //this is http request data
Uni<Object> requestOne = method1(request)
.emitOn(executor)
.map(response -> response.readEntity(DTOResponse.class))
.map(dtoResponse -> dtoResponse.total)
.memoize()
.atLeast(Duration.ofSeconds(3));
Uni<Object> requestTwo = method2(request)
.emitOn(executor)
.map(response -> response.readEntity(DTOResponse.class))
.map(dtoResponse -> dtoResponse.total)
.memoize()
.atLeast(Duration.ofSeconds(3));
Uni<Object> requestThree = method3(request)
.emitOn(executor)
.map(response -> response.readEntity(DTOResponse.class))
.map(dtoResponse -> dtoResponse.total)
.memoize()
.atLeast(Duration.ofSeconds(3));
return Multi.createBy()
.merging()
.withConcurrency(1)
.streams(requestOne.toMulti(), requestTwo.toMulti(), requestThree.toMulti())
.onItem()
.scan(() -> 0, (result, itemTotal) -> result + (Integer) itemTotal)
.skip()
.first(total -> total < 99)
.<Object>map(ignored -> "99+")
.toUni()
.onItem()
.ifNull()
.switchTo(
Uni.combine()
.all()
.unis(requestOne, requestTwo, requestThree)
.combinedWith((one, two, three) -> (Integer) one + (Integer) two + (Integer) three)
)
.map(result -> Response.ok().entity(result).build());
}

Why is CompletableFuture join/get faster in separate streams than using one stream

For the following program I am trying to figure out why using 2 different streams parallelizes the task and using the same stream and calling join/get on the Completable future makes them take longer time equivalent to as if they were sequentially processed).
public class HelloConcurrency {
private static Integer sleepTask(int number) {
System.out.println(String.format("Task with sleep time %d", number));
try {
TimeUnit.SECONDS.sleep(number);
} catch (InterruptedException e) {
e.printStackTrace();
return -1;
}
return number;
}
public static void main(String[] args) {
List<Integer> sleepTimes = Arrays.asList(1,2,3,4,5,6);
System.out.println("WITH SEPARATE STREAMS FOR FUTURE AND JOIN");
ExecutorService executorService = Executors.newFixedThreadPool(6);
long start = System.currentTimeMillis();
List<CompletableFuture<Integer>> futures = sleepTimes.stream()
.map(sleepTime -> CompletableFuture.supplyAsync(() -> sleepTask(sleepTime), executorService)
.exceptionally(ex -> { ex.printStackTrace(); return -1; }))
.collect(Collectors.toList());
executorService.shutdown();
List<Integer> result = futures.stream()
.map(CompletableFuture::join)
.collect(Collectors.toList());
long finish = System.currentTimeMillis();
long timeElapsed = (finish - start)/1000;
System.out.println(String.format("done in %d seconds.", timeElapsed));
System.out.println(result);
System.out.println("WITH SAME STREAM FOR FUTURE AND JOIN");
ExecutorService executorService2 = Executors.newFixedThreadPool(6);
start = System.currentTimeMillis();
List<Integer> results = sleepTimes.stream()
.map(sleepTime -> CompletableFuture.supplyAsync(() -> sleepTask(sleepTime), executorService2)
.exceptionally(ex -> { ex.printStackTrace(); return -1; }))
.map(CompletableFuture::join)
.collect(Collectors.toList());
executorService2.shutdown();
finish = System.currentTimeMillis();
timeElapsed = (finish - start)/1000;
System.out.println(String.format("done in %d seconds.", timeElapsed));
System.out.println(results);
}
}
Output
WITH SEPARATE STREAMS FOR FUTURE AND JOIN
Task with sleep time 6
Task with sleep time 5
Task with sleep time 1
Task with sleep time 3
Task with sleep time 2
Task with sleep time 4
done in 6 seconds.
[1, 2, 3, 4, 5, 6]
WITH SAME STREAM FOR FUTURE AND JOIN
Task with sleep time 1
Task with sleep time 2
Task with sleep time 3
Task with sleep time 4
Task with sleep time 5
Task with sleep time 6
done in 21 seconds.
[1, 2, 3, 4, 5, 6]

The two approaches are quite different, let me try to explain it clearly
1st approach : In the first approach you are spinning up all Async requests for all 6 tasks and then calling join function on each one of them to get the result
2st approach : But in the second approach you are calling the join immediately after spinning the Async request for each task. For example after spinning Async thread for task 1 calling join, make sure that thread to complete task and then only spin up the second task with Async thread
Note : Another side if you observe the output clearly, In the 1st approach output appears in random order since the all six tasks were executed asynchronously. But during second approach all tasks were executed sequentially one after the another.
I believe you have an idea how stream map operation is performed, or you can get more information from here or here
To perform a computation, stream operations are composed into a stream pipeline. A stream pipeline consists of a source (which might be an array, a collection, a generator function, an I/O channel, etc), zero or more intermediate operations (which transform a stream into another stream, such as filter(Predicate)), and a terminal operation (which produces a result or side-effect, such as count() or forEach(Consumer)). Streams are lazy; computation on the source data is only performed when the terminal operation is initiated, and source elements are consumed only as needed.

The stream framework does not define the order in which map operations are executed on stream elements, because it is not intended for use cases in which that might be a relevant issue. As a result, the particular way your second version is executing is equivalent, essentially, to
List<Integer> results = new ArrayList<>();
for (Integer sleepTime : sleepTimes) {
results.add(CompletableFuture
.supplyAsync(() -> sleepTask(sleepTime), executorService2)
.exceptionally(ex -> { ex.printStackTrace(); return -1; }))
.join());
}
...which is itself essentially equivalent to
List<Integer> results = new ArrayList<>()
for (Integer sleepTime : sleepTimes) {
results.add(sleepTask(sleepTime));
}

#Deadpool answered it pretty well, just adding my answer which can help someone understand it better.
I was able to get an answer by adding more printing to both methods.
TLDR
2 stream approach: We are starting up all 6 tasks asynchronously and then calling join function on each one of them to get the result in a separate stream.
1 stream approach: We are calling the join immediately after starting up each task. For example after spinning a thread for task 1, calling join makes sure the thread waits for completion of task 1 and then only spin up the second task with async thread.
Note: Also, if we observe the output clearly, in the 1 stream approach, output appears sequential order since the all six tasks were executed in order. But during second approach all tasks were executed in parallel, hence the random order.
Note 2: If we replace stream() with parallelStream() in the 1 stream approach, it will work identically to 2 stream approach.
More proof
I added more printing to the streams which gave the following outputs and confirmed the note above :
1 stream:
List<Integer> results = sleepTimes.stream()
.map(sleepTime -> CompletableFuture.supplyAsync(() -> sleepTask(sleepTime), executorService2)
.exceptionally(ex -> { ex.printStackTrace(); return -1; }))
.map(f -> {
int num = f.join();
System.out.println(String.format("doing join on task %d", num));
return num;
})
.collect(Collectors.toList());
WITH SAME STREAM FOR FUTURE AND JOIN
Task with sleep time 1
doing join on task 1
Task with sleep time 2
doing join on task 2
Task with sleep time 3
doing join on task 3
Task with sleep time 4
doing join on task 4
Task with sleep time 5
doing join on task 5
Task with sleep time 6
doing join on task 6
done in 21 seconds.
[1, 2, 3, 4, 5, 6]
2 streams:
List<CompletableFuture<Integer>> futures = sleepTimes.stream()
.map(sleepTime -> CompletableFuture.supplyAsync(() -> sleepTask(sleepTime), executorService)
.exceptionally(ex -> { ex.printStackTrace(); return -1; }))
.collect(Collectors.toList());
List<Integer> result = futures.stream()
.map(f -> {
int num = f.join();
System.out.println(String.format("doing join on task %d", num));
return num;
})
.collect(Collectors.toList());
WITH SEPARATE STREAMS FOR FUTURE AND JOIN
Task with sleep time 2
Task with sleep time 5
Task with sleep time 3
Task with sleep time 1
Task with sleep time 4
Task with sleep time 6
doing join on task 1
doing join on task 2
doing join on task 3
doing join on task 4
doing join on task 5
doing join on task 6
done in 6 seconds.
[1, 2, 3, 4, 5, 6]

Performance of executorService multithreading pool

I am using Java's concurrency library ExecutorService to run my tasks. The threshold for writing to the database is 200 QPS, however, this program can only reach 20 QPS with 15 threads. I tried 5, 10, 20, 30 threads, and they were even slower than 15 threads. Here is the code:
ExecutorService executor = Executors.newFixedThreadPool(15);
List<Callable<Object>> todos = new ArrayList<>();
for (final int id : ids) {
todos.add(Executors.callable(() -> {
try {
TestObject test = testServiceClient.callRemoteService();
SaveToDatabase();
} catch (Exception ex) {}
}));
}
try {
executor.invokeAll(todos);
} catch (InterruptedException ex) {}
executor.shutdown();
1) I checked the CPU usage of the linux server on which this program is running, and the usage was 90% and 60% (it has 4 CPUs). The memory usage was only 20%. So the CPU & memory were still fine. The database server's CPU usage was low (around 20%). What could prevent the speed from reaching 200 QPS? Maybe this service call: testServiceClient.callRemoteService()? I checked the server configuration for that call and it allows high number of calls per seconds.
2) If the count of id in ids is more than 50000, is it a good idea to use invokeAll? Should we split it to smaller batches, such as 5000 each batch?

There is nothing in this code which prevents this query rate, except creating and destroying a thread pool repeately is very expensive. I suggest using the Streams API which is not only simpler but reuses a built in thread pool
int[] ids = ....
IntStream.of(ids).parallel()
.forEach(id -> testServiceClient.callRemoteService(id));
Here is a benchmark using a trivial service. The main overhead is the latency in creating the connection.
public static void main(String[] args) throws IOException {
ServerSocket ss = new ServerSocket(0);
Thread service = new Thread(() -> {
try {
for (; ; ) {
try (Socket s = ss.accept()) {
s.getOutputStream().write(s.getInputStream().read());
}
}
} catch (Throwable t) {
t.printStackTrace();
}
});
service.setDaemon(true);
service.start();
for (int t = 0; t < 5; t++) {
long start = System.nanoTime();
int[] ids = new int[5000];
IntStream.of(ids).parallel().forEach(id -> {
try {
Socket s = new Socket("localhost", ss.getLocalPort());
s.getOutputStream().write(id);
s.getInputStream().read();
} catch (IOException e) {
e.printStackTrace();
}
});
long time = System.nanoTime() - start;
System.out.println("Throughput " + (int) (ids.length * 1e9 / time) + " connects/sec");
}
}
prints
Throughput 12491 connects/sec
Throughput 13138 connects/sec
Throughput 15148 connects/sec
Throughput 14602 connects/sec
Throughput 15807 connects/sec
Using an ExecutorService would be better as #grzegorz-piwowarek mentions.
ExecutorService es = Executors.newFixedThreadPool(8);
for (int t = 0; t < 5; t++) {
long start = System.nanoTime();
int[] ids = new int[5000];
List<Future> futures = new ArrayList<>(ids.length);
for (int id : ids) {
futures.add(es.submit(() -> {
try {
Socket s = new Socket("localhost", ss.getLocalPort());
s.getOutputStream().write(id);
s.getInputStream().read();
} catch (IOException e) {
e.printStackTrace();
}
}));
}
for (Future future : futures) {
future.get();
}
long time = System.nanoTime() - start;
System.out.println("Throughput " + (int) (ids.length * 1e9 / time) + " connects/sec");
}
es.shutdown();
In this case produces much the same results.

Why do you restrict yourself to such a low number of threads?
You're missing performance opportunities this way. It seems that your tasks are really not CPU-bound. The network operations (remote service + database query) may take up the majority of time for each task to finish. During these times, where a single task/thread needs to wait for some event (network,...), another thread can use the CPU. The more threads you make available to the system, the more threads may be waiting for their network I/O to complete while still having some threads use the CPU at the same time.
I suggest you drastically ramp up the number of threads for the executor. As you say that both remote servers are rather under-utilized, I assume the host your program runs at is the bottleneck at the moment. Try to increase (double?) the number of threads until either your CPU utilization approaches 100% or memory or the remote side become the bottleneck.
By the way, you shutdown the executor, but do you actually wait for the tasks to terminate? How do you measure the "QPS"?
One more thing comes to my mind: How are DB connections handled? I.e. how are SaveToDatabase()s synchronized? Do all threads share (and compete for) a single connection? Or, worse, will each thread create a new connection to the DB, do its thing, and then close the connection again? This may be a serious bottleneck because establishing a TCP connection and doing the authentication handshake may take up as much time as running a simple SQL statement.
If the count of id in ids is more than 50000, is it a good idea to use
invokeAll? Should we split it to smaller batches, such as 5000 each
batch?
As #Vaclav Stengl already wrote, the Executors have internal queues in which they enqueue and from which they process the tasks. So no need to worry about that one. You can also just call submit for each single task as soon as you have created it. This allows the first tasks to already start executing while you're still creating/preparing later tasks, which makes sense especially when each task creation takes comparatively long, but won't hurt in all other cases. Think about invokeAll as a convenience method for cases where you already have a collection of tasks. If you create the tasks successively yourself and you already have access to the ExecutorService to run them on, just submit() them a.s.a.p.

About batch spliting:
ExecutorService has inner queue for storing tasks. In your case ExecutorService executor = Executors.newFixedThreadPool(15); has 15 thread so max 15 tasks will run concurrently and others will be stored in queue. Size of queue can be parametrized. By default size will scale up to max int. InvokeAll call inside of method execute and this method will place tasks in to queue when all threads are working.
Imho there are 2 possible scenarios why CPU is not at 100%:
try to enlarge thread pool
thread is waiting for testServiceClient.callRemoteService() to
complete and meanwhile CPU is starwing

The problem of QPS maybe is the bandwidth limit or transaction execution(it will lock the table or row). So you just increase pool size is not worked. Additional, You can try to use the producer-consumer pattern.

Synchronisation object to ensure all tasks are completed

Which Java synchronisation object should I use to ensure an arbitrarily large number of tasks are completed? The constraints are that:
Each task takes a non-trivial amount of time to complete and it is appropriate to perform tasks in parallel.
There are too many tasks to fit into memory (i.e. I cannot put a Future for every task into a Collection and then call get on all the futures).
I do not know how many tasks there will be (i.e. I cannot use a CountDownLatch).
The ExecutorService may be shared so I cannot use awaitTermination( long, TimeUnit )
For example, with Grand Central Dispatch, I might do something like this:
let workQueue = dispatch_get_global_queue( QOS_CLASS_BACKGROUND, 0 )
let latch = dispatch_group_create()
let startTime = NSDate()
var itemsProcessed = 0
let countUpdateQueue = dispatch_queue_create( "countUpdateQueue", DISPATCH_QUEUE_SERIAL )
for item in fetchItems() // generator returns too many items to store in memory
{
dispatch_group_enter( latch )
dispatch_async( workQueue )
{
self.processItem( item ) // method takes a non-trivial amount of time to run
dispatch_async( countUpdateQueue )
{
itemsProcessed++
}
dispatch_group_leave( latch )
}
}
dispatch_group_wait( latch, DISPATCH_TIME_FOREVER )
let endTime = NSDate()
let totalTime = endTime.timeIntervalSinceDate( startTime )
print( "Processed \(itemsProcessed) items in \(totalTime) seconds." )
It produces output that looks like this (for 128 items): Processed 128 items in 1.846794962883 seconds.
I tried something similar with a Phaser:
final Executor executor = new ThreadPoolExecutor( 64, 64, 1l, MINUTES, new LinkedBlockingQueue<Runnable>( 8 ), new CallerRunsPolicy() );
final Phaser latch = new Phaser( 0 );
final long startTime = currentTimeMillis();
final AtomicInteger itemsProcessed = new AtomicInteger( 0 );
for( final String item : fetchItems() ) // iterator returns too many items to store in memory
{
latch.register();
final Runnable task = new Runnable() {
public void run() {
processItem( item ); // method takes a non-trivial amount of time to run
itemsProcessed.incrementAndGet();
latch.arrive();
}
};
executor.execute( task );
}
latch.awaitAdvance( 0 );
final long endTime = currentTimeMillis();
out.println( "Processed " + itemsProcessed.get() + " items in " + ( endTime - startTime ) / 1000.0 + " seconds." );
The tasks do not always complete before the last print statement and I might get output that looks like this (for 128 items): Processed 121 items in 5.296 seconds. Is the Phaser even the right object to use? The documentation indicates it only supports 65,535 parties so I would need to either batch the items to be processed or introduce some sort of Phaser tiering.

The problem with the Phaser usage in this example is that the CallerRunsPolicy allows a task to execute on the initiating thread. Thus, while the loop is still in progress, the number of arrived parties can equal the number of registered parties, causing the phase to increment. The solution is to initialise the Phaser with 1 party then, when the loop is finished, arrive and wait for the other parties to arrive. This ensures the phase does not increment to 1 until all the tasks are complete.
final Executor executor = new ThreadPoolExecutor( 64, 64, 1l, MINUTES, new LinkedBlockingQueue<Runnable>( 8 ), new CallerRunsPolicy() );
final Phaser latch = new Phaser( 1 );
final long startTime = currentTimeMillis();
final AtomicInteger itemsProcessed = new AtomicInteger( 0 );
for( final String item : fetchItems() ) // iterator returns too many items to store in memory
{
latch.register();
final Runnable task = new Runnable() {
public void run() {
processItem( item ); // method takes a non-trivial amount of time to run
itemsProcessed.incrementAndGet();
final int arrivalPhase = latch.arrive();
}
};
executor.execute( task );
}
latch.arriveAndAwaitAdvance();
final long endTime = currentTimeMillis();
out.println( "Processed " + itemsProcessed.get() + " items in " + ( endTime - startTime ) / 1000.0 + " seconds." );

"to ensure an arbitrarily large number of tasks are completed" - the simplest way is to maintain a counter of completed tasks, with blocking operation to wait that given number of task is reached. There is no such ready class, but it is easy to make one:
class EventCounter {
long counter=0;
synchronized void up () {
counter++;
notifyAll();
}
synchronized void ensure (long count) {
while (counter<count) wait();
}
}
"There are too many tasks to fit into memory" - so the process of submitting new tasks must be suspended when the number of running tasks is too high. The simplest way is to consider the number of running tasks as a resource and count it with a semaphore:
Semaphore runningTasksSema=new Semaphore(maxNumberOfRunningTasks);
EventCounter eventCounter =new EventCounter ();
for( final String item : fetchItems() ) {
final Runnable task = new Runnable() {
public void run() {
processItem( item );
runningTasksSema.release();
eventCounter.up();
}
};
runningTasksSema.aquire();
executor.execute(task);
}
When a thread wants to ensure some given number of tasks are completed, it invokes:
eventCounter.ensure(givenNumberOfFinishedTasks);
Asynchronous (nonblocking) versions of runningTasksSema.aquire() and eventCounter.ensure() operations can be designed, but they would be more complex.

In case if you're on java8 you can use CompletableFuture
java.util.concurrent.CompletableFuture.allOf(CompletableFuture<?>... cfs)
that will wait for results of all futures in passed array.

Parallel for loop with specific number of threads

What is the best way, how to implement parallel for loop with a specified number of threads?
Like this:
int maxThreads=5;
int curretnThreads=0;
for(int i = 0; i < 10000; i++){
if(currentThreads<maxThreads){
start thread......
}else{
wait...
}
}

I would first create a ForkJoinPool with a fixed number of threads:
final ForkJoinPool forkJoinPool = new ForkJoinPool(numThreads);
Now simply execute a parallel stream operation in a task:
forkJoinPool.submit(() -> {
IntStream.range(0, 10_000)
.parallel()
.forEach(i -> {
//do stuff
});
});
Obviously this example simply translates your code literally. I would recommend that you use the Stream API to its fullest rather than just loop [0, 10,000).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.