I need to make multiple RestTemplate calls for each of the Id in a List<Ids>. What is the best way to perform this?
I have used parallelStream(). Below code snippet is a similar scenario.
List<Integers> employeeIds;
employeeIds.parallelStream().
.map(empId -> employeeService.fetchEmployeedetails(empId))
.collect(Collectos.toList());
employeeService.fetchEmployeedetails is a kind of restCall which will fetch all the employeeDetails.
Is there any other way to tune the performance?
.parallelStream() does not guarantee multi threading since it will create only threads equals to number of cores you have. To really enforce multiple threads doing this simultaneously, you need to use .parallelStream() with ForkJoinPool.
High Performance using ForkJoinPool
List<Employee> employees = new ArrayList();
ForkJoinPool forkJoinPool = new ForkJoinPool(50);
try {
forkJoinPool.submit(() -> employeeIds.parallelStream().forEach(empId -> {
Employee em = employeeService.fetchEmployeedetails(empId);
synchronized(employees) {
employees.add(em);
}
})).get();
} catch (Exception e) {
e.printStackTrace();
throw new BusinessRuleException(ErrorCodeEnum.E002501.value(), e.getMessage());
} finally {
if (forkJoinPool != null) {
forkJoinPool.shutdown(); // always remember to shutdown the pool
}
}
This will ensure that parallelStream() creates max 50 threads instead of depending on number of cores of your system. Make sure you do not forget to shutdown() of the pool in finally block. and also do not forget .get(); which triggers execution of the thread.
ArrayList is not thread-safe to modify but synchronized() will provide that safety. Since the adding to the list is quite minor operation, it will not hold the thread long either.
You will see tremendous performance improvement with this approach.
How to choose optimal number for threads
Too many threads would cause context switch and slow down the process. Too less threads keep the system underutilized. The optimum thread count is often around 10 times the number of cores you have. Also depends on time you are going to spend within the thread. I generally pass an environment variable, so that i can tune the number 50 when I want. But I have reached 50 after quite a bit of experiment on a quad core instance that we have.
#Value("#{systemEnvironment['parallelism'] ?: 50}")
protected int parallelism;
and then use ForkJoinPool forkJoinPool = new ForkJoinPool(getParallelism());
Related
I am writing a command-line application in Java 8. There's a part that involves some computation, and I believe it could benefit from running in parallel using multiple threads. However, I have not much experience in writing multi-threaded applications, so I hope you could steer me in the right direction how should I design the parallel part of my code.
For simplicity, let's pretend the method in question receives a relatively big array of longs, and it should return a Set containing only prime numbers:
public final static boolean checkIfNumberIsPrime(long number) {
// algorithm implementation, not important here
// ...
}
// a single-threaded version
public Set<Long> extractPrimeNumbers(long[] inputArray) {
Set<Long> result = new HashSet<>();
for (long number : inputArray) {
if (checkIfNumberIsPrime(number)) {
result.add(number);
}
}
return result;
}
Now, I would like to refactor method extractPrimeNumbers() in such way that it would be executed by four threads in parallel, and when all of them are finished, return the result. Off the top of my head, I have the following questions:
Which approach would be more suitable for the task: ExecutorService or Fork/Join? (each element of inputArray[] is completely independent and they can be processed in any order whatsoever)
Assuming there are 1 million elements in inputArray[], should I "ask" thread #1 to process all indexes 0..249999, thread #2 - 250000..499999, thread #3 - 500000..749999 and thread #4 - 750000..999999? Or should I rather treat each element of inputArray[] as a separate task to be queued and then executed by an applicable worker thread?
If a prime number is detected, it should be added to `Set result, therefore it needs to be thread-safe (synchronized). So, perhaps it would be better if each thread maintained its own, local result-set, and only when it is finished, it would transfer its contents to the global result, in one go?
Is Spliterator of any use here? Should they be used to partition inputArray[] somehow?
Parallel stream
Use none of these. Parallel streams are going to be enough to deal with this problem much more straightforwardly than any of the alternatives you list.
return Arrays.parallelStream(inputArray)
.filter(n -> checkIfNumberIsPrime(n))
.boxed()
.collect(Collectors.toSet());
For more info, see The Java™ Tutorials > Aggregate Operations > Parallelism.
Introduction
I'm currently developing a program in which I use Java.util.Collection.parallelStream(), and wondering if it's possible to make it more Multi-threaded.
Several small map
I was wondering if using multiple map might allow the Java.util.Collection.parallelStream() to distribute the tasks better:
List<InsertOneModel<Document>> bulkWrites = puzzles.parallelStream()
.map(gson::toJson)
.map(Document::parse)
.map(InsertOneModel::new)
.toList();
Single big map
For example a better distribution than:
List<InsertOneModel<Document>> bulkWrites = puzzles.parallelStream()
.map(puzzle -> new InsertOneModel<>(Document.parse(gson.toJson(puzzle))))
.toList();
Question
Is there one of the solutions that is more suitable for Java.util.Collection.parallelStream(), or the two have no big difference?
I looked into the Stream source code. The result of a map operation is just fed into the next operation. So there is almost no difference between one big map() call or several small map() calls.
And for the map() operation a parallel Stream makes no difference at all. Meaning each input object will be processed until the end by the same Thread in any case.
Also note: A parallel Stream only splits up the work if the operation chain allows it and there is enough data to process. So for a small Collection or a Collection that allows no random access, a parallel Stream behaves like a sequential Stream.
I don't think it will do any better if you chain it with multiple maps. In case your code is not very complex I would prefer to use a single big map.
To understand this we have to check the code inside the map function. link
public final <R> Stream<R> map(Function<? super P_OUT, ? extends R> mapper) {
Objects.requireNonNull(mapper);
return new StatelessOp<P_OUT, R>(this, StreamShape.REFERENCE,
StreamOpFlag.NOT_SORTED | StreamOpFlag.NOT_DISTINCT) {
#Override
Sink<P_OUT> opWrapSink(int flags, Sink<R> sink) {
return new Sink.ChainedReference<P_OUT, R>(sink) {
#Override
public void accept(P_OUT u) {
downstream.accept(mapper.apply(u));
}
};
}
};
}
As you can see a lot many things happen behind the scenes. Multiple objects are created and multiple methods are called. Hence, for each chained map function call all these are repeated.
Now coming back to ParallelStreams, they work on the concept of Parallelism .
Streams Documentation
A parallel stream is a stream that splits its elements into multiple chunks, processing each chunk with a different thread. Thus, you can automatically partition the workload of a given operation on all the cores of your multicore processor and keep all of them equally busy.
Parallel streams internally use the default ForkJoinPool, which by default has as many threads as you have processors, as returned by Runtime.getRuntime().availableProcessors(). But you can change the size of this pool using the system property java.util.concurrent.ForkJoinPool.common.parallelism.
ParallelStream calls spliterator() on the collection object which returns a Spliterator implementation that provides the logic of splitting a task. Every source or collection has their own spliterator implementations. Using these spliterators, parallel stream splits the task as long as possible and finally when the task becomes too small it executes it sequentially and merges partial results from all the sub tasks.
So I would prefer parallelStream when
I have huge amount of data to process at a time
I have multiple cores to process the data
Performance issues with the existing implementation
I already don't have multiple threaded process running, as it will add to the complexity.
Performance Implications
Overhead : Sometimes when dataset is small converting a sequential stream into a parallel one results in worse performance. The overhead of managing threads, sources and results is a more expensive operation than doing the actual work.
Splitting: Arrays can split cheaply and evenly, while LinkedList has none of these properties. TreeMap and HashSet split better than LinkedList but not as well as arrays.
Merging:The merge operation is really cheap for some operations, such as reduction and addition, but merge operations like grouping to sets or maps can be quite expensive.
Conclusion: A large amount of data and many computations done per element indicate that parallelism could be a good option.
The three steps (toJson/parse/new) have to be executed sequentially, so all you're effectively doing is comparing s.map(g.compose(f)) and s.map(f).map(g). By virtue of being a monad, Java Streams are functors, and the 2nd functor law states that, in essence, s.map(g.compose(f)) == s.map(f).map(g), meaning that the two alternative ways of expressing the computation will produce identical results. From a performance standpoint the difference between the two is likely to be minimal.
However, in general you should be careful using Collection.parallelStream. It uses the common forkJoinPool, essentially a fixed pool of threads shared across the entire JVM. The size of the pool is determined by the number of cores on the host. The problem with using the common pool is that other threads in the same process may also be using it at the same time as your code. This can lead to your code randomly and inexplicably slowing down - if another part of the code has temporarily exhausted the common thread pool, for example.
More preferable is to create your own ExecutorService by using one of the creator methods on Executors, and then submit your tasks to that.
private static final ExecutorService EX_SVC = Executors.newFixedThreadPool(16);
public static List<InsertOneModel<Document>> process(Stream<Puzzle> puzzles) throws InterruptedException {
final Collection<Callable<InsertOneModel<Document>>> callables =
puzzles.map(puzzle ->
(Callable<InsertOneModel<Document>>)
() -> new InsertOneModel<>(Document.parse(gson.toJson(puzzle)))
).collect(Collectors.toList());
return EX_SVC.invokeAll(callables).stream()
.map(fut -> {
try {
return fut.get();
} catch (ExecutionException|InterruptedException ex) {
throw new RuntimeException(ex);
}
}).collect(Collectors.toList());
}
I doubt that there is much different in performance, but even if you proved it did have quicker performance I would still prefer to see and use the first style in code I had to maintain.
The first multi-map style is easier for others to understand, it is easier to maintain and easier to debug - for example adding peek stages for any stage of the processing chain.
List<InsertOneModel<Document>> bulkWrites = puzzles.parallelStream()
.map(gson::toJson)
// easy to make changes for debug, moving peek up/down
// .peek(System.out::println)
.map(Document::parse)
// easy to filter:
// .filter(this::somecondition)
.map(InsertOneModel::new)
.toList();
If your requirements change - such as needing to filter the output, or capture the intermediate data by splitting to 2 collections, the first approach beats second every time.
Consider the following code:
AtomicInteger counter1 = new AtomicInteger();
AtomicInteger counter2 = new AtomicInteger();
Flux<Object> source = Flux.generate(emitter -> {
emitter.next("item");
});
Executor executor1 = Executors.newFixedThreadPool(32);
Executor executor2 = Executors.newFixedThreadPool(32);
Flux<String> flux1 = Flux.merge(source).concatMap(item -> Mono.fromCallable(() -> {
Thread.sleep(1);
return "1_" + counter1.incrementAndGet();
}).subscribeOn(Schedulers.fromExecutor(executor1)));
Flux<String> flux2 = Flux.merge(source).concatMap(item -> Mono.fromCallable(() -> {
Thread.sleep(100);
return "2_" + counter2.incrementAndGet();
}).subscribeOn(Schedulers.fromExecutor(executor2)));
Flux.merge(flux1, flux2).subscribe(System.out::println);
You can see that one publisher is 100x faster than the other one. Still, when running the code it seems that all data is printed, but there's a huge gap between the two publishers which increases overtime.
What's interesting to note is that when changing the numbers so executer2 will have 1024 threads, and executer1 will have only 1 thread, then still we see a gap that is getting larger and larger overtime.
My expectation was that after tweaking the thread-pools accordingly the publishers will get balanced.
I'd like to achieve a balance between publishers (relative to the thread-pool sizes and processing time)
What would happen if I waited long enough? In other words, is a back-pressure can occur? (Which by default I guess it's a runtime exception, right?)
I don't want to drop items nor want to have a runtime exception. Instead, as I mentioned, I'd like the system to get balanced with respect to the resources it has and the processing times - Does the code above promise that?
Thanks!
Your Flux objects in this example are not ParallelFlux objects, so they'll only ever use one thread.
It doesn't matter if you create a scheduler that's capable of handling thousands of threads, and pass that to one of the Flux objects - they'll just sit there going unused, which is exactly what's happening in this example. There's no backpressure, and it won't result in an exception - it's just going as fast as it can using one thread.
If you want to make sure that the Flux takes full advantage of the 1024 threads available to it, then you need to call .parallel(1024):
ParallelFlux<String> flux1 = Flux.merge(source).parallel(1).concatMap(item -> Mono.fromCallable(() -> {
Thread.sleep(1);
return "1_" + counter1.incrementAndGet();
}).subscribeOn(Schedulers.fromExecutor(executor1)));
ParallelFlux<String> flux2 = Flux.merge(source).parallel(1024).concatMap(item -> Mono.fromCallable(() -> {
Thread.sleep(100);
return "2_" + counter2.incrementAndGet();
}).subscribeOn(Schedulers.fromExecutor(executor1)));
If you do that to your code, then you start to see results much closer to what you seem to be expecting, with 2_ sailing past 1_ despite the fact it's sleeping for 100 times as long:
...
2_17075
2_17076
1_863
1_864
2_17077
1_865
2_17078
2_17079
...
However, a word of warning:
I'd like to achieve a balance between publishers (relative to the thread-pool sizes and processing time)
You can't pick numbers here to balance the outputs, at least not reliably or in any meaningful way - the thread scheduling will be completely arbitrary. If you want to do that, then you could use this variant of the subscribe method, allowing you to explicitly call request() on the subscription consumer. This then allows you to provide backpressure by only requesting as many elements as you're prepared to deal with.
I've a map of key-value and iterating over keys, and calling service and based on the response, I am adding all the response to some uberList
How can I execute the different operations concurrently? Will changing stream() to parallelStream() do the trick? Does it synchronize when it adds to uberList?
The idea is to minimize the response time.
List<MyClass> uberList = new LinkedList<>();
Map<String, List<MyOtherClass>> map = new HashMap();
//Populate map
map.entrySet().stream().filter(s -> s.getValue().size() > 0 && s.getValue().values().size() > 0).forEach(
y -> {
// Do stuff
if(noError) {
uberList.add(MyClass3);
}
}
}
//Do stuff on uberList
How can I execute the different operations concurrently?
One thread can do one task at a time. If you want to do multiple operations concurrently, you have to offwork to other threads.
You can either creating new Thread or using ExecutorService to manage thread pool, queue the task and execute task for you.
Will changing stream() to parallelStream() do the trick?
Yes it does. Internally, parallelStream() use the ForkJoinPool.commonPool() to run tasks for you. But keep in mind that the parallelStream() has no guarantee about if the returned stream is paralleled (but for now, the current implementation return a paralleled one)
Does it synchronize when it adds to uberList?
It's up to you to do the synchronization part in forEach pipeline. Normally you do not want to call collection.add() inside forEach to create collection. Instead you should use .map().collect(toX()) methods. It frees you from synchronizatin part:
It does not required to know about your local variable (in this case uberlist. And it will not modify it on execution, help reduce a lot of strange bugs caused of concurrency
You can freely change the type of collection in .collect() part. It give you more control over the result type.
It does not require thread-safe or synchronization on given collection when using with parallel stream. Because "multiple intermediate results may be instantiated, populated, and merged so as to maintain isolation of mutable data structures" (Read more about this here)
So what you want is to execute multiple similar service call at the same time and collect your result into a list.
You can do it simply by parallel stream:
uberList = map.entrySet().stream()
.parallel() // Use .stream().parallel() to force parallism. The .parallelStream() does not guarantee that the returned stream is parallel stream
.filter(yourCondition)
.map(e -> yourService.methodCall(e))
.collect(Collectors.toList());
Pretty cool, isn't it?
But as I stated, the default parallel stream use ForkJoinPool.commonPool() for thread queueing and executing.
The bad part is if your yourService.methodCall(e) do heavy IO stuff (like HTTP call, even db call...) or long running task then it may exhaust the pool, other incoming tasks will queued forever to wait for execution.
So typically all other tasks depend on this common pool (not only your own yourService.methodCall(e), but all other parallel stream) will be slow down due to queueing time.
To solve this problem, you can force execute parallelism on your own fork-join pool:
ForkJoinPool forkJoinPool = new ForkJoinPool(4); // Typically set it to Runtime.availableProcessors()
uberlist = forkJoinPool.submit(() -> {
return map.entrySet().stream()
.parallel() // Use .stream().parallel() to force parallism. The .parallelStream() does not guarantee that the returned stream is parallel stream
.filter(yourCondition)
.map(e -> yourService.methodCall(e))
.collect(Collectors.toList());
}).get();
You probably don't want to use parallelStream for concurrency, only for parallelism. (That is: use it for tasks where you want to use multiple physical processes efficiently on a task that's conceptually sequential, not for tasks where you want multiple things going on at the same time conceptually.)
In your case you would probably be better off using an ExecutorService, or more specifically com.google.common.util.concurrent.ListenableExecutorService from Google Guava (warning: I haven't tried to compile the below code, there may be syntax errors):
int MAX_NUMBER_OF_SIMULTANEOUS_REQUESTS = 100;
ListeningExecutorService myExecutor =
MoreExecutors.listeningDecorator(
Executors.newFixedThreadPool(MAX_NUMBER_OF_SIMULTANEOUS_REQUESTS));
List<ListenableFuture<Optional<MyClass>>> futures = new ArrayList<>();
for (Map.Entry<String, List<MyOtherClass>> entry : map.entrySet()) {
if (entry.getValue().size() > 0 && entry.getValue().values().size() > 0) {
futures.add(myExecutor.submit(() -> {
// Do stuff
if(noError) {
return Optional.of(MyClass3);
} else {
return Optional.empty();
}
}));
}
}
List<MyClass> uberList = Futures.successfulAsList(futures)
.get(1, TimeUnit.MINUTES /* adjust as necessary */)
.stream()
.filter(Optional::isPresent)
.map(Optional::get)
.collect(Collectors.toList());
The advantage of this code is that it allows you to explicitly specify that the tasks should all start at the "same time" (at least conceptually) and allows you to control your concurrency explicitly (how many simultaneous requests are allowed? What do we do if some of the tasks fail? How long are we willing to wait? etc). Parallel streams aren't really for that.
Parallel Stream will help in execution concurrently. But it is not recommended to do forEach loop and add element in outside list. If you do that, you have to make sure of synchnising external list. Better way of doing it is to use map and collect result into list. In this case, parallelStream takes care of synchronisation.
List<MyClass> uberList = map.entrySet().parallelStream().filter(s ->
s.getValue().size() > 0 && s.getValue().values().size() >
0).map(
y -> {
// Do stuff
return MyClass3;
}
}
.filter(t -> check no ertor condition)
.collect (Collectors.toList())
this is how i normally iterate a collection
for(Iterator iterator = collectionthing.iterator(); iterator.hasNext();){
I believe most of us doing this, I wonder is there any better approach than have to iterate sequentially? is there any java library..can I can make this parallel executed by multi-code cpu? =)
looking forward feedback from you all.
Java's multithreading is quite low level in this respect. The best you could do is something like this:
ExecutorService executor = Executors.newFixedThreadPool(10);
for (final Object item : collectionThingy) {
executor.submit(new Runnable() {
#Override
public void run() {
// do stuff with item
}
});
}
executor.shutdown();
executor.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
This is Java 6 code. If running on Java 5 drop the #Override annotation (it doesn't apply to objects implementing interfaces in java 5 but it does in Java 6).
What this does is it creates a task for each item in the collection. A thread pool (size 10) is created to run those tasks). You can replace that with anything you want. Lastly, the thread pool is shut down and the code blocks awaiting the finishing of all the tasks.
The last has at least one or two exceptions you will need to catch. At a guess, InterruptedException and ExecutionException.
In most cases, the added complexity wouldn't be worth the potential performance gain. However, if you needed to process a Collection in multiple threads, you could possibly use Executors to do this, which would run all the tasks in a pool of threads:
int numThreads = 4;
ExecutorService threadExecutor = Executors.newFixedThreadPool(numThreads);
for(Iterator iterator = collectionthing.iterator(); iterator.hasNext();){
Runnable runnable = new CollectionThingProcessor(iterator.next());
threadExecutor.execute(runnable);
}
As part of the fork-join framework JDK7 should (although not certain) have parallel arrays. This is designed to allow efficient implementation of certain operations across arrays on many-core machines. But just cutting the array into pieces and throwing it at a thread pool will also work.
Sorry, Java does not have this sort of language-level support for automatic parallelism, if you wish it, you will have to implement it yourself using libraries and threads.