Executors.newFixedThreadPool - issue - java

I have used CopyOnWriteArrayList collection object which holds 1000 URLs. each URL indicates a file.
I want to use Multithread pooling mechanism to download those URL files parallel.
Tried using below code :
CopyOnWriteArrayList<String> fileList = DataExtractor.getRefLinks();
ExecutorService threadPool = Executors.newFixedThreadPool(4);
CompletionService<String> pool = new ExecutorCompletionService<String>(
threadPool);
for (int i = 0; i < fileList.size() ; i++){
pool.submit(new StringTask(fileList));
}
This is hitting the same URL 4 times. Might have done something wrong. Could you please suggest where it went wrong ?
My requirement is to pick 4 URLs (threads) at a time and start downloading them parallel till all the URLs in the List finish downloading.
Thanks.

I don't know what StringTask is, but you seem to be passing the full list of URLs to it. Make the appropriate changes to only submit a single URL from the list
pool.submit(new StringTask(fileList.get(i)));
(Or use an iterator over the fileList, whichever is more appropriate for a CopyOnWriteArrayList.)
for (String url : fileList){
pool.submit(new StringTask(url));
}

Related

Optimizing method with list of 500k+ elements

I'm looking for some help since I don't know how to optimize a process.
I have to invoke a service that returns a list with more than 500K elements (I don't know why, these services belongs to the client), per each element of the list, I have to invoke 2 more services and then save some attributes in our database, this last step is not the problem, but the entire process took between 1 and 2 seconds per element, so with this time is going to take like more of 100 hours to complete the process.
My approach is the following, I have my main method, inside this method I get the large list, then I use a parallelStream to iterate in the elements of the list and then I use a CompletableFuture to call the method that invokes the 2 services mentioned above. I've tried changing the parallelStream to stream and for-each , tried to split the main list into smaller lists and many other things but I don't see a better performance, I think the problem is the invocation of those 2 services but I want to try luck asking here.
I'm using java 11, spring, and for the invocation of the services I'm using RestTemplate, and this is my code:
public void updateDiscount() {
//List with 500k elements
var relationshipList = relationshipService.getLargeList();
//CompletableFuture to make the async calls to the method above
relationshipList.parallelStream().forEach(level1 -> {
CompletableFuture.runAsync(() -> relationshipService.asyncDiscountSave(level1));
});
}
//Second class
#Async("nameOfThePool")
public void asyncDiscountSave(ElementOfList element) {
//Logic to create request
//.........
var responseClients = anotherClass.getClients(element.getGroup1()) //get the first response with restTemplate
var responseProducts = anotherClass.getProducts(element.getGroup2())//get the second response with restTemplate
for (var client : responseClients) {
for (var product : responseProducts) {
//Here we just save some attributes of these objects on our DB
}
}
}
Thanks for the help.
UPDATE:
For this particular case, the only improvement that I can do is to pass a thread pool to the completable future, the problem is the response time of the services that I need to invoke.
I decided to follow a second approach and it took like 5 hours to complete, compared with the first approach this is acceptable.
As you haven't defined an executor you are using the default pool. Adding an executor allow you to create many threads as you needed and the server resources can manage
public void updateDiscount() {
Executor executor = Executors.newFixedThreadPool( 100 );//Define the number according to server resources performance
//List with 500k elements
var relationshipList = relationshipService.getLargeList();
//CompletableFuture to make the async calls to the method above
relationshipList.parallelStream().forEach(level1 -> {
CompletableFuture.runAsync(() -> relationshipService.asyncDiscountSave(level1), executor);
});
}

How to open and gather all the results from a list into threads with CompletableFuture?

I have a list of Strings, and for each of them, I need to open a new thread and to gather all the information into a CompletableFuture.
This is my iteration:
for (String result: results) {
candidateInfos.add(getCandidatesInfo(result));
}
I am trying for the first time the implementation of threads and I would appreciate some help.
You can build Stream for each method call and can then collect the result into a list as follows.
Stream.Builder<Supplier<CanditateInfo>> streamBuilder = Stream.builder();
results.forEach(string-> streamBuilder.accept(() -> this.getCandidatesInfo(string)));
List<CanditateInfo> candidateInfos = streamBuilder.build().map(supplier -> CompletableFuture.supplyAsync(supplier, Executors.newFixedThreadPool(
results.size()))).collect(Collectors.toList()).stream().map(
CompletableFuture::join).collect(Collectors.toList());
Here I have used the separate Executor because by default, java use the common Fork and Join Pool which will block all other threads if the pool would have got filled. For more Info see http://fahdshariff.blogspot.in/2016/06/java-8-completablefuture-vs-parallel.html
Edit: Less syntax.
You can directly create a stream using a list or if you an array then using Arrays.stream instead of using Stream.Builder
List<CanditateInfo> candidateInfos = results.stream().map(s ->
CompletableFuture.supplyAsync(this.getCandidatesInfo(s), Executors.newFixedThreadPool(
results.size()))).map(CompletableFuture::join).collect(Collectors.toList());

Create worker thread which performs specific task in the background

As per My Project,
Data has been fetched from database through a query,
There is an Iterator on result set and data has been added continuously to this result set.
By iterating over Iterator object results are added to ArrayList.
Once we got all the entries (more than 200000) then writing it to a file.
But as it is using more heap space of jvm ,I need to use a worker thread which runs in back ground and writes the data to the file.
As I am new to multi threading ,
I thought of using Executor service by creating fixed thread pool of 1 thread and whenever entries reaches the count of 50000 ,then submit those entries to executor to append them to file.
please suggest me if this approach is fine or do I need to follow any other approach.
I don't think you need a ThreadPool in order to handle single thread. You can do it by creating a single thread(pseudo code):
List<Entry> list = new ArraList<Entry>(); // class member that will hold the entries from Result set. I Assume entry as `Entry` here
....
void addEntry(Entry entry){
list.add(entry);
if(list.size() >= 20000){
//assign current list to a temp list inorder to reinitialze the list for next set of entries.
final List tempList = list;// tempList has 20000 entries!
list = new ArraList<Entry>();// list is reinitialized
// initiate a thread to write tempList to file
Thread t = new Thread(new Runnable(){
public void run() {
// stuff that will write `tempList` to file
}});
t.start();// start thread for writing.It will be run in background and
//the calling thread (from where you called `addEntry()` )will continue to add new entries to reinitialized list
}//end of if condition
}
Note: You mentioned about the heap space - even if we use thread it still uses heap.
Executing the process in a thread will free up the main thread to do other stuff.
It will not solve your heap space problem.
The heap space problem is caused by the number of entries returned from the query. You could change your query to return only a set number of rows. Process that and do the query again starting from the last row that you processed.
If you are using MS SQL, there is already an answer here on how to split your queries.
Row offset in SQL Server
You don't need to fetch all 20000 entries before writing them to file, unless they have some dependencies to each other.
In the simplest case you can write the entries directly to file as you're fetching them, making it unnecessary to have large amounts of heap.
An advanced version of that is the producer-consumer pattern, which you can then adjust to get different speed/memory use characteristics.
Created worker thread which process entries in the beckground.Starting this thread before fetching entries and stopping it when finished fetching all entries,
public class WriteToOutputFile implements Runnable{
BlockingQueue<entry> queue;
File file;
volatile boolean processentries;
WriteToOutputFile(BlockingQueue queue,File file){
this.queue = queue;
this.file = file;
this.processentries= tue;
}
#override
public void run(){
while(processentries && !queue.isEmpty()){
entry = queue.take();
if (entry== lastentry)break;
//logic to write entries to file
} }
public void stop(){
processentries = false;
queue.put(lastentry);
}
}

ExecutorService slows down , bogs down my pc

I am writing a parser for a website , it has many pages (I call them IndexPages) . Each page has a lot of links (about 300 to 400 links in an IndexPage). I use Java's ExecutorService to invoke 12 Callables concurrently in one IndexPage. Each Callable just fire a http request to one link and do some parsing and db storing actions. When first IndexPage finished , program progresses to second IndexPage , until no next IndexPage found.
When running , it seems OK , I can observe the threads working/scheduling well. Each link's parsing/storing just takes about 1 to 2 seconds.
But as time goes by , I observed each Callable(parsing/storing) takes longer and longer. Take this picture for example , sometimes it takes 10 or more seconds to finish a Callable (The green bar is RUNNING , the purple bar is WAITING). And my PC is bogging down , everything becomes sluggish.
This is my main algorithm :
ExecutorService executorService = Executors.newFixedThreadPool(12);
String indexUrl = // Set initial (1st page) IndexPage
while(true)
{
String nextPage = // parse next page in the indexUrl
Set<Callable<Void>> callables = new HashSet<>();
for(String url : getUrls(indexUrl))
{
Callable callable = new ParserCallable(url , … and some DAOs);
callables.add(callable);
}
try {
executorService.invokeAll(callables);
} catch (InterruptedException e) {
e.printStackTrace();
}
if (nextPage == null)
break;
indexUrl = nextPage;
} // true
executorService.shutdown();
The algorithm is simple and self-explanatory. I wonder what may cause such situation ? Anyway to prevent such performance degradation ?
The CPU/Memory/Heap shows reasonable usage.
Environments , FYI.
==================== updated ====================
I've change my implementations from ExecutorService to ForkJoinPool :
ForkJoinPool pool=new ForkJoinPool(12);
String indexUrl = // Set initial (1st page) IndexPage
while(true)
{
Set<Callable<Void>> callables = new HashSet<>();
for(String url : for(String url : getUrls(indexUrl)))
{
Callable callable = new ParserCallable(url , DAOs...);
callables.add(callable);
}
pool.invokeAll(callables);
String nextPage = // parse next page in this indexUrl
if (nextPage == null)
break;
indexUrl = nextPage;
} // true
It takes longer than ExecutorService's solution. ExecutorService takes about 2 hours to finish all pages , while ForkJoinPool takes 3 hours , and each Callable still takes longer and longer time to complete (from 1 sec to 5,6 or even 10 seconds). I don't mind it takes longer , I just hope it takes constant time (not longer and longer) to finish a job .
I am wondering if I create a lot of (non-thread-safe) GregorianCalendar , Date and SimpleDateFormat objects in the parser and cause some thread issue. But I didn't reuse these objects or pass them among threads. So I still cannot find the reason.
Based on the heap you have a memory issue. ExecutorService.invokeAll collects all of the results of the Callable instances into a List and returns that List when they all complete. You may want to consider simply calling ExecutorService.submit since you don't seem to care about the results of each Callable.
I can't see why there is need of Callable to parse your index pages since your 'Caller' method does not expect any result from ParserCallable. I could see you would need to bit Exception handling,but still it can be managed with Runnable.
When you use Callable.call() it would return FutureTask back ,which is never used.
You should be able to improve implementation by using Runnable which could avoid this additional operation
ExecutorService executor = Executors.newFixedThreadPool(12);
for(String url : getUrls(indexUrl)) {
Runnable worker = new ParserRunnable(url , … and some DAOs);
executor.execute(worker);
}
class ParserRunnable implements Runnable{
}
As I understand it, if you have 40 pages, each with ~300 URLs, you will create ~12,000 Callables? While that it probably not too many Callables, it is a lot of HTTPConnections and Database Connections.
I think you should try using one Callable per page. You'll still gain a ton by running them in parallel. I don't know what you are using for the HTTP request, but you might be able to reuse system resources there instead of opening and closing 12,000 of them.
And especially for the DB. You'll have just 40 connections. You might even be able to be super efficient by collecting the ~300 records locally, then using a batch update.

Deal with concurrent modification on List without having ConcurrentModificationException

I have a stateful EJB which calls an EJB stateless method of Web parsing pages.
Here is my stateful code :
#Override
public void parse() {
while(true) {
if(false == _activeMode) {
break;
}
for(String url : _urls){
if(false == _activeMode) {
break;
}
for(String prioritaryUrl : _prioritaryUrls) {
if(false == _activeMode)
break;
boursoramaStateless.parseUrl(prioritaryUrl);
}
boursoramaStateless.parseUrl(url);
}
}
}
No problem here.
I have some asynchronously call (with JMS) that add to my _urls variable (a List) some value. Goal is to parse new url inside my infinity loop.
I receive ConcurrentModificationException when I try to add new url in my List via JMS onMessage method but it seems to be working because this new url is parsed.
When I try to wrap a synchronized block :
while(true){
synchronized(_url){
// code...
}
}
My new url is never parsed, I expected to be parsed after a for() loop finished...
So my question is : how can I modify List when it's accessed inside a loop without having ConcurrentModificationException please ?
I just want 2 threads to modify some shared resource at same time without synchronized block...
You may want a CopyOnWriteArrayList.
For (String s : urls) uses an Iterator internally. The iterator checks for concurrent modification so that its behavior is well defined.
You can use a for(int i= ... loop. This way, no exception is thrown, and if elements are only added to the end of the List, you still get a consistent snapshot (the list as it exists at some time during the iteration). If the elements in the list are moved around, you may get missing entries.
If you want to use synchronised, you need to synchronise on both ends, but that way you lose concurrent reads.
If you want concurrent access AND consistent snapshots, you can use any of the collections in the java.util.concurrent package.
CopyOnWriteArrayList has already been mentioned. The other interesting are LinkedBlockingQueue and ArrayBlockingQueue (Collections but not Lists) but that's about all.
ok thank you guys.
So I made some modifications.
1) added iterator and leaving synchronized block (inside parse() function and around addUrl() function which add new url to my List)
--> it's work like a charm, no ConcurrentModificationException launched
2) added iterator and removed synchronized blocks
--> ConcurrentModificationException is still launched...
For now, I will read more about your answers and test your solutions.
Thank you again guys
First, forget about synchronized when running into Java EE container. It bothers the container to optimize threads utilization and will not work in clustered environment.
Second, it seems that your design is wrong. You should not update private field of the bean using JMS. This thing causes ConcurrentModificationException. You probably should modify your bean to retrieve the collection from database and your MDB to store the URL into the Database.
Other, easier for you solution is the following.
Retrieve the currently existing URLs and copy them to other collection. Then iterate over this collection. When the global collection is updated via JMS the update is not visible in the copied collection, so no exceptions will be thrown:
while(true) {
for (String url : copyUrls(_prioritaryUrls)) {
// deal with url
}
}
private List<String> copyUrls(List<Stirng> urls) {
return new ArrayList<String>(urls); // this create copy of the source list
}
//........
public void onMessage(Message message) {
_prioritaryUrls.add(((TextMessage)message).getText());
}

Categories

Resources