Multithreading help using ExecutorService in Java [duplicate] - java

This question already has an answer here:
ExecutorService Future::get very slow
(1 answer)
Closed 5 years ago.
I am trying to search a list of words and find the total count of all the words across multiple files.
My logic is to have separate threads for each file and get the count. Finally I can aggregate the total count got from each of the threads.
Say, I have 50 files each of 1MB. The performance does not improve when I am using multiple threads. My total execution time does not improve with FILE_THREAD_COUNT. I am getting almost the same execution time when my thread count is either 1 or 50.
Am I doing something wrong in using the executor service?
Here is my code.
public void searchText(List<File> filesInPath, Set<String> searchWords) {
try {
BlockingQueue<File> filesBlockingQueue = new ArrayBlockingQueue<>(filesInPath.size());
filesBlockingQueue.addAll(filesInPath);
ExecutorService executorService = Executors.newFixedThreadPool(FILE_THREAD_COUNT);
int totalWordCount = 0;
while (!filesBlockingQueue.isEmpty()) {
Callable<Integer> task = () -> {
int wordCount = 0;
try {
File file = filesBlockingQueue.take();
try (BufferedReader bufferedReader = new BufferedReader(new FileReader(file))) {
String currentLine;
while ((currentLine = bufferedReader.readLine()) != null) {
String[] words = currentLine.split("\\s+");
for (String word : words) {
for (String searchWord : searchWords) {
if (word.contains(searchWord)) {
wordCount++;
}
}
}
}
} catch (Exception e) {
// Handle error
}
} catch (Exception e) {
// Handle error
}
return wordCount;
};
totalWordCount += executorService.submit(task).get();
}
System.out.println("Final word count=" + totalWordCount);
executorService.shutdown();
} catch (Exception e) {
// Handle error
}
}

Yes, you're doing something wrong.
The problem is here:
executorService.submit(task).get()
Your code submits a task then waits for it to finish, which achieves nothing in parallel; the tasks run sequentially. And your BlockingQueue adds no value whatsoever.
The way to run tasks in parallel is to first submit all tasks, collect the Futures returned, then call get() on all of them. Like this:
List<Future<Integer>> futures = filesInPath.stream()
.map(<create your Callable>)
.map(executorService::submit)
.collect(toList());
for (Future future : futures)
totalWordCount += future.get();
}
You can actually do it in one stream, by going through the intermediate list (as above) but then immediately streaming that, but you have to wrap the call to Future#get in some code to catch the checked exception - I leave that as an exercise for the reader.

Related

How to get the execution results of ExecutorService without blocking the current code path?

I have a service which adds a bunch of requests to Callables and then prints the results of the executions. Currently the service request is blocked until I print all the Future results from the execution. However I want to return 200 to the requestor and run these requests in parallel without blocking the request. How can I achieve this? Below is my code.
Below is my code to run parallel code.
public void runParallelFunctions(Callable<Map<String, String>> invokerTask) {
List<Callable<Map<String, String>>> myTasks = new ArrayList<>();
for (int i = 0; i < invocationCount; i++) {
myTasks.add(invokerTask);
}
List<Future<Map<String, String>>> results = null;
try {
results = executorService.invokeAll(myTasks);
} catch (InterruptedException e) {
}
this.printResultsFromParallelInvocations(results);
}
Below is how I print the results from the Futures.
private void printResultsFromParallelInvocations(List<Future<Map<String, String>>> results) {
results.forEach(executionResults -> {
try {
executionResults.get().entrySet().forEach(entry -> {
LOGGER.info(entry.getKey() + ": " + entry.getValue());
});
} catch (InterruptedException e) {
} catch (ExecutionException e) {
}
});
}
Below is how I'm invoking the above methods when someone places a request to the service.
String documentToBeIndexed = GSON.toJson(indexDocument);
int documentId = indexMyDocument(documentToBeIndexed);
createAdditionalCandidatesForFuture(someInput);
return true;
In the above code, I call the createAdditionalCandidatesForFuture and then return true. But the code still waits for the printResultsFromParallelInvocations method to complete. How can I make the code return after invoking createAdditionalCandidatesForFuture without waiting for the results to print? Do I have to print the results using another executor thread or is there another way? Any help would be much appreciated
The answer is CompletableFuture.
Updated runParallelFunctions:
public void runParallelFunctions(Callable<Map<String, String>> invokerTask) {
// write a wrapper to handle exception outside CompletableFuture
Supplier<Map<String, String>> taskSupplier = () -> {
try {
// some task that takes a long time
Thread.sleep(4000);
return invokerTask.call();
} catch (Exception e) {
System.out.println(e);
}
// return default value on error
return new HashMap<>();
};
for (int i = 0; i < 5; i++) {
CompletableFuture.supplyAsync(taskSupplier, executorService)
.thenAccept(this::printResultsFromParallelInvocations);
}
// main thread immediately comes here after running through the loop
System.out.println("Doing other work....");
}
And, printResultsFromParallelInvocations may look like:
private void printResultsFromParallelInvocations(Map<String, String> result) {
result.forEach((key, value) -> System.out.println(key + ": " + value));
}
Output:
Doing other work....
// 4 secs wait
key:value
Calling get on a Future will block the thread until the task is completed, so yes, you will have to move the printing of the results to another thread/Executor service.
Another option is that each task prints its results upon completion, provided they are supplied with the necessary tools to do so (Access to the logger, etc). Or putting it in another way, each task is divided into two consecutive steps: execution and printing.

How to parallelize java source codes for large data with Threadpool

I try to parallelize some source code with ExecutorService and LinkedList in Java.
In the following source code, some process is done for each line of a text file.
When each line is relatively short, the following source code works well (all lines are read and processes are done).
However, when it is relatively long, the program stops reading a line without any errors.
Is there any capacity for LinkedList? Can I increase the capacity? Is there another proper way for parallelization?
ExecutorService threadPool = Executors.newFixedThreadPool(4);
Collection<Callable<Void>> processes = new LinkedList<Callable<Void>>();
int c = 0;
while ((string = br.readLine()) != null) {
final String str = string;
processes.add(new Callable<Void>() {
#Override
public Void call() {
// some process
return null;
}
});
}
try {
threadPool.invokeAll(processes);
} catch (InterruptedException e) {
throw new RuntimeException(e);
} finally {
System.out.println("parallel finishes");
threadPool.shutdown();
}

Ordered write to the same file with ExecutorService

I'm trying to instantiate tasks in a ExecutorService that need to write to file in order,so if there exist 33 tasks they need to write in order...
I've tried to use LinkedBlockingQueue and ReentrantLock to guarantee the order but by what I'm understanding in fair mode it unlock to the youngest of the x threads ExecutorService have created.
private final static Integer cores = Runtime.getRuntime().availableProcessors();
private final ReentrantLock lock = new ReentrantLock(false);
private final ExecutorService taskExecutor;
In constructor
taskExecutor = new ThreadPoolExecutor
(cores, cores, 1, TimeUnit.MINUTES, new LinkedBlockingQueue());
and so I process a quota of a input file peer task
if(s.isConverting()){
if(fileLineNumber%quote > 0) tasks = (fileLineNumber/quote)+1;
else tasks = (fileLineNumber/quote);
for(int i = 0 ; i<tasks || i<1 ; i++){
taskExecutor.execute(new ConversorProcessor(lock,this,i));
}
}
the task do
public void run() {
getFileQuote();
resetAccumulators();
process();
writeResult();
}
and my problem ocurre here:
private void writeResult() {
lock.lock();
try {
BufferedWriter bw = new BufferedWriter(new FileWriter("/tmp/conversion.txt",true));
Integer index = -1;
if(i == 0){
bw.write("ano dia tmin tmax tmed umid vento_vel rad prec\n");
}
while(index++ < getResult().size()-1){
bw.write(getResult().get(index) + "\n");
}
if(i == controller.getTasksNumber()){
bw.write(getResult().get(getResult().size()-1));
}
else{
bw.write(getResult().get(getResult().size()-1) + "\n");
}
bw.close();
} catch (IOException ex) {
Logger.getLogger(ConversorProcessor.class.getName()).log(Level.SEVERE, null, ex);
} finally {
lock.unlock();
}
}
It appears to me that everything needs to be done concurrently except the writing of the output to file, and this must be done in the object creation order.
I would take the code that writes to the file, the writeResult() method, out of your threading code, and instead create Futures that returned Strings that are created by the process() method, and load the Futures into an ArrayList<Future<String>>. You then could iterate through the ArrayList, in a for loop calling get() on each Future, and writing the result to your text file with your BufferedWriter or PrintWriter.

Read the 30Million user id's one by one from the big file

I am trying to read a very big file using Java. That big file will have data like this, meaning each line will have an user id.
149905320
1165665384
66969324
886633368
1145241312
286585320
1008665352
And in that big file there will be around 30Million user id's. Now I am trying to read all the user id's one by one from that big file only once. Meaning each user id should be selected only once from that big file. For example, if I have 30Million user id's then it should print 30 Million user id only once with the use of Multithreading code.
Below is the code I have which is a multithreaded code running with 10 threads but with the below program, I am not able to make sure that each user id is selected only once.
public class ReadingFile {
public static void main(String[] args) {
// create thread pool with given size
ExecutorService service = Executors.newFixedThreadPool(10);
for (int i = 0; i < 10; i++) {
service.submit(new FileTask());
}
}
}
class FileTask implements Runnable {
#Override
public void run() {
BufferedReader br = null;
try {
br = new BufferedReader(new FileReader("D:/abc.txt"));
String line;
while ((line = br.readLine()) != null) {
System.out.println(line);
//do things with line
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
Can anybody help me with this? What wrong I am doing? And what is the fastest way to do this?
You really can't improve on having one thread reading the file sequentially, assuming that you haven't done anything like stripe the file across multiple disks. With one thread, you do one seek and then one long sequential read; with multiple threads you're going to have the threads causing multiple seeks as each gains control of the disk head.
Edit: This is a way to parallelize the line processing while still using serial I/O to read the lines. It uses a BlockingQueue to communicate between threads; the FileTask adds lines to the queue, and the CPUTask reads them and processes them. This is a thread-safe data structure, so no need to add any synchronization to it. You're using put(E e) to add strings to the queue, so if the queue is full (it can hold up to 200 strings, as defined in the declaration in ReadingFile) the FileTask blocks until space frees up; likewise you're using take() to remove items from the queue, so the CPUTask will block until an item is available.
public class ReadingFile {
public static void main(String[] args) {
final int threadCount = 10;
// BlockingQueue with a capacity of 200
BlockingQueue<String> queue = new ArrayBlockingQueue<>(200);
// create thread pool with given size
ExecutorService service = Executors.newFixedThreadPool(threadCount);
for (int i = 0; i < (threadCount - 1); i++) {
service.submit(new CPUTask(queue));
}
// Wait til FileTask completes
service.submit(new FileTask(queue)).get();
service.shutdownNow(); // interrupt CPUTasks
// Wait til CPUTasks terminate
service.awaitTermination(365, TimeUnit.DAYS);
}
}
class FileTask implements Runnable {
private final BlockingQueue<String> queue;
public FileTask(BlockingQueue<String> queue) {
this.queue = queue;
}
#Override
public void run() {
BufferedReader br = null;
try {
br = new BufferedReader(new FileReader("D:/abc.txt"));
String line;
while ((line = br.readLine()) != null) {
// block if the queue is full
queue.put(line);
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
class CPUTask implements Runnable {
private final BlockingQueue<String> queue;
public CPUTask(BlockingQueue<String> queue) {
this.queue = queue;
}
#Override
public void run() {
String line;
while(true) {
try {
// block if the queue is empty
line = queue.take();
// do things with line
} catch (InterruptedException ex) {
break; // FileTask has completed
}
}
// poll() returns null if the queue is empty
while((line = queue.poll()) != null) {
// do things with line;
}
}
}
We are talking about an average of a 315 MB file with lines separated by new line. I presume this easily fits into memory. It is implied that there is no particular order in the user names that has to be conserved. So I would recommend the following algorithm:
Get the file length
Copy each a 10th of the file into a byte buffer (binary copy should be fast)
Start a thread for processing each of these buffers
Each thread processes all lines in his area except the first and last one.
Each thread must return the first and last partitial line in its data when done,
the “last” of each thread must be recombined with the “first” one of the one working on the next file block because you may have cut through a line. And these tokens must then be processed afterwards.
Fork Join API introduced in 1.7 is a great fit for this use case. Check out http://docs.oracle.com/javase/tutorial/essential/concurrency/forkjoin.html. If you search, you are going to find lots of examples out there.

Processing log files, distribute work among worker threads, to find a simple sum

I want to distribute work among threads. Load parts of a log file and then distribute the work to process parts of the file.
In my simple example, I wrote 800,000 lines of data and had a number in each line. And then I sum the number.
When I run this example, I get totals that are slightly off. Do you see in this threading code where threads might not complete properly and hence won't total the numbers?
public void process() {
final String d = FILE;
FileInputStream stream = null;
try {
stream = new FileInputStream(d);
final BufferedReader reader = new BufferedReader(new InputStreamReader(stream));
String data = "";
do {
final Stack<List<String>> allWork = new Stack<List<String>>();
final Stack<ParserWorkerAtLineThread> threadPool = new Stack<ParserWorkerAtLineThread>();
do {
if (data != null) {
final List<String> currentWorkToDo = new ArrayList<String>();
do {
data = reader.readLine();
if (data != null) {
currentWorkToDo.add(data);
} // End of the if //
} while(data != null && (currentWorkToDo.size() < thresholdLinesToAdd));
// Hand out future work
allWork.push(currentWorkToDo);
} // End of the if //
} while(data != null && (allWork.size() < numberOfThreadsAllowedInPool));
// Process the lines from the work to do //
// Hand out the work
for (final List<String> theCurrentTaskWork : allWork) {
final ParserWorkerAtLineThread t = new ParserWorkerAtLineThread();
t.data = theCurrentTaskWork;
threadPool.push(t);
}
for (final Thread workerAboutToDoWork : threadPool) {
workerAboutToDoWork.start();
System.out.println(" -> Starting my work... My name is : " + workerAboutToDoWork.getName());
} // End of the for //
// Waiting on threads to finish //
System.out.println("Waiting for all work to complete ... ");
for (final Thread waiting : threadPool) {
waiting.join();
} // End of the for //
System.out.println("Done waiting ... ");
} while(data != null); // End of outer parse file loop //
} catch(Exception e) {
e.printStackTrace();
} finally {
if (stream != null) {
try {
stream.close();
} catch (final IOException e) {
e.printStackTrace();
}
} // End of the stream //
} // End of the try - catch finally //
}
While you're at it, why not use a bounded BlockingQueue (ArrayBlockingQueue) of size thresholdLinesToAdd. This would be your producer code where you read the lines and use the method put on that queue to block until space is available.
As Chris mentionned before, use the Executors.newFixedThreadPool() to submit your work items on it. Your consumers would call take() to block until an element is available.
This is not a map/reduce. If you wanted a map/reduce, you would need another queue in the mix where you would publish keys to it. As an example, if you were to count the number of INFO and DEBUG occurances in your logs, your mapper would queue the extracted words every time it encounters it. The reducer would dequeue the mapper's output and increment the counter of each words. The result of your reducer would the word count for DEBUG and INFO.

Categories

Resources