Ordered write to the same file with ExecutorService - java

I'm trying to instantiate tasks in a ExecutorService that need to write to file in order,so if there exist 33 tasks they need to write in order...
I've tried to use LinkedBlockingQueue and ReentrantLock to guarantee the order but by what I'm understanding in fair mode it unlock to the youngest of the x threads ExecutorService have created.
private final static Integer cores = Runtime.getRuntime().availableProcessors();
private final ReentrantLock lock = new ReentrantLock(false);
private final ExecutorService taskExecutor;
In constructor
taskExecutor = new ThreadPoolExecutor
(cores, cores, 1, TimeUnit.MINUTES, new LinkedBlockingQueue());
and so I process a quota of a input file peer task
if(s.isConverting()){
if(fileLineNumber%quote > 0) tasks = (fileLineNumber/quote)+1;
else tasks = (fileLineNumber/quote);
for(int i = 0 ; i<tasks || i<1 ; i++){
taskExecutor.execute(new ConversorProcessor(lock,this,i));
}
}
the task do
public void run() {
getFileQuote();
resetAccumulators();
process();
writeResult();
}
and my problem ocurre here:
private void writeResult() {
lock.lock();
try {
BufferedWriter bw = new BufferedWriter(new FileWriter("/tmp/conversion.txt",true));
Integer index = -1;
if(i == 0){
bw.write("ano dia tmin tmax tmed umid vento_vel rad prec\n");
}
while(index++ < getResult().size()-1){
bw.write(getResult().get(index) + "\n");
}
if(i == controller.getTasksNumber()){
bw.write(getResult().get(getResult().size()-1));
}
else{
bw.write(getResult().get(getResult().size()-1) + "\n");
}
bw.close();
} catch (IOException ex) {
Logger.getLogger(ConversorProcessor.class.getName()).log(Level.SEVERE, null, ex);
} finally {
lock.unlock();
}
}

It appears to me that everything needs to be done concurrently except the writing of the output to file, and this must be done in the object creation order.
I would take the code that writes to the file, the writeResult() method, out of your threading code, and instead create Futures that returned Strings that are created by the process() method, and load the Futures into an ArrayList<Future<String>>. You then could iterate through the ArrayList, in a for loop calling get() on each Future, and writing the result to your text file with your BufferedWriter or PrintWriter.

Related

Create x amount of thread, and then wait for then to finish

I'm writing a console application to read json files and then do some processing with them. I have 200k json files to process, so I'm creating a thread per file. But I would like to have only 30 active threads running. I don't know how to control it in Java.
This is the piece of code I have so far:
for (String jsonFile : result) {
final String jsonFilePath = jsonFile;
Thread thread = new Thread(new Runnable() {
String filePath = jsonFilePath;
#Override
public void run() {
// Do stuff here
}
});
thread.start();
}
result is an array with the path of 200k files. From this point, I'm not sure how to control it. I thought about a List<Thread> and then in each thread implements a notifier and when they finish just remove from the list. But then I would have to make the main thread sleep and then wake-up. Which feels weird.
How can I achieve this?
I would suggest to not create one thread per file. Threads are limited resources. Creating too many can lead to starvation or even program abortion.
From what information was provided, I would use a ThreadPoolExecutor. Constructing such an Executor with a limited amount of threads is quite simple thanks to Executors::newFixedSizeThreadPool:
ExecutorService service = Executors.newFixedSizeThreadPool(30);
Looking at the ExecutorService-interface, method <T> Future<T> submit​(Callable<T> task) might be fitting.
For this, some changes will be necessary. The tasks (i.e. what is currently a Runnable in the given implementation) must be converted to a Callable<T>, where T should be substituted with the return-type. The Future<T> returned should then be collected into a list and waited upon on. When all Futures have completed, the result list can be constructed, e.g. through streaming.
With parallelStreams and ForkJoinPool maybe you can get a more straightforward code, plus, an easy way to collect the results of your files after processing. For parallel processing, I prefer to directly use Threads, as a last resort, only when parallelStream can't be used.
boolean doStuff( String file){
// do your magic here
System.out.println( "The file " + file + " has been processed." );
// return the status of the processed file
return true;
}
List<String> jsonFiles = new ArrayList<String>();
jsonFiles.add("file1");
jsonFiles.add("file2");
jsonFiles.add("file3");
...
jsonFiles.add("file200000");
ForkJoinPool forkJoinPool = null;
try {
final int parallelism = 30;
forkJoinPool = new ForkJoinPool(parallelism);
forkJoinPool.submit(() ->
jsonFiles.parallelStream()
.map( jsonFile -> doStuff( jsonFile) )
.collect(Collectors.toList()) // you can collect this to a List<Boolea> results
).get();
} catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
} finally {
if (forkJoinPool != null) {
forkJoinPool.shutdown();
}
}
Put your jobs (filenames) into a queue, start 30 threads to process them, then wait until all threads are done. For example:
static ConcurrentLinkedDeque<String> jobQueue = new ConcurrentLinkedDeque<String>();
private static class Worker implements Runnable {
int threadNumber;
public Worker(int threadNumber) {
this.threadNumber = threadNumber;
}
public void run() {
try {
System.out.println("Thread " + threadNumber + " started");
while (true) {
// get the next filename from job queue
String fileName;
try {
fileName = jobQueue.pop();
} catch (NoSuchElementException e) {
// The queue is empty, exit the loop
break;
}
System.out.println("Thread " + threadNumber + " processing file " + fileName);
Thread.sleep(1000); // so something useful here
System.out.println("Thread " + threadNumber + " finished file " + fileName);
}
System.out.println("Thread " + threadNumber + " finished");
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
public static void main(String[] args) throws InterruptedException {
// Create dummy filenames for testing:
for (int i = 1; i <= 200; i++) {
jobQueue.push("Testfile" + i + ".json");
}
System.out.println("Starting threads");
// Create 30 worker threads
List<Thread> workerThreads = new ArrayList<Thread>();
for (int i = 1; i <= 30; i++) {
Thread thread = new Thread(new Worker(i));
workerThreads.add(thread);
thread.start();
}
// Wait until the threads are all finished
for (Thread thread : workerThreads) {
thread.join();
}
System.out.println("Finished");
}
}

Multithreading help using ExecutorService in Java [duplicate]

This question already has an answer here:
ExecutorService Future::get very slow
(1 answer)
Closed 5 years ago.
I am trying to search a list of words and find the total count of all the words across multiple files.
My logic is to have separate threads for each file and get the count. Finally I can aggregate the total count got from each of the threads.
Say, I have 50 files each of 1MB. The performance does not improve when I am using multiple threads. My total execution time does not improve with FILE_THREAD_COUNT. I am getting almost the same execution time when my thread count is either 1 or 50.
Am I doing something wrong in using the executor service?
Here is my code.
public void searchText(List<File> filesInPath, Set<String> searchWords) {
try {
BlockingQueue<File> filesBlockingQueue = new ArrayBlockingQueue<>(filesInPath.size());
filesBlockingQueue.addAll(filesInPath);
ExecutorService executorService = Executors.newFixedThreadPool(FILE_THREAD_COUNT);
int totalWordCount = 0;
while (!filesBlockingQueue.isEmpty()) {
Callable<Integer> task = () -> {
int wordCount = 0;
try {
File file = filesBlockingQueue.take();
try (BufferedReader bufferedReader = new BufferedReader(new FileReader(file))) {
String currentLine;
while ((currentLine = bufferedReader.readLine()) != null) {
String[] words = currentLine.split("\\s+");
for (String word : words) {
for (String searchWord : searchWords) {
if (word.contains(searchWord)) {
wordCount++;
}
}
}
}
} catch (Exception e) {
// Handle error
}
} catch (Exception e) {
// Handle error
}
return wordCount;
};
totalWordCount += executorService.submit(task).get();
}
System.out.println("Final word count=" + totalWordCount);
executorService.shutdown();
} catch (Exception e) {
// Handle error
}
}
Yes, you're doing something wrong.
The problem is here:
executorService.submit(task).get()
Your code submits a task then waits for it to finish, which achieves nothing in parallel; the tasks run sequentially. And your BlockingQueue adds no value whatsoever.
The way to run tasks in parallel is to first submit all tasks, collect the Futures returned, then call get() on all of them. Like this:
List<Future<Integer>> futures = filesInPath.stream()
.map(<create your Callable>)
.map(executorService::submit)
.collect(toList());
for (Future future : futures)
totalWordCount += future.get();
}
You can actually do it in one stream, by going through the intermediate list (as above) but then immediately streaming that, but you have to wrap the call to Future#get in some code to catch the checked exception - I leave that as an exercise for the reader.

How to write in a file with threads?

How to write in a file with threads ? Each file should be 100 lines, each line length is 100 characters. This work must perform threads and I\O.
My code:
public class CustomThread extends Thread{
private Thread t;
private String threadName;
CustomThread(String threadName){
this.threadName = threadName;
}
public void run () {
if (t == null)
{
t = new Thread (this);
}
add(threadName);
}
public synchronized void add(String threadName){
File f = new File(threadName + ".txt");
if (!f.exists()) {
try {
f.createNewFile();
} catch (IOException e) {
e.printStackTrace();
System.out.println("File does not exists!");
}
}
FileWriter fw = null;
try {
fw = new FileWriter(f);
for (int i = 0; i < 100; i++) {
for (int j = 0; j < 100; j++) {
fw.write(threadName);
fw.write('\n');
}
}
} catch (IOException e) {
e.printStackTrace();
System.out.println("File does not exists!");
}
}
}
My code is correct ? I need to create file with 100 lines and 100 characters. Сharacter must depend on the file name. If I create a file named 1, and the name of the filling must be 1. Thanks.
Your code looks correct as per your requirement which is writing 100 lines and each line containing 100 characters. The assumption is, name of the thread will be single character, because your are writing threadName to the file. I have few closing suggestion to complete your implementation. They test it by yourself. If your find any issue, do comment.
To have each line 100 characters, you need to move new line characters statement to outer loop.
Once your finishing writing writing all the data to file, do flush() and close() the file, for saving it.
You are creating the file with threadName, You might want to add the starting path location for the file to be created.
Obviously you are missing main() method. Create object of the class and start() the thread.
You don't need to create a separate Thread instance, The run() method will be executed in a separate thread because you are extending Thread class.

Running threads in round robin fashion in java

I am new to Multithreading and synchronization in java. I am trying to achieve a task in which i am given 5 files, each file will be read by one particular thread. Every thread should read one line from file then forward execution to next thread and so on. When all 5 threads read the first line, then again start from thread 1 running line no. 2 of file 1 and so on.
Thread ReadThread1 = new Thread(new ReadFile(0));
Thread ReadThread2 = new Thread(new ReadFile(1));
Thread ReadThread3 = new Thread(new ReadFile(2));
Thread ReadThread4 = new Thread(new ReadFile(3));
Thread ReadThread5 = new Thread(new ReadFile(4));
// starting all the threads
ReadThread1.start();
ReadThread2.start();
ReadThread3.start();
ReadThread4.start();
ReadThread5.start();
and in ReadFile (which implements Runnable, in the run method, i am trying to synchronize on bufferreader object.
BufferedReader br = null;
String sCurrentLine;
String filename="Source/"+files[fileno];
br = new BufferedReader(new FileReader(filename));
synchronized(br)
{
while ((sCurrentLine = br.readLine()) != null) {
int f=fileno+1;
System.out.print("File No."+f);
System.out.println("-->"+sCurrentLine);
br.notifyAll();
// some thing needs to be dine here i guess
}}
Need Help
Though this is not an ideal scenario for using multi-threading but as this is assignment I am putting one solution that works. The threads will execute sequentially and there are few point to note:
Current thread cannot move ahead to read the line in the file until and unless its immediately previous thread is done as they are supposed to read in round-robin fashion.
After current thread is done reading the line it must notify the other thread else that thread will wait forever.
I have tested this code with some files in temp package and it was able to read the lines in round robin fashion. I believe Phaser can also be used to solve this problem.
public class FileReaderRoundRobinNew {
public Object[] locks;
private static class LinePrinterJob implements Runnable {
private final Object currentLock;
private final Object nextLock;
BufferedReader bufferedReader = null;
public LinePrinterJob(String fileToRead, Object currentLock, Object nextLock) {
this.currentLock = currentLock;
this.nextLock = nextLock;
try {
this.bufferedReader = new BufferedReader(new FileReader(fileToRead));
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
#Override
public void run() {
/*
* Few points to be noted:
* 1. Current thread cannot move ahead to read the line in the file until and unless its immediately previous thread is done as they are supposed to read in round-robin fashion.
* 2. After current thread is done reading the line it must notify the other thread else that thread will wait forever.
* */
String currentLine;
synchronized(currentLock) {
try {
while ( (currentLine = bufferedReader.readLine()) != null) {
try {
currentLock.wait();
System.out.println(currentLine);
}
catch(InterruptedException e) {}
synchronized(nextLock) {
nextLock.notify();
}
}
} catch (IOException e) {
e.printStackTrace();
}
}
synchronized(nextLock) {
nextLock.notify(); /// Ensures all threads exit at the end
}
}
}
public FileReaderRoundRobinNew(int numberOfFilesToRead) {
locks = new Object[numberOfFilesToRead];
int i;
String fileLocation = "src/temp/";
//Initialize lock instances in array.
for(i = 0; i < numberOfFilesToRead; ++i) locks[i] = new Object();
//Create threads
int j;
for(j=0; j<(numberOfFilesToRead-1); j++ ){
Thread linePrinterThread = new Thread(new LinePrinterJob(fileLocation + "Temp" + j,locks[j],locks[j+1]));
linePrinterThread.start();
}
Thread lastLinePrinterThread = new Thread(new LinePrinterJob(fileLocation + "Temp" + j,locks[numberOfFilesToRead-1],locks[0]));
lastLinePrinterThread.start();
}
public void startPrinting() {
synchronized (locks[0]) {
locks[0].notify();
}
}
public static void main(String[] args) {
FileReaderRoundRobinNew fileReaderRoundRobin = new FileReaderRoundRobinNew(4);
fileReaderRoundRobin.startPrinting();
}
}
If the only objective is to read the files in round-robin fashion and not strictly in same order then we can also use Phaser. In this case the order in which files are read is not always same, for example if we have four files (F1, F2, F3 and F4) then in first phase it can read them as F1-F2-F3-F4 but in next one it can read them as F2-F1-F4-F3. I am still putting this solution for sake of completion.
public class FileReaderRoundRobinUsingPhaser {
final List<Runnable> tasks = new ArrayList<>();
final int numberOfLinesToRead;
private static class LinePrinterJob implements Runnable {
private BufferedReader bufferedReader;
public LinePrinterJob(BufferedReader bufferedReader) {
this.bufferedReader = bufferedReader;
}
#Override
public void run() {
String currentLine;
try {
currentLine = bufferedReader.readLine();
System.out.println(currentLine);
} catch (IOException e) {
e.printStackTrace();
}
}
}
public FileReaderRoundRobinUsingPhaser(int numberOfFilesToRead, int numberOfLinesToRead) {
this.numberOfLinesToRead = numberOfLinesToRead;
String fileLocation = "src/temp/";
for(int j=0; j<(numberOfFilesToRead-1); j++ ){
try {
tasks.add(new LinePrinterJob(new BufferedReader(new FileReader(fileLocation + "Temp" + j))));
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
}
public void startPrinting( ) {
final Phaser phaser = new Phaser(1){
#Override
protected boolean onAdvance(int phase, int registeredParties) {
System.out.println("Phase Number: " + phase +" Registeres parties: " + getRegisteredParties() + " Arrived: " + getArrivedParties());
return ( phase >= numberOfLinesToRead || registeredParties == 0);
}
};
for(Runnable task : tasks) {
phaser.register();
new Thread(() -> {
do {
phaser.arriveAndAwaitAdvance();
task.run();
} while(!phaser.isTerminated());
}).start();
}
phaser.arriveAndDeregister();
}
public static void main(String[] args) {
FileReaderRoundRobinUsingPhaser fileReaderRoundRobin = new FileReaderRoundRobinUsingPhaser(4, 4);
fileReaderRoundRobin.startPrinting();
// Files will be accessed in round robin fashion but not exactly in same order always. For example it can read 4 files as 1234 then 1342 or 1243 etc.
}
}
The above example can be modified as per exact requirement. Here the constructor of FileReaderRoundRobinUsingPhaser takes the number of files and number of lines to read from each file. Also the boundary conditions need to be taken into consideration.
You are missing many parts of the puzzle:
you attempt to synchronize on an object local to each thread. This can have no effect and the JVM may even remove the whole locking operation;
you execute notifyAll without a matching wait;
the missing wait must be at the top of the run method, not at the bottom as you indicate.
Altogether, I'm afraid that fixing your code at this point is beyond the scope of one StackOverflow answer. My suggestion is to first familiarize yourself with the core concepts: the semantics of locks in Java, how they interoperate with wait and notify, and the precise semantics of those methods. An Oracle tutorial on the subject would be a nice start.

Read the 30Million user id's one by one from the big file

I am trying to read a very big file using Java. That big file will have data like this, meaning each line will have an user id.
149905320
1165665384
66969324
886633368
1145241312
286585320
1008665352
And in that big file there will be around 30Million user id's. Now I am trying to read all the user id's one by one from that big file only once. Meaning each user id should be selected only once from that big file. For example, if I have 30Million user id's then it should print 30 Million user id only once with the use of Multithreading code.
Below is the code I have which is a multithreaded code running with 10 threads but with the below program, I am not able to make sure that each user id is selected only once.
public class ReadingFile {
public static void main(String[] args) {
// create thread pool with given size
ExecutorService service = Executors.newFixedThreadPool(10);
for (int i = 0; i < 10; i++) {
service.submit(new FileTask());
}
}
}
class FileTask implements Runnable {
#Override
public void run() {
BufferedReader br = null;
try {
br = new BufferedReader(new FileReader("D:/abc.txt"));
String line;
while ((line = br.readLine()) != null) {
System.out.println(line);
//do things with line
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
Can anybody help me with this? What wrong I am doing? And what is the fastest way to do this?
You really can't improve on having one thread reading the file sequentially, assuming that you haven't done anything like stripe the file across multiple disks. With one thread, you do one seek and then one long sequential read; with multiple threads you're going to have the threads causing multiple seeks as each gains control of the disk head.
Edit: This is a way to parallelize the line processing while still using serial I/O to read the lines. It uses a BlockingQueue to communicate between threads; the FileTask adds lines to the queue, and the CPUTask reads them and processes them. This is a thread-safe data structure, so no need to add any synchronization to it. You're using put(E e) to add strings to the queue, so if the queue is full (it can hold up to 200 strings, as defined in the declaration in ReadingFile) the FileTask blocks until space frees up; likewise you're using take() to remove items from the queue, so the CPUTask will block until an item is available.
public class ReadingFile {
public static void main(String[] args) {
final int threadCount = 10;
// BlockingQueue with a capacity of 200
BlockingQueue<String> queue = new ArrayBlockingQueue<>(200);
// create thread pool with given size
ExecutorService service = Executors.newFixedThreadPool(threadCount);
for (int i = 0; i < (threadCount - 1); i++) {
service.submit(new CPUTask(queue));
}
// Wait til FileTask completes
service.submit(new FileTask(queue)).get();
service.shutdownNow(); // interrupt CPUTasks
// Wait til CPUTasks terminate
service.awaitTermination(365, TimeUnit.DAYS);
}
}
class FileTask implements Runnable {
private final BlockingQueue<String> queue;
public FileTask(BlockingQueue<String> queue) {
this.queue = queue;
}
#Override
public void run() {
BufferedReader br = null;
try {
br = new BufferedReader(new FileReader("D:/abc.txt"));
String line;
while ((line = br.readLine()) != null) {
// block if the queue is full
queue.put(line);
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
class CPUTask implements Runnable {
private final BlockingQueue<String> queue;
public CPUTask(BlockingQueue<String> queue) {
this.queue = queue;
}
#Override
public void run() {
String line;
while(true) {
try {
// block if the queue is empty
line = queue.take();
// do things with line
} catch (InterruptedException ex) {
break; // FileTask has completed
}
}
// poll() returns null if the queue is empty
while((line = queue.poll()) != null) {
// do things with line;
}
}
}
We are talking about an average of a 315 MB file with lines separated by new line. I presume this easily fits into memory. It is implied that there is no particular order in the user names that has to be conserved. So I would recommend the following algorithm:
Get the file length
Copy each a 10th of the file into a byte buffer (binary copy should be fast)
Start a thread for processing each of these buffers
Each thread processes all lines in his area except the first and last one.
Each thread must return the first and last partitial line in its data when done,
the “last” of each thread must be recombined with the “first” one of the one working on the next file block because you may have cut through a line. And these tokens must then be processed afterwards.
Fork Join API introduced in 1.7 is a great fit for this use case. Check out http://docs.oracle.com/javase/tutorial/essential/concurrency/forkjoin.html. If you search, you are going to find lots of examples out there.

Categories

Resources