Reading multiple text files in different threads in Java - java

I was learning multithreading and wanted to read multiple text files in different threads simultaneously using different threads and get the result in single list. I have text files with First name and Last name of employees.
I have written following Employee class.
class Employee {
String first_name;
String last_name;
public Employee(String first_name, String last_name) {
super();
this.first_name = first_name;
this.last_name = last_name;
}
}
Class for reading files, with List to store the objects.
class FileReading {
List<Employee> employees = new ArrayList<Employee>();
public synchronized void readFile(String fileName) {
try {
FileReader fr = new FileReader(new File(fileName));
BufferedReader br = new BufferedReader(fr);
String line;
while ((line = br.readLine()) != null) {
String[] arr = line.split("\\s+");
employees.add(new Employee(arr[0], arr[1]));
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
Class with main method and threads.
public class TestMultithreading {
public static void main(String[] args) {
final FileReading fr = new FileReading();
Thread t1 = new Thread() {
public synchronized void run() {
fr.readFile("file1.txt");
}
};
Thread t2 = new Thread() {
public synchronized void run() {
fr.readFile("file2.txt");
}
};
Thread t3 = new Thread() {
public synchronized void run() {
fr.readFile("file3.txt");
}
};
t1.start();
t2.start();
t3.start();
try {
t1.join();
t2.join();
t3.join();
} catch (InterruptedException e1) {
e1.printStackTrace();
}
System.out.println(fr.employees.size());
}
}
Does using join() method ensure to finish the thread on which it was called and proceed to the other? If yes, what is the point of multithreading?
Is there any other way to ensure all threads run parallelly and collect result from them after they all finish in main() method?

All threads run in parallel, however your readFile method is synchronized, so only one thread can enter it at any time (per object). This is a good choice, since it prevents updating the ArrayList (which is not thread-safe) concurrently, but also means that at any time two threads will wait before entering the readFile method.
If you create three FileReading instances, your code will run in parallel.
The join() method performs another kind of synchronization: it blocks the calling thread until the run() method of the other thread exits. Hence you are certain that after the three joins in your code the three threads have already finished.

The other answer was very elucidative. Refer to it to understand how to solve your problem better. I will give you a recipe along with some explanation as well.
First, you don't need a FileReading class (bad abstraction BTW). Just Runnable instances (e.g. anonymous classes) which receive the filename to read data from and the destination list.
You pass these Runnables to Thread instance constructors and keep them in some list or set, so you can call thread.start() on each of them (i.e. with set.forEach()) and do the same to thread.join(). Nothing needs to be done within synchronized blocks or methods.
This way your main method will wait for all of those threads to finish, while still taking advantage of parallelism (there will be some waiting for slower files to finish but all the threads will still do their heavy work in parallel -- at least as far as the file system/storage allows it).
What you said about join() is true, but the possibility of threads to work in parallel before joins is also true. The point is that joins will only happen after each task is concluded. So the file tasks taking less time will all do their work in parallel. Slower tasks still will take advantage of parallelism as a whole while the main method is waiting on the concluded faster ones and next will be the slower ones until all of them have concluded and the main method is finally allowed to go on.
It's like baking a cake, you can do tasks in parallel for a while but all will have to join into a single recipient which goes in the oven in the end.
Second, it is better to create an atomically-insertable List (see for instance Collections.synchronizedList(new ArrayList<>()) or more modern syntax so you can pass it to the Runnables and let them populate it concurrently while still preventing running conditions. This is where synchronized code is needed, and it will already be provided internally in the created list.
Lastly, I don't think you should create one numbered Thread reference for each single file, thread1, thread2, etc. You should have a list of files and create the Threads on demand while traversing it, then storing the Threads in the mentioned set or list for referencing them all at once later as mentioned.

Related

Java - How do I safely stop a thread in a Web App from the GUI?

Is there a way to safely and immediately stop the execution of a Thread in Java? Especially, if the logic inside the run() method of the Runnable implementation executes only a single iteration and does not regularly check for any flag that tells it to stop?
I am building a Web Application, using which a user can translate the contents of an entire document from one language to another.
Assuming the documents are extra-large, and subsequently assuming each translation is going to take a long time (say 20-25 minutes), my application creates a separate Thread for each translation that is initiated by its users. A user can see a list of active translations and decide to stop a particular translation job if he/she wishes so.
This is my Translator.java
public class Translator {
public void translate(File file, String sourceLanguage, String targetLanguage) {
//Translation happens here
//.......
//Translation ends and a new File is created.
}
}
I have created a TranslatorRunnable class which implements the Runnable interface as follows:
public class TranslatorRunnable implements Runnable {
private File document;
private String sourceLanguage;
private String targetLanguage;
public TranslatorRunnable(File document, String sourceLanguage, String targetLanguage) {
this.document = document;
this.sourceLanguage = sourceLanguage;
this.targetLanguage = targetLanguage;
}
public void run() {
// TODO Auto-generated method stub
Translator translator = new Translator();
translator.translate(this.document, this.sourceLanguage, this.targetLanguage);
System.out.println("Translator thread is finished.");
}
}
I'm creating the thread for translating a document from an outer class like this:
TranslatorRunnable tRunnable = new TranslatorRunnable(document, "ENGLISH", "FRENCH");
Thread t = new Thread(tRunnable);
t.start();
Now my problem is how do I stop a translation process (essentially a Thread) when the user clicks on "Stop" in the GUI?
I have read a few posts on StackOverflow as well as on other sites, which tell me to have a volatile boolean flag inside the Runnable implementation, which I should check on regularly from inside the run() method and decide when to stop. See this post
This doesn't work for me as the run() method is just calling the Translator.translate() method, which itself is going to take a long time. I have no option here.
The next thing I read is to use ExecutorService and use its shutDownAll() method. But even here, I'd have to handle InterruptedException somewhere regularly within my code. This, is again out of the option. Referred this documentation of the ExecutorService class.
I know I cannot use Thread.stop() as it is deprecated and may cause issues with objects that are commonly used by all threads.
What options do I have?
Is my requirement really feasible without substantial changes to my design? If yes, please tell me how.
If it is absolutely necessary for me to change the design, could anyone tell me what is the best approach I can take?
Thanks,
Sriram
Is there a way to safely and immediately stop the execution of a Thread in Java?
No. each thread is reponsible to periodically check if it has been interrupted to exit as soon as possible
if (Thread.currentThread().isInterrupted() ) {
// release resources. finish quickly what it was doing
}
if you want a more responsive application, you have to change the logic (for example divide each job in smaller batches) so each thread does this checking more often than every 20-25 minutes
If you are the one that created the Translator class what's stopping you from adding some kind of value inside the function that is checked periodically and if needed stops reading the lines from file something like this
public static List<String> readFile(String filename)
{
List<String> records = new ArrayList<>();
try
{
BufferedReader reader = new BufferedReader(new FileReader(filename));
String line;
while ((line = reader.readLine()) != null)
{
String[] split = line.split("\\s+");
records.addAll(Arrays.asList(split));
if (needsToStop) {
break; //Or throw exception
}
}
reader.close();
return records;
}
catch (Exception e)
{
System.err.format("Exception occurred trying to read '%s'.", filename);
e.printStackTrace();
return null;
}
}

Multiple threads working off the same list of strings, in java?

I'm trying to figure out the best way to have multiple threads working from the same list of strings. For example, say I have a list of words, and I want multiple threads to work on printing out each word on this list.
Here is what I came up with. The thread uses a while loop, and while the iterator has next, it prints out and removes it from the list.
import java.util.*;
public class ThreadsExample {
static Iterator it;
public static void main(String[] args) throws Exception {
ArrayList<String> list = new ArrayList<>();
list.add("comet");
list.add("planet");
list.add("moon");
list.add("star");
list.add("asteroid");
list.add("rocket");
list.add("spaceship");
list.add("solar");
list.add("quasar");
list.add("blackhole");
it = list.iterator();
//launch three threads
RunIt rit = new RunIt();
rit.runit();
rit.runit();
rit.runit();
}
}
class RunIt implements Runnable {
public void run()
{
while (ThreadsExample.it.hasNext()) {
//Print out and remove string from the list
System.out.println(ThreadsExample.it.next());
ThreadsExample.it.remove();
}
}
public void runit() {
Thread thread = new Thread(new RunIt());
thread.start();
}
}
This seems to work, although I get some Exception in thread "Thread-2" Exception in thread "Thread-0" java.lang.IllegalStateException errors during the run:
Exception in thread "Thread-1" Exception in thread "Thread-0"
java.lang.IllegalStateException at
java.util.ArrayList$Itr.remove(ArrayList.java:864) at
RunIt.run(ThreadsExample.java:44) at
java.lang.Thread.run(Thread.java:745) java.lang.IllegalStateException
at java.util.ArrayList$Itr.remove(ArrayList.java:864) at
RunIt.run(ThreadsExample.java:44) at
java.lang.Thread.run(Thread.java:745)
Am I doing this correctly or is there a better way to have multiple threads working on the same pool of strings?
A better way to do this is to use a concurrent queue. The Queue interface is designed to hold elements in a structure prior to processing them.
final Queue<String> queue = new ConcurrentLinkedQueue<String>();
queue.offer("asteroid");
ExecutorService executorService = Executors.newFixedThreadPool(4);
executorService.execute(new Runnable() {
public void run() {
System.out.println(queue.poll());
}
});
executorService.shutdown();
Try creating the list as a synchronized list using List.synchronizedList
Update your code like this:
ArrayList<String> list = Collections.synchronizedList(new ArrayList<>());
Am I doing this correctly or is there a better way to have multiple threads working on the same pool of strings?
You are not doing it correctly. Your code is not properly synchronized, and therefore its behavior is not well defined. There are a great number of ways you could approach the general problem you present, but one way the issues in your particular code could be fixed would be to change RunIt.run() to properly synchronize:
public void run()
{
while (true) {
synchronized(ThreadsExample.it) {
if (ThreadsExample.it.hasNext()) {
//Print out and remove string from the list
System.out.println(ThreadsExample.it.next());
ThreadsExample.it.remove();
} else {
break;
}
}
}
}
Note here that the hasNext() check, retrieval of the next element, and removal of that element are all handled within the same synchronized block to ensure mutual consistency of these operations. On the other hand, the scope of that block is contained within the loop, so that different threads executing the loop concurrently each get a chance to execute.
Note, too, that although in this case all the threads synchronize on the Iterator object, that's basically just a convenience (for me). As long as they all synchronize on the same object, it doesn't matter so much which object that is.

Where do i have to use synchronized?

I have done some research and could not find a solution to this problem.
From this topic Synchronization, When to or not to use? i understand i could use synchronized, but doing so doesn't solve the problem.
The case is that i have a method in which a Thread is used to create an ArrayList. In that same Thread another method is called after a BufferedReader has finished reading a file and the lines are being added to the first List.
In the second method the first list is being used to create the second List. When all that is done, the first method uses the second list.
This is somewhat the code i use, if there is something not clear please ask and i will try to provide the info needed.
public synchronized void theBaseList() {
Thread t = new Thread() {
#override
public void run() {
try(
while((line = br.readLine()) != null) {
firstList.add(line):
}
}
nextMethod();
currentObject = (Object[]) secondList.get(0); // throws an exception
}
}
};
t.start();
public synchronized void nextMethod() {
Thread t1 = new Thread(){
double objectListSize = calculateObjectListLength(firstList.size());
#override
public void run() {
try {
// create Objects
secondList.add(objects);
}
}
};
t1.start();
}
When i use a Thread in nextMethod() to create a new list of Objects from the items in the first list, i get an ArrayIndexOutOfBoundsException saying
Exception in thread "Thread-4" java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
I avoided this by not using a Thread in the second method and all works fine.
If i do use 2 Threads and make both methods synchronized, it still throws the exception.
Is it possible or should i just settle by not using a Thread in the second method? I thought synchronized was for dealing with this sort of problems. I don't understand why it doesn't work.
Let's say your methods are defined in a class named Sample and you've created an instance mySample. This appears to be what your code is doing:
main thread calls mySample.theBaseList() and synchronizes by locking on mySample.
theBaseList() defines thread1 and starts it.
theBaseList() exits scope, thus unlocking on mySample.
thread1 reads in the lines of a file and adds them to list1 (these operations are not synchronized)
thread1 calls mySample.nextMethod()
mySample.nextMethod() synchronizes by locking on mySample
nextMethod() defines thread2 and starts it.
nextMethod() exits scope, thus unlocking on mySample.
* thread2 sets up list2 (these operations are not synchronized)
* thread1, having returned from nextMethod() reads from list2 (these operations are not synchronized)
The last two operations are the cause of your race condition.
In your case, using synchronized methods is perhaps too coarse grained. A better option may be to synchronize on the object on which both threads operate, secondList.
nextMethod();
synchronized(secondList) {
currentObject = (Object[]) secondList.get(0); // should no longer throw an exception
}
synchronized(secondList) {
// create Objects
secondList.add(objects);
}
EDIT:
synchronized(secondList) {
nextMethod();
secondList.wait();
currentObject = (Object[]) secondList.get(0); // should no longer throw an exception
}
synchronized(secondList) {
// create Objects
secondList.add(objects);
secondList.notifyAll();
}

Java - How To Synchronize 2 Threads On 1 List?

How can I synchronize 2 threads to handle data in a list ?
thread A is adding / changing items in a list (writing to the list)
thread B is displaying the items (only reading the list)
I would like to "notify" thread B when it can display the list. In the time of reading the list it must not be changed by thread A. When thread B is done reading, thread A can start changing the list again.
My guesses go to
synchronized(obj)
list.wait() + list.notify()
Threads aren't invoking each other. They run concurrent all the time.
You could put all changes in Runnables and put them in a queue that Thread A executes in order. After each job, A must generate a snapshot of the modified list and submit it to Thread B. You could use Executors for that.
General concept (as I see it in your case) would be as follows.
1) Create an instance of List that you're planning to work with.
2) Write 2 classes corresponding to your thread A and thread B that both implement Runnable and take List as their constructor parameter.
3) Synchronize these 2 classes on list instance:
// method in class that adds
public void add() {
synchronized(list) {
// perform addition ...
list.notify();
}
}
// method in class that reads
public void read() throws InterruptedException {
synchronized(list) {
while (list.isEmpty())
list.wait();
// process data ...
}
}
4) Create 2 threads with argumens corresponding to instances of these 2 classes and start them.
Reader and writer locks are your friends here.
•thread A is adding / changing items in a list (writing to the list)
... so it can use the write lock ...
•thread B is displaying the items (only reading the list)
... so it can use the read lock.
Let's assume that you're using something straight forward for your wait/notify (for example, the built-in Object methods) to block the read and display thread. At that point, your code looks something like this:
/** This is the read/write lock that both threads can see */
private ReentrantReadWriteLock lock = new ReentrantReadWriteLock();
/** This method is called by thread A (the writer / modifier) */
public void add() {
try {
// Only one writer at a time allowed
lock.writeLock().lock();
// Insert code here: Add to the list
} finally {
// Unlock in the finally block to ensure that lock is released
lock.writeLock().unlock();
}
// Notify anyone who's waiting for data
list.notify();
}
/** This method is called by thread B (the reader / displayer) */
public void read() throws InterruptedException {
try {
// As many readers as you like at a time
lock.readLock().lock();
// Insert code here: read from the list
} finally {
// Unlock in the finally block to ensure that lock is released
lock.readLock().unlock();
}
// Wait for new data
list.wait();
}
To make things even more convenient, you can get rid of the notify/wait messaging by using a blocking data structure: e.g., one of the BlockingQueues. In that case, you don't write any notification at all. The reader blocks waiting for new data. When the writer adds data to the queue, the reader unblocks, drains the new data to process, does its thing and then blocks again.
I tried concurrency packages suggested here or here and it works well. The threads lock each other out:
final Lock lock = new ReentrantLock(true);
// thread A
lock.lock();
// write to list
lock.unlock();
// thread B
lock.lock();
// read from list
lock.unlock();
Not sure if they can execute precisely one after another and I didn't get the notify feature. But that doesn't hurt my application.

How to make a thread limit in Java

Let's say I have 1000 files to read and because of some limits, I want to read maximum 5 files in parallel. And, as soon as one of them is finished, I want a new one starts.
I have a main function who have the list of the files and I try changing a counter whenever one thread is finished. but it doesn't works!
Any suggestion?
The following is the main function loop
for (final File filename : folder.listFiles()) {
Object lock1 = new Object();
new myThread(filename, lock1).start();
counter++;
while (counter > 5);
}
Spawning threads like this is not the way to go. Use an ExecutorService and specify the pool to be 5. Put all the files in something like a BlockingQueue or another thread-safe collection and all the executing ones can just poll() it at will.
public class ThreadReader {
public static void main(String[] args) {
File f = null;//folder
final BlockingQueue<File> queue = new ArrayBlockingQueue<File>(1000);
for(File kid : f.listFiles()){
queue.add(kid);
}
ExecutorService pool = Executors.newFixedThreadPool(5);
for(int i = 1; i <= 5; i++){
Runnable r = new Runnable(){
public void run() {
File workFile = null;
while((workFile = queue.poll()) != null){
//work on the file.
}
}
};
pool.execute(r);
}
}
}
You can use an ExecutorService as a thread pool AND a queue.
ExecutorService pool = Executors.newFixedThreadPool(5);
File f = new File(args[0]);
for (final File kid : f.listFiles()) {
pool.execute(new Runnable() {
#Override
public void run() {
process(kid);
}
});
}
pool.shutdown();
// wait for them to finish for up to one minute.
pool.awaitTermination(1, TimeUnit.MINUTES);
The approach in Kylar's answer is the correct one. Use the executor classes provided by the Java class libraries rather than implementing thread pooling yourself from scratch (badly).
But I thought it might be useful to discuss the code in your question and why it doesn't work. (I've filled in some of the parts that you left out as best I can ...)
public class MyThread extends Thread {
private static int counter;
public MyThread(String fileName, Object lock) {
// Save parameters in instance variables
}
public void run() {
// Do stuff with instance variables
counter--;
}
public static void main(String[] args) {
// ...
for (final File filename : folder.listFiles()) {
Object lock1 = new Object();
new MyThread(filename, lock1).start();
counter++;
while (counter > 5);
}
// ...
}
}
OK, so what is wrong with this? Why doesn't it work?
Well the first problem is that in main you are reading and writing counter without doing any synchronization. I assume that it is also being updated by the worker threads - the code makes no sense otherwise. So that means that there is a good chance that the main threads won't see the result of the updates made by the child threads. In other words, while (counter > 5); could be an infinite loop. (In fact, this is pretty likely. The JIT compiler is allowed to generate code in which the counter > 5 simply tests the value of counter left in a register after the previous counter++; statement.
The second problem is that your while (counter > 5); loop is incredibly wasteful of resources. You are telling the JVM to poll a variable ... and it will do this potentially BILLIONS of times a second ... running one processor (core) flat out. You shouldn't do that. If you are going to implement this kind of stuff using low-level primitives, you should use Java's Object.wait() and Object.notify() methods; e.g. the main thread waits, and each worker thread notifies.
Whatever method you are using to create a new Thread, increment a global counter, add a conditional statement around the thread creation that if the limit has been reached then don't create a new thread, maybe push the files onto a queue (a list?) and then you could add another conditional statement, after a thread is created, if there are items in the queue, to process those items first.

Categories

Resources