Please Note: I am not "looking for teh codez" - just ideas for algorithms to solve this problem.
This IS a homework assignment. I thought I was in the home stretch, about to finish it out, but the last part has absolutely stumped me. Never have I been stuck like this. It has to do with threading in Java.
The Driver class reads a file, the first line indicates the number of threads, second line is a space delimited list of file names for each thread to read from. Each thread is numbered (0 - N), N being the total number of files. Each thread reads the file specified, and outputs to a file named t#_out.txt where # is the threads index.
After all of this is done the Driver thread must:
After all threads finish execution, the program Driver.java opens all
output files t#_out.txt, reads a line from each file, and writes the
line to an output file out.txt.
Example of the out.txt:
MyThread[0]: Line[1]: Something there is that doesn't love a wall,
MyThread[1]: Line[1]: HOG Butcher for the World,
MyThread[2]: Line[1]: I think that I shall never see
MyThread[0]: Line[2]: That sends the frozen-ground-swell under it,
MyThread[1]: Line[2]: Tool Maker, Stacker of Wheat,
MyThread[2]: Line[2]: A poem lovely as a tree.
MyThread[0]: Line[3]: And spills the upper boulders in the sun,
MyThread[1]: Line[3]: Player with Railroads and the Nation's Freight Handler;
MyThread[2]: Line[3]: A tree whose hungry mouth is prest
My problem is: What kind of loop structure could I setup to do this? Read a line from t1_out.txt, write to out.txt, read line from t2_out.txt, write to out.txt, read line from tN_out.txt, write to out.txt? How do I know when one file has reached the end?
Ideas:
Use a while(!done) loop to continue looping until each scanner is done. Keep track of an array of booleans indicating whether or not the Scanner is done reading its file. The Scanners would be in an array as well. The problem with this is how do I tell when ALL are done, to finish my infinite loop? In each iteration see if booleans[i] is done and if not then done = false? No good.
Just read every files lines into its own String[] array. Then figure out a loop to alternate the writing to the out.txt. Problem with this is what happens when I hit array index out of bounds? Also this is not in the specs, it says to read a line, and write a line.
EDIT: The solution was to create an allFilesReachedEOF() method which has an initial boolean of true. It then loops through each one, and if ANY have another line to read, sets the return condition to false. This was my while loops condition: while (!allFilesReachedEOF()).
My problem was that I was trying to control the loop from within the loop. So if a file had another line it would continue, but if ANY file EOF'd, the loop would stop.
Thanks for the help!
You could do a while with a condition that not all the files have reached EOF. Then you iterate through all the files, and for those that haven't reached EOF, you read the next line and write it to your output file. As you go, you update your condition variable for the "while" loop.
Is this what you're looking to do?
It sounds like you could use a Queue to achieve this. Add each t#_out.txt's input to the Queue then implement a loop in which you read a line from the polled input and write it to your output. As long as the read line isn't EOF, re-add the input to the Queue. When the Queue is empty, break out from the loop.
Also I recommend a BufferedWriter for the output, which you flush() at the end so that the actual writing only occurs once.
Here are the main points:
Create a class that implements Runnable whose run() method does what you need one thread to do. It'll likely need fields for threadNumber and filename. The run method should make sure to close() the output streams of the output files
For each thread you need to create, instantiate one of your class (giving it the filename (and other) data it needs) and pass it into the constructor of Thread. Keep a reference to the Thread objects
Call the start() method on all the threads
Call the join() method on all the threads (join waits for the thread to finish)
Do your final Driver task of opening the output files
This can be done by using a do - while "exit condition" loop and an inner for loop for iterating through the output files. Set the exit condition to true before the start of the for loop, and reset it within the for loop if you get at least a line from any of the files.
Files that have reached eof will continue to be read, but will not return any lines. You can choose to print blank lines for these or just skip them.
Related
How can I read a file in Java using multithreading?
It doesn't matter if it's slower than using once, I have to do it.
So, for example, if there are 2 threads, the first reads the first line and, at the same time, the second reads the second line; then the first reads the third line and the second reads the fourth line and they continue reading in this way since the end of the file. How can I implement this in Java?
Just use a single BufferedReader that is shared between the threads, and synchronize on it when calling readLine().
It is completely pointless.
From the document, the function of the "available" method is:
returns an estimate of the number of bytes that can be read (or
skipped over) from this input stream without blocking by the next
invocation of a method for this input stream.
So, how long does it take for this method to return a result. If I have a file with 10000 words, and I want to go through each word by checking like this:
while (steam.available() > 0) {
steam.read(); // suppose that this read a word
}
So after each reading the first word, is the method going to go through the next 9999 words? And, after the second word, do it check the next 9998 words?
From the document, it say that the method "estimate the number of bytes", then how does it do that?
As it states, the purpose is to tell you how many bytes you can read without the read call blocking. This is mostly useful for network connections, where data is filling the buffer and you might want to process as much of that data without the read call blocking, waiting for more data.
It's not commonly used and doesn't tell you anything about how much is GOING to be available over all. For example, iv seen it used to test the length of a message, which is of course wrong, because only a part of the message may have been received at that point.
You are best to just read the whole stream until EOF is reached. available() will only be of use if you want to process as much data as you can without blocking. it says "estimate" because more data could be coming in all the time and you may have been able to read more bytes than available() returned at the exact moment you called it.
In practice, you need all the data from a stream, or you stop when you reach a certain value. But this is a separate issue to how quickly it streams in from where ever it's coming from. Wether it blocks or not - you will neither know nor care. :)
I would like to create a FileProcessor which will read the file sequentially. This class will be accessed by multiple threads and each thread will read the line and store it in data structure after processing the line. So far i have imagine the steps to solve the design problem as this:
class FileProcessor with tasks: A method getline which will Open file.
BufferedReader will read each line and return that line.
2nd class FileThread which implements runnable. In run method call getline() on FileProcessor object. this thread will also process the line and store it in data structure.
3rd Class will crate multiple threads of FileThreads.
I know this design has faults which i can imagine.
How each thread will know if the EOF is reached so that they will stop.
in the FileProcessor should i read the file normally with while loop detecting EOF or just read line by line and have threa take care of the EOF condition.
Please help.
No, the design isn't good.
The class that instantiates the Executor pool, reads the file, and doles out the work is the only one that needs to know about EOF. The threads that take their portion and do the work don't have to know.
Don't use raw Threads; leverage the new concurrency classes like Excecutor.
I'm doing some java programming in Netbeans 7.3 and I have written a program that contains a complex while loop. It reads a file in, does some calculations, then prints an output file with some changes.
Something goes slightly wrong at the point where the while loop goes through the 158th line of the text (158th cycle of the while loop). Nothing appears to be wrong with the file itself at this point.
Am I being stupid when putting a breakpoint on the while loop and continuously clicking "Step Out" 158 times to see what's going on? Is there an easier way to do this with java? Like running the loop until a variable reaches a certain value?
adding debugcode like
if (iterationCount == 158) {
int dummy = 13; // <= put breakpoint here
}
is a practical alternative to "conditional breakpoints".
now i am working on a job about data format transform.
there is a large file, like 10GB, the current solution i implemented is read this file line by line, transform the format for each line, then output to a output file. i found the transform process is a bottle neck. so i am trying to do this in a concurrent way.
Each line is a complete unit, has nothing to do with other lines. Some lines may be discarded as some specific value in the line do not meet the demand.
now i have two plans:
one thread read data line by line from input file, then put the line into a queue, several threads get lines from the queue, transform the format, then put the line into a output queue, finally an output thread reads lines from the output queue and writes to a output file.
several threads currently read data from different part of the input file, then process the line and output to a file through a output queue or file lock.
would you guys please give me some advise ? i really appreciate it.
thanks in advance!
I would go for the first option ... reading data from a file in small pieces normally is slower than reading the whole file at once (depending on file caches/buffering/read ahead etc).
You also might need to think about a way to create the output file (acquiring all lines from the different processes, possibly in the correct order if needed).
Solution 1 makes sense.
This would also map nicely and simply to Java's Executor framework. Your main thread reads lines and submits each line to an Executor or ExecutorService.
It gets more complicated if you must keep order intact, though.