how to read files in multi process, multi threaded environment - java

I was successful in reading a file while using multi-process environment using file locking
and in case of multithreaded(singleprocess) i used a queue filled it with file names, opened a thread separately, read from it and then waited till the entire reading was over, after which i used to rename them. In this way i used to read files in multithreaded(in a batch).
Now, i want to read the files in a directory using both multiprocess and multithreads. I tried merging my two approaches but that didn't fare well. log showed a lot of files were showing FileNotFound exception(because their names were changed), some were never read (because thread died), sometimes locks were not released.
///////////////////////////////////////////////////////////////////////
//file filter inner class
class myfilter implements FileFilter{
#Override
public boolean accept(File pathname) {
// TODO Auto-generated method stub
Pattern pat = Pattern.compile("email[0-9]+$");
Matcher mat = pat.matcher(pathname.toString());
if(mat.find()) {
return true;
}
return false;
}
}
/////////////////////////////////////////////////////////////////////////
myfilter filter = new myfilter();
File alreadyread[] = new File[5];
Thread t[] = new Thread[5];
fileread filer[] = new fileread[5];
File file[] = directory.listFiles(filter);
FileChannel filechannel[] = new FileChannel[5];
FileLock lock[] = new FileLock[5];
tuple_json = new ArrayList();
//System.out.println("ayush");
while(true) {
//declare a queue
ConcurrentLinkedQueue filequeue = new ConcurrentLinkedQueue();
//addfilenames to queue and their renamed file names
try{
if(file.length!=0) {
//System.out.println(file.length);
for(int i=0;i<5 && i<file.length;i++) {
System.out.println("acquiring lock on file " + file[i].toString());
try{
filechannel[i] = new RandomAccessFile(file[i], "rw").getChannel();
lock[i] = filechannel[i].tryLock();
}
catch(Exception e) {
file[i] = null;
lock[i] = null;
System.out.println("cannot acquire lock");
}
if(lock[i]!=null){
System.out.println("lock acquired on file " + file[i].toString());
filequeue.add(file[i]);
alreadyread[i] = new File(file[i].toString() + "read");
System.out.println(file[i].toString() + "-----" + times);
}
else{
System.out.println("else condition of acquiring lock");
file[i] = null;
}
System.out.println("-----------------------------------");
}
//starting the thread to read the files
for(int i=0;i<5 && i<file.length && lock[i]!=null && file[i]!=null;i++){
filer[i] = new fileread(filequeue.toArray()[i].toString());
t[i] = new Thread(filer[i]);
System.out.println("starting a thread to read file" + file[i].toString());
t[i].start();
}
//read the text
for(int i=0;i<5 && i<file.length && lock[i]!=null && file[i]!=null;i++) {
try {
System.out.println("waiting to read " + file[i].toString() + " to be read completely");
t[i].join();
System.out.println(file[i] + " was read completetly");
//System.out.println(filer[i].getText());
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
//file has been read Now rename the file
for(int i=0;i<5 && i<file.length && lock[i]!=null && file[i]!=null;i++){
if(lock[i]!=null){
System.out.println("renaming file " + file[i].toString());
file[i].renameTo(alreadyread[i]);
System.out.println("releasing lock on file " + file[i].toString());
lock[i].release();
}
}
//rest of the processing
/////////////////////////////////////////////////////////////////////////////////////////////////////
Fileread class
class fileread implements Runnable{
//String loc = "/home/ayusun/workspace/Eclipse/fileread/bin";
String fileloc;
BufferedReader br;
String text = "";
public fileread(String filename) {
this.fileloc = filename;
}
#Override
public void run() {
try {
br = new BufferedReader(new FileReader(fileloc));
System.out.println("started reading file" + fileloc);
String currline;
while((( currline = br.readLine())!=null)){
if(text == "")
text += currline;
else
text += "\n" + currline;
}
System.out.println("Read" + fileloc + " completely");
br.close();
} catch ( IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
public String getText() {
return text;
}
}
I would like to know, if there is nay other approach that i can adopt.

If you want to create exclusive access to a file, you cannot use file locking, as on most OSes file locking is advisory, not mandatory.
I'd suggest creating a common lock directory for all your processes; in this lock directory, you would create a directory per file you want to lock, right before you open a file.
The big advantage is that directory creation, unlike file creation, is atomic; as such, you can use Files.createDirectory() (or File's .mkdir() if you still use Java6 but then don't forget to check the return code) to grab a lock on the files you read. If this fails, you know someone else is using the file.
Of course, when you're done with a file, don't forget to remove the lock directory matching this file... (in a finally block)
(note: with Java 7 you can use Files.newBufferedReader(); there is even Files.readAllLines())

If you need to process a large number of files using multiple threads, you should probably first distribute the specific files to each thread before it starts.
For example, if you only want to process files whose names start with email and are followed by some digits, you could create 10 threads. The first thread would look for files with names starting with email0, the second thread could handle email1, etc.
This of course would be efficient only if the numbers are evenly distributed.
Another way could be do have the main thread run through and collect all filenames to deal with. It could then divide the files across the number of available threads, and pass each thread an array of those file names.
There could be other ways of dividing the system load which are relevant to your situation.

Related

Impossible to get out of "gsReference.getFile(Objects.requireNonNull(localFile)).addOnSuccessListener"

I am trying to get a value from the first line of a csv file ( header excluded) store in Firebase Storage
Here is the code :
private String readFromCsv() {
StorageReference refCompteActifs = FirebaseStorage.getInstance().getReference().child("my_file").child("my_file.csv");
StorageReference gsReference = refCompteActifs.getStorage().getReferenceFromUrl("gs://test-8095e.appspot.com/my_file/my_filer.csv");
File localFile = null;
try {
localFile = File.createTempFile("my_file", ".csv");
} catch (IOException e) {
e.printStackTrace();
}
File finalLocalFile = localFile;
final String[] result = {null};
List<String> rows = new ArrayList<>();
gsReference.getFile(Objects.requireNonNull(localFile)).addOnSuccessListener(new OnSuccessListener<FileDownloadTask.TaskSnapshot>() {
#Override
public void onSuccess(FileDownloadTask.TaskSnapshot taskSnapshot) {
try {
CSVReader reader = new CSVReader(new FileReader("./data/user/0/com.example.test/cache/" + finalLocalFile.getName()), ',', '\'', 1);
String[] nextLine = null;
while ((nextLine = reader.readNext()) != null) {
System.out.println(nextLine[4] + "\n");
rows.add(nextLine[4]);
}
} catch (IOException e) {
e.printStackTrace();
}
for (int i = 0; i < rows.size(); i++) {
result[0] = rows.get(i);
}
}
}
System.out.println(result[0] + "\n");
return result[0];
}
The console never write "System.out.println(result[0] + "\n");" result[0] is affected inside the onlistener but I can't access it outside of it.
Thank you for your Help
That is the expected behavior. The getFile API is an asynchronous operation, which means that it executes in the background while the rest of your code continues to run. Then when the operation is complete, your onSuccess is called with the result.
This is easiest to see if you add some logging:
Log.i("File", "1. Starting to load file");
gsReference.getFile(Objects.requireNonNull(localFile)).addOnSuccessListener(new OnSuccessListener<FileDownloadTask.TaskSnapshot>() {
#Override
public void onSuccess(FileDownloadTask.TaskSnapshot taskSnapshot) {
Log.i("File", "2. Loaded file");
}
}
Log.i("File", "3. Started to load file");
If you run this code it outputs:
Starting to load file
Started to load file
Loaded file
This is probably not what you expected, but it is working by design - and it does completely explain why your System.out.println(result[0] + "\n"); doesn't show the file contents: the file hasn't been loaded yet.
This is an incredibly common problem, as most I/O and network APIs are asynchronous these days. The solution is always the same: any code that needs the data that is asynchronously loaded has to be inside the onSuccess handler, be called from there, or otherwise synchronized.
This means for example that you can't return the value from the file, as the return runs before the load has completed, and you'll instead want to pass a callback to your readFromCsv function, very similar to the OnSuccessListener.
For more on this, I recommend reading:
getContactsFromFirebase() method return an empty list
Setting Singleton property value in Firebase Listener
more questions on losing the asynchronous value outside of a callback

Reading Large file A, Search Records matching records from file B and write file C in java

I have two files assume its already sorted.
This is just example data, in real ill have around 30-40 Millions of records each file Size 7-10 GB file as row length is big, and fixed.
It's a simple text file, once searched record is found. ill do some update and write to file.
File A may contain 0 or more records of matching ID from File B
Motive is to complete this processing in least amount of time possible.
I am able to do but its time taking process...
Suggestions are welcome.
File A
1000000001,A
1000000002,B
1000000002,C
1000000002,D
1000000002,D
1000000003,E
1000000004,E
1000000004,E
1000000004,E
1000000004,E
1000000005,E
1000000006,A
1000000007,A
1000000008,B
1000000009,B
1000000010,C
1000000011,C
1000000012,C
File B
1000000002
1000000004
1000000006
1000000008
1000000010
1000000012
1000000014
1000000016
1000000018\
// Not working as of now. due to logic is wrong.
private static void readAndWriteFile() {
System.out.println("Read Write File Started.");
long time = System.currentTimeMillis();
try(
BufferedReader in = new BufferedReader(new FileReader(Commons.ROOT_PATH+"input.txt"));
BufferedReader search = new BufferedReader(new FileReader(Commons.ROOT_PATH+"search.txt"));
FileWriter myWriter = new FileWriter(Commons.ROOT_PATH+"output.txt");
) {
String inLine = in.readLine();
String searchLine = search.readLine();
boolean isLoopEnd = true;
while(isLoopEnd) {
if(searchLine == null || inLine == null) {
isLoopEnd = false;
break;
}
if(searchLine.substring(0, 10).equalsIgnoreCase(inLine.substring(0,10))) {
System.out.println("Record Found - " + inLine.substring(0, 10) + " | " + searchLine.substring(0, 10) );
myWriter.write(inLine + System.lineSeparator());
inLine = in.readLine();
}else {
inLine = in.readLine();
}
}
in.close();
myWriter.close();
search.close();
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println("Read and Write to File done in - " + (System.currentTimeMillis() - time));
}
My suggestion would be to use a database. As said in this answer. Using txt files has a big disadvantage over DBs. Mostly because of the lack of indexes and the other points mentioned in the answer.
So what I would do, is create a Database (there are lots of good ones out there such as MySQL, PostgreSQL, etc). Create the tables that are needed, and read the file afterward. Insert each line of the file into the DB and use the db to search and update them.
Maybe this would not be an answer to your concrete question on
Motive is to complete this processing in the least amount of time possible.
But this would be a worthy suggestion. Good luck.
With this approach I am able to process 50M Records in 150 Second on i-3, 4GB Ram and SSD Hardrive.
private static void readAndWriteFile() {
System.out.println("Read Write File Started.");
long time = System.currentTimeMillis();
try(
BufferedReader in = new BufferedReader(new FileReader(Commons.ROOT_PATH+"input.txt"));
BufferedReader search = new BufferedReader(new FileReader(Commons.ROOT_PATH+"search.txt"));
FileWriter myWriter = new FileWriter(Commons.ROOT_PATH+"output.txt");
) {
String inLine = in.readLine();
String searchLine = search.readLine();
boolean isLoopEnd = true;
while(isLoopEnd) {
if(searchLine == null || inLine == null) {
isLoopEnd = false;
break;
}
// Since file is already sorted, i was looking for the //ans i found here..
long seachInt = Long.parseLong(searchLineSubString);
long inInt = Long.parseLong(inputLineSubString);
if(searchLine.substring(0, 10).equalsIgnoreCase(inLine.substring(0,10))) {
System.out.println("Record Found - " + inLine.substring(0, 10) + " | " + searchLine.substring(0, 10) );
myWriter.write(inLine + System.lineSeparator());
}
// Which pointer to move..
if(seachInt < inInt) {
searchLine = search.readLine();
}else {
inLine = in.readLine();
}
}
in.close();
myWriter.close();
search.close();
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println("Read and Write to File done in - " + (System.currentTimeMillis() - time));
}

Editing a file using async threads in Java

I'm a small java developer currently working on a discord bot that I made in Java. one of the features of my bot is to simply have a leveling system whenever anyone sends a message (and other conditions but this is irrelevant for the problem I'm encountering).
Whenever someone sends a message an event is fired and a thread is created to compute how much exp the user should gain. and eventually, the function to edit the storage file is called.
which works fine when called sparsely. but if two threads try to write on the file at once, the file usually gets deleted or truncated. either of these two cases being undesired behavior
I then tried to make a queuing system that worked for over 24h but still failed once so it is more stable in a way. I only know the basics of how threads work so I may've skipped over an important thing that causes the problem
the function looks like this
Thread editingThread = null;
public boolean editThreadStarted = false;
HashMap<String, String> queue = new HashMap<>();
public final boolean editParameter(String key, String value) {
queue.put(key, value);
if(!editThreadStarted) {
editingThread = new Thread(new Runnable() {
#Override
public void run() {
while(queue.keySet().size() > 0) {
String key = (String) queue.keySet().toArray()[0];
String value = queue.get(key);
File inputFile = getFile();
File tempFile = new File(getFile().getName() + ".temp");
try {
tempFile.createNewFile();
} catch (IOException e) {
DemiConsole.error("Failed to create temp file");
handleTrace(e);
continue;
}
//System.out.println("tempFile.isFile = " + tempFile.isFile());
try (BufferedReader reader = new BufferedReader(new FileReader(inputFile)); BufferedWriter writer = new BufferedWriter(new FileWriter(tempFile))){
String currentLine;
while((currentLine = reader.readLine()) != null) {
String trimmedLine = currentLine.trim();
if(trimmedLine.startsWith(key)) {
writer.write(key + ":" + value + System.getProperty("line.separator"));
continue;
}
writer.write(currentLine + System.getProperty("line.separator"));
}
writer.close();
reader.close();
inputFile.delete();
tempFile.renameTo(inputFile);
} catch (IOException e) {
DemiConsole.error("Caught an IO exception while attempting to edit parameter ("+key+") in file ("+getFile().getName()+"), returning false");
handleTrace(e);
continue;
}
queue.remove(key);
}
editThreadStarted = false;
}
});
editThreadStarted = true;
editingThread.start();
}
return true;
}
getFile() returns the file the function is meant to write to
the file format is
memberid1:expamount
memberid2:expamount
memberid3:expamount
memberid4:expamount
the way the editing works is by creating a temporary file to which i will write all of the original file's data line by line, checking if the memberid matches with what i want to edit, if it does, then instead of writing the original file's line, i will write the new edited line with the new expamount instead, before continuing on with the rest of the lines. Once that is done, the original file is deleted and the temporary file is renamed to the original file, replacing it.
This function will always be called asynchronously so making the whole thing synchronous is not an option.
Thanks in advance
Edit(1) :
I've been suggested to use semaphores and after digging a little into it (i never heard of semaphores before) it seems to be a really good option and would remove the need for a queue, simply aquire in the beginning and release at the end, nothing more required!
I ended up using semaphores as per user207421's suggestions and it seems to work perfectly
I simply put delays between each line write to artificially make the task longer and make it easier to have multiple threads trying to write at once, and they all wait for their turns!
Thanks

How to write in a file with threads?

How to write in a file with threads ? Each file should be 100 lines, each line length is 100 characters. This work must perform threads and I\O.
My code:
public class CustomThread extends Thread{
private Thread t;
private String threadName;
CustomThread(String threadName){
this.threadName = threadName;
}
public void run () {
if (t == null)
{
t = new Thread (this);
}
add(threadName);
}
public synchronized void add(String threadName){
File f = new File(threadName + ".txt");
if (!f.exists()) {
try {
f.createNewFile();
} catch (IOException e) {
e.printStackTrace();
System.out.println("File does not exists!");
}
}
FileWriter fw = null;
try {
fw = new FileWriter(f);
for (int i = 0; i < 100; i++) {
for (int j = 0; j < 100; j++) {
fw.write(threadName);
fw.write('\n');
}
}
} catch (IOException e) {
e.printStackTrace();
System.out.println("File does not exists!");
}
}
}
My code is correct ? I need to create file with 100 lines and 100 characters. Сharacter must depend on the file name. If I create a file named 1, and the name of the filling must be 1. Thanks.
Your code looks correct as per your requirement which is writing 100 lines and each line containing 100 characters. The assumption is, name of the thread will be single character, because your are writing threadName to the file. I have few closing suggestion to complete your implementation. They test it by yourself. If your find any issue, do comment.
To have each line 100 characters, you need to move new line characters statement to outer loop.
Once your finishing writing writing all the data to file, do flush() and close() the file, for saving it.
You are creating the file with threadName, You might want to add the starting path location for the file to be created.
Obviously you are missing main() method. Create object of the class and start() the thread.
You don't need to create a separate Thread instance, The run() method will be executed in a separate thread because you are extending Thread class.

Java multi threading file saving

I have an app that created multiple endless threads. Each thread reads some info and I created some tasks using thread pool (which is fine).
I have added additional functions that handle arrays, when it finishes, its send those ArrayLists to new thread that save those lists as files. I have implemented the saving in 3 ways and only one of which succeeds. I would like to know why the other 2 ways did not.
I created a thread (via new Thread(Runnable)) and gave it the array and name of the file. In the thread constructor I create the PrintWriter and saved the files. It ran without any problems. ( I have 1-10 file save threads runing in parallel).
If I place the save code outputStream.println(aLog); in the Run method, it never reaches it and after the constructor finishes the thread exit.
I place the created runnables (file save) in a thread pool (and code for saving is in the run() method). When I send just 1 task (1 file to save), all is fine. More than 1 task is being added to the pool (very quickly), exceptions is created (in debug time I can see that all needed info is available) and some of the files are not saved.
Can one explain the difference behavior?
Thanks
Please see code below. (starting with function that is being part of an endless thread class that also place some tasks in the pool), the pool created in the endless thread:
ExecutorService iPool = Executors.newCachedThreadPool();
private void logRate(double r1,int ind){
historicalData.clear();
for (int i = 499; i>0; i--){
// some Code
Data.add(0,array1[ind][i][0] + "," + array1[ind][i][1] + "," +
array1[ind][i][2] + "," + array1[ind][i][3] + "," +
array2[ind][i] + "\n" );
}
// first item
array1[ind][0][0] = r1;
array1[ind][0][1] = array1[ind][0][0] ;
array1[ind][0][2] = array1[ind][0][0] ;
array2[ind][0] = new SimpleDateFormat("HH:mm:ss yyyy_MM_dd").format(today);
Data.add(0,r1+","+r1+","+r1+","+r1+ "," + array2[ind][0] + '\n') ;
// save the log send it to the pool (this is case 3)
//iPool.submit(new FeedLogger(fName,Integer.toString(ind),Data));
// Case 1 and 2
Thread fl = new Thread(new FeedLogger(fName,Integer.toString(ind),Data)) ;
}
here is the FeedLogger class:
public class FeedLogger implements Runnable{
private List<String> fLog = new ArrayList<>() ;
PrintWriter outputStream = null;
String asName,asPathName;
public FeedLogger(String aName,String ind, List<String> fLog) {
this.fLog = fLog;
this.asName = aName;
try {
asPathName = System.getProperty("user.dir") + "\\AsLogs\\" + asName + "\\Feed" + ind
+ ".log" ;
outputStream = new PrintWriter(new FileWriter(asPathName));
outputStream.println(fLog); Case 1 all is fine
outputStream.flush(); // Case 1 all is fine
outputStream.close(); Case 1 all is fine
}
catch (Exception ex) {
JavaFXApplication2.logger.log(Level.SEVERE, null,asName + ex.getMessage());
}
}
#Override
public void run()
{
try{
outputStream.println(fLog); // Cas2 --> not reaching this code, Case3 (as task) create
exception when we have multiple tasks
outputStream.flush();
}
catch (Exception e) {
System.out.println("err in file save e=" + e.getMessage() + asPathName + " feed size=" +
fLog.size());
JavaFXApplication2.logger.log(Level.ALL, null,asName + e.getMessage());
}
finally {if (outputStream != null) {outputStream.close();}}
}
}
You need to call start() on a Thread instance to make it actually do something.

Categories

Resources