I'm trying to read files from a folder. There are >1000 files in that folder and i trying read 100 files every 10s by using ScheduledThreadPoolExecutor.scheduleWithFixedDelay()
Files.list(Paths.get(path)).sorted().limit(limit).forEach(file ->{ //limit = 100
try{
String content = new String(Files.readAllBytes(file), "UTF-8");
//do business logic
doneFile.add(fileName);
}catch (Exception e){
log.error(e);
}
});
if(doneFile.size() > 0){
for(String fileName: doneFile){
try {
Files.delete(Paths.get(path + fileName));
} catch (IOException e) {
log.error(e);
}
}
}
About 60-80 files were read successfully and the others throw java.nio.file.NoSuchFileException.
Delete function also throws some java.nio.file.NoSuchFileException.
But finally, all files were read and deleted after some threads in spite of exceptions
What cause exception in this case and how can i fix it?
Many thanks!
Ps. Sr about my bad English
I commented:
It sounds like something else is adding and removing files. Is it possible that you have two scheduled tasks running? Is it possible that an external process is doing this?
You replied:
I checked log file and got that ScheduledThreadPoolExecutor.scheduleWithFixedDelay() created 10 threads instead of 8 as expected for 4 loops
So ... it sounds like you have two (or more) threads iterating through the same directory, processing and deleting files without any coordination. Combine that with the fact that Files.list is going to buffer part or all of the directory that it is listing, and you will get problems like:
One thread deletes a file that is in another file's stream before the second thread tries to open it. This leads to a NoSuchFileException when the latter tries to open the file.
Two threads process the same file at the same time, and one deletes it while the second thread is still processing it. This leads to a NoSuchFileException when the latter tries to delete the file.
Possible solutions:
Write the code to ignore these exceptions. (But you could still have two threads processing the same file at the same time, which seems like a bad thing.)
Restructure the code so that there is only one thread scanning the directory at any one time.
If you need parallelism when processing within the directory, have the directory processor thread submit a task for each file to an executor with a bounded thread pool.
Related
I basically want to make a watch service (or something like it) that checks if a file has been closed and instantly remove that file if it did close(finished executing).
How I can achieve this? please give me a cmd commands or some code(i prefer Java).
Ok, this should not be hard to do, if you google a bit you find a Java-File Method called file.canWrite() which basically returns if a file is locked by an other program or so.
So codewise what you could do is something like this.
boolean isDeleted = false;
File f = new File (// Put your file here);
while (!isDeleted) {
if (f.canWrite()) {
f.delete();
isDeleted = true;
} else {
try {
Thread.sleep(10); // Throws Exception you need to catch somewhere...
} catch (Exception e) {}
}
}
This code you need to include into some Java-Program. I added a simple Thread.sleep(10) that your PC does not have to check aaaaaalllllllll the time.
See Check if a file is locked in Java
Other possibility would be trying to rename the file with file.renameTo("some_path.txt"); as this method also returns a boolean whether it was successfull! Just note that you then need to update the file again before removing it.
Last possibility I see is pretty similar to the second one. You try to delete the file by calling file.delete(); If the file still exists you know it was not successful and loop because of that.
I assume you mean when the file is not open in another program, and you cannot make changes to that other program? (If you are talking about your own program opening the file, this is much easier.)
On Windows, it is not very easy to tell which program has a file open. Take a look at https://superuser.com/questions/117902/find-out-which-process-is-locking-a-file-or-folder-in-windows for some options. I like the handle tool for this, but it has to run as Administrator, which may be a problem. You can try renaming or writing to the file, as suggested at Check if a file is locked in Java
Once you have a script that determines whether the file is open to your satisfaction, it should be fairly straightforward to write a script which loops while testing if the file is open and then deletes file.
I have to process thousands of files but my program is failing after 20 files with the exception "No Space Left".
This is my pseudo code.
for (Task t: tasks) {
File f = t.createTempFile()
processing(f)
f.delete()
}
I checked the /tmp folder. The files are not getting deleted. My hairs are falling out. Can someone give some suggestions?
PS: it should have permissions to create the files so it should have permissions to delete as well.
It is probably because you still have some input or output stream on the file and you forgot to close it.
If the JVM itself (in any thread) is still holding an input or output stream to the file, it won't be deleted.
As was said in the comment above, you can check the return status of the method as well.
In my application I watch a directory for new files.
I keep an array of current files obtained with Files.list(dir), process this list one file after another, and then reload the directory with Files.list() again.
While I have never encountered the problem myself, a coworker told me that the predecessor software had an additional check that the file is older than 3 seconds (calculated with (Files.getLastModifiedTime(path) - System.currentTimeMillis()) > 3000) because there were issues that incomplete transferred (or better: not yet fully transferred) files went into processing.
Can I make any assumptions that the files returned by Files.list() were copied fully into the directory I am watching?
Is there a cleaner way to check if a file is complete? The 3 seconds check is more like a hack, a file with multiple GB in size could be copied over a slow connection (network) and may not be fully transfered even after 3s have passed.
It is not safe to assume that a file is complete if it is in a directory. There could still be a lock associated with another process that could be writing to it or holds the lock for a different reason.
I use the following method to check if a file is ready (this uses java 1.8)
public static boolean isFileReady(Path file) {
try(FileChannel ch = FileChannel.open(file, StandardOpenOption.WRITE, StandardOpenOpption.APPEND); FileLock lock = ch.tryLock()) {
if (lock == null) return false;
return true;
} catch (IOException ex) {
return false;
}
}
This will try to open the file for appending (open for normal writing will erase all its content) and create a lock. If the lock is established then we are good to go otherwise we are not.
For example I want to save large file (3G+) from web. Code sample:
try {
Files.copy(inputstream, destFilePath);
} catch (IOException ex) {
Files.deleteIfExists(destFilePath);
} finally {
IOUtils.closeQuietly(inputstream);
}
According to JavaDoc for deleteIfExists:
On some operating systems it may not be possible to remove a file when
* it is open and in use by this Java virtual machine or other programs.
Is it safe to delete file in a such way? Files.copy release output stream even error occurs, does it guarantee that JVM released lock on the file?
file should not be in use in your case. Take into consideration that Files.copy does not ask you for an OutputStream or File, just for a path. It would be weird that it could leave a file descriptor open upon exit, no matter if exception or not; the File api would be broken in that case IMO. In any case, File javadoc would inform you of that possibility.
In any case, if you want to minimise possibilities of file not being deleted, you can add also a file.deleteOnExit(); then when jvm terminates, it will do another try to delete file (unless jvm terminates abnormally).
If you have ever used a p2p downloading software, they can download a file with multi-threading, and they created only one file, So I wonder how the threads write data into that file. Sequentially or in parallel?
Imagine that you want to dump a big database table to a file, and how to make this job faster?
You can use multiple threads writing a to a file e.g. a log file. but you have to co-ordinate your threads as #Thilo points out. Either you need to synchronize file access and only write whole record/lines, or you need to have a strategy for allocating regions of the file to different threads e.g. re-building a file with known offsets and sizes.
This is rarely done for performance reasons as most disk subsystems perform best when being written to sequentially and disk IO is the bottleneck. If CPU to create the record or line of text (or network IO) is the bottleneck it can help.
Image that you want to dump a big database table to a file, and how to make this job faster?
Writing it sequentially is likely to be the fastest.
Java nio package was designed to allow this. Take a look for example at http://docs.oracle.com/javase/1.5.0/docs/api/java/nio/channels/FileChannel.html .
You can map several regions of one file to different buffers, each buffer can be filled separately by a separate thread.
The synchronized declaration enables doing this. Try the below code which I use in a similar context.
package hrblib;
import java.io.*;
public class FileOp {
static int nStatsCount = 0;
static public String getContents(String sFileName) {
try {
BufferedReader oReader = new BufferedReader(new FileReader(sFileName));
String sLine, sContent = "";
while ((sLine=oReader.readLine()) != null) {
sContent += (sContent=="")?sLine: ("\r\n"+sLine);
}
oReader.close();
return sContent;
}
catch (IOException oException) {
throw new IllegalArgumentException("Invalid file path/File cannot be read: \n" + sFileName);
}
}
static public void setContents(String sFileName, String sContent) {
try {
File oFile = new File(sFileName);
if (!oFile.exists()) {
oFile.createNewFile();
}
if (oFile.canWrite()) {
BufferedWriter oWriter = new BufferedWriter(new FileWriter(sFileName));
oWriter.write (sContent);
oWriter.close();
}
}
catch (IOException oException) {
throw new IllegalArgumentException("Invalid folder path/File cannot be written: \n" + sFileName);
}
}
public static synchronized void appendContents(String sFileName, String sContent) {
try {
File oFile = new File(sFileName);
if (!oFile.exists()) {
oFile.createNewFile();
}
if (oFile.canWrite()) {
BufferedWriter oWriter = new BufferedWriter(new FileWriter(sFileName, true));
oWriter.write (sContent);
oWriter.close();
}
}
catch (IOException oException) {
throw new IllegalArgumentException("Error appending/File cannot be written: \n" + sFileName);
}
}
}
You can have multiple threads write to the same file - but one at a time. All threads will need to enter a synchronized block before writing to the file.
In the P2P example - one way to implement it is to find the size of the file and create a empty file of that size. Each thread is downloading different sections of the file - when they need to write they will enter a synchronized block - move the file pointer using seek and write the contents of the buffer.
What kind of file is this? Why do you need to feed it with more threads? It depends on the characteristics (I don't know better word for it) of the file usage.
Transferring a file from several places over network (short: Torrent-like)
If you are transferring an existing file, the program should
as soon, as it gets know the size of the file, create it with empty content: this prevents later out-of-disk error (if there's not enough space, it will turns out at the creation, before downloading anything of it), also it helps the the performance;
if you organize the transfer well (and why not), each thread will responsible for a distinct portion of the file, thus file writes will be distinct,
even if somehow two threads pick the same portion of the file, it will cause no error, because they write the same data for the same file positions.
Appending data blocks to a file (short: logging)
If the threads just appends fixed or various-lenght info to a file, you should use a common thread. It should use a relatively large write buffer, so it can serve client threads quick (just taking the strings), and flush it out optimal scheduling and block size. It should use dedicated disk or even computer.
Also, there can be several performance issues, that's why are there logging servers around, even expensive commercial ones.
Reading and writing random time, random position (short: database)
It requires complex design, with mutexes etc., I never done this kinda stuff, but I can imagine. Ask Oracle for some tricks :)