I'm doing some file I/O with multiple files (writing to 19 files, it so happens). After writing to them a few hundred times I get the Java IOException: Too many open files. But I actually have only a few files opened at once. What is the problem here? I can verify that the writes were successful.
On Linux and other UNIX / UNIX-like platforms, the OS places a limit on the number of open file descriptors that a process may have at any given time. In the old days, this limit used to be hardwired1, and relatively small. These days it is much larger (hundreds / thousands), and subject to a "soft" per-process configurable resource limit. (Look up the ulimit shell builtin ...)
Your Java application must be exceeding the per-process file descriptor limit.
You say that you have 19 files open, and that after a few hundred times you get an IOException saying "too many files open". Now this particular exception can ONLY happen when a new file descriptor is requested; i.e. when you are opening a file (or a pipe or a socket). You can verify this by printing the stacktrace for the IOException.
Unless your application is being run with a small resource limit (which seems unlikely), it follows that it must be repeatedly opening files / sockets / pipes, and failing to close them. Find out why that is happening and you should be able to figure out what to do about it.
FYI, the following pattern is a safe way to write to files that is guaranteed not to leak file descriptors.
Writer w = new FileWriter(...);
try {
// write stuff to the file
} finally {
try {
w.close();
} catch (IOException ex) {
// Log error writing file and bail out.
}
}
1 - Hardwired, as in compiled into the kernel. Changing the number of available fd slots required a recompilation ... and could result in less memory being available for other things. In the days when Unix commonly ran on 16-bit machines, these things really mattered.
UPDATE
The Java 7 way is more concise:
try (Writer w = new FileWriter(...)) {
// write stuff to the file
} // the `w` resource is automatically closed
UPDATE 2
Apparently you can also encounter a "too many files open" while attempting to run an external program. The basic cause is as described above. However, the reason that you encounter this in exec(...) is that the JVM is attempting to create "pipe" file descriptors that will be connected to the external application's standard input / output / error.
For UNIX:
As Stephen C has suggested, changing the maximum file descriptor value to a higher value avoids this problem.
Try looking at your present file descriptor capacity:
$ ulimit -n
Then change the limit according to your requirements.
$ ulimit -n <value>
Note that this just changes the limits in the current shell and any child / descendant process. To make the change "stick" you need to put it into the relevant shell script or initialization file.
You're obviously not closing your file descriptors before opening new ones. Are you on windows or linux?
Although in most general cases the error is quite clearly that file handles have not been closed, I just encountered an instance with JDK7 on Linux that well... is sufficiently ****ed up to explain here.
The program opened a FileOutputStream (fos), a BufferedOutputStream (bos) and a DataOutputStream (dos). After writing to the dataoutputstream, the dos was closed and I thought everything went fine.
Internally however, the dos, tried to flush the bos, which returned a Disk Full error. That exception was eaten by the DataOutputStream, and as a consequence the underlying bos was not closed, hence the fos was still open.
At a later stage that file was then renamed from (something with a .tmp) to its real name. Thereby, the java file descriptor trackers lost track of the original .tmp, yet it was still open !
To solve this, I had to first flush the DataOutputStream myself, retrieve the IOException and close the FileOutputStream myself.
I hope this helps someone.
If you're seeing this in automated tests: it's best to properly close all files between test runs.
If you're not sure which file(s) you have left open, a good place to start is the "open" calls which are throwing exceptions! 😄
If you have a file handle should be open exactly as long as its parent object is alive, you could add a finalize method on the parent that calls close on the file handle. And call System.gc() between tests.
Recently, I had a program batch processing files, I have certainly closed each file in the loop, but the error still there.
And later, I resolved this problem by garbage collect eagerly every hundreds of files:
int index;
while () {
try {
// do with outputStream...
} finally {
out.close();
}
if (index++ % 100 = 0)
System.gc();
}
Related
I'm doing some file I/O with multiple files (writing to 19 files, it so happens). After writing to them a few hundred times I get the Java IOException: Too many open files. But I actually have only a few files opened at once. What is the problem here? I can verify that the writes were successful.
On Linux and other UNIX / UNIX-like platforms, the OS places a limit on the number of open file descriptors that a process may have at any given time. In the old days, this limit used to be hardwired1, and relatively small. These days it is much larger (hundreds / thousands), and subject to a "soft" per-process configurable resource limit. (Look up the ulimit shell builtin ...)
Your Java application must be exceeding the per-process file descriptor limit.
You say that you have 19 files open, and that after a few hundred times you get an IOException saying "too many files open". Now this particular exception can ONLY happen when a new file descriptor is requested; i.e. when you are opening a file (or a pipe or a socket). You can verify this by printing the stacktrace for the IOException.
Unless your application is being run with a small resource limit (which seems unlikely), it follows that it must be repeatedly opening files / sockets / pipes, and failing to close them. Find out why that is happening and you should be able to figure out what to do about it.
FYI, the following pattern is a safe way to write to files that is guaranteed not to leak file descriptors.
Writer w = new FileWriter(...);
try {
// write stuff to the file
} finally {
try {
w.close();
} catch (IOException ex) {
// Log error writing file and bail out.
}
}
1 - Hardwired, as in compiled into the kernel. Changing the number of available fd slots required a recompilation ... and could result in less memory being available for other things. In the days when Unix commonly ran on 16-bit machines, these things really mattered.
UPDATE
The Java 7 way is more concise:
try (Writer w = new FileWriter(...)) {
// write stuff to the file
} // the `w` resource is automatically closed
UPDATE 2
Apparently you can also encounter a "too many files open" while attempting to run an external program. The basic cause is as described above. However, the reason that you encounter this in exec(...) is that the JVM is attempting to create "pipe" file descriptors that will be connected to the external application's standard input / output / error.
For UNIX:
As Stephen C has suggested, changing the maximum file descriptor value to a higher value avoids this problem.
Try looking at your present file descriptor capacity:
$ ulimit -n
Then change the limit according to your requirements.
$ ulimit -n <value>
Note that this just changes the limits in the current shell and any child / descendant process. To make the change "stick" you need to put it into the relevant shell script or initialization file.
You're obviously not closing your file descriptors before opening new ones. Are you on windows or linux?
Although in most general cases the error is quite clearly that file handles have not been closed, I just encountered an instance with JDK7 on Linux that well... is sufficiently ****ed up to explain here.
The program opened a FileOutputStream (fos), a BufferedOutputStream (bos) and a DataOutputStream (dos). After writing to the dataoutputstream, the dos was closed and I thought everything went fine.
Internally however, the dos, tried to flush the bos, which returned a Disk Full error. That exception was eaten by the DataOutputStream, and as a consequence the underlying bos was not closed, hence the fos was still open.
At a later stage that file was then renamed from (something with a .tmp) to its real name. Thereby, the java file descriptor trackers lost track of the original .tmp, yet it was still open !
To solve this, I had to first flush the DataOutputStream myself, retrieve the IOException and close the FileOutputStream myself.
I hope this helps someone.
If you're seeing this in automated tests: it's best to properly close all files between test runs.
If you're not sure which file(s) you have left open, a good place to start is the "open" calls which are throwing exceptions! 😄
If you have a file handle should be open exactly as long as its parent object is alive, you could add a finalize method on the parent that calls close on the file handle. And call System.gc() between tests.
Recently, I had a program batch processing files, I have certainly closed each file in the loop, but the error still there.
And later, I resolved this problem by garbage collect eagerly every hundreds of files:
int index;
while () {
try {
// do with outputStream...
} finally {
out.close();
}
if (index++ % 100 = 0)
System.gc();
}
i know this may get a down vote this is bothering me a lot
i have already read all posts on .close() method like
explain the close() method in Java in Layman's terms
Why do I need to call a close() or shutdown() method?
the usage of close() method(Java Beginner)
i have these questions which may seem too trivial
1.what does the word 'resource' exactly mean (is it the file or the 'FileWriter' object or some other thing)(try to explain as broadly as possible)
lets consider following code
import java.io.*;
public class characterstreams
{
public static void main(String []args) throws Exception
{
File f=new File("thischaracter.txt");
FileWriter fw=new FileWriter(f);
char[] ch={'a','c','d'};
fw.write('a');
fw.write(ch);
fw.write("aaaa aaaaa aaaaaaa");
fw.flush();
FileReader fr=new FileReader(f);
int r=fr.read();
System.out.println(r);
char[] gh=new char[30];
System.out.println(fr.read(gh));
}
}
after compiling and executing it
G:/>java characterstreams
lets say resource is FileWriter below (since i have yet to get the meaning of resources)
JVM starts and opens up so-called resources and then execution completes and after which JVM get shuts down after execution
2.it unlocks the resource that it has opened since it's not running right (correct me if i am wrong)
G:/>
at this point JVM is not running
3.before shuting down , garbage collector is called right ?(correct me if am wrong) so FileWriter objects get destroyed
then why are we supposed to close all the resources that we have opened up
and
4.again i read that 'resources get leaked' what does this supposed to mean..?
resource means anything which is needed by the JVM and/or operating system to provide you with the functionality you request.
Taken your example. If you open a FileWriter the operating system in general (depends on the operating system, file system, etc.) will do (assuming you want to write a file to a disc, like HDD/SDD)
create a directory entry for the requested filename
create a data structure to maintain the writing process to the file
allocate disc space if you actually write data to the file
(note: this is not an exhaustive list)
The point will be done for any file you open for writing. If you don't close the resource all this remains in the memory and is still maintained by the operating system.
Assume your application is running over a long time and is constantly opening files. The number of open file the operating system allows you to keep open is limited (the concrete number depends on the operating system, quota settings, ...). If the resources are exhausted something will behave unexpected or fail.
Find below a small demonstration on Linux.
public static void main(String[] args) throws IOException {
List<OutputStream> files = new ArrayList<>();
for (int i = 0; i < 1000; i++) {
files.add(Files.newOutputStream(Paths.get("/tmp/demo." + i),
StandardOpenOption.CREATE));
}
}
The code open one-thousend file for writing.
Assume your limit of open files is 1024
ulimit -n
1024
you run the snippet and it will generate 1000 files /tmp/demo.*.
If your limit of open files is only 100 the code will fail
ulimit -n 100
java.nio.file.FileSystemException: /tmp/demo.94: Too many open files
(it fails before as the JVM itself has some open files)
To prevent such problems (lack of resources) you should close files which you don't need any longer to write to. If you don't do it in Java(close()) the operating system also doesn't know if the memory etc. can be freed and used for another request.
I am creating temporary files in my application using java.io.File.createTempFile().
While creating file, I have called deleteOnExit() for that File object.
This code is used in many scenarios in my application. Sometimes, the size of temp files is too large and so I have to delete it immediately after my job is completed. So I am calling File.delete() for some objects.
Now the problem is, when I delete the file using delete() function, the reference pointer is open for this deleted file(because of it being temp file(My opinion)). Because of this, I am facing memory leakage issue.
(Correct me if I am wrong on my above hypothesis)
I am facing high disk utilization on my environment, I found a discrepancy of over 30GB in the output of the 'df' and 'du' command ('df' looks at the stat of the FS itself whereas 'du' ignores deleted file descriptors).
If I remove deleteOnExit(), I will have to take care of deleting all the objects manually. Doing this, my pointers are still remaining open(Used lsof +al1 on linux to see open files) Why is this happening?
If I remove delete(), then I will have to wait until VM stops to get tempFiles deleted(which is a very rare case in Production Server). (Huge Space Utilization)
Is there any Solution on how I can remove file from deleteOnExit() list if I am manually deleting the file?
I suspect that your analysis is correct, and that could be seen as a bug in Java: once you call delete, it would be fair to expect the reference created by deleteOnExit to be be removed.
However, we are at least warned (sort of). The Javadoc for deleteOnExit says:
Once deletion has been requested, it is not possible to cancel the request. This method should therefore be used with care.
So I guess calling delete after deleteOnExit would be considered careless.
However, it seems to me that your question implies its own solution. You say:
If I remove delete(), then I will have to wait until VM stops to get tempFiles deleted(which is a very rare case in Production Server).
If the JVM is very rarely ended, then deleteOnExit will very rarely do you any good. Which suggests that the solution is to handle your own deletions, by having your application call delete when it is finished with a file, and note use deleteOnExit at all.
The pointer will remain open until the application releases the resource for that file. Try
fileVar = null;
after your
fileVar.delete();
I see this problem as well. I've temporarily "fixed" it by executing the following, prior to calling delete:
FileChannel outChan = new FileOutputStream(tmpfile, true).getChannel();
outChan.truncate(newSize);
outChan.close();
This at least makes it so that the tmp files don't consume disk space, and df and du report the same stats. It does still leak file descriptors, and I presume it leaks a small amount of heap.
It's noteworthy that File.delete() returns a boolean to indicate if the delete succeeded. It's possible that it's silently failing for you, and you actually have an unclosed stream to the file, which prevents the delete. You may want to try using the following call, which will throw an IOException with diagnostics if it is unable to delete the file.
java.nio.file.Files.delete(tmpfile.toPath)
If that still doesn't isolate the problem for you, I've had some luck using file-leak-detector, which keeps track of the time files are accessed via streams and grabs a stack trace at the time the stream is created. If a stream doesn't get closed, the stack trace can point you to the origin of that stream. Unfortunately, it doesn't cover all forms of file access, such as nio.
Every 5 seconds (for example), a server checks if files have been added to a specific directory. If yes, it reads and processes them. The concerned files can be quite big (100+ Mo for example), so copying/uploading them to the said directory can be quite long.
What if the server tries to access a file that hasn't finished being copied/uploaded? How does JAVA manage these concurrent accesses? Does it depend on the OS of the server?
I made a try, copying a ~1300000-line TXT file (i.e. about 200 Mo) from a remote server to my local computer: it takes about 5 seconds. During this lapse, I run the following JAVA class:
public static void main(String[] args) throws Exception {
String local = "C:\\large.txt";
BufferedReader reader = new BufferedReader(new FileReader(local));
int lines = 0;
while (reader.readLine() != null)
lines++;
reader.close();
System.out.println(lines + " lines");
}
I get the following exception:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2882)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:515)
at java.lang.StringBuffer.append(StringBuffer.java:306)
at java.io.BufferedReader.readLine(BufferedReader.java:345)
at java.io.BufferedReader.readLine(BufferedReader.java:362)
at main.Main.main(Main.java:15)
When running the class once the file has finished being copied, I get the expected output (i.e. 1229761 lines), so the exception isn't due to the size of the file (as we could think in the first place). What is JAVA doing in background, that threw this OutOfMemoryError exception?
How does JAVA manage these concurrent accesses? Does it depend on the OS of the server?
It depends on the specific OS. If you run a copy and server in a single JVM AsynchronousFileChannel (new in 1.7) class could be of a great help. However, if client and server are represented by different JVMs (or even more, are started on a different machines) it all turns to be a platform specific.
From JavaDoc for AsynchronousFileChannel:
As with FileChannel, the view of a file provided by an instance of this class is guaranteed to be consistent with other views of the same file provided by other instances in the same program. The view provided by an instance of this class may or may not, however, be consistent with the views seen by other concurrently-running programs due to caching performed by the underlying operating system and delays induced by network-filesystem protocols. This is true regardless of the language in which these other programs are written, and whether they are running on the same machine or on some other machine. The exact nature of any such inconsistencies are system-dependent and are therefore unspecified.
Why are you using a buffered reader just to count the lines?
From the javadoc:
Reads text from a character-input stream, buffering characters so as to provide for the efficient reading of characters, arrays, and lines.
This means it will "buffer", ie. save, that entire file in memory which causes your stack dump. Try a FileReader.
Last summer, I made a Java application that would parse some PDF files and get the information they contain to store them in a SQLite database.
Everything was fine and I kept adding new files to the database every week or so without any problems.
Now, I'm trying to improve my application's speed and I wanted to see how it would fare if I parsed all the files I have from the last two years in a new database. That's when I started getting this error: OutOfMemoryError: Java Heap Space. I didn't get it before because I was only parsing about 25 new files per week, but it seems like parsing 1000+ files one after the other is a lot more demanding.
I partially solved the problem: I made sure to close my connection after every call to the database and the error went away, but at a huge cost. Parsing the files is now unbearably slow. As for my ResultSets and Statements / PreparedStatements, I'm already closing them after every call.
I guess there's something I don't understand about when I should close my connection and when I should keep re-using the same one. I thought that since auto-commit is on, it commits after every transaction (select, update, insert, etc.) and the connection releases the extra memory it was using. I'm probably wrong since when I parse too many files, I end up getting the error I'm mentioning.
An easy solution would be to close it after every x calls, but then again I won't understand why and I'm probably going to get the same error later on. Can anyone explain when I should be closing my connections (if at all except when I'm done)? If I'm only supposed to do it when I'm done, then can someone explain how I'm supposed to avoid this error?
By the way, I didn't tag this as SQLite because I got the same error when I tried running my program on my online MySQL database.
Edit
As it has been pointed out by Deco and Mavrav, maybe the problem isn't my Connection. Maybe it's the files, so I'm going to post the code I use to call the function to parse the files one by one:
public static void visitAllDirsAndFiles(File dir){
if (dir.isDirectory()){
String[] children = dir.list();
for (int i = 0; i < children.length; i++){
visitAllDirsAndFiles(new File(dir, children[i]));
}
}
else{
try{
// System.out.println("File: " + dir);
BowlingFilesReader.readFile(dir, playersDatabase);
}
catch (Exception exc){
System.out.println("Other exception in file: " + dir);
}
}
}
So if I call the method using a directory, it recursively calls the function again using the File object I just created. My method then detects that it's a file and calls BowlingFilesReader.readFile(dir, playersDatabase);
The memory should be released when the method is done I think?
Your first instinct on open resultsets and connections was good, though maybe not entirely the cause. Let's start with your database connection first.
Database
Try using a database connection pooling library, such as the Apache Commons DBCP (BasicDataSource is a good place to start): http://commons.apache.org/dbcp/
You will still need to close your database objects, but this will keep things running smoothly on the database front.
JVM Memory
Increase the size of the memory you give to the JVM. You may do so by adding -Xmx and a memory amount after, such as:
-Xmx64m <- this would give the JVM 64 megs of memory to play with
-Xmx512m <- 512 megs
Be careful with your numbers, though, throwing more memory at the JVM will not fix memory leaks. You may use something like JConsole or JVisualVM (included in your JDK's bin/ folder) to observe how much memory you are using.
Threading
You may increase the speed of your operations by threading them out, assuming the operation you are performing to parse these records is threadable. But more information might be necessary to answer that question.
Hope this helps.
As it happens with Garbage colleciton I dont think the memory would be immediately recollected for the subsequent processes and threads.So we cant entirely put our eggs in that basket.To begin with put all the files in a directory and not in child directories of the parent. Then load the file one by one by iterating like this
File f = null;
for (int i = 0; i < children.length; i++){
f = new File(dir, children[i]);
BowlingFilesReader.readFile(f, playersDatabase);
f = null;
}
So we are invalidating the reference so that the file object is released and will be picked up in the subsequent GC. And to check the limits test it by increasing the no. of files start with 100, 200 ..... and then we will know at what point OME is getting thrown.
Hope this helps.