Error with NIO while trying to copy large file - java

I have the code to copy a file to another location.
public static void copyFile(String sourceDest, String newDest) throws IOException {
File sourceFile = new File(sourceDest);
File destFile = new File(newDest);
if (!destFile.exists()) {
destFile.createNewFile();
}
FileChannel source = null;
FileChannel destination = null;
try {
source = new FileInputStream(sourceFile).getChannel();
destination = new FileOutputStream(destFile).getChannel();
destination.transferFrom(source, 0, source.size());
} finally {
if (source != null) {
source.close();
}
if (destination != null) {
destination.close();
}
}
}
}
While copying small chunks, say, 300-400 Mb, everything works like magic. But when I tried to copy a file a size of 1.5 Gb it failed. The stack is:
run:
12.01.2011 11:16:36 FileCopier main
SEVERE: Exception occured while copying file. Try again.
java.io.IOException: Map failed
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:748)
at sun.nio.ch.FileChannelImpl.transferFromFileChannel(FileChannelImpl.java:527)
at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:590)
at FileCopier.copyFile(FileCopier.java:64)
at FileCopier.main(FileCopier.java:27)
Caused by: java.lang.OutOfMemoryError: Map failed
at sun.nio.ch.FileChannelImpl.map0(Native Method)
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:745)
... 4 more
BUILD SUCCESSFUL (total time: 0 seconds)
I haven't worked with NIO closely. Could you please help me out? Thank you so much in advance.

I think you might have been hit by an old bug which already encountered some time ago. I was not trying to copy a file but rather to seek through an memory-mapped file which failed as well. For me the workaround is to seek through the file in a loop and request the GC and finalizers to run every now and then.
The memory-mapped ByteBuffers release their mapping in the finalizer and make room for new mappings. This is very ugly, but at least it works. Let's hope they did something about this in the coming NIO iteration.

You are memory mapping a file but there is limited memory address space in a 32-bit JVM (which I presume you are using) so the map method is failing. I don't think you can map more than 1.3-1.4 GB of disk data. What heap size are you using?
You can try reducing your heap size or use a 64-bit JRE. Alternatively, don't read the file by mapping it to memory using NIO. Instead, use the traditional way of a buffered reader and writer to read and write data from one file to the other.

Related

Unzipping into a ByteArrayOutputStream -- why am I getting an EOFException?

I have been trying to create a Java program that will read zip files from an online API, unzip them into memory (not into the file system), and load them into a database. Since the unzipped files need to be loaded into the database in a specific order, I will have to unzip all of the files before I load any of them.
I basically used another question on StackOverflow as a model on how to do this. Using ZipInputStream from util.zip I was able to do this with a smaller ZIP (0.7MB zipped ~4MB unzipped), but when I encountered a larger file (25MB zipped, 135MB unzipped), the two largest files were not read into memory. I was not even able to retrieve a ZipEntry for these larger files (8MB and 120MB, the latter making up the vast majority of the data in the zip file). No exceptions were thrown, and my program proceeded until it tried to access tha the unzipped files that failed to be written, and threw NullPointerException.
I am using Jsoup to get the zipfile from online.
Has anyone had any experience with this and can give guidance on why I am unable to retrieve the complete contents of the zip file?
Below is the code that I am using. I am collecting unzipped files as InputStreams in a HashMap, and when there are no more ZipEntrys, the program should stop looking for ZipEntrys when there are no more left.
private Map<String, InputStream> unzip(ZipInputStream verZip) throws IOException {
Map<String, InputStream> result = new HashMap<>();
while (true) {
ZipEntry entry;
byte[] b = new byte[1024];
ByteArrayOutputStream out = new ByteArrayOutputStream();
int l;
entry = verZip.getNextEntry();//Might throw IOException
if (entry == null) {
break;
}
try {
while ((l = verZip.read(b)) > 0) {
out.write(b, 0, l);
}
out.flush();
}catch(EOFException e){
e.printStackTrace();
}
catch (IOException i) {
System.out.println("there was an ioexception");
i.printStackTrace();
fail();
}
result.put(entry.getName(), new ByteArrayInputStream(out.toByteArray()));
}
return result;
}
Might I be better off if my program took advantage of the filesystem to unzip files?
It turns out that Jsoup is the root of the issue. When obtaining binary data with a Jsoup connection, there is a limit to how many bytes will be read from the connection. By default, this limit is 1048576, or 1 megabyte. As a result, when I feed the binary data from Jsoup into a ZipInputStream, the resulting data is cut off after one megabyte. This limit, maxBodySizeBytes can be found in org.jsoup.helper.HttpConnection.Request.
Connection c = Jsoup.connect("example.com/download").ignoreContentType(true);
//^^returns a Connection that will only retrieve 1MB of data
InputStream oneMb = c.execute().bodyStream();
ZipInputStream oneMbZip = new ZipInputStream(oneMb);
Trying to unzip the truncated oneMbZip is what led me to get the EOFException
With the code below, I was able to change Connection's byte limit to 1 GB (1073741824), and then was able to retrieve the zip file without running into an EOFException.
Connection c = Jsoup.connect("example.com/download").ignoreContentType(true);
//^^returns a Connection that will only retrieve 1MB of data
Connection.Request theRequest = c.request();
theRequest.maxBodySize(1073741824);
c.request(theRequest);//Now this connection will retrieve as much as 1GB of data
InputStream oneGb = c.execute().bodyStream();
ZipInputStream oneGbZip = new ZipInputStream(oneGb);
Note that maxBodySizeBytes is an int and its upper limit is 2,147,483,647, or just under 2GB.

FileChannel works even after removing backing file

I noticed this weird thing that opened FileChannel object works even after linked file is deleted while a file channel is in use. I have created 15GB test file and following program reads 100MB of file content consequently per second.
Path path = Paths.get("/home/elbek/tmp/file.txt");
FileChannel fileChannel = FileChannel.open(path, StandardOpenOption.READ);
ByteBuffer byteBuffer = ByteBuffer.allocate(1024 * 1024);
while (true) {
int read = fileChannel.read(byteBuffer);
if (read < 0) {
break;
}
Thread.sleep(10);
byteBuffer.clear();
System.out.println(fileChannel.position());
}
fileChannel.close();
After program runs ~5 seconds (it has read 0.5GB) I delete the file from the file system and expect an error to be thrown after a few reads, but the program goes on and reads the file till the end, I was initially thinking maybe it is being served from file cache and made file huge so it won't fit into cache, 15GB is big enough I think not to fit into it.
Anyways, how OS is serving read requests while the file itself is not there anymore? The OS I am testing this is Fedora.
Thanks.

Java SSH Recursive Download causes memory leaks

I am using JSch to provide an utility that backs up an entire server data for my company.
The application is developped using Java 8 & JavaFX 2
My problem is that I believe that my recursive download is at fault because my program RAM usage is growing by the second and never seems to free up.
This is the order of the operations I perform :
Connexion to remote server : OK;
Opning SFT Channel -> session.openChannel("sftp") : OK
Retrieving local directory -> sftpChannel.cd(MAIN_DIRECTORY) : OK
Listing directory content -> final Vector<ChannelSftp.LsEntry> entries= sftpChannel.ls(".");
Calling recursive method to :
if (entry.getAttrs().isDir())-> calling recursive method
else -> it's a file there are no more sub folder to go to ;
Process download
Now, where I think the memory leak occurs in the Download Part :
Starting download & retrieving the inputstream
final InputStream is = sftpChannel.get(remoteFilePath, new SftpProgressMonitor());
Where SftpProgressMonitor() is an interface to provide progress monitoring which I use for updating UI (progressbar). this interface never references internally the inputstream just to make that clear. But it's still an non-static anonymous class so it does hold a reference to the DownloadMethod scope.
While it's downloading, I create the file to save and open an OutputStream to write the downloaded content in it :
final BufferedOutputStream bos = new BufferedOutputStream(new FileOutputStream(fileToSave));
This is where I write to file as the remote file gets downloaded :
Code:
int readCount;
final byte[] buffer = new byte[8 * 1024];
while ((readCount = is.read(buffer)) > 0) {
bos.write(buffer, 0, readCount);
bos.flush();
}
And of course, once this is completed, I don't forget to close both streams:
is.close(); //the inputstream from sftChannel.get()
bos.close(); //the FileOutputStream
So as you can understand I recursively process these operations meaning :
List current directory content ;
Check first entry
if it's a directory, go inside, and do 1.
it it's a file, download it
Check second entry
etc.
Multiple tests show the exact same behaviour (and the content to download remain exactly the same during these tests). This means that my memory usage keeps growing and at the same pace.
[UPDATE 1]
I tried a solution where I let JSch write to the FileOutputStream itself :
final BufferedOutputStream bos = new BufferedOutputStream(new FileOutputStream(fileToSave));
sftpChannel.get(remoteFilePath, bos, new SftpProgressMonitor()
And in SftpProgressMonitor.end() I close -> bos.close().
No changed at all.
I also tried listing all files, still recursively, adding their respective bytes length to a private long totalBytesToDownload and my program memory remained very stable : only 20Mb taken during the whole process (on the account that totalBytesToDownload kept increasing) which confirms that my downloading method is really at fault.
If I do close my streams, why the GC won't collect them ?

JAI create seems to leave file descriptors open

I have some old code that was working until recently, but seems to barf now that it runs on a new server using OpenJDK 6 rather than Java SE 6.
The problem seems to revolve around JAI.create. I have jpeg files which I scale and convert to png files. This code used to work with no leaks, but now that the move has been made to a box running OpenJDK, the file descriptors seem to never close, and I see more and more tmp files accumulate in the tmp directory on the server. These are not files I create, so I assume it is JAI that does it.
Another reason might be the larger heap size on the new server. If JAI cleans up on finalize, but GC happens less frequently, then maybe the files pile up because of that. Reducing the heap size is not an option, and we seem to be having unrelated issues with increasing ulimit.
Here's an example of a file that leaks when I run this:
/tmp/imageio7201901174018490724.tmp
Some code:
// Processor is an internal class that aggregates operations
// performed on the image, like resizing
private byte[] processImage(Processor processor, InputStream stream) {
byte[] bytes = null;
SeekableStream s = null;
try {
// Read the file from the stream
s = SeekableStream.wrapInputStream(stream, true);
RenderedImage image = JAI.create("stream", s);
BufferedImage img = PlanarImage.wrapRenderedImage(image).getAsBufferedImage();
// Process image
if (processor != null) {
image = processor.process(img);
}
// Convert to bytes
bytes = convertToPngBytes(image);
} catch (Exception e){
// error handling
} finally {
// Clean up streams
IOUtils.closeQuietly(stream);
IOUtils.closeQuietly(s);
}
return bytes;
}
private static byte[] convertToPngBytes(RenderedImage image) throws IOException {
ByteArrayOutputStream out = null;
byte[] bytes = null;
try {
out = new ByteArrayOutputStream();
ImageIO.write(image, "png", out);
bytes = out.toByteArray();
} finally {
IOUtils.closeQuietly(out);
}
return bytes;
}
My questions are:
Has anyone run into this and solved it? Since the tmp files created are not mine, I don't know what their names are and thus can't really do anything about them.
What're some of the libraries of choice for resizing and reformatting images? I heard of Scalr - anything else I should look into?
I would rather not rewite the old code at this time, but if there is no other choice...
Thanks!
Just a comment on the temp files/finalizer issue, now that you seem to have solved the root of the problem (too long for a comment, so I'll post it as an answer... :-P):
The temp files are created by ImageIO's FileCacheImageInputStream. These instances are created whenever you call ImageIO.createImageInputStream(stream) and the useCache flag is true (the default). You can set it to false to disable the disk caching, at the expense of in-memory caching. This might make sense as you have a large heap, but probably not if you are processing very large images.
I also think you are (almost) correct about the finalizer issue. You'll find the following ´finalize´ method on FileCacheImageInputStream (Sun JDK 6/1.6.0_26):
protected void finalize() throws Throwable {
// Empty finalizer: for performance reasons we instead use the
// Disposer mechanism for ensuring that the underlying
// RandomAccessFile is closed/deleted prior to garbage collection
}
There's some quite "interesting" code in the class' constructor, that sets up automatic stream closing and disposing when the instance is finalized (should client code forget to do so). This might be different in the OpenJDK implentation, at least it seems kind of hacky. It's also unclear to me at the moment exactly what "performance reasons" we are talking about...
In any case, it seems calling close on the ImageInputStream instance, as you now do, will properly close the file descriptor and delete the temp file.
Found it!
So a stream gets wrapped by another stream in a different area in the code:
iis = ImageIO.createImageInputStream(stream);
And further down, stream is closed.
This doesn't seem to leak any resources when running with Sun Java, but does seem to cause a leak when running with Open JDK.
I'm not sure why that is (I have not looked at source code to verify, though I have my guesses), but that's what seems to be happening. Once I explicitly closed the wrapping stream, all was well.

Java out of memory using fileoutstream

I,m trying to export some files from a system and save it in my drive, the problem is that some files are pretty big and I get the java out of memory error.
FileOutputStream fileoutstream = new FileOutputStream(filenameExtension);
fileoutstream.write(dataManagement.getContent(0).getData());
fileoutstream.flush();
fileoutstream.close();
Any recomendation that I can try, I add the flush but now diference, this will call the export method, generate the file and saved. I,m using a cursos to run over the data that I,m exporting not an array, I try to add more memory but the files are too big.
You are loading the whole file in memory before writing it. On the contrary you should:
load only a chunk of data
write it
repeat the steps above until you have processed all data.
If the files are really big, you may need to read/write them in chunks. If the files are big enough to fit in memory, then you can increase the size of the virtual machine memory.
i.e:
java -Xmx512M ...
FileInputStream fi = infile;
FileOutputStream fo = outfile
byte[] buffer = new byte[5000];
int n;
while((n = fi.read(buffer)) > 0)
fo.write(b, 0, n);
Hope this helps to get the idea.
you can use the spring batch framework to do the reading and writing the file in chunk size.
http://static.springsource.org/spring-batch/

Categories

Resources