I have been trying to create a Java program that will read zip files from an online API, unzip them into memory (not into the file system), and load them into a database. Since the unzipped files need to be loaded into the database in a specific order, I will have to unzip all of the files before I load any of them.
I basically used another question on StackOverflow as a model on how to do this. Using ZipInputStream from util.zip I was able to do this with a smaller ZIP (0.7MB zipped ~4MB unzipped), but when I encountered a larger file (25MB zipped, 135MB unzipped), the two largest files were not read into memory. I was not even able to retrieve a ZipEntry for these larger files (8MB and 120MB, the latter making up the vast majority of the data in the zip file). No exceptions were thrown, and my program proceeded until it tried to access tha the unzipped files that failed to be written, and threw NullPointerException.
I am using Jsoup to get the zipfile from online.
Has anyone had any experience with this and can give guidance on why I am unable to retrieve the complete contents of the zip file?
Below is the code that I am using. I am collecting unzipped files as InputStreams in a HashMap, and when there are no more ZipEntrys, the program should stop looking for ZipEntrys when there are no more left.
private Map<String, InputStream> unzip(ZipInputStream verZip) throws IOException {
Map<String, InputStream> result = new HashMap<>();
while (true) {
ZipEntry entry;
byte[] b = new byte[1024];
ByteArrayOutputStream out = new ByteArrayOutputStream();
int l;
entry = verZip.getNextEntry();//Might throw IOException
if (entry == null) {
break;
}
try {
while ((l = verZip.read(b)) > 0) {
out.write(b, 0, l);
}
out.flush();
}catch(EOFException e){
e.printStackTrace();
}
catch (IOException i) {
System.out.println("there was an ioexception");
i.printStackTrace();
fail();
}
result.put(entry.getName(), new ByteArrayInputStream(out.toByteArray()));
}
return result;
}
Might I be better off if my program took advantage of the filesystem to unzip files?
It turns out that Jsoup is the root of the issue. When obtaining binary data with a Jsoup connection, there is a limit to how many bytes will be read from the connection. By default, this limit is 1048576, or 1 megabyte. As a result, when I feed the binary data from Jsoup into a ZipInputStream, the resulting data is cut off after one megabyte. This limit, maxBodySizeBytes can be found in org.jsoup.helper.HttpConnection.Request.
Connection c = Jsoup.connect("example.com/download").ignoreContentType(true);
//^^returns a Connection that will only retrieve 1MB of data
InputStream oneMb = c.execute().bodyStream();
ZipInputStream oneMbZip = new ZipInputStream(oneMb);
Trying to unzip the truncated oneMbZip is what led me to get the EOFException
With the code below, I was able to change Connection's byte limit to 1 GB (1073741824), and then was able to retrieve the zip file without running into an EOFException.
Connection c = Jsoup.connect("example.com/download").ignoreContentType(true);
//^^returns a Connection that will only retrieve 1MB of data
Connection.Request theRequest = c.request();
theRequest.maxBodySize(1073741824);
c.request(theRequest);//Now this connection will retrieve as much as 1GB of data
InputStream oneGb = c.execute().bodyStream();
ZipInputStream oneGbZip = new ZipInputStream(oneGb);
Note that maxBodySizeBytes is an int and its upper limit is 2,147,483,647, or just under 2GB.
Related
The following code creates a file, but it is not openable, and the file size it creates does not remotely correspond with the size of the file I am trying to download. (using whatsapp updater link as an example):
private static boolean download(){
try {
String outfile = "/sdcard/whatsapp.apk";
URL download = new URL("https://www.whatsapp.com/android/current/WhatsApp.apk");
ReadableByteChannel rbc = Channels.newChannel(download.openStream());
FileOutputStream fileOut = new FileOutputStream(outfile);
fileOut.getChannel().transferFrom(rbc, 0, 1 << 24);
fileOut.close();
rbc.close();
return true;
} catch (IOException ioe){
return false;
}
}
EDIT: this is a shortened version of my full code, (the full code alows network ops on main thread, and trust all certificates), also changed the code in the question.
Tests show that IOException is not being throw, and code completes without error. so, Why is the downloaded file not usable?
From the Javadoc:
Fewer than the requested number of bytes will be transferred if the source channel has fewer than count bytes remaining, or if the source channel is non-blocking and has fewer than count bytes immediately available in its input buffer.
This means that it is not guaranteed that this will save the entire file at once. You should put this call in a loop which breaks once the entire download has completed.
I have around 5 to 6 large files each size 3 GB . My goal is to zip those files and then transfer it using file servlet .My current code takes a great amount of time resulting in timeout session on the browser . Is there a better way to zip the files .
File zipFile=new File( downloadedFileLocation.getAbsolutePath()+"/Download.zip" );
FileOutputStream fos = new FileOutputStream(zipFile);
ZipOutputStream zos = new ZipOutputStream(fos);
for( File f:downloadedFileLocation.listFiles() ) {
byte[] buffer = new byte[1024];
ZipEntry ze= new ZipEntry(f.getName());
zos.putNextEntry(ze);
FileInputStream in = new FileInputStream(f.getAbsolutePath());
int len;
while ((len = in.read(buffer)) > 0) {
zos.write(buffer, 0, len);
}
in.close();
zos.closeEntry();
f.delete();
}
zos.close();
fos.close();
Will changing the buffer size make any difference ?
Can anyone suggest any better way where zip can be done faster .
Can anyone suggest any better way where zip can be done faster
No, you can't do zipping faster, but you can do it "live".
Don't write the zipped content to a temporary file before transmitting it. Write it straight to the OutputStream in the Servlet.
The result is that zipped content is transmitted as it is compressed, so the connection will not time out, and total response time is reduced.
You should also use try-with-resources for resource management, and the newer NIO file classes for ease of use and better error messages.
Something like this:
#Override
protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
resp.setContentType("application/zip");
try (ZipOutputStream zos = new ZipOutputStream(resp.getOutputStream())) {
for (File f : downloadedFileLocation.listFiles()) {
zos.putNextEntry(new ZipEntry(f.getName()));
Files.copy(f.toPath(), zos);
Files.delete(f.toPath());
}
}
}
I left the delete() in there, but depending on what you're doing, it is likely not appropriate when doing it this way. Or at the very least, you should not delete until download is complete, i.e. until after the for loop ends.
IMHO, there is always a better way of doing the things. Recently (of course it was Java 7 NIO) I got to know about NIO way of zipping files and its way faster than any conventional method so far. I havent worked on the time's numbers but its almost twice the speed given any conventional method so far.
Its worth a try. Refer this.
The FileOutputStream should be wrapped with a BufferedOuputStream. The ZipOutputStream writes many small chunks to its destination OutputStream when zipping the data. It should have a minimum buffer size of 16KB. This should speed it up by a factor of 10.
When reading file data, the buffer size should also be at least 16KB.
I'm having a really strange problem. I'm trying to download some file and store. My code is relatively simple and straight forward (see below) and works fine on my local machine.
But it is intended to run on a Windows Terminal Server accessed through Citrix and a VPN. The file is to be saved to a mounted network drive. This mount is the local C:\ drive mounted through the Citrix VPN, so there might be some lag involved. Unfortunately I have no inside detail about how exactly the whole infrastructure is set up...
Now my problem is that the code below throws an IOException telling me there is no space left on the disk, when attempting to execute the write() call. The directory structure is created alright and a zero byte file is created, but content is never written.
There is more than a gigabyte space available on the drive, the Citrix client has been given "Full Access" permissions and copying/writing files on that mapped drive with Windows explorer or notepad works just fine. Only Java is giving me trouble here.
I also tried downloading to a temporary file first and then copying it to the destination, but since copying is basically the same stream operation as in my original code, there was no change in behavior. It still fails with a out of disk space exception.
I have no idea what else to try. Can you give any suggestions?
public boolean downloadToFile(URL url, File file){
boolean ok = false;
try {
file.getParentFile().mkdirs();
BufferedInputStream bis = new BufferedInputStream(url.openStream());
byte[] buffer = new byte[2048];
FileOutputStream fos = new FileOutputStream(file);
BufferedOutputStream bos = new BufferedOutputStream( fos , buffer.length );
int size;
while ((size = bis.read(buffer, 0, buffer.length)) != -1) {
bos.write(buffer, 0, size);
}
bos.flush();
bos.close();
bis.close();
ok = true;
}catch(Exception e){
e.printStackTrace();
}
return ok;
}
Have a try with commons-io. Esspecially the Util Classes FileUtils and IOUtils
After changing our code to use commons-io all file operations went much smouther. Even with mapped network drives.
I have the code to copy a file to another location.
public static void copyFile(String sourceDest, String newDest) throws IOException {
File sourceFile = new File(sourceDest);
File destFile = new File(newDest);
if (!destFile.exists()) {
destFile.createNewFile();
}
FileChannel source = null;
FileChannel destination = null;
try {
source = new FileInputStream(sourceFile).getChannel();
destination = new FileOutputStream(destFile).getChannel();
destination.transferFrom(source, 0, source.size());
} finally {
if (source != null) {
source.close();
}
if (destination != null) {
destination.close();
}
}
}
}
While copying small chunks, say, 300-400 Mb, everything works like magic. But when I tried to copy a file a size of 1.5 Gb it failed. The stack is:
run:
12.01.2011 11:16:36 FileCopier main
SEVERE: Exception occured while copying file. Try again.
java.io.IOException: Map failed
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:748)
at sun.nio.ch.FileChannelImpl.transferFromFileChannel(FileChannelImpl.java:527)
at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:590)
at FileCopier.copyFile(FileCopier.java:64)
at FileCopier.main(FileCopier.java:27)
Caused by: java.lang.OutOfMemoryError: Map failed
at sun.nio.ch.FileChannelImpl.map0(Native Method)
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:745)
... 4 more
BUILD SUCCESSFUL (total time: 0 seconds)
I haven't worked with NIO closely. Could you please help me out? Thank you so much in advance.
I think you might have been hit by an old bug which already encountered some time ago. I was not trying to copy a file but rather to seek through an memory-mapped file which failed as well. For me the workaround is to seek through the file in a loop and request the GC and finalizers to run every now and then.
The memory-mapped ByteBuffers release their mapping in the finalizer and make room for new mappings. This is very ugly, but at least it works. Let's hope they did something about this in the coming NIO iteration.
You are memory mapping a file but there is limited memory address space in a 32-bit JVM (which I presume you are using) so the map method is failing. I don't think you can map more than 1.3-1.4 GB of disk data. What heap size are you using?
You can try reducing your heap size or use a 64-bit JRE. Alternatively, don't read the file by mapping it to memory using NIO. Instead, use the traditional way of a buffered reader and writer to read and write data from one file to the other.
I have a database file in res/raw/ folder. I am calling Resources.openRawResource() with the file name as R.raw.FileName and I get an input stream, but I have an another database file in device, so to copy the contents of that db to the device db I use:
BufferedInputStream bi = new BufferedInputStream(is);
and FileOutputStream, but I get an exception that database file is corrupted. How can I proceed?
I try to read the file using File and FileInputStream and the path as /res/raw/fileName, but that also doesn't work.
Yes, you should be able to use openRawResource to copy a binary across from your raw resource folder to the device.
Based on the example code in the API demos (content/ReadAsset), you should be able to use a variation of the following code snippet to read the db file data.
InputStream ins = getResources().openRawResource(R.raw.my_db_file);
ByteArrayOutputStream outputStream=new ByteArrayOutputStream();
int size = 0;
// Read the entire resource into a local byte buffer.
byte[] buffer = new byte[1024];
while((size=ins.read(buffer,0,1024))>=0){
outputStream.write(buffer,0,size);
}
ins.close();
buffer=outputStream.toByteArray();
A copy of your file should now exist in buffer, so you can use a FileOutputStream to save the buffer to a new file.
FileOutputStream fos = new FileOutputStream("mycopy.db");
fos.write(buffer);
fos.close();
InputStream.available has severe limitations and should never be used to determine the length of the content available for streaming.
http://developer.android.com/reference/java/io/FileInputStream.html#available():
"[...]Returns an estimated number of bytes that can be read or skipped without blocking for more input. [...]Note that this method provides such a weak guarantee that it is not very useful in practice."
You have 3 solutions:
Go through the content twice, first just to compute content length, second to actually read the data
Since Android resources are prepared by you, the developer, hardcode its expected length
Put the file in the /asset directory and read it through AssetManager which gives you access to AssetFileDescriptor and its content length methods. This may however give you the UNKNOWN value for length, which isn't that useful.