Is there are better way to zip large files in java? - java

I have around 5 to 6 large files each size 3 GB . My goal is to zip those files and then transfer it using file servlet .My current code takes a great amount of time resulting in timeout session on the browser . Is there a better way to zip the files .
File zipFile=new File( downloadedFileLocation.getAbsolutePath()+"/Download.zip" );
FileOutputStream fos = new FileOutputStream(zipFile);
ZipOutputStream zos = new ZipOutputStream(fos);
for( File f:downloadedFileLocation.listFiles() ) {
byte[] buffer = new byte[1024];
ZipEntry ze= new ZipEntry(f.getName());
zos.putNextEntry(ze);
FileInputStream in = new FileInputStream(f.getAbsolutePath());
int len;
while ((len = in.read(buffer)) > 0) {
zos.write(buffer, 0, len);
}
in.close();
zos.closeEntry();
f.delete();
}
zos.close();
fos.close();
Will changing the buffer size make any difference ?
Can anyone suggest any better way where zip can be done faster .

Can anyone suggest any better way where zip can be done faster
No, you can't do zipping faster, but you can do it "live".
Don't write the zipped content to a temporary file before transmitting it. Write it straight to the OutputStream in the Servlet.
The result is that zipped content is transmitted as it is compressed, so the connection will not time out, and total response time is reduced.
You should also use try-with-resources for resource management, and the newer NIO file classes for ease of use and better error messages.
Something like this:
#Override
protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
resp.setContentType("application/zip");
try (ZipOutputStream zos = new ZipOutputStream(resp.getOutputStream())) {
for (File f : downloadedFileLocation.listFiles()) {
zos.putNextEntry(new ZipEntry(f.getName()));
Files.copy(f.toPath(), zos);
Files.delete(f.toPath());
}
}
}
I left the delete() in there, but depending on what you're doing, it is likely not appropriate when doing it this way. Or at the very least, you should not delete until download is complete, i.e. until after the for loop ends.

IMHO, there is always a better way of doing the things. Recently (of course it was Java 7 NIO) I got to know about NIO way of zipping files and its way faster than any conventional method so far. I havent worked on the time's numbers but its almost twice the speed given any conventional method so far.
Its worth a try. Refer this.

The FileOutputStream should be wrapped with a BufferedOuputStream. The ZipOutputStream writes many small chunks to its destination OutputStream when zipping the data. It should have a minimum buffer size of 16KB. This should speed it up by a factor of 10.
When reading file data, the buffer size should also be at least 16KB.

Related

Return a .zip file containing several .csv files to the browser with ZipOutputStream

To begin with, I am well aware that a similar topic is available on Stack OverFlow, but this one does not answer my problem. That's why I'm making this post.
At the moment, my program is able to search for ".csv" files locally, according to certain criteria, to add them to a list, and then to create a file in ".zip" format, which contains all these files. This works without any problem.
Now, I want to download this ".zip" file, from a Web interface, with a button. The connection between the code and the button works, but when I open my downloaded ".zip" file, it contains only one file, and not all the ".csv" files it is supposed to contain. Where can the problem come from?
Here is my code:
FileOutputStream baos = new FileOutputStream("myZip.zip");
ZipOutputStream zos = new ZipOutputStream(baos);
for(String sCurrent : selectedFiles){
zos.putNextEntry(new ZipEntry(new File(sCurrent).getName()));
Files.copy(Paths.get(sCurrent), zos);
zos.closeEntry();
}
zos.close();
response.getOutputStream().flush();
response.getOutputStream().close();
You are closing the ZIP after sending the response. Swap the order:
zos.close();
response.getOutputStream().write(baos.toByteArray());
It would be more efficient if you use try-with-resources to handle close, Files.copy(Paths.get(currentFile), zos); to transfer the files, and if you ZIP straight to the output stream because you might risk out of memory exceptions for large files:
ZipOutputStream zos = new ZipOutputStream(response.getOutputStream());
Not an answer so much as consolidation based on #DuncG
ZipOutputStream zos = new ZipOutputStream(response.getOutputStream());
for (String path : selectedFiles) {
zos.putNextEntry(new ZipEntry(new File(path).getName()));
Files.copy(Paths.get(path), zos);
zos.closeEntry();
}
zos.close();

Unzipping into a ByteArrayOutputStream -- why am I getting an EOFException?

I have been trying to create a Java program that will read zip files from an online API, unzip them into memory (not into the file system), and load them into a database. Since the unzipped files need to be loaded into the database in a specific order, I will have to unzip all of the files before I load any of them.
I basically used another question on StackOverflow as a model on how to do this. Using ZipInputStream from util.zip I was able to do this with a smaller ZIP (0.7MB zipped ~4MB unzipped), but when I encountered a larger file (25MB zipped, 135MB unzipped), the two largest files were not read into memory. I was not even able to retrieve a ZipEntry for these larger files (8MB and 120MB, the latter making up the vast majority of the data in the zip file). No exceptions were thrown, and my program proceeded until it tried to access tha the unzipped files that failed to be written, and threw NullPointerException.
I am using Jsoup to get the zipfile from online.
Has anyone had any experience with this and can give guidance on why I am unable to retrieve the complete contents of the zip file?
Below is the code that I am using. I am collecting unzipped files as InputStreams in a HashMap, and when there are no more ZipEntrys, the program should stop looking for ZipEntrys when there are no more left.
private Map<String, InputStream> unzip(ZipInputStream verZip) throws IOException {
Map<String, InputStream> result = new HashMap<>();
while (true) {
ZipEntry entry;
byte[] b = new byte[1024];
ByteArrayOutputStream out = new ByteArrayOutputStream();
int l;
entry = verZip.getNextEntry();//Might throw IOException
if (entry == null) {
break;
}
try {
while ((l = verZip.read(b)) > 0) {
out.write(b, 0, l);
}
out.flush();
}catch(EOFException e){
e.printStackTrace();
}
catch (IOException i) {
System.out.println("there was an ioexception");
i.printStackTrace();
fail();
}
result.put(entry.getName(), new ByteArrayInputStream(out.toByteArray()));
}
return result;
}
Might I be better off if my program took advantage of the filesystem to unzip files?
It turns out that Jsoup is the root of the issue. When obtaining binary data with a Jsoup connection, there is a limit to how many bytes will be read from the connection. By default, this limit is 1048576, or 1 megabyte. As a result, when I feed the binary data from Jsoup into a ZipInputStream, the resulting data is cut off after one megabyte. This limit, maxBodySizeBytes can be found in org.jsoup.helper.HttpConnection.Request.
Connection c = Jsoup.connect("example.com/download").ignoreContentType(true);
//^^returns a Connection that will only retrieve 1MB of data
InputStream oneMb = c.execute().bodyStream();
ZipInputStream oneMbZip = new ZipInputStream(oneMb);
Trying to unzip the truncated oneMbZip is what led me to get the EOFException
With the code below, I was able to change Connection's byte limit to 1 GB (1073741824), and then was able to retrieve the zip file without running into an EOFException.
Connection c = Jsoup.connect("example.com/download").ignoreContentType(true);
//^^returns a Connection that will only retrieve 1MB of data
Connection.Request theRequest = c.request();
theRequest.maxBodySize(1073741824);
c.request(theRequest);//Now this connection will retrieve as much as 1GB of data
InputStream oneGb = c.execute().bodyStream();
ZipInputStream oneGbZip = new ZipInputStream(oneGb);
Note that maxBodySizeBytes is an int and its upper limit is 2,147,483,647, or just under 2GB.

write a XSSFWorkbook to a zip file

I now have this problem. I want to write a excel file hold in this XSSFWorkbook (workbook) obj into a zip file eg(example.zip while contain this example.xlsx file) to a remote server.
I have tried following but not working, it created a folder with some odd files in the zip file
XSSFWorkbook workbook = new XSSFWorkbook();
//add some data
Zipoutputstream zipstream=new Zipoutputstream(//destination outputstream);
workbook.write(zipstream);
So do anyone knows what's the right way to do this? Thanks in advance
ps workbook.write(fileoutputstream) works but it only write to local disk as a flat file eg test.xlsx instead of inside a zip as I need.
Passing a a ZipOutputStream to XSSFWorkbook.write will result in the stream being hijacked and closed by the workbook. This is because an XSSFWorkbook writes a .xlsx which is itself a zip archive of xml and other files (you can unzip any .xslx to see what's in there).
If you're able to fit the excel file in memory, I've found this to work well:
ZipOutputStream zos = new ZipOutputStream(//destination outputstream);
zos.putNextEntry(new ZipEntry("AnExcelFile.xlsx"));
ByteArrayOutputStream bos = new ByteArrayOutputStream();
workbook.write(bos);
bos.writeTo(zos);
zos.closeEntry();
// Add other entries as needed
zos.close();
Calling close on ByteArrayOutputStream has no effect and can still be written to zos.
You are missing some necessary calls on your ZipOutputStream. You will need to create a ZipEntry for your spreadsheet file, then write it out. You'll need something like
zipstream.putNextEntry(new ZipEntry("example.xlsx"));
Then you should be able to call
workbook.write(zipstream);
But after that you'll need to close the entry before closing the stream.
zipstream.closeEntry();
Please see "Write And Read .Zip File From Java" for details on how to use Java's ZipOutputStream.
Also, be aware that .xlsx files are already compressed zip files, so placing it in a .zip file may not compress it very much.
A colleague of mine, M. Bunshaft, suggested a solution similar to that of Klugscheißer but that does not require the use of a ByteArrayOutputStream, and hence can accommodate larger output.
The idea is to subclass ZipOutputStream, overriding the close() method so it will not do a close.
public class UncloseableZipOutputStream extends ZipOutputStream
{
OutputStream os;
public UncloseableZipOutputStream( OutputStream os )
{
super(os);
}
#Override
/** just flush but do not close */
public void close() throws IOException
{
flush();
}
public void reallyClose() throws IOException
{
super.close();
}
}
Then, simply use it the way you would use the ZipOutputStream.
UncloseableZipOutputStream zos = new UncloseableZipOutputStream(//destination outputstream);
zos.putNextEntry(new ZipEntry("AnExcelFile.xlsx"));
workbook.write(zos);
zos.closeEntry(); // now this will not cause a close of the stream
// Add other entries as needed
zos.reallyClose();

Write with ObjectOutputStream into multiple ZipEntrys in a single ZipOutputStream

I want to create a zip archive in Java where each contained file is produced by serializing some objects. I have a problem with correctly closing the streams.
The code looks like this:
try (OutputStream os = new FileOutputStream(file);
ZipOutputStream zos = new ZipOutputStream(os);) {
ZipEntry ze;
ObjectOutputStream oos;
ze = new ZipEntry("file1");
zos.putNextEntry(ze); // start first file in zip archive
oos = new ObjectOutputStream(zos);
oos.writeObject(obj1a);
oos.writeObject(obj1b);
// I want to close oos here without closing zos
zos.closeEntry(); // end first file in zip archive
ze = new ZipEntry("file2");
zos.putNextEntry(ze); // start second file in zip archive
oos = new ObjectOutputStream(zos);
oos.writeObject(obj2a);
oos.writeObject(obj2b);
// And here again
zos.closeEntry(); // end second file in zip archive
}
I know of course that I should close each stream after finishing using it, so I should close the ObjectOutputStreams in the indicated positions. However, closing the ObjectOutputStreams would also close the ZipOutputStream that I still need.
I do not want to omit the call to ObjectOutputStream.close() because I do not want to rely on the fact that it currently does not more than flush() and reset().
I also cannot use a single ObjectOutputStream instance because then I miss the stream header that is written by the constructor (each single file in the zip archive would not be a full object serialization file, and I could not de-serialize them independently).
The same problem occurs when reading the file again.
The only way I see would be to wrap the ZipOutputStream in some kind of "CloseProtectionOutputStream" that would forward all methods except close() before giving it to the ObjectOutputStream. However, this seems rather hacky and I wonder if I missed a nicer solution in the API.
If your OutputStream wrapper throws an exception when closed more than once, it is not a hack. You can create a wrapper for each zip entry.
From an architectural point of view, I think the ObjectOutputStream author should have provided an option to disable close() cascading. You are just workarounding his lacking API.
In this case, and for all the reasons you mentioned, I would simply not pipe my ObjectOutputStream to the ZipOutputStream. Instead, serialize to a byte[] and then write that straight into the ZipOutputStream. This way, you are free to close the ObjectOutputStream and each byte[] you produce will have the proper header from the serializer. One down side is you wind up with a byte[] in memory that you didn't have before but if you get rid of it right away, assuming we're not talking about millions of objects, the garbage collector shouldn't have a hard time cleaning up.
Just my two cents...
It at least sounds less hacky than a stream subclass that changes the close() behavior.
If you're intending to throw the ObjectOutputStream away anyway, then it should be sufficient to call flush() rather than close(), but as you say in the question the safest approach is probably to use a wrapper around the underlying ZipOutputStream that blocks the close() call. Apache commons-io has CloseShieldOutputStream for this purpose.

Android FileOutputStream creates corrupted file

I have an app that creates multiple files using a byte array it gets from a Socket InputStream. The file saves perfectly when I just save one file, but if I save the one file then re-instantiate the file stream and save a different file, the first file gets corrupted and the second file is saved perfectly. I opened the two files in a text editor and it seems (about...)the first 1/5th of the first file is blank spaces but the second file is full, and they both have the same size properties(9,128,731 bytes). The following example is a duplication of the senario but with the same corruption result:
FileOutputStream outStream;
outStream = new FileOutputStream("/mnt/sdcard/testmp3.mp3");
File file = new File("/mnt/sdcard/test.mp3");
FileInputStream inStream = new FileInputStream(file);
byte[] buffer = new byte[9128731];
inStream.read(buffer);
outStream.write(buffer, 0, buffer.length);
inStream.close();
outStream.flush();
outStream.close();
outStream = null;
outStream = new FileOutputStream("/mnt/sdcard/testmp32.mp3");
outStream.write(buffer, 0, buffer.length);
inStream.close();
outStream.flush();
outStream.close();
outStream = null;
I tried this EXACT code in a regular java application and both files were saved without a problem. Does anyone know why the android is doing this?
Any help would be GREATLY appreciated
As jtahlborn mentioned you cannot assume that InputStream.read(byte[]) will always read as many bytes as you want. As well you should avoid using such a large byte array to write out at once. At least not without buffering, you could potentially overflow something. You can handle these concerns and save some memory by copying the file like this:
File inFile = new File("/mnt/sdcard/test.mp3");
File outFile = new File("/mnt/sdcard/testmp3.mp3");
FileInputStream inStream = new FileInputStream(inFile);
FileOutputStream outStream = new FileOutputStream(outFile);
byte[] buffer = new byte[65536];
int len;
while ((len = inStream.read(buffer)) != -1) {
outStream.write(buffer, 0, len);
}
inStream.close();
outStream.close();
I see some potential issues that can get you started debugging:
You writing to the first output stream before you close the input stream. This is a bit weird.
You can't accurately gauge the similarity/difference between two binary files using a text editor. You need to look at the files in a hex editor (or better, Audacity)
I would use BufferedOutputStream as suggested by the Android docs:
out = new BufferedOutputStream(new FileOutputStream(file));
http://developer.android.com/reference/java/io/FileOutputStream.html
As a debugging technique, print the contents of buffer after the first write. Also, inStream.read() returns an int. I would additionally compare this to buffer.length and make sure they are the same. Regardless, I would just call write(buffer) instead of write(buffer, 0, buffer.length) unless you have a really good reason.
-tjw
You are assuming that the read() call will read as many bytes as you want. that is incorrect. that method is free to read anywhere from 1 to buffer.length bytes. that is why you should always use the return value to determine how many bytes were actually read. there are plenty of streams tutorials out there which will show you how to correctly read from a java stream (i.e. how to completely fill your buffer).
If anyone's having the same problem and wondering how o fix it I found out the problem was being caused by my SD card. I bought a 32gb kingston sd card and just yesterday I decided to try running the same code again accept using the internal storage instead and everything worked perfectly. I also tried the stock 2gb SD card it came with and it also worked perfectly. I glad to know my code works great but a little frustrated I spent 50 bucks on a defective memory card. Thanks for everyones input.

Categories

Resources