Java: how to compress a byte[] using ZipOutputStream without intermediate file - java

Requirement: compress a byte[] to get another byte[] using java.util.zip.ZipOutputStream BUT without using any files on disk or in-memory(like here https://stackoverflow.com/a/18406927/9132186). Is this even possible?
All the examples I found online read from a file(.txt) and write to a file(.zip). ZipOutputStream needs a ZipEntry to work with and that ZipEntry needs a file.
However, my use case is as follows: I need to compress a chunk (say 10MB) of a file at a time using a zip format and append all these compressed chunks to make a .zip file. But, when I unzip the .zip file then it is corrupted.
I am using in-memory files as suggested in https://stackoverflow.com/a/18406927/9132186 to avoid files on disk but need a solution without these files also.
public void testZipBytes() {
String infile = "test.txt";
FileInputStream in = new FileInputStream(infile);
String outfile = "test.txt.zip";
FileOutputStream out = new FileOutputStream(outfile);
byte[] buf = new byte[10];
int len;
while ((len = in.read(buf)) > 0) {
out.write(zipBytes(buf));
}
in.close();
out.close();
}
// ACTUAL function that compresses byte[]
public static class MemoryFile {
public String fileName;
public byte[] contents;
}
public byte[] zipBytesMemoryFileWORKS(byte[] input) {
MemoryFile memoryFile = new MemoryFile();
memoryFile.fileName = "try.txt";
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ZipOutputStream zos = new ZipOutputStream(baos);
ZipEntry entry = new ZipEntry(memoryFile.fileName);
entry.setSize(input.length);
zos.putNextEntry(entry);
zos.write(input);
zos.finish();
zos.closeEntry();
zos.close();
return baos.toByteArray();
}
Scenario 1:
if test.txt has small amount of data (less than 10 bytes) like "this" then unzip test.txt.zip yeilds try.txt with "this" in it.
Scenario 2:
if test.txt has larger amount of data (more than 10 bytes) like "this is a test for zip output stream and it is not working" then unzip test.txt.zip yields try.txt with broken pieces of data and is incomplete.
this 10 bytes is the buffer size in testZipBytes and is the amount of data that is compressed at a time by zipBytes
Expected (or rather desired):
1. unzip test.txt.zip does not use the "try.txt" filename i gave in the MemoryFile but rather unzips to filename test.txt itself.
2. unzipped data is not broken and yields the input data as is.
3. I have done the same with GzipOutputStream and it works perfectly fine.

Requirement: compress a byte[] to get another byte[] using java.util.zip.ZipOutputStream BUT without using any files on disk or in-memory(like here https://stackoverflow.com/a/18406927/9132186). Is this even possible?
Yes, you've already done it. You don't actually need MemoryFile in your example; just delete it from your implementation and write ZipEntry entry = new ZipEntry("try.txt") instead.
But you can't concatenate the zips of 10MB chunks of file and get a valid zip file for the combined file. Zipping doesn't work like that. You could have a solution which minimizes how much is in memory at once, perhaps. But breaking the original file up into chunks seems unworkable.

Related

Java in websphere produces corrupted zip file, if previously zipped content as byte array is written into file

my java zip problem apeears only in websphere in linux environment,
but works on my local windows developer computer in tomee all right.
I try to compress a content and send the result as mail and backup the same content in a folder.
My input data is a CSV file content in a byte-array.
I compress the input byte array to a zip result byte array.
I try to save my zipped result data in a folder and send the same content as a mail attachment.
Result: zip-attachment in mail is delivered correctly but the same zip byte array saved as a file is corrupted.
public static byte[] zipContent(String filename, byte[] content) throws IOException {
final CRC32 crc = new CRC32();
final ByteArrayOutputStream baos = new ByteArrayOutputStream();
try (final ZipOutputStream zipOut = new ZipOutputStream(baos);
final InputStream fis = new ByteArrayInputStream(content);) {
crc.reset();
final ZipEntry zipEntry = new ZipEntry(filename);
zipOut.putNextEntry(zipEntry);
int length;
byte[] bytes = new byte[1024];
while ((length = fis.read(bytes)) >= 0) {
crc.update(bytes,0,length);
zipOut.write(bytes, 0, length);
}
final long crcValue = crc.getValue();
zipEntry.setCrc(crcValue);
zipEntry.setSize(content.length);
zipEntry.setComment("CRC_" + Long.toString(crcValue));
zipOut.closeEntry();
zipOut.flush();
} finally {
baos.flush();
baos.close();
}
return baos.toByteArray();
}
I call
final byte[] csvContentArray = ... // Get CSV input anyway ...
File csvFile = new File(aCsvPathString);
Files.write(csvfile.toPath(), csvContentArray); // OKAY, saved correctly pre compress in server
//COMPRESS: byte[] --> byte[]
final byte[] zippedContent = zipContent(csvFileName, csvContentArray);
File zipFile = new File(aZipPathString);
Files.write(zipfile.toPath(), zippedContent); // CORRUPTED, saved after compress in server !!!
...
sendMail(emailAdress, zippedContent); // the same zipped content in mail is delivered correctly and can be opened without warnings !
What differs for zip file content as byte[] between saving in file system and sending as mail attachment (MimeBodyPart).
It is very strange for me, because it works fine in Windows/Tomee but not in Linux/Websphere.
My zipped csv file is huge and after compressing, first about 1000 lines are fine. Rest of file is confused.
I tried to code with and without CRC by ZipEntry, but it doesn't matter.
Opening corrupted zip in 7zip gives warnings like:
Unerwartetes Datenende. Translated: unexpected data end.
Es gibt noch Daten hinter den Hauptdaten. Translated: existing data after main data.
I'm always grateful for opinions and suggestions.

Zip and Unzip a large file without loading the entire file in memory in apache Camel

We are using Apache Camel for compressing and decompressing our files.
We use the standard .marshal().gzip() and .unmarshall().gzip() APIs.
Our problem is that when we get really large files, say 800MB to more than 1GB file size, our application runs out of memory, since the entire file is loading into memory for compression and decompression.
Are there any camel apis or java libraries which will help zip/unzip the file without loading the entire file in memory.
There is a similar unanswered question here
Explanation
Use a different approach: Stream the file.
That is, don't load it into memory completely but read it byte per byte and simultaneously write it back byte per byte .
Get an InputStream to the file, wrap some GZipInputStream around. Read byte per byte, write to an OutputStream.
The opposite if you want to compress an archive. Then you wrap the OutputStream by some GZipOutputStream.
Code
The example uses Apache Commons Compress but the logic of the code remains the same for all libraries.
Unpacking a gz archive:
Path inputPath = Paths.get("archive.tar.gz");
Path outputPath = Paths.get("archive.tar");
try (InputStream fin = Files.newInputStream(inputPath );
OutputStream out = Files.newOutputStream(outputPath);) {
GZipCompressorInputStream in = new GZipCompressorInputStream(
new BufferedInputStream(fin));
// Read and write byte by byte
final byte[] buffer = new byte[buffersize];
int n = 0;
while (-1 != (n = in.read(buffer))) {
out.write(buffer, 0, n);
}
}
Packing as gz archive:
Path inputPath = Paths.get("archive.tar");
Path outputPath = Paths.get("archive.tar.gz");
try (InputStream in = Files.newInputStream(inputPath);
OutputStream fout = Files.newOutputStream(outputPath);) {
GZipCompressorOutputStream out = new GZipCompressorOutputStream(
new BufferedOutputStream(fout));
// Read and write byte by byte
final byte[] buffer = new byte[buffersize];
int n = 0;
while (-1 != (n = in.read(buffer))) {
out.write(buffer, 0, n);
}
}
You could also wrap BufferedReader and PrintWriter around if you feel more comfortable with them. They manage the buffering themselves and you can read and write lines instead of bytes. Note that this only works correct if you read a file with lines and not some other format.

Transferring file in client-server program but all bytes not being transmitted

I am currently writing a client-server program that would allow me to upload a file from the client to the server. However, when I try this the file becomes corrupt and it appears not all the bytes are being transferred. Can someone tell me why this is happening? Thanks.
Here is part of the client code:
System.out.println("What file would you like to upload?");
String file=in.next();//get file name
outToServer.writeUTF(file);//send file name to server
File test= new File(file);//create file
byte[] bits = new byte[(int) test.length()]; //byte array to store file
FileInputStream fis= new FileInputStream(test); //read in file
//write bytes into array
int size=(int) test.length();//size of array
outToServer.write(size);//send size of array to Server
fis.read(bits);//read in byte values
fis.close();//close stream
outToServer.write(bits, 0, size);//writes bytes out to server
And here is the server code:
String filename= inFromClient.readUTF();//read in file name that is being uploaded
int size=inFromClient.read(); //read in size of file
byte[] bots=new byte[size]; //create array
inFromClient.read(bots); //read in bytes
FileOutputStream fos=new FileOutputStream(filename);
fos.write(bots);
fos.flush();
fos.close();
String complete="Upload Complete.";
outToClient.writeUTF(complete);
Try and use Java 7's Files.copy().
On the client side:
final Path source = Paths.get(file);
Files.copy(source, outToServer);
On the server side:
final Path destination = Paths.get(file);
Files.copy(inFromClient, destination);
See the javadoc for Files.
Usual mistake. You're assuming that read() fills the buffer. It isn't obliged to do that. See the Javadoc.
The canonical way to copy streams in Java is as follows:
while ((count = in.read(buffer)) > 0)
{
out.write(buffer, 0, count);
}
Use this at both ends. You don't need a buffer the size of the file either. This works for any byte array with one or more elements.

How to read content of the Zipped file without extracting in java

I have file with names like ex.zip. In this example, the Zip file contains only one file with the same name(ie. `ex.txt'), which is quite large. I don't want to extract the zip file every time.Hence I need to read the content of the file(ex.txt) without extracting the zip file. I tried some code like below But i can only read the name of the file in the variable.
How do I read the content of the file and stores it in the variable?
Thank you in Advance
fis=new FileInputStream("C:/Documents and Settings/satheesh/Desktop/ex.zip");
ZipInputStream zis = new ZipInputStream(new BufferedInputStream(fis));
ZipEntry entry;
while((entry = zis.getNextEntry()) != null) {
i=i+1;
System.out.println(entry);
System.out.println(i);
//read from zis until available
}
Your idea is to read the zip file as it is into a byte array and store it in a variable.
Later when you need the zip you extract it on demand, saving memory:
First read the content of the Zip file in a byte array zipFileBytes
If you have Java 1.7:
Path path = Paths.get("path/to/file");
byte[] zipFileBytes= Files.readAllBytes(path);
otherwise use Appache.commons lib
byte[] zipFileBytes;
zipFileBytes = IOUtils.toByteArray(InputStream input);
Now your Zip file is stored in a variable zipFileBytes, still in compressed form.
Then when you need to extract something use
ByteArrayInputStream bis = new ByteArrayInputStream(zipFileBytes));
ZipInputStream zis = new ZipInputStream(bis);
Try this:
String zipFile = "ex.zip";
try (ZipFile zip = new ZipFile(zipFile)) {
int i = 0;
for (Enumeration<? extends ZipEntry> e = zip.entries(); e.hasMoreElements(); ) {
ZipEntry entry = (ZipEntry) e.nextElement();
System.out.println(entry);
System.out.println(i);
InputStream in = zip.getInputStream(entry);
}
}
For example, if the file contains text, and you want to print it as a String, you can read the InputStream like this: How do I read / convert an InputStream into a String in Java?
I think that in your case the fact that a zipfile is a container that can hold many files (and thus forces you to navigate to the right contained file each time you open it) seriously complicates things, as you state that each zipfile only contains one textfile. Maybe it's a lot easier to just gzip the text file (gzip is not a container, just a compressed version of your data). And it's very simple to use:
GZIPInputStream gis = new GZIPInputStream(new FileInputStream("file.txt.gz"));
// and a BufferedReader on top to comfortably read the file
BufferedReader in = new BufferedReader(new InputStreamReader(gis) );
Producing them is equally simple:
GZIPOutputStream gos = new GZIPOutputStream(new FileOutputStream("file.txt.gz"));

Help! unexpected java.lang.ArrayIndexOutOfBoundsException when using ByteArrayInputStream

I get a java.lang.ArrayIndexOutOfBoundsException when using ByteArrayInputStream.
First, I use a ZipInputStream to read through a zip file,
and while looping through the zipEntries,
I use a ByteArrayInputStream to capture the data of each zipEntry
using the
ZipInputStream.read(byte[] b) and ByteArrayInputStream(byte[] b) methods.
At the end, I have a total of 6 different ByteArrayInputStream objects containing data from 6 different zipEntries.
I then use OpenCSV to read through each of the ByteArrayInputStream.
I have no problem reading 4 of the 6 ByteArrayInputStream objects, of which have byte sizes of less than 2000.
The other 2 ByteArrayInputStream objects have byte sizes of 2155 and 4010 respectively and the CSVreader was only able to read part of these 2 objects, then give an java.lang.ArrayIndexOutOfBoundsException.
This is the code I used to loop through the ZipInputStream
InputStream fileStream = attachment.getInputStream();
try {
ZipInputStream zippy = new ZipInputStream(fileStream);
ZipEntry entry = zippy.getNextEntry();
ByteArrayInputStream courseData = null;
while (entry!= null) {
String name = entry.getName();
long size = entry.getSize();
if (name.equals("course.csv")) {
courseData = copyInputStream(zippy, (int)size);
}
//similar IF statements for 5 other ByteArrayInputStream objects
entry = zippy.getNextEntry();
}
CourseDataManager.load(courseData);
}catch(Exception e){
e.printStackTrace();
}
The following is the code with which I use to copy the data from the ZipInputStream to the ByteArrayInputStream.
public ByteArrayInputStream copyInputStream(InputStream in, int size)
throws IOException {
byte[] buffer = new byte[size];
in.read(buffer);
ByteArrayInputStream b = new ByteArrayInputStream(buffer);
return b;
}
The 2 sets of openCSV codes are able to read a few lines of data, before throwing that exception, which leads me to believe that it is the byteArray that is causing the problem. Is there anything I can do or work around this problem?
I am trying to make an application that accepts a zip file, while not storing any temporary files in the web app, as I am deploying to both google app engine and tomcat server.
Fixed!!! Thanks to stephen C, i realized that read(byte[]) does not read everything so I adjusted the code to make the copyInputStream fully functional.
Since this looks like homework, here's a hint:
The read(byte[]) method returns the number bytes read.
On what line do you get the error? And have you checked the value of size? I suspect it's 0

Categories

Resources