Compressing and decompressing large size data in java? - java

I need to compress/decompress different types of files that are contained in a Folder the size of that folder might be more than 10-11 GB.
I used following code but this is taking long time to compress the data.
BufferedReader in = new BufferedReader(new FileReader("D:/ziptest/expansion1.MPQ"));
BufferedOutputStream out = new BufferedOutputStream(
new GZIPOutputStream(new FileOutputStream("test.gz")));
int c;
while ((c = in.read()) != -1)
out.write(c);
in.close();
out.close();
Please suggest me some fast compressing and decompressing library in java, i also want to split the large file in different parts such as in a chunk of 100MB each.

Reader/Writer is only for Text and if you try to read binary with these is will get corrupted.
Instead I suggest you use FileInputStream. The fastest way to copy the data is to use your own buffer.
InputStream in = new FileInputStream("D:/ziptest/expansion1.MPQ");
OutputStream out = new GZIPOutputStream(
new BufferedOutputStream(new FileOutputStream("test.gz")));
byte[] bytes = new byte[32*1024];
int len;
while((len = in.read(bytes)) > 0)
out.write(bytes, 0, len);
in.close();
out.close();
Since you reading large chunks of bytes, it is more efficient not to BufferedInput/OuptuStream as this removes one copy. There is a BufferedOutptuStream after the GZIPOutputStream as you cannot control the size of data it produces.
BTW: If you are only reading this with Java, you can use DeflatorOutputStream, its slightly faster and smaller, but only supported by Java AFAIK.

Related

Zip and Unzip a large file without loading the entire file in memory in apache Camel

We are using Apache Camel for compressing and decompressing our files.
We use the standard .marshal().gzip() and .unmarshall().gzip() APIs.
Our problem is that when we get really large files, say 800MB to more than 1GB file size, our application runs out of memory, since the entire file is loading into memory for compression and decompression.
Are there any camel apis or java libraries which will help zip/unzip the file without loading the entire file in memory.
There is a similar unanswered question here
Explanation
Use a different approach: Stream the file.
That is, don't load it into memory completely but read it byte per byte and simultaneously write it back byte per byte .
Get an InputStream to the file, wrap some GZipInputStream around. Read byte per byte, write to an OutputStream.
The opposite if you want to compress an archive. Then you wrap the OutputStream by some GZipOutputStream.
Code
The example uses Apache Commons Compress but the logic of the code remains the same for all libraries.
Unpacking a gz archive:
Path inputPath = Paths.get("archive.tar.gz");
Path outputPath = Paths.get("archive.tar");
try (InputStream fin = Files.newInputStream(inputPath );
OutputStream out = Files.newOutputStream(outputPath);) {
GZipCompressorInputStream in = new GZipCompressorInputStream(
new BufferedInputStream(fin));
// Read and write byte by byte
final byte[] buffer = new byte[buffersize];
int n = 0;
while (-1 != (n = in.read(buffer))) {
out.write(buffer, 0, n);
}
}
Packing as gz archive:
Path inputPath = Paths.get("archive.tar");
Path outputPath = Paths.get("archive.tar.gz");
try (InputStream in = Files.newInputStream(inputPath);
OutputStream fout = Files.newOutputStream(outputPath);) {
GZipCompressorOutputStream out = new GZipCompressorOutputStream(
new BufferedOutputStream(fout));
// Read and write byte by byte
final byte[] buffer = new byte[buffersize];
int n = 0;
while (-1 != (n = in.read(buffer))) {
out.write(buffer, 0, n);
}
}
You could also wrap BufferedReader and PrintWriter around if you feel more comfortable with them. They manage the buffering themselves and you can read and write lines instead of bytes. Note that this only works correct if you read a file with lines and not some other format.

Input stream reads large files very slowly, why?

I am trying to submit a 500 MB file.
I can load it but I want to improve the performance.
This is the slow code:
File dest = getDestinationFile(source, destination);
if(dest == null) return false;
in = new BufferedInputStream(new FileInputStream(source));
out = new BufferedOutputStream(new FileOutputStream(dest));
byte[] buffer = new byte[1024 * 20];
int i = 0;
// this while loop is very slow
while((i = in.read(buffer)) != -1){
out.write(buffer, 0, i); //<-- SLOW HERE
out.flush();
}
How can I find why it is slow?
Isn't the byte array size / buffer size sufficient?
Do you have any ideas to improve the performance or?
Thanks in advance for any help
You should not flush in loop.
You are using BufferedOutputStream. This mean that after "caching" some amount of data it flushes data to file.
Your code just kills performance by flushing data after writing a little amount of data.
try do this like that:
while((i = in.read(buffer)) != -1){
out.write(buffer, 0, i); <-- SLOW HERE
}
out.flush();
..:: Edit: in response of comment below ::..
In my opinion you should not use buffer at all. You are using Buffered(Output/Input)Stream which means that they have his own buffer to read "package" of data from disk and save "package" of data. Im not 100% sure about performance in using additional buffer but I want you to show how I would do that:
File dest = getDestinationFile(source, destination);
if(dest == null) return false;
in = new BufferedInputStream(new FileInputStream(source));
out = new BufferedOutputStream(new FileOutputStream(dest));
int i;
while((i = in.read()) != -1){
out.write(i);
}
out.flush();
In my version you will just read a BYTE (no a int. Read doc: http://docs.oracle.com/javase/7/docs/api/java/io/InputStream.html#read()
this method returns int but this is just a BYTE) but there is no need to read a whole buffer (so you don't need to be worry about size of it).
Probably you should read more about streams to better understand what is nessesary to do with them.

Java: is it possible to store raw data in source file?

OK, I know this is a bit of a weird question:
I'm writing this piece of java code and need to load raw data (approx 130000 floating points):
This data never changes, and since I don't want to write different loading methods for PC and Android, I was thinking of embedding it into the source file as a float[].
Too bad, there seems to be a limit of 65535 entries; is there an efficient way to do it?
Store that data in a file in the classpath; then read that data as a ByteBuffer which you then "convert" to a FloatBuffer. Note that the below code assumes big endian:
final InputStream in = getClass().getResourceAsStream("/path/to/data");
final ByteArrayOutputStream out = new ByteArrayOutputStream();
final byte[] buf = new byte[8192];
int count;
try {
while ((count = in.read(buf)) != -1)
out.write(buf, 0, count);
} finally {
out.close();
in.close();
}
final FloatBuffer buf = ByteBuffer.wrap(out.toByteArray()).asFloatBuffer();
You can then .get() from the FloatBuffer.
You could use 2 or 3 arrays to get around the limit, if that was your only problem with that approach.

Buffered Input Stream does not load file correctly

I have the following code to download a List of files. After downloading I compare the md5 of the online File with the downloaded.
They are similar when the download size is lower than 1024 bytes. For all over 1024bytes, there is an different md5 sum.
Now I don't know the reason. I think, it depends on the Array-Size with 1024 bytes? Maybe it writes on every time the full 1024 bytes to the file but then the question is, why does it work with files lower than 1kb??
String fileUrl= url_str;
URL url = new URL(fileUrl);
BufferedInputStream bufferedInputStream = new BufferedInputStream(url.openStream());
FileOutputStream fileOutputStream =new FileOutputStream(target);
BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(fileOutputStream, 1024);
byte data[] = new byte[1024];
while(bufferedInputStream.read(data, 0, 1024) >0 )
{
bufferedOutputStream.write(data);
}
bufferedOutputStream.close();
bufferedInputStream.close();
This is broken:
while(bufferedInputStream.read(data, 0, 1024) >0 )
{
bufferedOutputStream.write(data);
}
You're assuming that every read call fills up the entire buffer. You should use the return value of read:
int bytesRead;
while((bytesRead = bufferedInputStream.read(data, 0, 1024)) >0 )
{
bufferedOutputStream.write(data, 0, bytesRead);
}
(Additionally, you should be closing all your streams in finally blocks, but that's another matter.)
After the first read the data[] will be containing bytes. So during the last read the array will contain the last n bytes, and some bytes from the previous read. Actually you should check the return of the read. It indicates how many bytes has been read into the array, and write just that many bytes out.

Sending big file using FileInputStream/ObjectOutputStream

I need help on my homework, any help will be much appreciated. I can send small files without a problem. But when i try to send let’s say a 1GB file byte array sends OutOfMemoryError so i need a better solution to send file from server to client. How can i improve this code and send big files, please help me.
Server Code:
FileInputStream fis = new FileInputStream(file);
byte[] fileByte = new byte[fis.available()]; //This causes the problem.
bytesRead = fis.read(fileByte);
oos = new ObjectOutputStream(sock.getOutputStream());
oos.writeObject(fileByte);
Client Code:
ois = new ObjectInputStream(sock.getInputStream());
byte[] file = (byte[]) ois.readObject();
fos = new FileOutputStream(file);
fos.write(file);
Don't read the whole file into memory, use a small buffer and write while you are reading the file:
BufferedOutputStream bos = new BufferedOutputStream(sock.getOutputStream())
File file = new File("asd");
FileInputStream fis = new FileInputStream(file);
BufferedInputStream bis = new BufferedInputStream(fis);
byte[] buffer = new byte[1024*1024*10];
int n = -1;
while((n = bis.read(buffer))!=-1) {
bos.write(buffer,0,n):
}
Use Buffered* to optimize the writing and reading from Streams
Just split the array to smaller chunks so that you don't need to allocate any big array.
For example you could split the array into 16Kb chunks, eg new byte[16384] and send them one by one. On the receiving side you would have to wait until a chunk can be fully read and then store them somewhere and start with next chunk.
But if you are not able to allocate a whole array of the size you need on server side you won't be able to store all the data that you are going to receive anyway.
You could also compress the data before sending it to save bandwidth (and time), take a look at ZipOutputStream and ZipInputStream.
Here's how I solved it:
Client Code:
bis=new BufferedInputStream(sock.getInputStream());
fos = new FileOutputStream(file);
int n;
byte[] buffer = new byte[8192];
while ((n = bis.read(buffer)) > 0){
fos.write(buffer, 0, n);}
Server Code:
bos= new BufferedOutputStream(sock.getOutputStream());
FileInputStream fis = new FileInputStream(file);
BufferedInputStream bis = new BufferedInputStream(fis);
int n=-1;
byte[] buffer = new byte[8192];
while((n = bis.read(buffer))>-1)
bos.write(buffer,0,n);
Depending on whether or not you have to write the code yourself, there are existing libraries which solve this problem, e.g. rmiio. If you are not using RMI, just plain java serialization, you can use the DirectRemoteInputStream, which is kind of like a Serializable InputStream. (this library also has support for things like auto-magically compressing the data).
Actually, if you are only sending file data, you would be better off ditching the Object streams and use DataInput/DataOutput streams. first write an integer indicating the file length, then copy the bytes directly to the stream. on the receiving side, read the integer file length, then read exactly that many bytes.
when you copy the data between streams, use a small, fixed size byte[] to move chunks of data between the input and output streams in a loop. there are numerous examples of how to do this correctly available online (e.g. #ErikFWinter's answer).

Categories

Resources