Compression / Decompression of Strings using the deflater - java

I want to compress/decompress and serialize/deserialize String content. I'm using the following two static functions.
/**
* Compress data based on the {#link Deflater}.
*
* #param pToCompress
* input byte-array
* #return compressed byte-array
* #throws NullPointerException
* if {#code pToCompress} is {#code null}
*/
public static byte[] compress(#Nonnull final byte[] pToCompress) {
checkNotNull(pToCompress);
// Compressed result.
byte[] compressed = new byte[] {};
// Create the compressor.
final Deflater compressor = new Deflater();
compressor.setLevel(Deflater.BEST_SPEED);
// Give the compressor the data to compress.
compressor.setInput(pToCompress);
compressor.finish();
/*
* Create an expandable byte array to hold the compressed data.
* You cannot use an array that's the same size as the orginal because
* there is no guarantee that the compressed data will be smaller than
* the uncompressed data.
*/
try (ByteArrayOutputStream bos = new ByteArrayOutputStream(pToCompress.length)) {
// Compress the data.
final byte[] buf = new byte[1024];
while (!compressor.finished()) {
final int count = compressor.deflate(buf);
bos.write(buf, 0, count);
}
// Get the compressed data.
compressed = bos.toByteArray();
} catch (final IOException e) {
LOGWRAPPER.error(e.getMessage(), e);
throw new RuntimeException(e);
}
return compressed;
}
/**
* Decompress data based on the {#link Inflater}.
*
* #param pCompressed
* input string
* #return compressed byte-array
* #throws NullPointerException
* if {#code pCompressed} is {#code null}
*/
public static byte[] decompress(#Nonnull final byte[] pCompressed) {
checkNotNull(pCompressed);
// Create the decompressor and give it the data to compress.
final Inflater decompressor = new Inflater();
decompressor.setInput(pCompressed);
byte[] decompressed = new byte[] {};
// Create an expandable byte array to hold the decompressed data.
try (final ByteArrayOutputStream bos = new ByteArrayOutputStream(pCompressed.length)) {
// Decompress the data.
final byte[] buf = new byte[1024];
while (!decompressor.finished()) {
try {
final int count = decompressor.inflate(buf);
bos.write(buf, 0, count);
} catch (final DataFormatException e) {
LOGWRAPPER.error(e.getMessage(), e);
throw new RuntimeException(e);
}
}
// Get the decompressed data.
decompressed = bos.toByteArray();
} catch (final IOException e) {
LOGWRAPPER.error(e.getMessage(), e);
}
return decompressed;
}
Yet, compared to non-compressed values it's orders of magnitudes slower even if I'm caching the decompressed-result and the values are only decompressed if the content is really needed.
That is, it's used for a DOM-like persistable tree-structure and XPath-queries which force the decompression of the String-values are about 50 times if not even more slower (not really benchmarked, just executed unit tests). My laptop even freezes after some unit tests (everytime, checked it about 5-times), because Eclipse isn't responding anymore due to heavy disk I/O and what not. I've even set the compression level to Deflater.BEST_SPEED, whereas other compression levels might be better, maybe I'm providing a configuration option parameter which can be set for resources. Maybe I've messed something up as I haven't used the deflater before. I'm even only compressing content where the String lenght is > 10.
Edit: After considering to extract the Deflater instantiation to a static field it seems creating an instance of deflater and inflater is very costly as the performance bottleneck is gone and perhaps without microbenchmarks or the like I can't see any performance loss :-) I'm just resetting the deflater/inflater before using a new input.

How you considered using the higher level api like Gzip.
Here is an example for compressing:
public static byte[] compressToByte(final String data, final String encoding)
throws IOException
{
if (data == null || data.length == 0)
{
return null;
}
else
{
byte[] bytes = data.getBytes(encoding);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
GZIPOutputStream os = new GZIPOutputStream(baos);
os.write(bytes, 0, bytes.length);
os.close();
byte[] result = baos.toByteArray();
return result;
}
}
Here is an example for uncompressing:
public static String unCompressString(final byte[] data, final String encoding)
throws IOException
{
if (data == null || data.length == 0)
{
return null;
}
else
{
ByteArrayInputStream bais = new ByteArrayInputStream(data);
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
GZIPInputStream is = new GZIPInputStream(bais);
byte[] tmp = new byte[256];
while (true)
{
int r = is.read(tmp);
if (r < 0)
{
break;
}
buffer.write(tmp, 0, r);
}
is.close();
byte[] content = buffer.toByteArray();
return new String(content, 0, content.length, encoding);
}
}
We get very good performance and compression ratio with this.
The zip api is also an option.

Your comments are the correct answer.
In general, if a method is going to be used frequently, you want to eliminate any allocations and copying of data. This often means removing instance initialization and other setup to either static variables or to the constructor.
Using statics is easier, but you may run into lifetime issues (as in how do you know when to clean up the statics - do they exist forever?).
Doing the setup and initialization in the constructor allows the user of the class to determine the lifetime of the object and clean up appropriately. You could instantiate it once before going into a processing loop and GC it after exiting.

Related

Java byte array compression

I'm trying to use the java DeflaterOutputStream and InflaterOutputStream classes to compress a byte array, but both appear to not be working correctly. I assume I'm incorrectly implementing them.
public static byte[] compress(byte[] in) {
try {
ByteArrayOutputStream out = new ByteArrayOutputStream();
DeflaterOutputStream defl = new DeflaterOutputStream(out);
defl.write(in);
defl.flush();
defl.close();
return out.toByteArray();
} catch (Exception e) {
e.printStackTrace();
System.exit(150);
return null;
}
}
public static byte[] decompress(byte[] in) {
try {
ByteArrayOutputStream out = new ByteArrayOutputStream();
InflaterOutputStream infl = new InflaterOutputStream(out);
infl.write(in);
infl.flush();
infl.close();
return out.toByteArray();
} catch (Exception e) {
e.printStackTrace();
System.exit(150);
return null;
}
}
Here's the two methods I'm using to compress and decompress the byte array. Most implementations I've seen online use a fixed size buffer array for the decompression portion, but I'd prefer to avoid that if possible, because I'd need to make that buffer array have a size of one if I want to have any significant compression.
If anyone can explain to me what I'm doing wrong it would be appreciated. Also, to explain why I know these methods aren't working correctly: The "compressed" byte array that it outputs is always larger than the uncompressed one, no matter what size byte array I attempt to provide it.
This will depend on the data you are compressing. For example if we take an array of 0 bytes it compresses well:
byte[] plain = new byte[10000];
byte[] compressed = compress(plain);
System.out.println(compressed.length); // 33
byte[] result = decompress(compressed);
System.out.println(result.length); // 10000
Compression always has overhead to allow for future decompression. If the compression produced no reduction in length (the data was unique or nearly unique) then the output file could be longer than the input file

Decompressing byte[] using LZ4

I am using LZ4 for compressing and decompressing a string.I have tried the following way
public class CompressionDemo {
public static byte[] compressLZ4(LZ4Factory factory, String data) throws IOException {
final int decompressedLength = data.getBytes().length;
LZ4Compressor compressor = factory.fastCompressor();
int maxCompressedLength = compressor.maxCompressedLength(decompressedLength);
byte[] compressed = new byte[maxCompressedLength];
compressor.compress(data.getBytes(), 0, decompressedLength, compressed, 0, maxCompressedLength);
return compressed;
}
public static String deCompressLZ4(LZ4Factory factory, byte[] data) throws IOException {
LZ4FastDecompressor decompressor = factory.fastDecompressor();
byte[] restored = new byte[data.length];
decompressor.decompress(data,0,restored, 0,data.length);
return new String(restored);
}
public static void main(String[] args) throws IOException, DataFormatException {
String string = "kjshfhshfashfhsakjfhksjafhkjsafhkjashfkjhfjkfhhjdshfhhjdfhdsjkfhdshfdskjfhksjdfhskjdhfkjsdhfk";
LZ4Factory factory = LZ4Factory.fastestInstance();
byte[] arr = compressLZ4(factory, string);
System.out.println(arr.length);
System.out.println(deCompressLZ4(factory, arr) + "decom");
}
}
it is giving following excpetion
Exception in thread "main" net.jpountz.lz4.LZ4Exception: Error decoding offset 92 of input buffer
The problem here is that decompressing is working only if i pass the actual String byte[] length i.e
public static String deCompressLZ4(LZ4Factory factory, byte[] data) throws IOException {
LZ4FastDecompressor decompressor = factory.fastDecompressor();
byte[] restored = new byte[data.length];
decompressor.decompress(data,0,restored, 0,"kjshfhshfashfhsakjfhksjafhkjsafhkjashfkjhfjkfhhjdshfhhjdfhdsjkfhdshfdskjfhksjdfhskjdhfkjsdhfk".getBytes().length);
return new String(restored);
}
It is expecting the actual string byte[] size.
Can someone help me with this
As the compression and decompressions may happen on different machines, or the machine default character encoding is not one of the Unicode formats, one should indicate the encoding too.
For the rest it is using the actual compression and decompression lengths, and better store the size of the uncompressed data too, in plain format, so it may be extracted prior to decompressing.
public static byte[] compressLZ4(LZ4Factory factory, String data) throws IOException {
byte[] decompressed = data.getBytes(StandardCharsets.UTF_8).length;
LZ4Compressor compressor = factory.fastCompressor();
int maxCompressedLength = compressor.maxCompressedLength(decompressed.length);
byte[] compressed = new byte[4 + maxCompressedLength];
int compressedSize = compressor.compress(decompressed, 0, decompressed.length,
compressed, 4, maxCompressedLength);
ByteBuffer.wrap(compressed).putInt(decompressed.length);
return Arrays.copyOf(compressed, 0, 4 + compressedSize);
}
public static String deCompressLZ4(LZ4Factory factory, byte[] data) throws IOException {
LZ4FastDecompressor decompressor = factory.fastDecompressor();
int decrompressedLength = ByteBuffer.wrap(data).getInt();
byte[] restored = new byte[decrompressedLength];
decompressor.decompress(data, 4, restored, 0, decrompressedLength);
return new String(restored, StandardCharsets.UTF_8);
}
It should be told, that String is not suited for binary data, and your compression/decompression is for text handling only. (String contains Unicode text in the form of UTF-16 two-byte chars. Conversion to binary data always involves a conversion with the encoding of the binary data. That costs in memory, speed and possible data corruption.)
I just faced the same error on Android and resolved it based on issue below:
https://github.com/lz4/lz4-java/issues/68
In short make sure you are using the same factory for both operations (compression + decompression) and use Arrays.copyOf() as below:
byte[] compress(final byte[] data) {
LZ4Factory lz4Factory = LZ4Factory.safeInstance();
LZ4Compressor fastCompressor = lz4Factory.fastCompressor();
int maxCompressedLength = fastCompressor.maxCompressedLength(data.length);
byte[] comp = new byte[maxCompressedLength];
int compressedLength = fastCompressor.compress(data, 0, data.length, comp, 0, maxCompressedLength);
return Arrays.copyOf(comp, compressedLength);
}
byte[] decompress(final byte[] compressed) {
LZ4Factory lz4Factory = LZ4Factory.safeInstance();
LZ4SafeDecompressor decompressor = lz4Factory.safeDecompressor();
byte[] decomp = new byte[compressed.length * 4];//you might need to allocate more
decomp = decompressor.decompress(Arrays.copyOf(compressed, compressed.length), decomp.length);
return decomp;
Hope this will help.
restored byte[] length is to small, you should not use compressed data.length, instead you should use data[].length * 3 or more than 3.
I resoved like this:
public static byte[] decompress( byte[] finalCompressedArray,String ... extInfo) {
int len = finalCompressedArray.length * 3;
int i = 5;
while (i > 0) {
try {
return decompress(finalCompressedArray, len);
} catch (Exception e) {
len = len * 2;
i--;
if (LOGGER.isInfoEnabled()) {
LOGGER.info("decompress Error: extInfo ={} ", extInfo, e);
}
}
}
throw new ItemException(1, "decompress error");
}
/**
* 解压一个数组
*
* #param finalCompressedArray 压缩后的数据
* #param length 原始数据长度, 精确的长度,不能大,也不能小。
* #return
*/
private static byte[] decompress(byte[] finalCompressedArray, int length) {
byte[] desc = new byte[length ];
int decompressLen = decompressor.decompress(finalCompressedArray, desc);
byte[] result = new byte[decompressLen];
System.arraycopy(desc,0,result,0,decompressLen);
return result;
}

Why gzip compressed buffer size is greater then uncompressed buffer?

I'm trying to write a compress utils class.
But during the test, I find the result it greater than original buffer.
Is my codes right ?
Please see codes:
/**
* This class provide compress ability
* <p>
* Support:
* <li>GZIP
* <li>Deflate
*/
public class CompressUtils {
final public static int DEFAULT_BUFFER_SIZE = 4096; // Compress/Decompress buffer is 4K
/**
* GZIP Compress
*
* #param data The data will be compressed
* #return The compressed data
* #throws IOException
*/
public static byte[] gzipCompress(byte[] data) throws IOException {
Validate.isTrue(ArrayUtils.isNotEmpty(data));
ByteArrayInputStream bis = new ByteArrayInputStream(data);
ByteArrayOutputStream bos = new ByteArrayOutputStream();
try {
gzipCompress(bis, bos);
bos.flush();
return bos.toByteArray();
} finally {
bis.close();
bos.close();
}
}
/**
* GZIP Decompress
*
* #param data The data to be decompressed
* #return The decompressed data
* #throws IOException
*/
public static byte[] gzipDecompress(byte[] data) throws IOException {
Validate.isTrue(ArrayUtils.isNotEmpty(data));
ByteArrayInputStream bis = new ByteArrayInputStream(data);
ByteArrayOutputStream bos = new ByteArrayOutputStream();
try {
gzipDecompress(bis, bos);
bos.flush();
return bos.toByteArray();
} finally {
bis.close();
bos.close();
}
}
/**
* GZIP Compress
*
* #param is The input stream to be compressed
* #param os The compressed result
* #throws IOException
*/
public static void gzipCompress(InputStream is, OutputStream os) throws IOException {
GZIPOutputStream gos = null;
byte[] buffer = new byte[DEFAULT_BUFFER_SIZE];
int count = 0;
try {
gos = new GZIPOutputStream(os);
while ((count = is.read(buffer)) != -1) {
gos.write(buffer, 0, count);
}
gos.finish();
gos.flush();
} finally {
if (gos != null) {
gos.close();
}
}
}
/**
* GZIP Decompress
*
* #param is The input stream to be decompressed
* #param os The decompressed result
* #throws IOException
*/
public static void gzipDecompress(InputStream is, OutputStream os) throws IOException {
GZIPInputStream gis = null;
int count = 0;
byte[] buffer = new byte[DEFAULT_BUFFER_SIZE];
try {
gis = new GZIPInputStream(is);
while ((count = is.read(buffer)) != -1) {
os.write(buffer, 0, count);
}
} finally {
if (gis != null) {
gis.close();
}
}
}
}
And here's the testing codes:
public class CompressUtilsTest {
private Random random = new Random();
#Test
public void gzipTest() throws IOException {
byte[] buffer = new byte[1023];
random.nextBytes(buffer);
System.out.println("Orignal: " + Hex.encodeHexString(buffer));
byte[] result = CompressUtils.gzipCompress(buffer);
System.out.println("Compressed: " + Hex.encodeHexString(result));
byte[] decompressed = CompressUtils.gzipDecompress(result);
System.out.println("DeCompressed: " + Hex.encodeHexString(decompressed));
Assert.assertArrayEquals(buffer, decompressed);
}
}
And the result is:
original is 1023 bytes long
compressed is 1036 bytes long
How is it happen ?
In your test you initialize the buffer with a set of random characters.
GZIP consists of two parts:
LZW compression
Encoding using a Huffman code
The former relies heavily on repeated sequences in the input. Basically it says something like: "The next 10 characters are the same as the 10 characters staring at index X".
In your case there are (possibly) no such repeated sequences, thus no compression by the first algorithm.
The Huffman encoding on the other hand should work, but in total the GZIP overhead (storing the used Huffman encoding, e.g.) outweighs the advantages of compressing the input.
If you test your algorithm with real files, you will get some meaningful results.
Best results are usually acquired when trying to compress structured files like XML.
It's because compression generally works great on medium to large data length (1023 bytes is quite small) and moreover it also works the best on data that contains repeated patterns not on random ones.

Decompressed video file is not working in Java

Basically i compress video using the customized compressor class in Java. I have assembled my complete code snippets here. My actually problem is, generated video [ A.mp4] from the decompressed byte array is not running. I actually i got this compressor class code over the internet. As i new to Java platform, i am struggling to resolve this problem. Could you please any one help me on this.?
public class CompressionTest
{
public static void main(String[] args)
{
Compressor compressor = new Compressor();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
FileInputStream fis=null;
File file=null;
try
{
URL uri=CompressionTest.class.getResource("/Files/Video.mp4");
file=new File(uri.getPath());
fis = new FileInputStream(file);
}
catch ( FileNotFoundException fnfe )
{
System.out.println( "Unable to open input file");
}
try
{
byte[] videoBytes = getBytesFromFile(file);
System.out.println("CompressionVideoToCompress is: '" +videoBytes + "'");
byte[] bytesCompressed = compressor.compress(videoBytes);
System.out.println("bytesCompressed is: '" +bytesCompressed+ "'");
byte[] bytesDecompressed=compressor.decompress(bytesCompressed);
System.out.println("bytesDecompressed is: '" +bytesDecompressed+ "'");
FileOutputStream out = new FileOutputStream("A.mp4");
out.write(bytesDecompressed,0,bytesDecompressed.length-1);
out.close();
}
catch (IOException e)
{
// TODO Auto-generated catch block
System.out.println("bytesCompressed is: '");
}
}
public static byte[] getBytesFromFile(File file) throws IOException
{
InputStream is = new FileInputStream(file);
// Get the size of the file
long length = file.length();
// You cannot create an array using a long type.
// It needs to be an int type.
// Before converting to an int type, check
// to ensure that file is not larger than Integer.MAX_VALUE.
if (length > Integer.MAX_VALUE) {
// File is too large
}
// Create the byte array to hold the data
byte[] bytes = new byte[1064];
// Read in the bytes
int offset = 0;
int numRead = 0;
while (offset < bytes.length
&& (numRead=is.read(bytes, offset, bytes.length-offset)) >= 0)
{
offset += numRead;
}
// Ensure all the bytes have been read in
if (offset < bytes.length) {
throw new IOException("Could not completely read file "+file.getName());
}
// Close the input stream and return bytes
is.close();
return bytes;
}
}
class Compressor
{
public Compressor()
{}
public byte[] compress(byte[] bytesToCompress)
{
Deflater deflater = new Deflater();
deflater.setInput(bytesToCompress);
deflater.finish();
byte[] bytesCompressed = new byte[Short.MAX_VALUE];
int numberOfBytesAfterCompression = deflater.deflate(bytesCompressed);
byte[] returnValues = new byte[numberOfBytesAfterCompression];
System.arraycopy
(
bytesCompressed,
0,
returnValues,
0,
numberOfBytesAfterCompression
);
return returnValues;
}
public byte[] decompress(byte[] bytesToDecompress)
{
Inflater inflater = new Inflater();
int numberOfBytesToDecompress = bytesToDecompress.length;
inflater.setInput
(
bytesToDecompress,
0,
numberOfBytesToDecompress
);
int compressionFactorMaxLikely = 3;
int bufferSizeInBytes =
numberOfBytesToDecompress
* compressionFactorMaxLikely;
byte[] bytesDecompressed = new byte[bufferSizeInBytes];
byte[] returnValues = null;
try
{
int numberOfBytesAfterDecompression = inflater.inflate(bytesDecompressed);
returnValues = new byte[numberOfBytesAfterDecompression];
System.arraycopy
(
bytesDecompressed,
0,
returnValues,
0,
numberOfBytesAfterDecompression
);
}
catch (DataFormatException dfe)
{
dfe.printStackTrace();
}
inflater.end();
return returnValues;
}
}
I've tested your code by compressing and decompressing a simple TXT file. The code is broken, since the compressed file, when uncompressed, is different from the original one.
Take for granted that the code is broken at least in the getBytesFromFile function. Its logic is tricky and troublesome, since it only allows files up to length 1064 and the check (throwing IOException when a longer file is read) does not work at all. The file gets read only partially and no exception is thrown.
What you are trying to achieve (file compression/decompression) can be done this way. I've tested it and it works, you just need this library.
import java.io.*;
import java.util.zip.*;
import org.apache.commons.io.IOUtils; // <-- get this from http://commons.apache.org/io/index.html
public class CompressionTest2 {
public static void main(String[] args) throws IOException {
File input = new File("input.txt");
File output = new File("output.bin");
Compression.compress(input, output);
File input2 = new File("input2.txt");
Compression.decompress(output, input2);
// At this point, input.txt and input2.txt should be equal
}
}
class Compression {
public static void compress(File input, File output) throws IOException {
FileInputStream fis = new FileInputStream(input);
FileOutputStream fos = new FileOutputStream(output);
GZIPOutputStream gzipStream = new GZIPOutputStream(fos);
IOUtils.copy(fis, gzipStream);
gzipStream.close();
fis.close();
fos.close();
}
public static void decompress(File input, File output) throws IOException {
FileInputStream fis = new FileInputStream(input);
FileOutputStream fos = new FileOutputStream(output);
GZIPInputStream gzipStream = new GZIPInputStream(fis);
IOUtils.copy(gzipStream, fos);
gzipStream.close();
fis.close();
fos.close();
}
}
This code doesn't come from "credible and/or official sources" but at least it works. :)
Moreover, in order to get more answers, adjust the title stating your real problem: your compressed files don't decompress the right way. There is no 'video' stuff here. Moreover, zipping a .mp4 file is no achievement (compression ratio will likely be around 99.99%).
Two tips:
1) Replace getBytesFromFile with a well known API call, either using Apache commons (IOUtils) or java 7 now provides such a method, too.
2) Test compress and decompress by writing a Junit test:
Create a random huge byte array, write it out, read it back and compare it with the created one.

Iterable gzip deflate/inflate in Java

Is there a library for gzip-deflating in terms of ByteBuffers hidden in the Internet? Something which allows us to push raw data then pull deflated data? We have searched for it but found only libraries which deal with InputStreams and OutputStreams.
We are tasked with creating gzip filters for deflating a flow of ByteBuffers in a pipeline architecture. This is a pull architecture where the last element pulls data from earlier elements. Our gzip filter deals with a flow of ByteBuffers, there is no single Stream object available.
We have toyed with adapting the data flow as some kind of InputStream and then use GZipOutputStream to satisfy our requirements but the amount of adaptor code is annoying to say the least.
Post-accept edit: for the record, our architecture is similar to that of GStreamer and the likes.
I don't understand the "hidden in the internet" part, but zlib does in-memory gzip format compression and decompression. The java.util.zip API provides some access to zlib, though it is limited. Due to the interface limitations, you cannot request that zlib produce and consume gzip streams directly. You can however use the nowrap option to produce and consume raw deflate data. Then it's easy to roll your own gzip header and trailer, using the CRC32 class in java.util.zip. You can prepend a fixed 10-byte header, append the four-byte CRC and then the four-byte uncompressed length (modulo 232), both in little-endian order, and you're good to go.
Much credit to Mark Adler for suggesting this approach, which is much better than my original answer.
package stack;
import java.io.*;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.util.zip.CRC32;
import java.util.zip.Deflater;
public class BufferDeflate2 {
/** The standard 10 byte GZIP header */
private static final byte[] GZIP_HEADER = new byte[] { 0x1f, (byte) 0x8b,
Deflater.DEFLATED, 0, 0, 0, 0, 0, 0, 0 };
/** CRC-32 of uncompressed data. */
private final CRC32 crc = new CRC32();
/** Deflater to deflate data */
private final Deflater deflater = new Deflater(Deflater.BEST_COMPRESSION,
true);
/** Output buffer building area */
private final ByteArrayOutputStream buffer = new ByteArrayOutputStream();
/** Internal transfer space */
private final byte[] transfer = new byte[1000];
/** The flush mode to use at the end of each buffer */
private final int flushMode;
/**
* New buffer deflater
*
* #param syncFlush
* if true, all data in buffer can be immediately decompressed
* from output buffer
*/
public BufferDeflate2(boolean syncFlush) {
flushMode = syncFlush ? Deflater.SYNC_FLUSH : Deflater.NO_FLUSH;
buffer.write(GZIP_HEADER, 0, GZIP_HEADER.length);
}
/**
* Deflate the buffer
*
* #param in
* the buffer to deflate
* #return deflated representation of the buffer
*/
public ByteBuffer deflate(ByteBuffer in) {
// convert buffer to bytes
byte[] inBytes;
int off = in.position();
int len = in.remaining();
if( in.hasArray() ) {
inBytes = in.array();
} else {
off = 0;
inBytes = new byte[len];
in.get(inBytes);
}
// update CRC and deflater
crc.update(inBytes, off, len);
deflater.setInput(inBytes, off, len);
while( !deflater.needsInput() ) {
int r = deflater.deflate(transfer, 0, transfer.length, flushMode);
buffer.write(transfer, 0, r);
}
byte[] outBytes = buffer.toByteArray();
buffer.reset();
return ByteBuffer.wrap(outBytes);
}
/**
* Write the final buffer. This writes any remaining compressed data and the GZIP trailer.
* #return the final buffer
*/
public ByteBuffer doFinal() {
// finish deflating
deflater.finish();
// write all remaining data
int r;
do {
r = deflater.deflate(transfer, 0, transfer.length,
Deflater.FULL_FLUSH);
buffer.write(transfer, 0, r);
} while( r == transfer.length );
// write GZIP trailer
writeInt((int) crc.getValue());
writeInt((int) deflater.getBytesRead());
// reset deflater
deflater.reset();
// final output
byte[] outBytes = buffer.toByteArray();
buffer.reset();
return ByteBuffer.wrap(outBytes);
}
/**
* Write a 32 bit value in little-endian order
*
* #param v
* the value to write
*/
private void writeInt(int v) {
System.out.println("v="+v);
buffer.write(v & 0xff);
buffer.write((v >> 8) & 0xff);
buffer.write((v >> 16) & 0xff);
buffer.write((v >> 24) & 0xff);
}
/**
* For testing. Pass in the name of a file to GZIP compress
* #param args
* #throws IOException
*/
public static void main(String[] args) throws IOException {
File inFile = new File(args[0]);
File outFile = new File(args[0]+".test.gz");
FileChannel inChan = (new FileInputStream(inFile)).getChannel();
FileChannel outChan = (new FileOutputStream(outFile)).getChannel();
BufferDeflate2 def = new BufferDeflate2(false);
ByteBuffer buf = ByteBuffer.allocate(500);
while( true ) {
buf.clear();
int r = inChan.read(buf);
if( r==-1 ) break;
buf.flip();
ByteBuffer compBuf = def.deflate(buf);
outChan.write(compBuf);
}
ByteBuffer compBuf = def.doFinal();
outChan.write(compBuf);
inChan.close();
outChan.close();
}
}
Processing ByteBuffers is not hard. See my sample code below. You need to know how the buffers are created. The options are:
Each buffer is compressed independently. This is so simple to handle I assume this is not the case. You would just transform the buffer into a byte array and wrap it in an ByteArrayInputStream within a GZIPInputStream.
Each buffer was ended with a SYNC_FLUSH by the writer, and thus comprises an entire block of data within a stream. All the data written by the writer to the buffer can be read immediately by the reader.
Each buffer is just part of a GZIP stream. There is no guarantee the reader can read anything from the buffer.
Data generated by GZIP must be processed in order. The ByteBuffers will have to be processed in the same order they are generated.
Sample code:
package stack;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.nio.ByteBuffer;
import java.nio.channels.Channels;
import java.nio.channels.Pipe;
import java.nio.channels.SelectableChannel;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.zip.GZIPInputStream;
public class BufferDeflate {
static AtomicInteger idSrc = new AtomicInteger(1);
/** Queue for transferring buffers */
final BlockingQueue<ByteBuffer> buffers = new LinkedBlockingQueue<ByteBuffer>();
/** The entry point for deflated buffers */
final Pipe.SinkChannel bufSink;
/** The source for the inflater */
final Pipe.SourceChannel infSource;
/** The destination for the inflater */
final Pipe.SinkChannel infSink;
/** The source for the outside world */
public final SelectableChannel source;
class Relayer extends Thread {
public Relayer(int id) {
super("BufferRelayer" + id);
}
public void run() {
try {
while( true ) {
ByteBuffer buf = buffers.take();
if( buf != null ) {
bufSink.write(buf);
} else {
bufSink.close();
break;
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
class Inflater extends Thread {
public Inflater(int id) {
super("BufferInflater" + id);
}
public void run() {
try {
InputStream in = Channels.newInputStream(infSource);
GZIPInputStream gzip = new GZIPInputStream(in);
OutputStream out = Channels.newOutputStream(infSink);
int ch;
while( (ch = gzip.read()) != -1 ) {
out.write(ch);
}
out.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
/**
* New buffer inflater
*/
public BufferDeflate() throws IOException {
Pipe pipe = Pipe.open();
bufSink = pipe.sink();
infSource = pipe.source();
pipe = Pipe.open();
infSink = pipe.sink();
source = pipe.source().configureBlocking(false);
int id = idSrc.incrementAndGet();
Thread thread = new Relayer(id);
thread.setDaemon(true);
thread.start();
thread = new Inflater(id);
thread.setDaemon(true);
thread.start();
}
/**
* Add the buffer to the stream. A null buffer closes the stream
*
* #param buf
* the buffer to add
* #throws IOException
*/
public void add(ByteBuffer buf) throws IOException {
buffers.offer(buf);
}
}
Simply pass the buffers to the add method and read from the public source channel. The amount of data that can be read from GZIP after processing a given number of bytes is impossible to predict. I have therefore made the source channel non-blocking so you can safely read from it in the same thread that you add the byte buffers.

Categories

Resources