Iterable gzip deflate/inflate in Java - java

Is there a library for gzip-deflating in terms of ByteBuffers hidden in the Internet? Something which allows us to push raw data then pull deflated data? We have searched for it but found only libraries which deal with InputStreams and OutputStreams.
We are tasked with creating gzip filters for deflating a flow of ByteBuffers in a pipeline architecture. This is a pull architecture where the last element pulls data from earlier elements. Our gzip filter deals with a flow of ByteBuffers, there is no single Stream object available.
We have toyed with adapting the data flow as some kind of InputStream and then use GZipOutputStream to satisfy our requirements but the amount of adaptor code is annoying to say the least.
Post-accept edit: for the record, our architecture is similar to that of GStreamer and the likes.

I don't understand the "hidden in the internet" part, but zlib does in-memory gzip format compression and decompression. The java.util.zip API provides some access to zlib, though it is limited. Due to the interface limitations, you cannot request that zlib produce and consume gzip streams directly. You can however use the nowrap option to produce and consume raw deflate data. Then it's easy to roll your own gzip header and trailer, using the CRC32 class in java.util.zip. You can prepend a fixed 10-byte header, append the four-byte CRC and then the four-byte uncompressed length (modulo 232), both in little-endian order, and you're good to go.

Much credit to Mark Adler for suggesting this approach, which is much better than my original answer.
package stack;
import java.io.*;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.util.zip.CRC32;
import java.util.zip.Deflater;
public class BufferDeflate2 {
/** The standard 10 byte GZIP header */
private static final byte[] GZIP_HEADER = new byte[] { 0x1f, (byte) 0x8b,
Deflater.DEFLATED, 0, 0, 0, 0, 0, 0, 0 };
/** CRC-32 of uncompressed data. */
private final CRC32 crc = new CRC32();
/** Deflater to deflate data */
private final Deflater deflater = new Deflater(Deflater.BEST_COMPRESSION,
true);
/** Output buffer building area */
private final ByteArrayOutputStream buffer = new ByteArrayOutputStream();
/** Internal transfer space */
private final byte[] transfer = new byte[1000];
/** The flush mode to use at the end of each buffer */
private final int flushMode;
/**
* New buffer deflater
*
* #param syncFlush
* if true, all data in buffer can be immediately decompressed
* from output buffer
*/
public BufferDeflate2(boolean syncFlush) {
flushMode = syncFlush ? Deflater.SYNC_FLUSH : Deflater.NO_FLUSH;
buffer.write(GZIP_HEADER, 0, GZIP_HEADER.length);
}
/**
* Deflate the buffer
*
* #param in
* the buffer to deflate
* #return deflated representation of the buffer
*/
public ByteBuffer deflate(ByteBuffer in) {
// convert buffer to bytes
byte[] inBytes;
int off = in.position();
int len = in.remaining();
if( in.hasArray() ) {
inBytes = in.array();
} else {
off = 0;
inBytes = new byte[len];
in.get(inBytes);
}
// update CRC and deflater
crc.update(inBytes, off, len);
deflater.setInput(inBytes, off, len);
while( !deflater.needsInput() ) {
int r = deflater.deflate(transfer, 0, transfer.length, flushMode);
buffer.write(transfer, 0, r);
}
byte[] outBytes = buffer.toByteArray();
buffer.reset();
return ByteBuffer.wrap(outBytes);
}
/**
* Write the final buffer. This writes any remaining compressed data and the GZIP trailer.
* #return the final buffer
*/
public ByteBuffer doFinal() {
// finish deflating
deflater.finish();
// write all remaining data
int r;
do {
r = deflater.deflate(transfer, 0, transfer.length,
Deflater.FULL_FLUSH);
buffer.write(transfer, 0, r);
} while( r == transfer.length );
// write GZIP trailer
writeInt((int) crc.getValue());
writeInt((int) deflater.getBytesRead());
// reset deflater
deflater.reset();
// final output
byte[] outBytes = buffer.toByteArray();
buffer.reset();
return ByteBuffer.wrap(outBytes);
}
/**
* Write a 32 bit value in little-endian order
*
* #param v
* the value to write
*/
private void writeInt(int v) {
System.out.println("v="+v);
buffer.write(v & 0xff);
buffer.write((v >> 8) & 0xff);
buffer.write((v >> 16) & 0xff);
buffer.write((v >> 24) & 0xff);
}
/**
* For testing. Pass in the name of a file to GZIP compress
* #param args
* #throws IOException
*/
public static void main(String[] args) throws IOException {
File inFile = new File(args[0]);
File outFile = new File(args[0]+".test.gz");
FileChannel inChan = (new FileInputStream(inFile)).getChannel();
FileChannel outChan = (new FileOutputStream(outFile)).getChannel();
BufferDeflate2 def = new BufferDeflate2(false);
ByteBuffer buf = ByteBuffer.allocate(500);
while( true ) {
buf.clear();
int r = inChan.read(buf);
if( r==-1 ) break;
buf.flip();
ByteBuffer compBuf = def.deflate(buf);
outChan.write(compBuf);
}
ByteBuffer compBuf = def.doFinal();
outChan.write(compBuf);
inChan.close();
outChan.close();
}
}

Processing ByteBuffers is not hard. See my sample code below. You need to know how the buffers are created. The options are:
Each buffer is compressed independently. This is so simple to handle I assume this is not the case. You would just transform the buffer into a byte array and wrap it in an ByteArrayInputStream within a GZIPInputStream.
Each buffer was ended with a SYNC_FLUSH by the writer, and thus comprises an entire block of data within a stream. All the data written by the writer to the buffer can be read immediately by the reader.
Each buffer is just part of a GZIP stream. There is no guarantee the reader can read anything from the buffer.
Data generated by GZIP must be processed in order. The ByteBuffers will have to be processed in the same order they are generated.
Sample code:
package stack;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.nio.ByteBuffer;
import java.nio.channels.Channels;
import java.nio.channels.Pipe;
import java.nio.channels.SelectableChannel;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.zip.GZIPInputStream;
public class BufferDeflate {
static AtomicInteger idSrc = new AtomicInteger(1);
/** Queue for transferring buffers */
final BlockingQueue<ByteBuffer> buffers = new LinkedBlockingQueue<ByteBuffer>();
/** The entry point for deflated buffers */
final Pipe.SinkChannel bufSink;
/** The source for the inflater */
final Pipe.SourceChannel infSource;
/** The destination for the inflater */
final Pipe.SinkChannel infSink;
/** The source for the outside world */
public final SelectableChannel source;
class Relayer extends Thread {
public Relayer(int id) {
super("BufferRelayer" + id);
}
public void run() {
try {
while( true ) {
ByteBuffer buf = buffers.take();
if( buf != null ) {
bufSink.write(buf);
} else {
bufSink.close();
break;
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
class Inflater extends Thread {
public Inflater(int id) {
super("BufferInflater" + id);
}
public void run() {
try {
InputStream in = Channels.newInputStream(infSource);
GZIPInputStream gzip = new GZIPInputStream(in);
OutputStream out = Channels.newOutputStream(infSink);
int ch;
while( (ch = gzip.read()) != -1 ) {
out.write(ch);
}
out.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
/**
* New buffer inflater
*/
public BufferDeflate() throws IOException {
Pipe pipe = Pipe.open();
bufSink = pipe.sink();
infSource = pipe.source();
pipe = Pipe.open();
infSink = pipe.sink();
source = pipe.source().configureBlocking(false);
int id = idSrc.incrementAndGet();
Thread thread = new Relayer(id);
thread.setDaemon(true);
thread.start();
thread = new Inflater(id);
thread.setDaemon(true);
thread.start();
}
/**
* Add the buffer to the stream. A null buffer closes the stream
*
* #param buf
* the buffer to add
* #throws IOException
*/
public void add(ByteBuffer buf) throws IOException {
buffers.offer(buf);
}
}
Simply pass the buffers to the add method and read from the public source channel. The amount of data that can be read from GZIP after processing a given number of bytes is impossible to predict. I have therefore made the source channel non-blocking so you can safely read from it in the same thread that you add the byte buffers.

Related

InputStream.read() stuck

I have searched for this problem but all people encoutering it are using sockets.
I am just using a solid file so everything should already be in the Stream ...
Here is my code :
AudioInputStream in = null;
MpegAudioFileReader mp = new MpegAudioFileReader();
in = mp.getAudioInputStream(new File(this.directory + currentBeatmaps.get(0).getAudioFileName()));
AudioFormat baseFormat = in.getFormat();
AudioFormat decodedFormat = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED,
baseFormat.getSampleRate(),
16,
baseFormat.getChannels(),
baseFormat.getChannels() * 2,
baseFormat.getSampleRate(),
false);
final AudioInputStream din2 = AudioSystem.getAudioInputStream(decodedFormat, in);
long length = din2.getFrameLength();
byte[] bytes = IOUtils.toByteArray(din2);
in.close();
InputStream is1 = new ByteArrayInputStream(bytes);
AudioInputStream din = new AudioInputStream(
is1,
decodedFormat,
length
);
IOUtils.toByteArray(din2) gets stuck for some files, and works for other files so I checked what was inside, and there is this function that gets stuck in debugging mode :
/**
* Copies bytes from a large (over 2GB) <code>InputStream</code> to an
* <code>OutputStream</code>.
* <p>
* This method uses the provided buffer, so there is no need to use a
* <code>BufferedInputStream</code>.
* <p>
*
* #param input the <code>InputStream</code> to read from
* #param output the <code>OutputStream</code> to write to
* #param buffer the buffer to use for the copy
* #return the number of bytes copied
* #throws NullPointerException if the input or output is null
* #throws IOException if an I/O error occurs
* #since 2.2
*/
public static long copyLarge(final InputStream input, final OutputStream output, final byte[] buffer)
throws IOException {
long count = 0;
int n;
while (EOF != (n = input.read(buffer))) {
output.write(buffer, 0, n);
count += n;
}
return count;
}
It dies when doing input.read(buffer) at the first iteration.
And I repeat, this works for some files but not for others, so I don't know how to handle this...
If someone may at least find a solution to at least, detect when it will do this so I can print an error that would be great (This part is not critical to my software)

Java decompress GZIP stream sequentially

My Java program implements a server that should get a very large file, compressed using gzip, from a client over websockets and should check for some bytes pattern in the file content.
The client sends the file chunks embedded inside a proprietary protocol so I'm getting message after message from the client, parse the message and extract the gzipped file content.
I can't hold the whole file in the program memory so I'm trying to decompress each chunk, process the data and continue to the next chunk.
I'm using the following code:
public static String gzipDecompress(byte[] compressed) throws IOException {
String uncompressed;
try (
ByteArrayInputStream bis = new ByteArrayInputStream(compressed);
GZIPInputStream gis = new GZIPInputStream(bis);
Reader reader = new InputStreamReader(gis);
Writer writer = new StringWriter()
) {
char[] buffer = new char[10240];
for (int length = 0; (length = reader.read(buffer)) > 0; ) {
writer.write(buffer, 0, length);
}
uncompressed = writer.toString();
}
return uncompressed;
}
But I'm getting the following exception when calling the function with the first compressed chunk:
java.io.EOFException: Unexpected end of ZLIB input stream
at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:240)
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:117)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.Reader.read(Reader.java:140)
It's important to mention that I'm not skipping any chunk and trying to decompress the chunks sequentially.
What am I missing?
The problem is that you play with those chunks manually.
The correct way would be to obtain some InputStream, wrap it with GZIPInputStream and then read the data.
InputStream is = // obtain the original gzip stream
GZIPInputStream gis = new GZIPInputStream(is);
Reader reader = new InputStreamReader(gis);
//... proceed reading and so on
GZIPInputStream works in stream fashion, so if you only ask 10kb at a time from your reader, the overall memory footprint will be low regardless of the size of the initial GZIP file.
Update after the question was updated
A possible solution for your situation is to write an InputStream implementation that streams bytes that are being put to it in chunks by your client protocol handler.
Here is a prototype:
public class ProtocolDataInputStream extends InputStream {
private BlockingQueue<byte[]> nextChunks = new ArrayBlockingQueue<byte[]>(100);
private byte[] currentChunk = null;
private int currentChunkOffset = 0;
private boolean noMoreChunks = false;
#Override
public synchronized int read() throws IOException {
boolean takeNextChunk = currentChunk == null || currentChunkOffset >= currentChunk.length;
if (takeNextChunk) {
if (noMoreChunks) {
// stream is exhausted
return -1;
} else {
currentChunk = nextChunks.take();
currentChunkOffset = 0;
}
}
return currentChunk[currentChunkOffset++];
}
#Override
public synchronized int available() throws IOException {
if (currentChunk == null) {
return 0;
} else {
return currentChunk.length - currentChunkOffset;
}
}
public synchronized void addChunk(byte[] chunk, boolean chunkIsLast) {
nextChunks.add(chunk);
if (chunkIsLast) {
noMoreChunks = true;
}
}
}
Your client protocol handler adds byte chunks using addChunk(), while your decompressing code pulls the data out of this stream (via Reader).
Please note that this code has some issues:
The queue being used has a limited size. If addChunk() is being called too frequently, the queue may be filled, which will block addChunk(). This may be desirable or not.
Only read() method is implemented for illustration purposes. For performance, it is better to implement read(byte[]) in the same manner.
Conservative synchornization is used under the assumption that reader (decompressor) and writer (protocol handler calling addChunk()) are different threads.
InterruptedException is not handled on take() to avoid too much details.
If your decompressor and addChunk() execute in the same thread (in the same loop), then you could try to use the InputStream.available() method when pulling using InputStream or Reader.ready() when pulling with a Reader.
An arbitrary sequence of bytes from a gzipped stream is not valid standalone gzip data. One way or another, you must concatenate all the byte chunks.
The easiest way is to accumulate them all with a simple pipe:
import java.io.PipedOutputStream;
import java.io.IOException;
import java.util.zip.GZIPInputStream;
public class ChunkInflater {
private final PipedOutputStream pipe;
private final InputStream stream;
public ChunkInflater()
throws IOException {
pipe = new PipedOutputStream();
stream = new GZIPInputStream(new PipedInputStream(pipe));
}
public InputStream getInputStream() {
return stream;
}
public void addChunk(byte[] compressedChunk)
throws IOException {
pipe.write(compressedChunk);
}
}
Now you have an InputStream you can read in whatever increments you desire. For instance:
ChunkInflater inflater = new ChunkInflater();
Callable<Void> chunkReader = new Callable<Void>() {
#Override
public Void call()
throws IOException {
byte[] chunk;
while ((chunk = readChunkFromSource()) != null) {
inflater.addChunk(chunk);
}
return null;
}
};
ExecutorService executor = Executors.newSingleThreadExecutor();
executor.submit(chunkReader);
executor.shutdown();
Reader reader = new InputStreamReader(inflater.getInputStream());
// read text here

C#-Server and Java-Client: TCP Socket Communication Issues

I have wrote a server program in C# using TCPListner and a client program in Java using socket but I fail to send complex objects from Java client to C# server.
When I send a simple string from Java client to C# server by converting the string into byte array,
it always show some invalid characters at the start of message when converted back to String (using Encoding.utf8.getstring(bytesArray) ) in C# server. When I pass a String from C# to Java Client it shows invalid Header error.
Please help me if any one have any alternative or know abut any free API which can solve my problem. I have tried Java-cs-bridge to send complex objects but it always show Exception on C# server.
Here is the code:
C# Server Code - Main Function
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Net;
using System.Net.Sockets;
using System.IO;
namespace netSocketServer
{
class Program
{
static void Main(string[] args)
{
TcpListener server = new TcpListener(IPAddress.Any, 8888);
var IP = Dns.GetHostEntry(Dns.GetHostName()).AddressList.Where(ip =>ip.AddressFamily == AddressFamily.InterNetwork).Select(ip =>ip).FirstOrDefault();
server.Start();
Console.WriteLine("Server is Running at " + IP.ToString());
TcpClient clientSocket = server.AcceptTcpClient();
Console.WriteLine("Client Connected ... ");
Writer wr = new Writer(clientSocket);
wr.start();
Reader r = new Reader(clientSocket);
r.start();
Console.Read();
}
}
}
C# Server Reader Class
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;
using System.Net.Sockets;
using System.Net;
using System.IO;
namespace netSocketServer
{
class Reader
{
TcpClient socket;
NetworkStream ns;
public Reader(TcpClient s)
{
socket = s;
ns = socket.GetStream() ;
}
public void start()
{
new Thread(
t => {
while (true)
{
try
{
int size = ns.ReadByte();
byte[] buff = new byte[size];
ns.Read(buff,0,size);
String message = Encoding.UTF8.getString(buff);
Console.WriteLine("Message from Client : {0}",message);
ns.Flush();
}
catch (Exception e)
{
Console.WriteLine("Client Disconnected : " + e.Message);
}
}
}).Start();
}
}
}
C# Server Writer Class
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Net.Sockets;
using System.Text;
using System.Threading;
namespace netSocketServer
{
class Writer
{
TcpClient socket;
NetworkStream ns;
public Writer(TcpClient s)
{
socket = s;
ns = socket.GetStream();
}
public void start()
{
new Thread(
t => {
while (true)
{
try
{
Console.Write("Please Enter your Message : ");
string Message = Console.ReadLine();
byte[] buff = Encoding.UTF8.GetBytes(Message);
byte size = (byte)Message.Length;
ns.WriteByte(size);
ns.Write(buff, 0, buff.Length);
ns.Flush();
}
catch(IOException e)
{
Console.WriteLine("Client Disconnected : " + e.Message);
socket.Close();
Thread.CurrentThread.Abort();
Console.WriteLine("Press any key to Closse Server .... ");
}
}
}).Start();
}
}
}
Java Client - Main Function
package javaclient.net;
import java.io.IOException;
import java.net.Socket;
import java.util.Scanner;
/**
*
* #author Numan
*/
public class JavaClientNet {
/**
* #param args the command line arguments
*/
public static void main(String[] args)
{
Socket socket;
Read r;
Writer wr;
Scanner s = new Scanner(System.in);
try
{
// TODO code application logic here
System.out.print("Please Enter Server IP : ");
socket = new Socket(s.next(), 8888);
wr = new Writer(socket);
wr.start();
r = new Read(socket);
r.start();
}
catch (IOException ex)
{
System.out.println(ex.getMessage());
}
}
}
Java Client - Reader Class
package javaclient.net;
import java.io.IOException;
import java.io.ObjectInputStream;
import java.net.Socket;
/**
*
* #author Numan
*/
public class Read extends Thread
{
Socket socket;
ObjectInputStream inStream;
Read(Socket s)
{
socket = s;
try {
inStream = new ObjectInputStream(socket.getInputStream());
}
catch (IOException ex)
{
System.out.println(ex.getMessage());
}
}
#Override
public void run()
{
while(true)
{
try
{
String str;
byte size = inStream.readByte();
byte[] buf = new byte[size];
inStream.read(buf);
str = new String(buf);
System.out.println("Message form Server : "+str);
}
catch(IOException e)
{
System.out.println(e.getMessage());
Thread.currentThread().stop();
}
}
}
}
Java Client - Writer Class
package javaclient.net;
import java.io.IOException;
import java.io.ObjectOutputStream;
import java.net.Socket;
import java.util.Scanner;
import javacsconverter.core.tobyte.ToByteConvertHelper;
/**
*
* #author Numan
*/
public class Writer extends Thread
{
Socket socket;
ObjectOutputStream outStream;
Scanner scanner = new Scanner(System.in);
Writer(Socket s)
{
socket =s;
try
{
outStream = new ObjectOutputStream(socket.getOutputStream());
}
catch (IOException ex)
{
System.out.println(ex.getMessage());
}
}
#Override
public void run()
{
while(true)
{
try
{
System.out.print("Please Enter Your Message : ");
String str = scanner.nextLine();
byte[] buff = str.getBytes();
outStream.write(buff);
outStream.flush();
}
catch (IOException ex)
{
System.out.println(ex.getMessage());
}
}
}
}
General notes
Please do not abort the threads (both C# and Java).
C# Server
Program class
There is a data race because the static Console class is used by multiple threads:
Main thread: the Program.Main() method calls the Console.Read() method;
Worker thread: the Writer.start() method calls the Console.ReadLine() method.
Please consider replacing the Console.Read() method call of the Program.Main() method with something different, for example, Thread.Sleep(Timeout.Infinite).
Reader class
There is a mistake — the Stream.Read() method is not guaranteed to read the array of the specified "size" at once (one call), the return value should be used to determine the actual number of bytes read. Let's see the original implementation:
int size = ns.ReadByte();
byte[] buff = new byte[size];
// The Stream.Read() method does not guarantee to read the **whole array** "at once".
// Please use the return value of the method.
ns.Read(buff, 0, size);
String message = Encoding.UTF8.GetString(buff);
Corrected version:
/// <summary>
/// Helper method to read the specified byte array (number of bytes to read is the size of the array).
/// </summary>
/// <param name="inputStream">Input stream.</param>
/// <param name="buffer">The output buffer.</param>
private static void ReadFully(Stream inputStream, byte[] buffer)
{
if (inputStream == null)
{
throw new ArgumentNullException("inputStream");
}
if (buffer == null)
{
throw new ArgumentNullException("buffer");
}
int totalBytesRead = 0;
int bytesLeft = buffer.Length;
if (bytesLeft <= 0)
{
throw new ArgumentException("There is nothing to read for the specified buffer", "buffer");
}
while (totalBytesRead < buffer.Length)
{
var bytesRead = inputStream.Read(buffer, totalBytesRead, bytesLeft);
if (bytesRead > 0)
{
totalBytesRead += bytesRead;
bytesLeft -= bytesRead;
}
else
{
throw new InvalidOperationException("Input stream reaches the end before reading all the bytes");
}
}
}
public void start()
{
...
int size = ns.ReadByte();
byte[] buff = new byte[size];
ReadFully(ns, buff);
using (var memoryStream = new MemoryStream(buff, false))
{
// The StreamReader class is used to extract the UTF-8 string which is encoded with the byte order mark (BOM).
using (var streamReader = new StreamReader(memoryStream, Encoding.UTF8))
{
string message = streamReader.ReadToEnd();
Console.WriteLine("Message from Client: {0}", message);
}
}
...
}
Writer class
First of all, to describe and determine byte the order of the text stream consider including the byte order mark (BOM) for each message (for example).
Also, there is a mistake — wrong "buffer length" value is sent. Let's see the original implementation:
string Message = Console.ReadLine();
byte[] buff = Encoding.UTF8.GetBytes(Message);
// Problem: instead of the length of the string, the size of byte array must be used
// because the UTF-8 encoding is used: generally, string length != "encoded number of bytes".
byte size = (byte)Message.Length;
ns.WriteByte(size);
ns.Write(buff, 0, buff.Length);
ns.Flush();
Corrected version:
// UTF-8 with BOM.
var encoding = new UTF8Encoding(true);
// Buffer encoded as UTF-8 with BOM.
byte[] buff = encoding.GetPreamble()
.Concat(encoding.GetBytes(message))
.ToArray();
// Size of the encoded buffer.
byte size = Convert.ToByte(buff.Length);
ns.WriteByte(size);
ns.Write(buff, 0, buff.Length);
ns.Flush();
Alternative corrected version — the StreamWriter class is used to encode the string as UTF-8 with the byte order mark (BOM):
string message = Console.ReadLine();
using (var memoryStream = new MemoryStream())
{
using (var streamWriter = new StreamWriter(memoryStream, Encoding.UTF8, 1024, true))
{
streamWriter.Write(message);
}
memoryStream.Flush();
byte size = Convert.ToByte(memoryStream.Length);
ns.WriteByte(size);
memoryStream.Seek(0, SeekOrigin.Begin);
memoryStream.CopyTo(ns);
ns.Flush();
}
Java Client
Read class
First, please consider using DataInputStream class because the following statement is not true according to the question:
An ObjectInputStream deserializes primitive data and objects previously written using an ObjectOutputStream.
-- java.io.ObjectInputStream class, Java™ Platform
Standard Ed. 7.
The instantiation of the stream is almost the same:
inStream = new DataInputStream(socket.getInputStream());
Second, there is a mistake — reading the byte array, but ignoring the return value (actual number of bytes read):
String str;
byte size = inStream.readByte();
byte[] buf = new byte[size];
// The InputStream.read() method does not guarantee to read the **whole array** "at once".
// Please use the return value of the method.
inStream.read(buf);
str = new String(buf);
Third, as stated above, the byte order mark (BOM) is included.
Corrected version:
// Note: inStream must be an instance of DataInputStream class.
byte size = inStream.readByte();
byte[] buf = new byte[size];
// The DataInputStream.readFully() method reads the number of bytes required to fill the buffer entirely.
inStream.readFully(buf);
// Create in-memory stream for the byte array and read the UTF-8 string.
try (ByteArrayInputStream inputStream = new ByteArrayInputStream(buf);
// The BOMInputStream class belongs to Apache Commons IO library.
BOMInputStream bomInputStream = new BOMInputStream(inputStream, false)) {
String charsetName = bomInputStream.getBOMCharsetName();
// The IOUtils class belongs to Apache Commons IO library.
String message = IOUtils.toString(bomInputStream, charsetName);
System.out.println("Message form Server : " + message);
}
Writer class
There is a mistake — the encoding is not specified explicitly. Let's see the original implementation:
String str = scanner.nextLine();
byte[] buff = str.getBytes();
Corrected version:
String str = scanner.nextLine();
byte[] byteOrderMarkBytes = ByteOrderMark.UTF_8.getBytes();
byte[] stringBytes = str.getBytes(StandardCharsets.UTF_8);
// The ArrayUtils.addAll() method belongs to Apache Commons Lang library.
byte[] buff = ArrayUtils.addAll(byteOrderMarkBytes, stringBytes);
outStream.writeByte(buff.length);
outStream.write(buff);
outStream.flush();
Alternative corrected version — the ByteArrayOutputStream class is used to concatenate the arrays:
String str = scanner.nextLine();
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
byte[] byteOrderMarkBytes = ByteOrderMark.UTF_8.getBytes();
byteArrayOutputStream.write(byteOrderMarkBytes);
byte[] stringBytes = str.getBytes(StandardCharsets.UTF_8);
byteArrayOutputStream.write(stringBytes);
byteArrayOutputStream.flush();
byte[] buff = byteArrayOutputStream.toByteArray();
outStream.writeByte(buff.length);
outStream.write(buff);
outStream.flush();
Hope this helps!

Why gzip compressed buffer size is greater then uncompressed buffer?

I'm trying to write a compress utils class.
But during the test, I find the result it greater than original buffer.
Is my codes right ?
Please see codes:
/**
* This class provide compress ability
* <p>
* Support:
* <li>GZIP
* <li>Deflate
*/
public class CompressUtils {
final public static int DEFAULT_BUFFER_SIZE = 4096; // Compress/Decompress buffer is 4K
/**
* GZIP Compress
*
* #param data The data will be compressed
* #return The compressed data
* #throws IOException
*/
public static byte[] gzipCompress(byte[] data) throws IOException {
Validate.isTrue(ArrayUtils.isNotEmpty(data));
ByteArrayInputStream bis = new ByteArrayInputStream(data);
ByteArrayOutputStream bos = new ByteArrayOutputStream();
try {
gzipCompress(bis, bos);
bos.flush();
return bos.toByteArray();
} finally {
bis.close();
bos.close();
}
}
/**
* GZIP Decompress
*
* #param data The data to be decompressed
* #return The decompressed data
* #throws IOException
*/
public static byte[] gzipDecompress(byte[] data) throws IOException {
Validate.isTrue(ArrayUtils.isNotEmpty(data));
ByteArrayInputStream bis = new ByteArrayInputStream(data);
ByteArrayOutputStream bos = new ByteArrayOutputStream();
try {
gzipDecompress(bis, bos);
bos.flush();
return bos.toByteArray();
} finally {
bis.close();
bos.close();
}
}
/**
* GZIP Compress
*
* #param is The input stream to be compressed
* #param os The compressed result
* #throws IOException
*/
public static void gzipCompress(InputStream is, OutputStream os) throws IOException {
GZIPOutputStream gos = null;
byte[] buffer = new byte[DEFAULT_BUFFER_SIZE];
int count = 0;
try {
gos = new GZIPOutputStream(os);
while ((count = is.read(buffer)) != -1) {
gos.write(buffer, 0, count);
}
gos.finish();
gos.flush();
} finally {
if (gos != null) {
gos.close();
}
}
}
/**
* GZIP Decompress
*
* #param is The input stream to be decompressed
* #param os The decompressed result
* #throws IOException
*/
public static void gzipDecompress(InputStream is, OutputStream os) throws IOException {
GZIPInputStream gis = null;
int count = 0;
byte[] buffer = new byte[DEFAULT_BUFFER_SIZE];
try {
gis = new GZIPInputStream(is);
while ((count = is.read(buffer)) != -1) {
os.write(buffer, 0, count);
}
} finally {
if (gis != null) {
gis.close();
}
}
}
}
And here's the testing codes:
public class CompressUtilsTest {
private Random random = new Random();
#Test
public void gzipTest() throws IOException {
byte[] buffer = new byte[1023];
random.nextBytes(buffer);
System.out.println("Orignal: " + Hex.encodeHexString(buffer));
byte[] result = CompressUtils.gzipCompress(buffer);
System.out.println("Compressed: " + Hex.encodeHexString(result));
byte[] decompressed = CompressUtils.gzipDecompress(result);
System.out.println("DeCompressed: " + Hex.encodeHexString(decompressed));
Assert.assertArrayEquals(buffer, decompressed);
}
}
And the result is:
original is 1023 bytes long
compressed is 1036 bytes long
How is it happen ?
In your test you initialize the buffer with a set of random characters.
GZIP consists of two parts:
LZW compression
Encoding using a Huffman code
The former relies heavily on repeated sequences in the input. Basically it says something like: "The next 10 characters are the same as the 10 characters staring at index X".
In your case there are (possibly) no such repeated sequences, thus no compression by the first algorithm.
The Huffman encoding on the other hand should work, but in total the GZIP overhead (storing the used Huffman encoding, e.g.) outweighs the advantages of compressing the input.
If you test your algorithm with real files, you will get some meaningful results.
Best results are usually acquired when trying to compress structured files like XML.
It's because compression generally works great on medium to large data length (1023 bytes is quite small) and moreover it also works the best on data that contains repeated patterns not on random ones.

Compression / Decompression of Strings using the deflater

I want to compress/decompress and serialize/deserialize String content. I'm using the following two static functions.
/**
* Compress data based on the {#link Deflater}.
*
* #param pToCompress
* input byte-array
* #return compressed byte-array
* #throws NullPointerException
* if {#code pToCompress} is {#code null}
*/
public static byte[] compress(#Nonnull final byte[] pToCompress) {
checkNotNull(pToCompress);
// Compressed result.
byte[] compressed = new byte[] {};
// Create the compressor.
final Deflater compressor = new Deflater();
compressor.setLevel(Deflater.BEST_SPEED);
// Give the compressor the data to compress.
compressor.setInput(pToCompress);
compressor.finish();
/*
* Create an expandable byte array to hold the compressed data.
* You cannot use an array that's the same size as the orginal because
* there is no guarantee that the compressed data will be smaller than
* the uncompressed data.
*/
try (ByteArrayOutputStream bos = new ByteArrayOutputStream(pToCompress.length)) {
// Compress the data.
final byte[] buf = new byte[1024];
while (!compressor.finished()) {
final int count = compressor.deflate(buf);
bos.write(buf, 0, count);
}
// Get the compressed data.
compressed = bos.toByteArray();
} catch (final IOException e) {
LOGWRAPPER.error(e.getMessage(), e);
throw new RuntimeException(e);
}
return compressed;
}
/**
* Decompress data based on the {#link Inflater}.
*
* #param pCompressed
* input string
* #return compressed byte-array
* #throws NullPointerException
* if {#code pCompressed} is {#code null}
*/
public static byte[] decompress(#Nonnull final byte[] pCompressed) {
checkNotNull(pCompressed);
// Create the decompressor and give it the data to compress.
final Inflater decompressor = new Inflater();
decompressor.setInput(pCompressed);
byte[] decompressed = new byte[] {};
// Create an expandable byte array to hold the decompressed data.
try (final ByteArrayOutputStream bos = new ByteArrayOutputStream(pCompressed.length)) {
// Decompress the data.
final byte[] buf = new byte[1024];
while (!decompressor.finished()) {
try {
final int count = decompressor.inflate(buf);
bos.write(buf, 0, count);
} catch (final DataFormatException e) {
LOGWRAPPER.error(e.getMessage(), e);
throw new RuntimeException(e);
}
}
// Get the decompressed data.
decompressed = bos.toByteArray();
} catch (final IOException e) {
LOGWRAPPER.error(e.getMessage(), e);
}
return decompressed;
}
Yet, compared to non-compressed values it's orders of magnitudes slower even if I'm caching the decompressed-result and the values are only decompressed if the content is really needed.
That is, it's used for a DOM-like persistable tree-structure and XPath-queries which force the decompression of the String-values are about 50 times if not even more slower (not really benchmarked, just executed unit tests). My laptop even freezes after some unit tests (everytime, checked it about 5-times), because Eclipse isn't responding anymore due to heavy disk I/O and what not. I've even set the compression level to Deflater.BEST_SPEED, whereas other compression levels might be better, maybe I'm providing a configuration option parameter which can be set for resources. Maybe I've messed something up as I haven't used the deflater before. I'm even only compressing content where the String lenght is > 10.
Edit: After considering to extract the Deflater instantiation to a static field it seems creating an instance of deflater and inflater is very costly as the performance bottleneck is gone and perhaps without microbenchmarks or the like I can't see any performance loss :-) I'm just resetting the deflater/inflater before using a new input.
How you considered using the higher level api like Gzip.
Here is an example for compressing:
public static byte[] compressToByte(final String data, final String encoding)
throws IOException
{
if (data == null || data.length == 0)
{
return null;
}
else
{
byte[] bytes = data.getBytes(encoding);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
GZIPOutputStream os = new GZIPOutputStream(baos);
os.write(bytes, 0, bytes.length);
os.close();
byte[] result = baos.toByteArray();
return result;
}
}
Here is an example for uncompressing:
public static String unCompressString(final byte[] data, final String encoding)
throws IOException
{
if (data == null || data.length == 0)
{
return null;
}
else
{
ByteArrayInputStream bais = new ByteArrayInputStream(data);
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
GZIPInputStream is = new GZIPInputStream(bais);
byte[] tmp = new byte[256];
while (true)
{
int r = is.read(tmp);
if (r < 0)
{
break;
}
buffer.write(tmp, 0, r);
}
is.close();
byte[] content = buffer.toByteArray();
return new String(content, 0, content.length, encoding);
}
}
We get very good performance and compression ratio with this.
The zip api is also an option.
Your comments are the correct answer.
In general, if a method is going to be used frequently, you want to eliminate any allocations and copying of data. This often means removing instance initialization and other setup to either static variables or to the constructor.
Using statics is easier, but you may run into lifetime issues (as in how do you know when to clean up the statics - do they exist forever?).
Doing the setup and initialization in the constructor allows the user of the class to determine the lifetime of the object and clean up appropriately. You could instantiate it once before going into a processing loop and GC it after exiting.

Categories

Resources