Convert byte stream to byte array without extra space

Convert byte stream to byte array without extra space - java

I have a ByteArrayOutputStream that has large amounts of data written into, which is ultimately converted into a byte array and written to a cache:
try {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
try (JsonGenerator jg = mapper.getFactory().createGenerator(baos)) {
for (Object result : results) {
jg.writeObject(result);
}
}
localCache.put(cacheKey, baos.toByteArray());
}
catch (IOException e) {
throw Throwables.propagate(e);
}
Here baos.toByteArray() creates a whole new copy of the data in memory, which I'm trying to avoid. Is there a way to convert the stream to a byte array without using the extra memory?

The internal buffer and current count are protected fields documented in the Javadoc. This means you should be OK to subclass ByteArrayOutputStream and provide a byte[] getBuffer() method to access the buffer directly. Use the existing size() method to determine how much data is present.
public class MyBAOS extends ByteArrayOutputStream
{
public MyBAOS() { super(); }
public MyBAOS(int size) { super(size); }
public byte[] getBuffer() { return buf; }
}

Related

Java byte array compression

I'm trying to use the java DeflaterOutputStream and InflaterOutputStream classes to compress a byte array, but both appear to not be working correctly. I assume I'm incorrectly implementing them.
public static byte[] compress(byte[] in) {
try {
ByteArrayOutputStream out = new ByteArrayOutputStream();
DeflaterOutputStream defl = new DeflaterOutputStream(out);
defl.write(in);
defl.flush();
defl.close();
return out.toByteArray();
} catch (Exception e) {
e.printStackTrace();
System.exit(150);
return null;
}
}
public static byte[] decompress(byte[] in) {
try {
ByteArrayOutputStream out = new ByteArrayOutputStream();
InflaterOutputStream infl = new InflaterOutputStream(out);
infl.write(in);
infl.flush();
infl.close();
return out.toByteArray();
} catch (Exception e) {
e.printStackTrace();
System.exit(150);
return null;
}
}
Here's the two methods I'm using to compress and decompress the byte array. Most implementations I've seen online use a fixed size buffer array for the decompression portion, but I'd prefer to avoid that if possible, because I'd need to make that buffer array have a size of one if I want to have any significant compression.
If anyone can explain to me what I'm doing wrong it would be appreciated. Also, to explain why I know these methods aren't working correctly: The "compressed" byte array that it outputs is always larger than the uncompressed one, no matter what size byte array I attempt to provide it.

This will depend on the data you are compressing. For example if we take an array of 0 bytes it compresses well:
byte[] plain = new byte[10000];
byte[] compressed = compress(plain);
System.out.println(compressed.length); // 33
byte[] result = decompress(compressed);
System.out.println(result.length); // 10000

Compression always has overhead to allow for future decompression. If the compression produced no reduction in length (the data was unique or nearly unique) then the output file could be longer than the input file

Hash a whole object without converting to byte[] first

I want to get sha2 hash of particular java object. I don't want it to be int, I want byte[] or at least String. I've got the following code to create sha2:
static byte[] sha2(byte[] message) {
if (message == null || message.length == 0) {
throw new IllegalArgumentException("message is null");
}
try {
MessageDigest sha256 = MessageDigest.getInstance(SHA_256);
return sha256.digest(message);
} catch (NoSuchAlgorithmException e) {
throw new IllegalArgumentException(e);
}
}
I can just convert my object to byte[], but I don't think it's a good idea to store big array in memory just to create 32 byte array. So how can I compute sha2(or maybe another crypto hash function) of object?

You do not have to load the whole object into memory, you can load parts of it into temporary buffer.
Dump object into a temporary file using FileOutputStream/BufferedOutputStream, this will make sure serialized object does not pollute JVM memory.
The load serialize object from temporary file using FileInputStream/BufferedInputStream and feed it to MessageDigest#update(buf) method in a loop.
Finally call MessageDigest#digest() to finish work:
int[] buf = new int[1024];
while (/* has more data */) {
int readBytes = readIntoBuf(buf);
sha256.update(buf, 0, readBytes);
}
return sha256.digest();
If you can afford to store entire serialized object in memory, use ByteArrayOutputStream and pass result byte[] to MessageDigest#digest(buf):
try (ByteArrayOutputStream baos = new ByteArrayOutputStream();
ObjectOutputStream oos = new ObjectOuputStream(baos)) {
oos.writeObject(obj);
MessageDigest sha256 = MessageDigest.getInstance(SHA_256);
return sha256.digest(baos.toByteArray());
}

Serialization gives wrong size of object

I'm working on an Android app where I use serialization to convert an object to a byte array. After the conversion I read the size and I got a much bigger value of the byte array.
The method that I have made is as followed:
public void Send(testpacket packet){
try
{
// First convert the CommStruct to a byte array
// Then send the byte array
byte [] buffer = toByteArray(packet);
int size = buffer.length;
System.out.println("SIZE OF BYTE ARRAY: " + size);
server.send(buffer);
}
catch (IOException e)
{
Log.e("USBCommunicator", "problem sending TCP message", e);
}
}
The serialization method toByteArray converts a object to a byte array and looks as followed:
public static byte[] toByteArray(Object obj) throws IOException {
byte[] bytes = null;
ByteArrayOutputStream bos = null;
ObjectOutputStream oos = null;
try {
bos = new ByteArrayOutputStream();
oos = new ObjectOutputStream(bos);
oos.writeObject(obj);
oos.flush();
bytes = bos.toByteArray();
} finally {
if (oos != null) {
Log.i(TAG, "not null");
oos.close();
}
if (bos != null) {
bos.close();
Log.i(TAG, "not null");
}
}
return bytes;
}
The object packet consists of two classes with a total of 7 integers (so the size should be 28 bytes). And is defined as followed:
public class testpacket implements java.io.Serializable {
public ObjectInfo VisionData;
public SensorDataStruct SensorData;
//Constructor
public testpacket(){
// Call constructors
VisionData = new ObjectInfo();
SensorData = new SensorDataStruct();
}
}
ObjectInfo consists of the following:
//ObjectInfo struct definition
public class ObjectInfo implements java.io.Serializable
{
public int ObjectXCor;
public int ObjectYCor;
public int ObjectMass;
//Constructor
public ObjectInfo(){
ObjectMass = 0;
ObjectXCor = 0;
ObjectYCor = 0;
}
};
And SensorDataStruct is as followed:
//ObjectInfo struct definition
public class SensorDataStruct implements java.io.Serializable
{
public int PingData;
public int IRData;
public int ForceData;
public int CompassData;
//Constructor
public SensorDataStruct(){
CompassData = 0;
ForceData = 0;
IRData = 0;
PingData = 0;
}
};
But when I read out the length of the byte buffer after the convertion the size is 426. Does anybody have a idea or suggestion why this is not 28 bytes? If i need to supply more information please say so! Any tips and suggestions are welcome!
Update
I have changed the code with the help of EJP. I use the DataOutputStream to convert the object data (the actual variable data) to bytes. The object decribed above in this post contains 7 integers and when the object it created the starting values is for all these integers 0.
The convertion function is as followed:
public static byte[] toByteArray(testpacket obj) throws IOException {
byte[] bytes = null;
ByteArrayOutputStream baos = new ByteArrayOutputStream();
DataOutputStream w = new DataOutputStream(baos);
w.write(obj.SensorData.CompassData);
w.write(obj.SensorData.ForceData);
w.write(obj.SensorData.IRData);
w.write(obj.SensorData.PingData);
w.write(obj.VisionData.ObjectMass);
w.write(obj.VisionData.ObjectXCor);
w.write(obj.VisionData.ObjectYCor);
//w.flush();
bytes = baos.toByteArray();
int size = bytes.length;
System.out.println("SIZE OF BYTE ARRAY IN CONVERTION FUNCTION: " + size);
return bytes;
}
Now i only have one question: the size is 7 when i read out the size of the byte buffer. This is (i think) because of the that all values (0's) of the integers are so small that they fit in one byte each. My question is how can i make this so for each integer value Always four bytes will be used in the datastream? Any suggestions are welcome!

The serialized stream for your object contains:
An object stream header.
Tag information saying the next item is an object.
Class information for the object.
Version information for the object.
Type-name-value tuples, for each serialized member of the object.

Java decompressing array of bytes

On server (C++), binary data is compressed using ZLib function:
compress2()
and it's sent over to client (Java).
On client side (Java), data should be decompressed using the following code snippet:
public static String unpack(byte[] packedBuffer) {
InflaterInputStream inStream = new InflaterInputStream(new ByteArrayInputStream( packedBuffer);
ByteArrayOutputStream outStream = new ByteArrayOutputStream();
int readByte;
try {
while((readByte = inStream.read()) != -1) {
outStream.write(readByte);
}
} catch(Exception e) {
JMDCLog.logError(" unpacking buffer of size: " + packedBuffer.length);
e.printStackTrace();
// ... the rest of the code follows
}
Problem is that when it tries to read in while loop it always throws:
java.util.zip.ZipException: invalid stored block lengths
Before I check for other possible causes can someone please tell me can I compress on one side with compress2 and decompress it on the other side using above code, so I can eliminate this as a problem? Also if someone has a possible clue about what might be wrong here (I know I didn't provide too much of of the code in here but projects are rather big.
Thanks.

I think the problem is not with unpack method but in packedBuffer content. Unpack works fine
public static byte[] pack(String s) throws IOException {
ByteArrayOutputStream out = new ByteArrayOutputStream();
DeflaterOutputStream dout = new DeflaterOutputStream(out);
dout.write(s.getBytes());
dout.close();
return out.toByteArray();
}
public static void main(String[] args) throws Exception {
byte[] a = pack("123");
String s = unpack(a); // calls your unpack
System.out.println(s);
}
output
123

public static String unpack(byte[] packedBuffer) {
try (GZipInputStream inStream = new GZipInputStream(
new ByteArrayInputStream(packedBuffer));
ByteArrayOutputStream outStream = new ByteArrayOutputStream()) {
inStream.transferTo(outStream);
//...
return outStream.toString(StandardCharsets.UTF_8);
} catch(Exception e) {
JMDCLog.logError(" unpacking buffer of size: " + packedBuffer.length);
e.printStackTrace();
throw new IllegalArgumentException(e);
}
}
ZLib is the zip format, hence a GZipInputStream is fine.
A you seem to expect the bytes to represent text, hence be in some encoding, add that encoding, Charset, to the conversion to String (which always holds Unicode).
Note, UTF-8 is the encoding of the bytes. In your case it might be an other encoding.
The ugly try-with-resources syntax closes the streams even on exception or here the return.
I rethrowed a RuntimeException as it seems dangerous to do something with no result.

Compression / Decompression of Strings using the deflater

I want to compress/decompress and serialize/deserialize String content. I'm using the following two static functions.
/**
* Compress data based on the {#link Deflater}.
*
* #param pToCompress
* input byte-array
* #return compressed byte-array
* #throws NullPointerException
* if {#code pToCompress} is {#code null}
*/
public static byte[] compress(#Nonnull final byte[] pToCompress) {
checkNotNull(pToCompress);
// Compressed result.
byte[] compressed = new byte[] {};
// Create the compressor.
final Deflater compressor = new Deflater();
compressor.setLevel(Deflater.BEST_SPEED);
// Give the compressor the data to compress.
compressor.setInput(pToCompress);
compressor.finish();
/*
* Create an expandable byte array to hold the compressed data.
* You cannot use an array that's the same size as the orginal because
* there is no guarantee that the compressed data will be smaller than
* the uncompressed data.
*/
try (ByteArrayOutputStream bos = new ByteArrayOutputStream(pToCompress.length)) {
// Compress the data.
final byte[] buf = new byte[1024];
while (!compressor.finished()) {
final int count = compressor.deflate(buf);
bos.write(buf, 0, count);
}
// Get the compressed data.
compressed = bos.toByteArray();
} catch (final IOException e) {
LOGWRAPPER.error(e.getMessage(), e);
throw new RuntimeException(e);
}
return compressed;
}
/**
* Decompress data based on the {#link Inflater}.
*
* #param pCompressed
* input string
* #return compressed byte-array
* #throws NullPointerException
* if {#code pCompressed} is {#code null}
*/
public static byte[] decompress(#Nonnull final byte[] pCompressed) {
checkNotNull(pCompressed);
// Create the decompressor and give it the data to compress.
final Inflater decompressor = new Inflater();
decompressor.setInput(pCompressed);
byte[] decompressed = new byte[] {};
// Create an expandable byte array to hold the decompressed data.
try (final ByteArrayOutputStream bos = new ByteArrayOutputStream(pCompressed.length)) {
// Decompress the data.
final byte[] buf = new byte[1024];
while (!decompressor.finished()) {
try {
final int count = decompressor.inflate(buf);
bos.write(buf, 0, count);
} catch (final DataFormatException e) {
LOGWRAPPER.error(e.getMessage(), e);
throw new RuntimeException(e);
}
}
// Get the decompressed data.
decompressed = bos.toByteArray();
} catch (final IOException e) {
LOGWRAPPER.error(e.getMessage(), e);
}
return decompressed;
}
Yet, compared to non-compressed values it's orders of magnitudes slower even if I'm caching the decompressed-result and the values are only decompressed if the content is really needed.
That is, it's used for a DOM-like persistable tree-structure and XPath-queries which force the decompression of the String-values are about 50 times if not even more slower (not really benchmarked, just executed unit tests). My laptop even freezes after some unit tests (everytime, checked it about 5-times), because Eclipse isn't responding anymore due to heavy disk I/O and what not. I've even set the compression level to Deflater.BEST_SPEED, whereas other compression levels might be better, maybe I'm providing a configuration option parameter which can be set for resources. Maybe I've messed something up as I haven't used the deflater before. I'm even only compressing content where the String lenght is > 10.
Edit: After considering to extract the Deflater instantiation to a static field it seems creating an instance of deflater and inflater is very costly as the performance bottleneck is gone and perhaps without microbenchmarks or the like I can't see any performance loss :-) I'm just resetting the deflater/inflater before using a new input.

How you considered using the higher level api like Gzip.
Here is an example for compressing:
public static byte[] compressToByte(final String data, final String encoding)
throws IOException
{
if (data == null || data.length == 0)
{
return null;
}
else
{
byte[] bytes = data.getBytes(encoding);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
GZIPOutputStream os = new GZIPOutputStream(baos);
os.write(bytes, 0, bytes.length);
os.close();
byte[] result = baos.toByteArray();
return result;
}
}
Here is an example for uncompressing:
public static String unCompressString(final byte[] data, final String encoding)
throws IOException
{
if (data == null || data.length == 0)
{
return null;
}
else
{
ByteArrayInputStream bais = new ByteArrayInputStream(data);
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
GZIPInputStream is = new GZIPInputStream(bais);
byte[] tmp = new byte[256];
while (true)
{
int r = is.read(tmp);
if (r < 0)
{
break;
}
buffer.write(tmp, 0, r);
}
is.close();
byte[] content = buffer.toByteArray();
return new String(content, 0, content.length, encoding);
}
}
We get very good performance and compression ratio with this.
The zip api is also an option.

Your comments are the correct answer.
In general, if a method is going to be used frequently, you want to eliminate any allocations and copying of data. This often means removing instance initialization and other setup to either static variables or to the constructor.
Using statics is easier, but you may run into lifetime issues (as in how do you know when to clean up the statics - do they exist forever?).
Doing the setup and initialization in the constructor allows the user of the class to determine the lifetime of the object and clean up appropriately. You could instantiate it once before going into a processing loop and GC it after exiting.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Convert byte stream to byte array without extra space - java

Related

Java byte array compression

Hash a whole object without converting to byte[] first

Serialization gives wrong size of object

Java decompressing array of bytes

Compression / Decompression of Strings using the deflater

Categories

Resources