Why does Java serialization take up so much space? - java

I tried serializing instances of Byte and Integer and was shocked by how much space they took up when they were received on the other end. Why is it that it only takes 4 bytes to make an Integer, but it takes up over 10 times that many bytes upon serialization? I mean in C++, a final class has a 64 bit class identifier, plus its contents. Going off that logic, I would expect an Integer to take up 64 + 32, or 96 bits when serialized.
import java.io.*;
public class Test {
public static void main (String[] ar) throws Exception {
ByteArrayOutputStream bos = new ByteArrayOutputStream();
ObjectOutput out = new ObjectOutputStream(bos);
out.writeObject(new Integer(32));
byte[] yourBytes = bos.toByteArray();
System.out.println("length: " + yourBytes.length + " bytes");
}
}
Output:
length: 81 bytes
Update:
public static void main(String[] args) throws IOException {
{
ByteArrayOutputStream bos1 = new ByteArrayOutputStream();
ObjectOutput out1 = new ObjectOutputStream(bos1);
out1.writeObject(new Boolean(false));
byte[] yourBytes = bos1.toByteArray();
System.out.println("1 Boolean length: " + yourBytes.length);
}
ByteArrayOutputStream bos = new ByteArrayOutputStream();
ObjectOutput out = new ObjectOutputStream(bos);
for (int i = 0; i < 1000; ++i) {
out.writeObject(new Boolean(true)); // 47 bytes
}
byte[] yourBytes = bos.toByteArray();
System.out.println("1000 Booleans length: " + yourBytes.length); // 7040 bytes
final int count = 1000;
ArrayList<Boolean> listBoolean = new ArrayList<>(count);
listBoolean.addAll(Collections.nCopies(count, Boolean.TRUE));
System.out.printf("ArrayList: %d%n", sizeOf(listBoolean)); // 5096 bytes
Boolean[] arrayBoolean = new Boolean[count];
Arrays.fill(arrayBoolean, true);
System.out.printf("Boolean[]: %d%n", sizeOf(arrayBoolean)); // 5083 bytes
boolean[] array = new boolean[count];
Arrays.fill(array, true);
System.out.printf("boolean[]: %d%n", sizeOf(array)); // 1027 bytes
BitSet bits = new BitSet(count);
bits.set(0, count);
System.out.printf("BitSet: %d%n", sizeOf(bits)); // 201 bytes
}
static int sizeOf(Serializable obj) throws IOException {
ByteArrayOutputStream bytesOut = new ByteArrayOutputStream();
ObjectOutputStream objsOut = new ObjectOutputStream(bytesOut);
objsOut.writeObject(obj);
return bytesOut.toByteArray().length;
}
Output:
1 Boolean length: 47 (47 bytes per Boolean)
1000 Booleans length: 7040 (7 bytes per Boolean)
ArrayList: 5096 (5 bytes per Boolean)
Boolean[]: 5083 (5 bytes per Boolean)
boolean[]: 1027 (1 bytes per boolean)
BitSet: 201 (1/5 of 1 byte per boolean)

Though Radiodef has clarified why the size of the serialized object is huge, i would like to make another point here so we don't forget the optimization present in the underlying java's serialization algorithm (almost in all algorithms).
When you write another Integer object (or any object which is already written), you would not see similar size (i mean the size would not be 81 * 2 = 162 bytes) in this case,
ObjectOutput out = new ObjectOutputStream(bos);
out.writeObject(new Integer(32));
out.writeObject(new Integer(65));
byte[] yourBytes = bos.toByteArray();
System.out.println("length: " + yourBytes.length + " bytes");
The way it works is that, when an instance (object) of class is requested for serialization for the first time, it writes the information about the whole class. i.e including class name, it writes the name of each fields present in the class. That's why the number of bytes are more. This is basically to handle the class evaluation cases properly.
While it sends the meta data of the class for first time, it also caches the same information into the local cache called value-cache or indirection table. So next time when another instance of same class is requested for serialization (remember the cache is applicable only at stream level, or before reset() is called), it just writes only a marker (just 4 bytes of information) so that the size would be less.

java.lang.Byte and java.lang.Integer are objects, so at the very least the qualified names of their classes need to also be stored for them to be deserialized. Also the serialVersionUID needs to be stored, etc. We can easily see how this extra information inflates the size quickly.
If you want to learn about the serialization format, there is an article about it at JavaWorld: http://www.javaworld.com/article/2072752/the-java-serialization-algorithm-revealed.html.
If you're concerned about the size of serialized data, pick a format which is more compact:
import java.util.*;
import java.io.*;
class Example {
public static void main(String[] args) throws IOException {
final int count = 1000;
ArrayList<Boolean> list = new ArrayList<>(count);
list.addAll(Collections.nCopies(count, Boolean.TRUE));
System.out.printf("ArrayList: %d%n", sizeOf(list));
boolean[] array = new boolean[count];
Arrays.fill(array, true);
System.out.printf("boolean[]: %d%n", sizeOf(array));
BitSet bits = new BitSet(count);
bits.set(0, count);
System.out.printf("BitSet: %d%n", sizeOf(bits));
}
static int sizeOf(Serializable obj) throws IOException {
ByteArrayOutputStream bytesOut = new ByteArrayOutputStream();
ObjectOutputStream objsOut = new ObjectOutputStream(bytesOut);
objsOut.writeObject(obj);
return bytesOut.toByteArray().length;
}
}
ArrayList: 5096
boolean[]: 1027
BitSet: 201
Example on Ideone.

Related

Gzip decompression adding one extra byte ... Why?

I've written a simple Java code snippet which takes a String, converts it to byte[], and then compresses it using Gzip. Then it decompresses the result to get back the byte[], which now contains one extra garbage value byte. Why is there a garbage value byte here ??
public static void main(String[] args) throws Exception {
String testString = "Sample String here";
byte[] originalBytes = testString.getBytes();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
GZIPOutputStream gzos = new GZIPOutputStream(baos);
gzos.write(originalBytes);
gzos.close();
byte[] compressedBytes = baos.toByteArray();
ByteArrayInputStream bais = new ByteArrayInputStream(compressedBytes);
GZIPInputStream gzis = new GZIPInputStream(bais);
ByteArrayOutputStream dbaos = new ByteArrayOutputStream();
while(gzis.available() > 0) {
dbaos.write(gzis.read());
}
byte[] decompressedBytes = dbaos.toByteArray();
String decompressedString = new String(decompressedBytes);
System.out.println(">>" + decompressedString + "<<");
System.out.println("Size of bytes before: " + originalBytes.length);
System.out.println("Size of bytes after: " + decompressedBytes.length);
}
Output:
>>Sample String here�<<
Size of bytes before: 18
Size of bytes after: 19
Can someone tell me why is there a garbage value byte ? How do I get rid of it WITHOUT changing the setup of the code above ??
You are using available() here, so you get one extra byte. You should be reading the stream and checking for a value less than 0. Change this
ByteArrayOutputStream dbaos = new ByteArrayOutputStream();
while(gzis.available() > 0) {
dbaos.write(gzis.read());
}
to something like
ByteArrayOutputStream dbaos = new ByteArrayOutputStream();
int b;
while ((b = gzis.read()) >= 0) {
dbaos.write(b);
}
and I get
>>Sample String here<<
Size of bytes before: 18
Size of bytes after: 18

Java compress byte array and base 64 encode to base 64 decode and decompress byte array error: different sized input/output arrays

My application requires a list of doubles encoded as a byte array with little endian encoding that has been zlib compressed and then encoded as base 64. I wrote up a harness to test my encoding, which wasn't working. I was able to make progress.
However, I noticed that when I attempt to decompress to a fixed size buffer, I am able to come up with input such that the size of the decompressed byte array is smaller than the original byte array, which obviously isn't right. Coincident with this, the last double in the list disappears. On most inputs, the fixed buffer size reproduces the input. Does anyone know why that would be? I am guessing the error is in the way I am encoding the data, but I can't figure out what is going wrong.
When I try using a ByteArrayOutputStream to handle variable-length output of arbitrary size (which will be important for the real version of the code, as I can't guarantee max size limits), the inflate method of Inflater continuously returns 0. I looked up the documentation and it said this means it needs more data. Since there is no more data, I again suspect my encoding, and guess that it is the same issue causing the previously explained behavior.
In my code I've included an example of data that works fine with the fixed buffer size, as well as data that doesn't work for fixed buffer. Both data sets will cause the variable buffer size error I explained.
Any clues as to what I am doing wrong? Many thanks.
import java.io.ByteArrayOutputStream;
import java.io.UnsupportedEncodingException;
import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.util.ArrayList;
import java.util.zip.DataFormatException;
import java.util.zip.Deflater;
import java.util.zip.Inflater;
import org.apache.commons.codec.binary.Base64;
public class BinaryReaderWriter {
public static void main(String [ ] args) throws UnsupportedEncodingException, DataFormatException
{
// this input will break the fixed buffer method
//double[] centroids = {123.1212234143345453223123123, 28464632322456781.23, 3123121.0};
// this input will break the fixed buffer method
double[] centroids = {123.1212234143345453223123123, 28464632322456781.23, 31.0};
BinaryReaderWriter brw = new BinaryReaderWriter();
String output = brw.compressCentroids(centroids);
brw.decompressCentroids(output);
}
void decompressCentroids(String encoded) throws DataFormatException{
byte[] binArray = Base64.decodeBase64(encoded);
// This block of code is the fixed buffer version
//
System.out.println("binArray length " + binArray.length);
Inflater deCompressor = new Inflater();
deCompressor.setInput(binArray, 0, binArray.length);
byte[] decompressed = new byte[1024];
int decompressedLength = deCompressor.inflate(decompressed);
deCompressor.end();
System.out.println("decompressedLength = " + decompressedLength);
byte[] decompressedData = new byte[decompressedLength];
for(int i=0;i<decompressedLength;i++){
decompressedData[i] = decompressed[i];
}
/*
// This block of code is the variable buffer version
//
ByteArrayOutputStream bos = new ByteArrayOutputStream(binArray.length);
Inflater deCompressor = new Inflater();
deCompressor.setInput(binArray, 0, binArray.length);
byte[] decompressed = new byte[1024];
while (!deCompressor.finished()) {
int decompressedLength = deCompressor.inflate(decompressed);
bos.write(decompressed, 0, decompressedLength);
}
deCompressor.end();
byte[] decompressedData = bos.toByteArray();
*/
ByteBuffer bb = ByteBuffer.wrap(decompressedData);
bb.order(ByteOrder.LITTLE_ENDIAN);
System.out.println("decompressedData length = " + decompressedData.length);
double[] doubleValues = new double[decompressedData.length / 8];
for (int i = 0; i< doubleValues.length; i++){
doubleValues[i] = bb.getDouble(i * 8);
}
for(double dbl : doubleValues){
System.out.println(dbl);
}
}
String compressCentroids(double[] centroids){
byte[] cinput = new byte[centroids.length * 8];
ByteBuffer buf = ByteBuffer.wrap(cinput);
buf.order(ByteOrder.LITTLE_ENDIAN);
for (double cent : centroids){
buf.putDouble(cent);
}
byte[] input = buf.array();
System.out.println("raw length = " + input.length);
byte[] output = new byte[input.length];
Deflater compresser = new Deflater();
compresser.setInput(input);
compresser.finish();
int compressedLength = compresser.deflate(output);
compresser.end();
System.out.println("Compressed length = " + compressedLength);
byte[] compressed = new byte[compressedLength];
for(int i = 0; i < compressedLength; i++){
compressed[i] = output[i];
}
String decrypted = Base64.encodeBase64String(compressed);
return decrypted;
}
}
When compressing data what we are really doing is re-encoding to increase entropy in the data. During the reecoding precess we have to add meta data to tell us how we have encoded the data so it can be converted back to what it was previously.
Compression will only be successful if the meta data size is less than the space we save by reencoding the data.
Consider Huffman encoding:
Huffman is a simple encoding scheme where we replace the fixed width character set with a variable width character set plus a charset length table. The length table size will be greater than 0 for obvious reasons. If all characters appear with a near equal distribution we will not be able to save any space. So our compressed data ends up being larger than our uncompressed data.

Serialization gives wrong size of object

I'm working on an Android app where I use serialization to convert an object to a byte array. After the conversion I read the size and I got a much bigger value of the byte array.
The method that I have made is as followed:
public void Send(testpacket packet){
try
{
// First convert the CommStruct to a byte array
// Then send the byte array
byte [] buffer = toByteArray(packet);
int size = buffer.length;
System.out.println("SIZE OF BYTE ARRAY: " + size);
server.send(buffer);
}
catch (IOException e)
{
Log.e("USBCommunicator", "problem sending TCP message", e);
}
}
The serialization method toByteArray converts a object to a byte array and looks as followed:
public static byte[] toByteArray(Object obj) throws IOException {
byte[] bytes = null;
ByteArrayOutputStream bos = null;
ObjectOutputStream oos = null;
try {
bos = new ByteArrayOutputStream();
oos = new ObjectOutputStream(bos);
oos.writeObject(obj);
oos.flush();
bytes = bos.toByteArray();
} finally {
if (oos != null) {
Log.i(TAG, "not null");
oos.close();
}
if (bos != null) {
bos.close();
Log.i(TAG, "not null");
}
}
return bytes;
}
The object packet consists of two classes with a total of 7 integers (so the size should be 28 bytes). And is defined as followed:
public class testpacket implements java.io.Serializable {
public ObjectInfo VisionData;
public SensorDataStruct SensorData;
//Constructor
public testpacket(){
// Call constructors
VisionData = new ObjectInfo();
SensorData = new SensorDataStruct();
}
}
ObjectInfo consists of the following:
//ObjectInfo struct definition
public class ObjectInfo implements java.io.Serializable
{
public int ObjectXCor;
public int ObjectYCor;
public int ObjectMass;
//Constructor
public ObjectInfo(){
ObjectMass = 0;
ObjectXCor = 0;
ObjectYCor = 0;
}
};
And SensorDataStruct is as followed:
//ObjectInfo struct definition
public class SensorDataStruct implements java.io.Serializable
{
public int PingData;
public int IRData;
public int ForceData;
public int CompassData;
//Constructor
public SensorDataStruct(){
CompassData = 0;
ForceData = 0;
IRData = 0;
PingData = 0;
}
};
But when I read out the length of the byte buffer after the convertion the size is 426. Does anybody have a idea or suggestion why this is not 28 bytes? If i need to supply more information please say so! Any tips and suggestions are welcome!
Update
I have changed the code with the help of EJP. I use the DataOutputStream to convert the object data (the actual variable data) to bytes. The object decribed above in this post contains 7 integers and when the object it created the starting values is for all these integers 0.
The convertion function is as followed:
public static byte[] toByteArray(testpacket obj) throws IOException {
byte[] bytes = null;
ByteArrayOutputStream baos = new ByteArrayOutputStream();
DataOutputStream w = new DataOutputStream(baos);
w.write(obj.SensorData.CompassData);
w.write(obj.SensorData.ForceData);
w.write(obj.SensorData.IRData);
w.write(obj.SensorData.PingData);
w.write(obj.VisionData.ObjectMass);
w.write(obj.VisionData.ObjectXCor);
w.write(obj.VisionData.ObjectYCor);
//w.flush();
bytes = baos.toByteArray();
int size = bytes.length;
System.out.println("SIZE OF BYTE ARRAY IN CONVERTION FUNCTION: " + size);
return bytes;
}
Now i only have one question: the size is 7 when i read out the size of the byte buffer. This is (i think) because of the that all values (0's) of the integers are so small that they fit in one byte each. My question is how can i make this so for each integer value Always four bytes will be used in the datastream? Any suggestions are welcome!
The serialized stream for your object contains:
An object stream header.
Tag information saying the next item is an object.
Class information for the object.
Version information for the object.
Type-name-value tuples, for each serialized member of the object.

Breaking binary files

With reference to my previous question.
I have made the program with the following approach:
The program first reads 2k of data from the file and stores it into a byte array.
Then the data to be added to each packet is also stored in an array and both are added to an array list.
The array list is then written to an output stream for the file.
The Code is here:
File bin=chooser.getSelectedFile();
int filesize=(int)bin.length();
int pcount=filesize/2048;
byte[] file=new byte[filesize];
byte[] meta=new byte[12];
int arraysize=pcount*12+filesize;
byte[] rootfile=new byte[46];
ArrayList al = new ArrayList();
String root;
prbar.setVisible(true);
int mark=0;
String metas;
try{
FileInputStream fis=new FileInputStream(bin);
FileOutputStream fos=new FileOutputStream(bin.getName().replace(".bin", ".xyz"));
ObjectOutputStream os=new ObjectOutputStream(fos);
root="46kb"+"5678"+"0000"+pcount+"MYBOX"+"13"+"S208";
rootfile=root.getBytes();
for(int n=0;n<=pcount;n++)
{
fis.read(file, 0, 2048);
mark=mark+2048;
int v=(mark/filesize)*100;
prbar.setValue(v);
metas="02KB"+"1234"+n;
meta=metas.getBytes();
al.add(rootfile);
al.add(meta);
al.add(file);
}
os.writeObject(al.toArray());
}
catch(Exception ex){
erlabel.setText(ex.getMessage());
}
The program runs without any errors but the file is not created correctly.
Either the approach is wrong or the code.
Please Help
You appear to be writing your own binary format but you are using ObjectOutputStream which has it own header. writeObject write an Object not data in a manner that lets a Java process deserialize that object e.g. with it class hierarchy and field names.
For binary, I suggest you use a plain DataOutputStream with a BufferedOutputStream which will be more efficient and do what you want.
I also suggest you write the data as you generate it rather than using an ArrayList. This will use less memory, make the code simpler and be faster.
I would write the code more like this
File bin = chooser.getSelectedFile();
int filesize = (int) bin.length();
int pcount = (filesize + 2048 - 1) / 2048;
byte[] file = new byte[2048];
FileInputStream fis = new FileInputStream(bin);
String name2 = bin.getName().replace(".bin", ".xyz");
OutputStream os = new BufferedOutputStream(new FileOutputStream(name2));
byte[] rootfile = ("46kb" + "5678" + "0000" + pcount + "MYBOX" + "13" + "S208").getBytes("UTF-8");
for (int n = 0; n < pcount; n++) {
os.write(rootfile);
byte[] metas = ("02KB" + "1234" + n).getBytes("UTF-8");
os.write(metas);
int len = fis.read(file);
os.write(file, 0, len);
int percent = 100 * n / pcount;
prbar.setValue(percent);
}
ow.close();
With the smallest thing first:
int v=(mark/filesize)*100;
Is using integer division yielding always 0 I think.
int v = mark * 100 / filesize;
The byte[] object (file for instance) is created once and many times added to the list.
You get n copies of the last overwrite.

Converting String to InputStream, and OutputStream to String back again

I am trying to do such conversions, but i have a little problem.
Let's say i have a following String:
String in = "1234567890123456";
Then I convert it to ByteArrayInputStream like this:
ByteArrayInputStream bais = new ByteArrayInputStream(in.getBytes("UTF-8"));
I also have:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
Then I do my encryption:
ch.encrypt(bais, baos);
So now I have my "output" in baos. When i do such thing:
byte[] b2 = baos.toByteArray();
int[] i2 = toUnsignedIntArray(b2);
writeIntegersAsHex(i2);
where (I know it is not the most elegant way but it's only for testing):
public static void writeIntegersAsHex(int[] integers) {
int height = integers.length;
for (int i = 0; i < height; i++) {
System.out.print(Integer.toHexString(integers[i]) + ",");
}
System.out.println("");
}
I get such output:
d1,68,a0,46,32,37,25,64,67,71,17,df,ee,ef,2,12,
And that output is correct, because when I process file that contains the same string as in that output is the same. But I can't get a proper string from baos.
Please don't ask me why am i doing it this way, because it was not my call. I am a student and this is one of the excersises.
The algorithm (btw it's aes128) works ok, with files but i can't get string to inputstream and outputstream to string work properly.
But I can't get a proper string from baos.
At this point your output is just arbitrary binary data. It's not encoded text - it's just a bunch of bits.
To convert that to a sensible string which will let you convert it back to the original bytes, you should probably use either hex or base64. There's a public domain base64 library which works well in my experience, or plenty of other alternatives (for both base64 and hex).
public static void main(String[] args) throws IOException {
String in = "1234567890123456";
ByteArrayInputStream bais = new ByteArrayInputStream(in.getBytes("UTF-8"));
ByteArrayOutputStream baos = new ByteArrayOutputStream();
int i;
while ( ( i = bais.read() ) != -1 ){
baos.write(i);
baos.flush();
}
System.out.print(baos.toString());
}

Categories

Resources