Gzip decompression adding one extra byte ... Why? - java

I've written a simple Java code snippet which takes a String, converts it to byte[], and then compresses it using Gzip. Then it decompresses the result to get back the byte[], which now contains one extra garbage value byte. Why is there a garbage value byte here ??
public static void main(String[] args) throws Exception {
String testString = "Sample String here";
byte[] originalBytes = testString.getBytes();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
GZIPOutputStream gzos = new GZIPOutputStream(baos);
gzos.write(originalBytes);
gzos.close();
byte[] compressedBytes = baos.toByteArray();
ByteArrayInputStream bais = new ByteArrayInputStream(compressedBytes);
GZIPInputStream gzis = new GZIPInputStream(bais);
ByteArrayOutputStream dbaos = new ByteArrayOutputStream();
while(gzis.available() > 0) {
dbaos.write(gzis.read());
}
byte[] decompressedBytes = dbaos.toByteArray();
String decompressedString = new String(decompressedBytes);
System.out.println(">>" + decompressedString + "<<");
System.out.println("Size of bytes before: " + originalBytes.length);
System.out.println("Size of bytes after: " + decompressedBytes.length);
}
Output:
>>Sample String here�<<
Size of bytes before: 18
Size of bytes after: 19
Can someone tell me why is there a garbage value byte ? How do I get rid of it WITHOUT changing the setup of the code above ??

You are using available() here, so you get one extra byte. You should be reading the stream and checking for a value less than 0. Change this
ByteArrayOutputStream dbaos = new ByteArrayOutputStream();
while(gzis.available() > 0) {
dbaos.write(gzis.read());
}
to something like
ByteArrayOutputStream dbaos = new ByteArrayOutputStream();
int b;
while ((b = gzis.read()) >= 0) {
dbaos.write(b);
}
and I get
>>Sample String here<<
Size of bytes before: 18
Size of bytes after: 18

Related

Compressing byte[] to byte[] with GZIPOutputStream? Unexpected end of ZLIB input stream

I am trying to compress and array of bytes into another array of bytes using GZIPOutputStream (in Java).
This is my code:
#Test
public void testCompressBytes() throws IOException {
final byte[] uncompressed = RandomStringUtils.randomAlphanumeric(100000 /* 100 kb */).getBytes();
// compress
byte[] compressed;
try (InputStream is = new ByteArrayInputStream(uncompressed);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
OutputStream os = new GZIPOutputStream(baos)) {
IOUtils.copy(is, os); // org.apache.commons.io
os.flush();
compressed = baos.toByteArray();
}
System.out.println("Size before compression = " + uncompressed.length + ", after = " + compressed.length);
// decompress back
byte[] decompressedBack;
try (InputStream is = new GZIPInputStream(new ByteArrayInputStream(compressed));
ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
IOUtils.copy(is, baos); // EXCEPTION THROWN HERE
baos.flush();
decompressedBack = baos.toByteArray();
}
assertArrayEquals(uncompressed, decompressedBack);
}
And this is the output I'm getting:
Size before compression = 100000, after = 63920
java.io.EOFException: Unexpected end of ZLIB input stream
What could I be doing wrong?
You need to call GZIPOutputStream::close before calling ByteArrayOutputStream::toByteArray, so that GZIPOutputStream writes all the end bits.
In your current code you are calling ByteArrayOutputStream::toByteArray before GZIPOutputStream::close (via try-with-resources) that's why it doesn't work.
Thanks, everybody!
Although calling GZIPOutputStream::finish() before ByteArrayOutputStream::toByteArray() seems to do the trick, I believe it's better to completely close the GZIP stream first, which in turn forces us to keep ByteArrayOutputStream outside the try-with-resources clause.
So, my reworked compression part looks like that now:
final ByteArrayOutputStream baos = new ByteArrayOutputStream();
try (InputStream is = new ByteArrayInputStream(uncompressed);
GZIPOutputStream gzos = new GZIPOutputStream(baos)) {
IOUtils.copy(is, gzos);
} catch (final IOException e) {
throw new RuntimeException(e);
}
IOUtils.closeQuietly(baos);
final byte[] compressed = baos.toByteArray();

convert to base 64 with A reapeat in result?

I get url in android and convert data stream to 64 bit data string with this code:
URL url = new URL("http://iranassistance.com/images/sos-logo.png");
HttpURLConnection urlConnection = (HttpURLConnection) url.openConnection();
urlConnection.setRequestMethod("GET");
//urlConnection.setDoOutput(true);
urlConnection.connect();
File SDCardRoot = Environment.getExternalStorageDirectory().getAbsoluteFile();
String filename="downloadedFile.png";
Log.i("Local filename:",""+filename);
File file = new File(SDCardRoot,filename);
if(file.createNewFile()) {
file.createNewFile();
}
FileOutputStream fileOutput = new FileOutputStream(file);
InputStream inputStream = urlConnection.getInputStream();byte[] imageBytes = new byte[urlConnection.getContentLength()];
inputStream.read(imageBytes, 0, imageBytes.length);
inputStream.close();
String base64Image = Base64.encodeToString(imageBytes, Base64.DEFAULT);
But the base64Image result is not complete and gave something like this :
......nUTJaJnb7PLyscfBMQLLiexyKSEh/o2RfctcZtc8Hr5xcAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA........
The repeated 'A' show something is wrong and image is not complete! Why this not work properly ?
Simple:
inputStream.read(imageBytes, 0, imageBytes.length);
You assume that the above always reads all bytes in one shot.
Wrong. This method reads as many bytes as it wants to read. Therefore it is returning you the number of bytes read. See its javadoc:
Returns: the total number of bytes read into the buffer, or -1 if there is no more data because the end of the stream has been reached.
In other words: you have to loop and accumulate these numbers until you got exactly the amount of bytes you are looking for!
And you get those A chars: your array is initially all 0. As explained: you are only filling parts of that array. So the rest of the arrays has still 0s in it - which results as AAAAAs after encoding.
You can use the following function to convert image into base64 just pass on your image....
private String encodeImage(Bitmap mphoto)
{
ByteArrayOutputStream baos = new ByteArrayOutputStream();
mphoto.compress(Bitmap.CompressFormat.JPEG,100,baos);
byte[] b = baos.toByteArray();
String encImage = Base64.encodeToString(b, Base64.DEFAULT);
return encImage;
}
GhostCat said the correct answer ,
i change my code as bellow and it worked find :
InputStream is = null;
try {
URL url = new URL("http://iranassistance.com/images/sos-logo.png");
ByteArrayOutputStream baos = new ByteArrayOutputStream();
is = url.openStream ();
byte[] byteChunk = new byte[4096];
int n;
while ( (n = is.read(byteChunk)) > 0 ) {
baos.write(byteChunk, 0, n);
}
String base64Image2 = Base64.encodeToString(baos.toByteArray(), Base64.DEFAULT);
db.UpdateImage64(base64Image2);
productModel pd = db.GetProductById(2);
}
catch (IOException e) {
e.printStackTrace ();
}
finally {
if (is != null) {
try{
is.close();
}
catch (IOException s){
}
}
}

Why does Java serialization take up so much space?

I tried serializing instances of Byte and Integer and was shocked by how much space they took up when they were received on the other end. Why is it that it only takes 4 bytes to make an Integer, but it takes up over 10 times that many bytes upon serialization? I mean in C++, a final class has a 64 bit class identifier, plus its contents. Going off that logic, I would expect an Integer to take up 64 + 32, or 96 bits when serialized.
import java.io.*;
public class Test {
public static void main (String[] ar) throws Exception {
ByteArrayOutputStream bos = new ByteArrayOutputStream();
ObjectOutput out = new ObjectOutputStream(bos);
out.writeObject(new Integer(32));
byte[] yourBytes = bos.toByteArray();
System.out.println("length: " + yourBytes.length + " bytes");
}
}
Output:
length: 81 bytes
Update:
public static void main(String[] args) throws IOException {
{
ByteArrayOutputStream bos1 = new ByteArrayOutputStream();
ObjectOutput out1 = new ObjectOutputStream(bos1);
out1.writeObject(new Boolean(false));
byte[] yourBytes = bos1.toByteArray();
System.out.println("1 Boolean length: " + yourBytes.length);
}
ByteArrayOutputStream bos = new ByteArrayOutputStream();
ObjectOutput out = new ObjectOutputStream(bos);
for (int i = 0; i < 1000; ++i) {
out.writeObject(new Boolean(true)); // 47 bytes
}
byte[] yourBytes = bos.toByteArray();
System.out.println("1000 Booleans length: " + yourBytes.length); // 7040 bytes
final int count = 1000;
ArrayList<Boolean> listBoolean = new ArrayList<>(count);
listBoolean.addAll(Collections.nCopies(count, Boolean.TRUE));
System.out.printf("ArrayList: %d%n", sizeOf(listBoolean)); // 5096 bytes
Boolean[] arrayBoolean = new Boolean[count];
Arrays.fill(arrayBoolean, true);
System.out.printf("Boolean[]: %d%n", sizeOf(arrayBoolean)); // 5083 bytes
boolean[] array = new boolean[count];
Arrays.fill(array, true);
System.out.printf("boolean[]: %d%n", sizeOf(array)); // 1027 bytes
BitSet bits = new BitSet(count);
bits.set(0, count);
System.out.printf("BitSet: %d%n", sizeOf(bits)); // 201 bytes
}
static int sizeOf(Serializable obj) throws IOException {
ByteArrayOutputStream bytesOut = new ByteArrayOutputStream();
ObjectOutputStream objsOut = new ObjectOutputStream(bytesOut);
objsOut.writeObject(obj);
return bytesOut.toByteArray().length;
}
Output:
1 Boolean length: 47 (47 bytes per Boolean)
1000 Booleans length: 7040 (7 bytes per Boolean)
ArrayList: 5096 (5 bytes per Boolean)
Boolean[]: 5083 (5 bytes per Boolean)
boolean[]: 1027 (1 bytes per boolean)
BitSet: 201 (1/5 of 1 byte per boolean)
Though Radiodef has clarified why the size of the serialized object is huge, i would like to make another point here so we don't forget the optimization present in the underlying java's serialization algorithm (almost in all algorithms).
When you write another Integer object (or any object which is already written), you would not see similar size (i mean the size would not be 81 * 2 = 162 bytes) in this case,
ObjectOutput out = new ObjectOutputStream(bos);
out.writeObject(new Integer(32));
out.writeObject(new Integer(65));
byte[] yourBytes = bos.toByteArray();
System.out.println("length: " + yourBytes.length + " bytes");
The way it works is that, when an instance (object) of class is requested for serialization for the first time, it writes the information about the whole class. i.e including class name, it writes the name of each fields present in the class. That's why the number of bytes are more. This is basically to handle the class evaluation cases properly.
While it sends the meta data of the class for first time, it also caches the same information into the local cache called value-cache or indirection table. So next time when another instance of same class is requested for serialization (remember the cache is applicable only at stream level, or before reset() is called), it just writes only a marker (just 4 bytes of information) so that the size would be less.
java.lang.Byte and java.lang.Integer are objects, so at the very least the qualified names of their classes need to also be stored for them to be deserialized. Also the serialVersionUID needs to be stored, etc. We can easily see how this extra information inflates the size quickly.
If you want to learn about the serialization format, there is an article about it at JavaWorld: http://www.javaworld.com/article/2072752/the-java-serialization-algorithm-revealed.html.
If you're concerned about the size of serialized data, pick a format which is more compact:
import java.util.*;
import java.io.*;
class Example {
public static void main(String[] args) throws IOException {
final int count = 1000;
ArrayList<Boolean> list = new ArrayList<>(count);
list.addAll(Collections.nCopies(count, Boolean.TRUE));
System.out.printf("ArrayList: %d%n", sizeOf(list));
boolean[] array = new boolean[count];
Arrays.fill(array, true);
System.out.printf("boolean[]: %d%n", sizeOf(array));
BitSet bits = new BitSet(count);
bits.set(0, count);
System.out.printf("BitSet: %d%n", sizeOf(bits));
}
static int sizeOf(Serializable obj) throws IOException {
ByteArrayOutputStream bytesOut = new ByteArrayOutputStream();
ObjectOutputStream objsOut = new ObjectOutputStream(bytesOut);
objsOut.writeObject(obj);
return bytesOut.toByteArray().length;
}
}
ArrayList: 5096
boolean[]: 1027
BitSet: 201
Example on Ideone.

Converting String to InputStream, and OutputStream to String back again

I am trying to do such conversions, but i have a little problem.
Let's say i have a following String:
String in = "1234567890123456";
Then I convert it to ByteArrayInputStream like this:
ByteArrayInputStream bais = new ByteArrayInputStream(in.getBytes("UTF-8"));
I also have:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
Then I do my encryption:
ch.encrypt(bais, baos);
So now I have my "output" in baos. When i do such thing:
byte[] b2 = baos.toByteArray();
int[] i2 = toUnsignedIntArray(b2);
writeIntegersAsHex(i2);
where (I know it is not the most elegant way but it's only for testing):
public static void writeIntegersAsHex(int[] integers) {
int height = integers.length;
for (int i = 0; i < height; i++) {
System.out.print(Integer.toHexString(integers[i]) + ",");
}
System.out.println("");
}
I get such output:
d1,68,a0,46,32,37,25,64,67,71,17,df,ee,ef,2,12,
And that output is correct, because when I process file that contains the same string as in that output is the same. But I can't get a proper string from baos.
Please don't ask me why am i doing it this way, because it was not my call. I am a student and this is one of the excersises.
The algorithm (btw it's aes128) works ok, with files but i can't get string to inputstream and outputstream to string work properly.
But I can't get a proper string from baos.
At this point your output is just arbitrary binary data. It's not encoded text - it's just a bunch of bits.
To convert that to a sensible string which will let you convert it back to the original bytes, you should probably use either hex or base64. There's a public domain base64 library which works well in my experience, or plenty of other alternatives (for both base64 and hex).
public static void main(String[] args) throws IOException {
String in = "1234567890123456";
ByteArrayInputStream bais = new ByteArrayInputStream(in.getBytes("UTF-8"));
ByteArrayOutputStream baos = new ByteArrayOutputStream();
int i;
while ( ( i = bais.read() ) != -1 ){
baos.write(i);
baos.flush();
}
System.out.print(baos.toString());
}

Assigning to a byte array in Java

I have a byte array I want to assign as follows:
First byte specifies the length of the string: (byte)string.length()
2nd - Last bytes contain string data from string.getBytes()
Other than using a for loop, is there a quick way to initialize a byte array using bytes from two different variables?
You can use System.arrayCopy() to copy your bytes:
String x = "xx";
byte[] out = new byte[x.getBytes().length()+1];
out[0] = (byte) (0xFF & x.getBytes().length());
System.arraycopy(x.getBytes(), 0, out, 1, x.length());
Though using something like a ByteArrayOutputStream or a ByteBuffer like other people suggested is probably a cleaner approach and will be better for your in the long run :-)
How about ByteBuffer ?
Example :
ByteBuffer bb = ByteBuffer.allocate(string.getBytes().length +1 );
bb.put((byte) string.length());
bb.put(string.getBytes());
While ByteBuffer is generally the best way to build up byte arrays, given the OP's goals I think the following will be more robust:
public static void main(String[] argv)
throws Exception
{
String s = "any string up to 64k long";
ByteArrayOutputStream bos = new ByteArrayOutputStream();
DataOutputStream out = new DataOutputStream(bos);
out.writeUTF(s);
out.close();
byte[] bytes = bos.toByteArray();
ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
DataInputStream in = new DataInputStream(bis);
String s2 = in.readUTF();
}
How about ByteArrayOutputStream?

Categories

Resources