Java byte array compression - java

I'm trying to use the java DeflaterOutputStream and InflaterOutputStream classes to compress a byte array, but both appear to not be working correctly. I assume I'm incorrectly implementing them.
public static byte[] compress(byte[] in) {
try {
ByteArrayOutputStream out = new ByteArrayOutputStream();
DeflaterOutputStream defl = new DeflaterOutputStream(out);
defl.write(in);
defl.flush();
defl.close();
return out.toByteArray();
} catch (Exception e) {
e.printStackTrace();
System.exit(150);
return null;
}
}
public static byte[] decompress(byte[] in) {
try {
ByteArrayOutputStream out = new ByteArrayOutputStream();
InflaterOutputStream infl = new InflaterOutputStream(out);
infl.write(in);
infl.flush();
infl.close();
return out.toByteArray();
} catch (Exception e) {
e.printStackTrace();
System.exit(150);
return null;
}
}
Here's the two methods I'm using to compress and decompress the byte array. Most implementations I've seen online use a fixed size buffer array for the decompression portion, but I'd prefer to avoid that if possible, because I'd need to make that buffer array have a size of one if I want to have any significant compression.
If anyone can explain to me what I'm doing wrong it would be appreciated. Also, to explain why I know these methods aren't working correctly: The "compressed" byte array that it outputs is always larger than the uncompressed one, no matter what size byte array I attempt to provide it.

This will depend on the data you are compressing. For example if we take an array of 0 bytes it compresses well:
byte[] plain = new byte[10000];
byte[] compressed = compress(plain);
System.out.println(compressed.length); // 33
byte[] result = decompress(compressed);
System.out.println(result.length); // 10000

Compression always has overhead to allow for future decompression. If the compression produced no reduction in length (the data was unique or nearly unique) then the output file could be longer than the input file

Related

Lengthy string compression/decompression in Java

I am looking for String length compression to avoid lengthy filename as below. The string contains UTF-8 characters as well.
"dt=20200623_isValid=valid_module_name=A&B&C_data_source=internet_part-00001-1234-9d12-1234-123d-1234567890a1.b001.json"
Tried Huffman compression from GitHub here, it reduces the size but not much on the String length.
Size before compression: 944
Size after compression: 569
Compressed string:
01011111001111100011101000111011101011001000111110001101000011011001000110001111010001010111111001010110001111010001010001101101010000101101110001110000000110101011010110100000111111001101011111100111101111110100000010101011011110011000010011001000101110010011101001000001111101001010111110000001001101010000111100001110101001100100111110001011101110111011101001001010011000111110111000101100000101100110000010100110001111101110001010011000111110101001010011000111110111011010111011001101100110110111000011100110100111000111011101110111010011100011101111001100100010101
Please advise how to achieve length compression in Java? (The decompressed file Name value is needed for further processing).
You should try ZLIB/GZ Compression. You can find GZ compression snippet here compression and decompression of string data in java
ZLIB compression implementation is also fairly easy. You can use the below code as a starter and improve upon it.
Detailed explanation on compressions How are zlib, gzip and zip related? What do they have in common and how are they different?
Read Deflator strategies before proceeding ahead: Java Deflater strategies - DEFAULT_STRATEGY, FILTERED and HUFFMAN_ONLY
public void compressFile(String originalFileName, String compressedFileName) {
try (FileInputStream fileInputStream = new FileInputStream(originalFileName);
FileOutputStream fileOutputStream = new FileOutputStream(compressedFileName);
DeflaterOutputStream deflaterOutputStream = new DeflaterOutputStream(fileOutputStream))
{
int data;
while ((data = fileInputStream.read()) != -1) {
deflaterOutputStream.write(data);
}
} catch (IOException e) {
e.printStackTrace();
}
}
You can decompress using Inflator.
public void decompressFile(String fileTobeDecomporessed, String outputfile) {
try (
FileInputStream fileInputStream = new FileInputStream(fileTobeDecomporessed);
FileOutputStream fileOutputStream = new FileOutputStream(outputfile);
InflaterInputStream inflaterInputStream = new InflaterInputStream(fileInputStream)) {
int data;
while ((data = inflaterInputStream.read()) != -1) {
fileOutputStream.write(data);
}
} catch (IOException e) {
e.printStackTrace();
}
}
Refer: http://cr.openjdk.java.net/~iris/se/11/latestSpec/api/java.base/java/util/zip/Deflater.html
Of course using one character per binary digit is going to use up a lot of space. That library is using 16 bits (the size of a char) to represent a single bit, so it is literally making its result 16 times larger than it needs to be.
A far more compact way to represent binary data is by converting it to hexadecimal.
byte[] compressedBytes = new BigInteger(compressedString, 2).toByteArray();
Formatter formatter = new Formatter();
for (byte b : compressedBytes) {
formatter.format("%02x", b);
}
String hex = formatter.toString();
Then the result is 142 bytes:
BE7C7477591F1A1B231E8AFCAC7A28DA85B8E0356B41F9AFCF7E8156F30991727483E95F026A1E1D4C9F17777494C7DC582CC14C7DC531F5298FBB5D9B36E1CD38EEEE9C779915
You could even go a step farther and Base64 encode it, reducing the result to 96 bytes:
String s = Base64.getEncoder().encodeToString(compressedBytes);
Result:
AL58dHdZHxobIx6K/Kx6KNqFuOA1a0H5r89+gVbzCZFydIPpXwJqHh1Mnxd3dJTH3FgswUx9xTH1KY+7XZs24c047u6cd5kV

Why is my binary data bigger after getting it from the webserver?

I need to serve a binary file through a web service implemented in Python/Django. The problem is, that when I compare the original file with the transferred file with vbindiff I see trailing bytes on the transferred file, sadly rendering it useless.
The Binary File is accessed saved by a client in Java with:
HttpURLConnection userdataConnection = null;
URL userdataUrl = null;
try {
userdataUrl = new URL("http://localhost:8000/app/vuforia/10");
userdataConnection = (HttpURLConnection) userdataUrl.openConnection();
userdataConnection.setRequestMethod("GET");
userdataConnection.setRequestProperty("Content-Type", "application/octet-stream");
userdataConnection.connect();
InputStream userdataStream = new BufferedInputStream(userdataConnection.getInputStream());
try (ByteArrayOutputStream fileStream = new ByteArrayOutputStream()) {
byte[] buffer = new byte[4094];
while (userdataStream.read(buffer) != -1) {
fileStream.write(buffer);
}
byte[] fileBytes = fileStream.toByteArray();
try (FileOutputStream fos = new FileOutputStream("./test.dat")) {
fos.write(fileBytes);
}
}
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
I think that HttpURLConnection.getInputStream only reads the body of the response, or not?
This code serves the data in the backend
in views.py:
if request.method == "GET":
all_data = VuforiaDatabase.objects.all()
data = all_data.get(id=version)
return FileResponse(data.get_dat_bytes())
in models.py:
def get_dat_bytes(self):
return self.dat_upload.open()
How do I go about transferring the binary data 1:1?
You’re ignoring the return value of InputStream.read.
From the documentation:
Returns:
the total number of bytes read into the buffer, or -1 if there is no more data because the end of the stream has been reached.
Your code is assuming that the buffer is filled with every call to userdataStream.read(buffer), instead of checking how many bytes were actually read into buffer.
You don’t need to read from an InputStream at all. Just use Files.copy:
Path file = Paths.get("./test.dat");
try (InputStream userdataStream = new BufferedInputStream(userdataConnection.getInputStream())) {
Files.copy(userdataStream, file, StandardCopyOption.REPLACE_EXISTING);
}
You always write a multiple the 4094 bytes, no matter how many bytes you actually read.
Don't do .write(buffer); write the amount you actually read. This is what userdataStream.read returns you. It can return a number smaller than the buffer size, but still positive.
If you project is using Apache Commons already, you can just use copyInputStreamToFile.
Note: 4K = 4096, not 4094, and it's a ridiculously small buffer, unless you operate something like a smartcard. On a PC, use something like a few hundred kb at least.

Convert byte stream to byte array without extra space

I have a ByteArrayOutputStream that has large amounts of data written into, which is ultimately converted into a byte array and written to a cache:
try {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
try (JsonGenerator jg = mapper.getFactory().createGenerator(baos)) {
for (Object result : results) {
jg.writeObject(result);
}
}
localCache.put(cacheKey, baos.toByteArray());
}
catch (IOException e) {
throw Throwables.propagate(e);
}
Here baos.toByteArray() creates a whole new copy of the data in memory, which I'm trying to avoid. Is there a way to convert the stream to a byte array without using the extra memory?
The internal buffer and current count are protected fields documented in the Javadoc. This means you should be OK to subclass ByteArrayOutputStream and provide a byte[] getBuffer() method to access the buffer directly. Use the existing size() method to determine how much data is present.
public class MyBAOS extends ByteArrayOutputStream
{
public MyBAOS() { super(); }
public MyBAOS(int size) { super(size); }
public byte[] getBuffer() { return buf; }
}

Java Decompressing byte array - incorrect data check

I have a little problem: I decompress byte array and everything is ok with following code but sometimes with some data it throws DataFormatException with incorrect data check. Any ideas?
private byte[] decompress(byte[] compressed) throws DecoderException {
Inflater decompressor = new Inflater();
decompressor.setInput(compressed);
ByteArrayOutputStream outPutStream = new ByteArrayOutputStream(compressed.length);
byte temp [] = new byte[8196];
while (!decompressor.finished()) {
try {
int count = decompressor.inflate(temp);
logger.info("count = " + count);
outPutStream.write(temp, 0, count);
}
catch (DataFormatException e) {
logger.info(e.getMessage());
throw new DecoderException("Wrong format", e);
}
}
try {
outPutStream.close();
} catch (IOException e) {
throw new DecoderException("Cant close outPutStream ", e);
}
return outPutStream.toByteArray();
}
Try with a different compression level or using the nowrap options
1 Some warning: do you use the same algorithm in both sides ?
do you use bytes ? (not String)
your arrays have the good sizes ?
2
I suggest you check step by step, catching exceptions, checking sizes, null, and comparing bytes.
like this: Using Java Deflater/Inflater with custom dictionary causes IllegalArgumentException
Take your input
Compress it
copy your bytes
decompress them
compare output with input
3 if you cant find, take another example which works, and modify it step by step
hope it helps
I found out why its happening
byte temp [] = new byte[8196];
its too big, it must be exactly size of decompressed array cause it was earlier Base64 encoded, how i can get this size before decompressing it?

Java decompressing array of bytes

On server (C++), binary data is compressed using ZLib function:
compress2()
and it's sent over to client (Java).
On client side (Java), data should be decompressed using the following code snippet:
public static String unpack(byte[] packedBuffer) {
InflaterInputStream inStream = new InflaterInputStream(new ByteArrayInputStream( packedBuffer);
ByteArrayOutputStream outStream = new ByteArrayOutputStream();
int readByte;
try {
while((readByte = inStream.read()) != -1) {
outStream.write(readByte);
}
} catch(Exception e) {
JMDCLog.logError(" unpacking buffer of size: " + packedBuffer.length);
e.printStackTrace();
// ... the rest of the code follows
}
Problem is that when it tries to read in while loop it always throws:
java.util.zip.ZipException: invalid stored block lengths
Before I check for other possible causes can someone please tell me can I compress on one side with compress2 and decompress it on the other side using above code, so I can eliminate this as a problem? Also if someone has a possible clue about what might be wrong here (I know I didn't provide too much of of the code in here but projects are rather big.
Thanks.
I think the problem is not with unpack method but in packedBuffer content. Unpack works fine
public static byte[] pack(String s) throws IOException {
ByteArrayOutputStream out = new ByteArrayOutputStream();
DeflaterOutputStream dout = new DeflaterOutputStream(out);
dout.write(s.getBytes());
dout.close();
return out.toByteArray();
}
public static void main(String[] args) throws Exception {
byte[] a = pack("123");
String s = unpack(a); // calls your unpack
System.out.println(s);
}
output
123
public static String unpack(byte[] packedBuffer) {
try (GZipInputStream inStream = new GZipInputStream(
new ByteArrayInputStream(packedBuffer));
ByteArrayOutputStream outStream = new ByteArrayOutputStream()) {
inStream.transferTo(outStream);
//...
return outStream.toString(StandardCharsets.UTF_8);
} catch(Exception e) {
JMDCLog.logError(" unpacking buffer of size: " + packedBuffer.length);
e.printStackTrace();
throw new IllegalArgumentException(e);
}
}
ZLib is the zip format, hence a GZipInputStream is fine.
A you seem to expect the bytes to represent text, hence be in some encoding, add that encoding, Charset, to the conversion to String (which always holds Unicode).
Note, UTF-8 is the encoding of the bytes. In your case it might be an other encoding.
The ugly try-with-resources syntax closes the streams even on exception or here the return.
I rethrowed a RuntimeException as it seems dangerous to do something with no result.

Categories

Resources