Using Java Deflater/Inflater with custom dictionary causes IllegalArgumentException - java

The following code is based on the example given in the javadocs for java.util.zip.Deflater. The only changes I have made is to create a byte array called dict and then set the dictionary on both the Deflater and Inflater instances using the setDictionary(byte[]) method.
The problem I'm seeing is that when I call Inflater.setDictionary() with the exact same array as I used for the Deflater, I get an IllegalArgumentException.
Here is the code in question:
import java.util.zip.Deflater;
import java.util.zip.Inflater;
public class DeflateWithDictionary {
public static void main(String[] args) throws Exception {
String inputString = "blahblahblahblahblah??";
byte[] input = inputString.getBytes("UTF-8");
byte[] dict = "blah".getBytes("UTF-8");
// Compress the bytes
byte[] output = new byte[100];
Deflater compresser = new Deflater();
compresser.setInput(input);
compresser.setDictionary(dict);
compresser.finish();
int compressedDataLength = compresser.deflate(output);
// Decompress the bytes
Inflater decompresser = new Inflater();
decompresser.setInput(output, 0, compressedDataLength);
decompresser.setDictionary(dict); //IllegalArgumentExeption thrown here
byte[] result = new byte[100];
int resultLength = decompresser.inflate(result);
decompresser.end();
// Decode the bytes into a String
String outputString = new String(result, 0, resultLength, "UTF-8");
System.out.println("Decompressed String: " + outputString);
}
}
If I try deflating the same compressed bytes without setting the dictionary, I get no error but the result returned is zero bytes.
Is there anything special I need to do in order to use a custom dictionary with Deflater/Inflater?

I actually figured this out while formulating the question but thought I should post the question anyway so others might benefit from my struggles.
It turns out you have to call inflate() once after setting the input but before setting the dictionary. The value returned will be 0, and a call to needsDictionary() will then return true. After that you can set the dictionary and call inflate again.
The amended code is as follows:
import java.util.zip.Deflater;
import java.util.zip.Inflater;
public class DeflateWithDictionary {
public static void main(String[] args) throws Exception {
String inputString = "blahblahblahblahblah??";
byte[] input = inputString.getBytes("UTF-8");
byte[] dict = "blah".getBytes("UTF-8");
// Compress the bytes
byte[] output = new byte[100];
Deflater compresser = new Deflater();
compresser.setInput(input);
compresser.setDictionary(dict);
compresser.finish();
int compressedDataLength = compresser.deflate(output);
// Decompress the bytes
Inflater decompresser = new Inflater();
decompresser.setInput(output, 0, compressedDataLength);
byte[] result = new byte[100];
decompresser.inflate(result);
decompresser.setDictionary(dict);
int resultLength = decompresser.inflate(result);
decompresser.end();
// Decode the bytes into a String
String outputString = new String(result, 0, resultLength, "UTF-8");
System.out.println("Decompressed String: " + outputString);
}
}
This seems very counter intuitive and clunky from an API design perspective, so please enlighten me if there are any better alternatives.

Related

How can I decompress a stream in c# like this java snippet code?

I'm trying to convert this java snippet code in c# but I'm a bit confused about it.
This is the java code:
My try is the following, but there are some errors in gis.Read, because it wants a char* and not a byte[] and in the String constructor for the same reason.
public static String decompress(InputStream input) throws IOException
{
final int BUFFER_SIZE = 32;
GZIPInputStream gis = new GZIPInputStream(input, BUFFER_SIZE);
StringBuilder string = new StringBuilder();
byte[] data = new byte[BUFFER_SIZE];
int bytesRead;
while ((bytesRead = gis.read(data)) != -1) {
string.append(new String(data, 0, bytesRead));
}
gis.close();
// is.close();
return string.toString();
}
I expected to get a readable string.
You need to transform the bytes to characters first. For that, you need to know the encoding.
In your code, you could have replaced new String(data, 0, bytesRead) with Encoding.UTF8.GetString(data, 0, bytesRead) to do that. However, I would handle this slightly differently.
StreamReader is a useful class to read bytes as text in C#. Just wrap it around your GZipStream and let it do its magic.
public static string Decompress(Stream input)
{
// note this buffer size is REALLY small.
// You could stick with the default buffer size of the StreamReader (1024)
const int BUFFER_SIZE = 32;
string result = null;
using (var gis = new GZipStream(input, CompressionMode.Decompress, leaveOpen: true))
using (var reader = new StreamReader(gis, Encoding.UTF8, true, BUFFER_SIZE))
{
result = reader.ReadToEnd();
}
return result;
}

Decompressing byte[] using LZ4

I am using LZ4 for compressing and decompressing a string.I have tried the following way
public class CompressionDemo {
public static byte[] compressLZ4(LZ4Factory factory, String data) throws IOException {
final int decompressedLength = data.getBytes().length;
LZ4Compressor compressor = factory.fastCompressor();
int maxCompressedLength = compressor.maxCompressedLength(decompressedLength);
byte[] compressed = new byte[maxCompressedLength];
compressor.compress(data.getBytes(), 0, decompressedLength, compressed, 0, maxCompressedLength);
return compressed;
}
public static String deCompressLZ4(LZ4Factory factory, byte[] data) throws IOException {
LZ4FastDecompressor decompressor = factory.fastDecompressor();
byte[] restored = new byte[data.length];
decompressor.decompress(data,0,restored, 0,data.length);
return new String(restored);
}
public static void main(String[] args) throws IOException, DataFormatException {
String string = "kjshfhshfashfhsakjfhksjafhkjsafhkjashfkjhfjkfhhjdshfhhjdfhdsjkfhdshfdskjfhksjdfhskjdhfkjsdhfk";
LZ4Factory factory = LZ4Factory.fastestInstance();
byte[] arr = compressLZ4(factory, string);
System.out.println(arr.length);
System.out.println(deCompressLZ4(factory, arr) + "decom");
}
}
it is giving following excpetion
Exception in thread "main" net.jpountz.lz4.LZ4Exception: Error decoding offset 92 of input buffer
The problem here is that decompressing is working only if i pass the actual String byte[] length i.e
public static String deCompressLZ4(LZ4Factory factory, byte[] data) throws IOException {
LZ4FastDecompressor decompressor = factory.fastDecompressor();
byte[] restored = new byte[data.length];
decompressor.decompress(data,0,restored, 0,"kjshfhshfashfhsakjfhksjafhkjsafhkjashfkjhfjkfhhjdshfhhjdfhdsjkfhdshfdskjfhksjdfhskjdhfkjsdhfk".getBytes().length);
return new String(restored);
}
It is expecting the actual string byte[] size.
Can someone help me with this
As the compression and decompressions may happen on different machines, or the machine default character encoding is not one of the Unicode formats, one should indicate the encoding too.
For the rest it is using the actual compression and decompression lengths, and better store the size of the uncompressed data too, in plain format, so it may be extracted prior to decompressing.
public static byte[] compressLZ4(LZ4Factory factory, String data) throws IOException {
byte[] decompressed = data.getBytes(StandardCharsets.UTF_8).length;
LZ4Compressor compressor = factory.fastCompressor();
int maxCompressedLength = compressor.maxCompressedLength(decompressed.length);
byte[] compressed = new byte[4 + maxCompressedLength];
int compressedSize = compressor.compress(decompressed, 0, decompressed.length,
compressed, 4, maxCompressedLength);
ByteBuffer.wrap(compressed).putInt(decompressed.length);
return Arrays.copyOf(compressed, 0, 4 + compressedSize);
}
public static String deCompressLZ4(LZ4Factory factory, byte[] data) throws IOException {
LZ4FastDecompressor decompressor = factory.fastDecompressor();
int decrompressedLength = ByteBuffer.wrap(data).getInt();
byte[] restored = new byte[decrompressedLength];
decompressor.decompress(data, 4, restored, 0, decrompressedLength);
return new String(restored, StandardCharsets.UTF_8);
}
It should be told, that String is not suited for binary data, and your compression/decompression is for text handling only. (String contains Unicode text in the form of UTF-16 two-byte chars. Conversion to binary data always involves a conversion with the encoding of the binary data. That costs in memory, speed and possible data corruption.)
I just faced the same error on Android and resolved it based on issue below:
https://github.com/lz4/lz4-java/issues/68
In short make sure you are using the same factory for both operations (compression + decompression) and use Arrays.copyOf() as below:
byte[] compress(final byte[] data) {
LZ4Factory lz4Factory = LZ4Factory.safeInstance();
LZ4Compressor fastCompressor = lz4Factory.fastCompressor();
int maxCompressedLength = fastCompressor.maxCompressedLength(data.length);
byte[] comp = new byte[maxCompressedLength];
int compressedLength = fastCompressor.compress(data, 0, data.length, comp, 0, maxCompressedLength);
return Arrays.copyOf(comp, compressedLength);
}
byte[] decompress(final byte[] compressed) {
LZ4Factory lz4Factory = LZ4Factory.safeInstance();
LZ4SafeDecompressor decompressor = lz4Factory.safeDecompressor();
byte[] decomp = new byte[compressed.length * 4];//you might need to allocate more
decomp = decompressor.decompress(Arrays.copyOf(compressed, compressed.length), decomp.length);
return decomp;
Hope this will help.
restored byte[] length is to small, you should not use compressed data.length, instead you should use data[].length * 3 or more than 3.
I resoved like this:
public static byte[] decompress( byte[] finalCompressedArray,String ... extInfo) {
int len = finalCompressedArray.length * 3;
int i = 5;
while (i > 0) {
try {
return decompress(finalCompressedArray, len);
} catch (Exception e) {
len = len * 2;
i--;
if (LOGGER.isInfoEnabled()) {
LOGGER.info("decompress Error: extInfo ={} ", extInfo, e);
}
}
}
throw new ItemException(1, "decompress error");
}
/**
* 解压一个数组
*
* #param finalCompressedArray 压缩后的数据
* #param length 原始数据长度, 精确的长度,不能大,也不能小。
* #return
*/
private static byte[] decompress(byte[] finalCompressedArray, int length) {
byte[] desc = new byte[length ];
int decompressLen = decompressor.decompress(finalCompressedArray, desc);
byte[] result = new byte[decompressLen];
System.arraycopy(desc,0,result,0,decompressLen);
return result;
}

Convert ZIP to byte array without saving the output to file

I have a ZIP file and when I convert it into byte array and encode it, I am unable to print the encoded format without writing it into file.
Could anyone help in solving this issue?
My code is
InputStream is = null;
OutputStream os = null;
is = new FileInputStream("C:/Users/DarkHorse/Desktop/WebServicesTesting/PolicyCredit.zip");
os = new FileOutputStream("D:/EclipseTestingFolder/EncodedFile1.txt");
int bytesRead = 0;
int chunkSize = 10000000;
byte[] chunk = new byte[chunkSize];
while ((bytesRead = is.read(chunk)) > 0)
{
byte[] ba = new byte[bytesRead];
for(int i=0;i<ba.length;i++)
{
ba[i] = chunk[i];
}
byte[] encStr = Base64.encodeBase64(ba);
os.write(encStr);
}
os.close();
is.close();
}
My Output in the file is
UEsDBBQAAAAIANGL/UboGxdAAQUAAK0WAAAQAAAAUG9saWN5Q3JlZGl0LnhtbJVY3Y6rNhC+r9R34AlqSPankSwkdtNskbLZKOk5Va8QC95d6wRIDZyeffszxgSMGUPKFcx8M/b8egwN87IWcZ6waF+cePLp//qLAw/d8BOL/mRxykRL6sk89T1KLq8adx1XLHp5i55YzkRc8SL3F6534y69O0oQpia6K6LiLTqwpBBpKdUPCRq
But when I am trying to print it on the screen, I am getting in this way
8569115686666816565656573657871764785981117112010065658185656575488765656581656565658571571159787785381517410890711084876110104116987486895189541147810467431145782515265108113838097110107831191071001167811510798769075791075386975681675753100541198273689012110110210211512212010383777185807570991205677479856101103119785655738799905411997704399101807611247471137665119471005666797647109821201211078276
You need to create a string representation of Base 64 encoded data.
System.out.println( new String(encStr, Charset.forName("UTF-8")));
Here are some other examples Base 64 Print Question
String Class
Assuming your result array byte[] encStr = Base64.encodeBase64(ba) is actually the encoded string, try the following:
System.out.println(new String(bytes, Charset.defaultCharset());
If you are using JDK 7 you can use Files.readAllBytes(path)
Your code would be much simpler like below:
Path path = Paths.get("C:/Users/DarkHorse/Desktop/WebServicesTesting/PolicyCredit.zip");
byte[] data = Files.readAllBytes(path);
byte[] encStr = Base64.encodeBase64(data);
System.out.println( new String(encStr));
Your will be able to print on console.

Java compress byte array and base 64 encode to base 64 decode and decompress byte array error: different sized input/output arrays

My application requires a list of doubles encoded as a byte array with little endian encoding that has been zlib compressed and then encoded as base 64. I wrote up a harness to test my encoding, which wasn't working. I was able to make progress.
However, I noticed that when I attempt to decompress to a fixed size buffer, I am able to come up with input such that the size of the decompressed byte array is smaller than the original byte array, which obviously isn't right. Coincident with this, the last double in the list disappears. On most inputs, the fixed buffer size reproduces the input. Does anyone know why that would be? I am guessing the error is in the way I am encoding the data, but I can't figure out what is going wrong.
When I try using a ByteArrayOutputStream to handle variable-length output of arbitrary size (which will be important for the real version of the code, as I can't guarantee max size limits), the inflate method of Inflater continuously returns 0. I looked up the documentation and it said this means it needs more data. Since there is no more data, I again suspect my encoding, and guess that it is the same issue causing the previously explained behavior.
In my code I've included an example of data that works fine with the fixed buffer size, as well as data that doesn't work for fixed buffer. Both data sets will cause the variable buffer size error I explained.
Any clues as to what I am doing wrong? Many thanks.
import java.io.ByteArrayOutputStream;
import java.io.UnsupportedEncodingException;
import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.util.ArrayList;
import java.util.zip.DataFormatException;
import java.util.zip.Deflater;
import java.util.zip.Inflater;
import org.apache.commons.codec.binary.Base64;
public class BinaryReaderWriter {
public static void main(String [ ] args) throws UnsupportedEncodingException, DataFormatException
{
// this input will break the fixed buffer method
//double[] centroids = {123.1212234143345453223123123, 28464632322456781.23, 3123121.0};
// this input will break the fixed buffer method
double[] centroids = {123.1212234143345453223123123, 28464632322456781.23, 31.0};
BinaryReaderWriter brw = new BinaryReaderWriter();
String output = brw.compressCentroids(centroids);
brw.decompressCentroids(output);
}
void decompressCentroids(String encoded) throws DataFormatException{
byte[] binArray = Base64.decodeBase64(encoded);
// This block of code is the fixed buffer version
//
System.out.println("binArray length " + binArray.length);
Inflater deCompressor = new Inflater();
deCompressor.setInput(binArray, 0, binArray.length);
byte[] decompressed = new byte[1024];
int decompressedLength = deCompressor.inflate(decompressed);
deCompressor.end();
System.out.println("decompressedLength = " + decompressedLength);
byte[] decompressedData = new byte[decompressedLength];
for(int i=0;i<decompressedLength;i++){
decompressedData[i] = decompressed[i];
}
/*
// This block of code is the variable buffer version
//
ByteArrayOutputStream bos = new ByteArrayOutputStream(binArray.length);
Inflater deCompressor = new Inflater();
deCompressor.setInput(binArray, 0, binArray.length);
byte[] decompressed = new byte[1024];
while (!deCompressor.finished()) {
int decompressedLength = deCompressor.inflate(decompressed);
bos.write(decompressed, 0, decompressedLength);
}
deCompressor.end();
byte[] decompressedData = bos.toByteArray();
*/
ByteBuffer bb = ByteBuffer.wrap(decompressedData);
bb.order(ByteOrder.LITTLE_ENDIAN);
System.out.println("decompressedData length = " + decompressedData.length);
double[] doubleValues = new double[decompressedData.length / 8];
for (int i = 0; i< doubleValues.length; i++){
doubleValues[i] = bb.getDouble(i * 8);
}
for(double dbl : doubleValues){
System.out.println(dbl);
}
}
String compressCentroids(double[] centroids){
byte[] cinput = new byte[centroids.length * 8];
ByteBuffer buf = ByteBuffer.wrap(cinput);
buf.order(ByteOrder.LITTLE_ENDIAN);
for (double cent : centroids){
buf.putDouble(cent);
}
byte[] input = buf.array();
System.out.println("raw length = " + input.length);
byte[] output = new byte[input.length];
Deflater compresser = new Deflater();
compresser.setInput(input);
compresser.finish();
int compressedLength = compresser.deflate(output);
compresser.end();
System.out.println("Compressed length = " + compressedLength);
byte[] compressed = new byte[compressedLength];
for(int i = 0; i < compressedLength; i++){
compressed[i] = output[i];
}
String decrypted = Base64.encodeBase64String(compressed);
return decrypted;
}
}
When compressing data what we are really doing is re-encoding to increase entropy in the data. During the reecoding precess we have to add meta data to tell us how we have encoded the data so it can be converted back to what it was previously.
Compression will only be successful if the meta data size is less than the space we save by reencoding the data.
Consider Huffman encoding:
Huffman is a simple encoding scheme where we replace the fixed width character set with a variable width character set plus a charset length table. The length table size will be greater than 0 for obvious reasons. If all characters appear with a near equal distribution we will not be able to save any space. So our compressed data ends up being larger than our uncompressed data.

Converting String to InputStream, and OutputStream to String back again

I am trying to do such conversions, but i have a little problem.
Let's say i have a following String:
String in = "1234567890123456";
Then I convert it to ByteArrayInputStream like this:
ByteArrayInputStream bais = new ByteArrayInputStream(in.getBytes("UTF-8"));
I also have:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
Then I do my encryption:
ch.encrypt(bais, baos);
So now I have my "output" in baos. When i do such thing:
byte[] b2 = baos.toByteArray();
int[] i2 = toUnsignedIntArray(b2);
writeIntegersAsHex(i2);
where (I know it is not the most elegant way but it's only for testing):
public static void writeIntegersAsHex(int[] integers) {
int height = integers.length;
for (int i = 0; i < height; i++) {
System.out.print(Integer.toHexString(integers[i]) + ",");
}
System.out.println("");
}
I get such output:
d1,68,a0,46,32,37,25,64,67,71,17,df,ee,ef,2,12,
And that output is correct, because when I process file that contains the same string as in that output is the same. But I can't get a proper string from baos.
Please don't ask me why am i doing it this way, because it was not my call. I am a student and this is one of the excersises.
The algorithm (btw it's aes128) works ok, with files but i can't get string to inputstream and outputstream to string work properly.
But I can't get a proper string from baos.
At this point your output is just arbitrary binary data. It's not encoded text - it's just a bunch of bits.
To convert that to a sensible string which will let you convert it back to the original bytes, you should probably use either hex or base64. There's a public domain base64 library which works well in my experience, or plenty of other alternatives (for both base64 and hex).
public static void main(String[] args) throws IOException {
String in = "1234567890123456";
ByteArrayInputStream bais = new ByteArrayInputStream(in.getBytes("UTF-8"));
ByteArrayOutputStream baos = new ByteArrayOutputStream();
int i;
while ( ( i = bais.read() ) != -1 ){
baos.write(i);
baos.flush();
}
System.out.print(baos.toString());
}

Categories

Resources