compression and decompression of string data in java - java
I am using the following code to compress and decompress string data, but the problem which I am facing is, it is easily getting compressed without error, but the decompress method throws the following error.
Exception in thread "main" java.io.IOException: Not in GZIP format
public static void main(String[] args) throws Exception {
String string = "I am what I am hhhhhhhhhhhhhhhhhhhhhhhhhhhhh"
+ "bjggujhhhhhhhhh"
+ "rggggggggggggggggggggggggg"
+ "esfffffffffffffffffffffffffffffff"
+ "esffffffffffffffffffffffffffffffff"
+ "esfekfgy enter code here`etd`enter code here wdd"
+ "heljwidgutwdbwdq8d"
+ "skdfgysrdsdnjsvfyekbdsgcu"
+ "jbujsbjvugsduddbdj";
System.out.println("after compress:");
String compressed = compress(string);
System.out.println(compressed);
System.out.println("after decompress:");
String decomp = decompress(compressed);
System.out.println(decomp);
}
public static String compress(String str) throws Exception {
if (str == null || str.length() == 0) {
return str;
}
System.out.println("String length : " + str.length());
ByteArrayOutputStream obj=new ByteArrayOutputStream();
GZIPOutputStream gzip = new GZIPOutputStream(obj);
gzip.write(str.getBytes("UTF-8"));
gzip.close();
String outStr = obj.toString("UTF-8");
System.out.println("Output String length : " + outStr.length());
return outStr;
}
public static String decompress(String str) throws Exception {
if (str == null || str.length() == 0) {
return str;
}
System.out.println("Input String length : " + str.length());
GZIPInputStream gis = new GZIPInputStream(new ByteArrayInputStream(str.getBytes("UTF-8")));
BufferedReader bf = new BufferedReader(new InputStreamReader(gis, "UTF-8"));
String outStr = "";
String line;
while ((line=bf.readLine())!=null) {
outStr += line;
}
System.out.println("Output String lenght : " + outStr.length());
return outStr;
}
Still couldn't figure out how to fix this issue!
This is because of
String outStr = obj.toString("UTF-8");
Send the byte[] which you can get from your ByteArrayOutputStream and use it as such in your ByteArrayInputStream to construct your GZIPInputStream. Following are the changes which need to be done in your code.
byte[] compressed = compress(string); //In the main method
public static byte[] compress(String str) throws Exception {
...
...
return obj.toByteArray();
}
public static String decompress(byte[] bytes) throws Exception {
...
GZIPInputStream gis = new GZIPInputStream(new ByteArrayInputStream(bytes));
...
}
The above Answer solves our problem but in addition to that.
if we are trying to decompress a uncompressed("not a zip format") byte[] .
we will get "Not in GZIP format" exception message.
For solving that we can add addition code in our Class.
public static boolean isCompressed(final byte[] compressed) {
return (compressed[0] == (byte) (GZIPInputStream.GZIP_MAGIC)) && (compressed[1] == (byte) (GZIPInputStream.GZIP_MAGIC >> 8));
}
My Complete Compression Class with compress/decompress would look like:
import java.io.BufferedReader;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.zip.GZIPInputStream;
import java.util.zip.GZIPOutputStream;
public class GZIPCompression {
public static byte[] compress(final String str) throws IOException {
if ((str == null) || (str.length() == 0)) {
return null;
}
ByteArrayOutputStream obj = new ByteArrayOutputStream();
GZIPOutputStream gzip = new GZIPOutputStream(obj);
gzip.write(str.getBytes("UTF-8"));
gzip.flush();
gzip.close();
return obj.toByteArray();
}
public static String decompress(final byte[] compressed) throws IOException {
final StringBuilder outStr = new StringBuilder();
if ((compressed == null) || (compressed.length == 0)) {
return "";
}
if (isCompressed(compressed)) {
final GZIPInputStream gis = new GZIPInputStream(new ByteArrayInputStream(compressed));
final BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(gis, "UTF-8"));
String line;
while ((line = bufferedReader.readLine()) != null) {
outStr.append(line);
}
} else {
outStr.append(compressed);
}
return outStr.toString();
}
public static boolean isCompressed(final byte[] compressed) {
return (compressed[0] == (byte) (GZIPInputStream.GZIP_MAGIC)) && (compressed[1] == (byte) (GZIPInputStream.GZIP_MAGIC >> 8));
}
}
If you ever need to transfer the zipped content via network or store it as text, you have to use Base64 encoder(such as apache commons codec Base64) to convert the byte array to a Base64 String, and decode the string back to byte array at remote client.
Found an example at Use Zip Stream and Base64 Encoder to Compress Large String Data!
Another example of correct compression and decompression:
#Slf4j
public class GZIPCompression {
public static byte[] compress(final String stringToCompress) {
if (isNull(stringToCompress) || stringToCompress.length() == 0) {
return null;
}
try (final ByteArrayOutputStream baos = new ByteArrayOutputStream();
final GZIPOutputStream gzipOutput = new GZIPOutputStream(baos)) {
gzipOutput.write(stringToCompress.getBytes(UTF_8));
gzipOutput.finish();
return baos.toByteArray();
} catch (IOException e) {
throw new UncheckedIOException("Error while compression!", e);
}
}
public static String decompress(final byte[] compressed) {
if (isNull(compressed) || compressed.length == 0) {
return null;
}
try (final GZIPInputStream gzipInput = new GZIPInputStream(new ByteArrayInputStream(compressed));
final StringWriter stringWriter = new StringWriter()) {
IOUtils.copy(gzipInput, stringWriter, UTF_8);
return stringWriter.toString();
} catch (IOException e) {
throw new UncheckedIOException("Error while decompression!", e);
}
}
}
The problem is this line:
String outStr = obj.toString("UTF-8");
The byte array obj contains arbitrary binary data. You can't "decode" arbitrary binary data as if it was UTF-8. If you try you will get a String that cannot then be "encoded" back to bytes. Or at least, the bytes you get will be different to what you started with ... to the extent that they are no longer a valid GZIP stream.
The fix is to store or transmit the contents of the byte array as-is. Don't try to convert it into a String. It is binary data, not text.
Client send some messages need be compressed, server (kafka) decompress the string meesage
Below is my sample:
compress:
public static String compress(String str, String inEncoding) {
if (str == null || str.length() == 0) {
return str;
}
try {
ByteArrayOutputStream out = new ByteArrayOutputStream();
GZIPOutputStream gzip = new GZIPOutputStream(out);
gzip.write(str.getBytes(inEncoding));
gzip.close();
return URLEncoder.encode(out.toString("ISO-8859-1"), "UTF-8");
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
decompress:
public static String decompress(String str, String outEncoding) {
if (str == null || str.length() == 0) {
return str;
}
try {
String decode = URLDecoder.decode(str, "UTF-8");
ByteArrayOutputStream out = new ByteArrayOutputStream();
ByteArrayInputStream in = new ByteArrayInputStream(decode.getBytes("ISO-8859-1"));
GZIPInputStream gunzip = new GZIPInputStream(in);
byte[] buffer = new byte[256];
int n;
while ((n = gunzip.read(buffer)) >= 0) {
out.write(buffer, 0, n);
}
return out.toString(outEncoding);
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
You can't convert binary data to String. As a solution you can encode binary data and then convert to String. For example, look at this How do you convert binary data to Strings and back in Java?
Related
Compress and Decompress String in Java
I'm trying to compress and decompress a string from producer and consumer environment (which accepts only string as params). So after I compress a string, I'm converting compressed byte array to string and then passing it to producer. Then in consumer part, I'm taking the string back , converting into byte array and then decompressing the string from bytes. Instead of converting into string, if I used byte[], then it is working fine. But what I need is to convert into string and viceversa. Here is my code : public class Compression { public static void main(String[] args) throws Exception{ // TODO Auto-generated method stub String strToCompress = "Helloo!! "; byte[] compressedBytes = compress(strToCompress); String compressedStr = new String(compressedBytes, StandardCharsets.UTF_8); byte[] bytesToDecompress = compressedStr.getBytes(StandardCharsets.UTF_8); String decompressedStr = decompress(bytesToDecompress); System.out.println("Compressed Bytes : "+Arrays.toString(compressedBytes)); System.out.println("Decompressed String : "+decompressedStr); } public static byte[] compress(final String str) throws IOException { if ((str == null) || (str.length() == 0)) { return null; } ByteArrayOutputStream obj = new ByteArrayOutputStream(); GZIPOutputStream gzip = new GZIPOutputStream(obj); gzip.write(str.getBytes("UTF-8")); gzip.flush(); gzip.close(); return obj.toByteArray(); } public static String decompress(final byte[] compressed) throws IOException { final StringBuilder outStr = new StringBuilder(); if ((compressed == null) || (compressed.length == 0)) { return ""; } if (isCompressed(compressed)) { //It is not going into this if part final GZIPInputStream gis = new GZIPInputStream(new ByteArrayInputStream(compressed)); final BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(gis, "UTF-8")); String line; while ((line = bufferedReader.readLine()) != null) { outStr.append(line); } } else { outStr.append(compressed); } return outStr.toString(); } public static boolean isCompressed(final byte[] compressed) { return (compressed[0] == (byte) (GZIPInputStream.GZIP_MAGIC)) && (compressed[1] == (byte) (GZIPInputStream.GZIP_MAGIC >> 8)); } }
You can't assume a compressed string can be treated as UTF-8 as many possible byte combinations are not valid UTF-8. I suggest trying ISO-8859-1 which keeps all 8-bit values untranslated. Also note that while large pieces of text should get smaller, small strings can get larger. Note: This loop will strip any newline characters String line; while ((line = bufferedReader.readLine()) != null) { outStr.append(line); } I suggest instead copying using a char[] which won't drop any characters. char[] chars = new char[512]; for(int len; (len = reader.read(chars)) > 0;) outStr.append(chars, 0, len);
Is there any compression method in java to reduce the number of charaters in a string?
I am currently facing a problem while compressing a string to fewer characters in java. I have a huge string which is about 751396 characters and there is a requirement of compressing the string into a 1500 characters. I have tried GZIP Compressor, Inflater & Deflater but these libraries return byte arrays Then I tried LZ-String compressor in which I was able to get satisfactory results using UTF16 encoding and base64 encoding, But these compression return some characters which are neither alphanumeric nor are they included in the symbols list provided. N.B. The list for the Symbols is [+,-,*,/,!,#,#] is there any other technique of compressing the string into another string with fewer characters and providing at least 30% of compression ratio. The codes which I am using for GZip compression is as follows:- import java.io.BufferedReader; import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import java.io.IOException; import java.io.InputStreamReader; import java.util.zip.GZIPInputStream; import java.util.zip.GZIPOutputStream; public class GZIPCompression { public static byte[] compress(final String str) throws IOException { if ((str == null) || (str.length() == 0)) { return null; } ByteArrayOutputStream obj = new ByteArrayOutputStream(); GZIPOutputStream gzip = new GZIPOutputStream(obj); gzip.write(str.getBytes("UTF-8")); gzip.close(); return obj.toByteArray(); } public static String decompress(final byte[] compressed) throws IOException { String outStr = ""; if ((compressed == null) || (compressed.length == 0)) { return ""; } if (isCompressed(compressed)) { GZIPInputStream gis = new GZIPInputStream(new ByteArrayInputStream(compressed)); BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(gis, "UTF-8")); String line; while ((line = bufferedReader.readLine()) != null) { outStr += line; } } else { outStr = new String(compressed); } return outStr; } public static boolean isCompressed(final byte[] compressed) { return (compressed[0] == (byte) (GZIPInputStream.GZIP_MAGIC)) && (compressed[1] == (byte) (GZIPInputStream.GZIP_MAGIC >> 8)); } } The code for the Inflater & Deflater program is as follows:- import java.io.ByteArrayOutputStream; import java.io.IOException; import java.util.Arrays; import java.util.zip.DataFormatException; import java.util.zip.Deflater; import java.util.zip.Inflater; public class Apple { public static void main(String[] args) { String sr = " [120,-100,-19,89,91,79,-21,56,16,-2,43,40,-49,104,55,113,-18,-68,-91,-23,21,104,90,-38,-62,10,-83,120,48,-83,91,34,-46,-92,-21,-92,8,-124,-8,-17,103,-100,-92,77,-22,-38,-25,112,86,27,-119,106,65,84,-86,103,-58,-10,124,-98,-15,101,-66,-66,43,25,126,-99,-112,116,-109,-60,41,81,46,-34,-107,-57,36,121,14,-29,-43,-20,109,3,77,101,-94,-100,43,120,-79,-115,50,63,-39,-58,25,8,52,16,-52,-97,-62,104,-79,19,-88,32,8,-29,37,-114,-77,-70,-92,71,-78,89,125,-36,-65,-33,-107,5,-50,-40,-120,-86,-11,39,82,-31,95,-77,-40,-48,-21,-94,15,82,13,11,-58,-115,112,-102,-6,-55,-126,-103,-7,126,-65,53,-22,-113,-64,-58,123,-63,97,52,37,-85,53,97,-106,-17,74,55,10,87,79,-39,96,-63,-100,65,76,31,-46,40,-116,73,-39,111,-38,3,81,97,18,108,-41,-113,-124,-126,-52,-48,-100,-62,-50,-89,120,-103,-107,-56,108,-99,9,71,52,92,-123,49,52,91,-41,-109,125,19,44,55,9,-51,102,-124,-82,-61,24,71,-96,5,85,-101,-92,25,-76,-78,48,-55,-51,71,-61,67,-103,-92,-49,6,-45,108,75,73,27,-80,-49,-62,53,-101,23,-64,-25,75,-96,89,103,72,-67,48,-44,11,-107,-83,-105,71,105,-8,-126,35,-119,29,-70,-48,74,-69,-10,-106,-18,92,-48,-98,104,122,-90,-85,48,93,10,-118,2,108,-78,-100,102,-55,38,85,46,-44,115,-27,46,-60,-123,23,-2,-106,82,18,-49,-33,-54,21,26,4,-109,-35,-86,-114,-15,107,23,-125,119,36,-125,70,-102,71,-55,23,-58,96,47,5,-60,-13,-61,-24,-80,-28,96,97,105,-31,52,-100,123,101,60,53,-61,112,33,12,48,54,19,-61,-56,74,-112,116,41,-127,-42,74,41,-28,-69,-4,34,-53,109,-68,-64,-113,17,1,-7,-3,77,-18,-8,44,-55,112,4,-39,-77,27,12,68,61,-102,-92,-23,126,112,-45,48,-64,-91,100,-67,14,-45,-76,88,11,-45,-2,-61,-75,-108,-113,-113,-13,67,0,-99,-114,12,64,-91,17,3,-128,-124,108,14,0,-46,28,-99,3,-32,104,66,0,-35,-82,12,64,-91,-111,0,48,-35,6,1,-40,-74,-54,1,-48,84,93,-120,-96,-33,-105,33,-88,52,98,4,-70,-34,96,8,116,-45,114,121,4,-70,24,-63,-27,-91,12,65,-91,-111,32,112,27,-116,-127,-127,44,-115,71,96,-70,66,4,87,87,50,4,-107,70,-116,-64,112,26,-116,-127,-87,89,54,-113,-64,21,-57,-32,-6,90,-122,-96,-46,-120,17,-104,77,34,-80,-112,-50,111,100,36,-55,-94,-31,80,-122,-96,-46,-120,17,-40,77,30,69,-74,-87,-15,89,-124,36,103,81,16,-56,16,84,26,49,2,71,111,112,31,56,-82,-55,-97,69,-70,110,10,17,-116,70,50,4,-107,70,-116,-64,117,26,68,-96,-87,-90,-31,-16,16,92,49,-124,-101,27,25,-124,74,35,-71,-110,-31,-38,108,16,3,-46,85,126,51,27,-106,56,-111,38,19,25,-122,74,35,-63,-48,-24,-99,-96,25,8,-103,-4,-61,66,-78,-99,-89,83,25,-122,74,35,-63,96,-94,38,115,-55,-46,85,-2,72,-78,52,113,28,102,51,25,-122,74,35,-63,96,55,-71,-93,53,-57,52,-8,45,109,-19,-10,-61,-61,-57,-71,-78,43,43,-90,25,-50,-74,48,57,-100,96,-73,41,-95,51,-118,-25,-49,121,93,48,25,-34,-113,6,-82,-83,-62,-3,91,124,116,-37,-123,67,-56,50,12,3,-23,-102,-19,-72,-106,-125,92,56,-25,-64,39,-16,99,-76,-51,54,-37,18,-28,106,-123,87,-60,-117,23,67,-126,-93,-76,27,1,-100,-36,-29,-68,-108,41,-86,-118,-78,18,41,94,-53,71,-75,8,91,46,-80,-50,-21,20,-74,58,-108,-32,-25,-37,-51,-2,-127,93,-82,-93,-16,69,46,20,10,94,-43,-99,127,-74,-84,84,0,31,-40,12,59,41,76,90,127,-58,-77,64,-46,112,83,-106,10,-29,105,-105,57,-73,-49,64,-107,-91,-62,-95,-55,109,-69,110,-94,-101,-24,-40,-92,55,-99,-43,76,28,-29,-40,-30,-22,-54,-81,15,-14,-15,112,28,108,-45,97,-50,82,28,-89,120,-50,122,117,9,-55,-105,120,74,-24,75,56,39,-2,19,-90,43,-112,56,-4,-117,51,47,16,-123,111,126,54,-53,-124,-64,-94,80,-78,-24,-122,36,90,-28,-21,-100,71,2,-60,53,-55,42,-65,-59,-64,-63,-63,98,76,-109,100,89,-74,-58,-112,-5,-84,116,-37,-105,-117,65,94,-65,-58,-117,-68,113,-49,-118,46,-88,-54,70,-53,86,72,-77,39,-82,79,-25,117,19,-46,-73,118,81,-39,-42,21,-125,52,-35,66,21,-99,-105,-60,-12,-83,84,6,121,-23,-122,-93,48,43,36,-112,-54,62,43,-91,-65,-66,-101,-125,-68,-64,111,-60,-49,-5,-1,-50,79,112,52,-33,-73,-5,-8,-105,45,-40,15,-20,91,-71,-79,-18,122,67,118,-78,49,-55,97,-10,6,-55,89,-47,-95,80,-18,-113,39,-106,-25,-121,-3,-81,-123,-3,107,-118,-86,80,50,-71,-34,99,-65,-27,11,123,-113,59,94,112,59,-101,-98,121,65,-5,-84,-43,-71,-13,38,94,-81,-61,-115,-90,25,-68,47,-63,-99,-60,-105,-102,66,-18,75,32,-13,37,-16,-4,-2,-24,55,93,-71,12,36,-82,92,122,-125,-32,108,-40,-15,120,127,116,-109,31,-94,-35,-110,12,-47,30,120,-83,-50,108,-32,127,110,24,95,6,-53,-9,-90,-3,-65,58,-97,89,-27,-121,-113,-4,-74,-84,81,86,-58,17,101,5,23,-54,33,101,85,-29,20,126,66,89,-63,-33,39,73,43,-3,-41,-92,85,-50,66,125,-98,-76,-54,57,-82,127,69,90,25,123,50,74,117,-10,100,-108,-128,-76,-86,-20,-64,72,64,90,33,70,90,53,-55,89,125,85,-54,7,-23,-4,-3,117,98,-108,-113,-125,-8,119,-27,-87,81,62,22,66,39,78,-7,-24,26,79,124,-98,26,-27,-125,-48,17,113,120,106,-108,-113,-61,111,-28,-109,-93,124,44,62,-117,78,-116,-14,113,-43,-93,26,-9,-28,40,31,75,-27,121,-73,19,-92,124,44,126,51,-97,32,-27,99,-13,-44,-37,9,82,62,38,127,36,-99,32,-27,-29,30,-47,86,95,-97,-14,41,-33,-14,-115,-110,62,-59,-77,-108,39,125,10,-23,79,73,-97,-39,-92,-50,-24,-120,56,-97,-33,-90,-123,12,83,-1,21,45,-92,33,1,115,116,-56,11,25,34,94,-56,-74,63,-59,11,-79,95,43,-72,119,-87,-50,-1,-112,-73,-69,-52,-66,-119,-95,-26,-35,-4,38,-122,-66,-119,-95,-1,27,49,4,-97,31,15,0,-88,84]"; byte[] data = sr.getBytes(); try { String x = new String(decompress(compress(data))); System.out.println("decompressed " + x); } catch (IOException | DataFormatException e) { e.printStackTrace(); } } public static byte[] compress(byte[] data) throws IOException { Deflater deflater = new Deflater(); deflater.setInput(data); ByteArrayOutputStream outputStream = new ByteArrayOutputStream(data.length); deflater.finish(); byte[] buffer = new byte[1024]; while (!deflater.finished()) { int count = deflater.deflate(buffer); outputStream.write(buffer, 0, count); } outputStream.close(); byte[] output = outputStream.toByteArray(); System.out.println("Original: " + data.length); System.out.println("Compressed: " + output.length); return output; } public static byte[] decompress(byte[] data) throws IOException, DataFormatException { Inflater inflater = new Inflater(); inflater.setInput(data); ByteArrayOutputStream outputStream = new ByteArrayOutputStream(data.length); byte[] buffer = new byte[1024]; while (!inflater.finished()) { int count = inflater.inflate(buffer); outputStream.write(buffer, 0, count); } outputStream.close(); byte[] output = outputStream.toByteArray(); System.out.println(); return output; } } A sample of how the data will look like:- "120,-100,-19,89,91,79,-21,56,16,-2,43,40,-49,104,55,113,-18,-68,-91,-23,21,104,90,-38,-62,10,-83,120,48,-83,91,34,-46,-92,-21,-92,8,-124,-8,-17,103,-100,-92,77,-22,-38,-25,112,86,27,-119,106,65,84,-86,103,-58,-10,124,-98,-15,101,-66,-66,43,25,126,-99,-112,116,-109,-60,41,81,46,-34,-107,-57,36,121,14,-29,-43,-20,109,3,77,101,-94,-100,43,120,-79,-115,50,63,-39,-58,25,8,52,16,-52,-97,-62,104,-79,19,-88,32,8,-29,37,-114,-77,-70,-92,71,-78,89,125,-36,-65,-33,-107,5,-50,-40,-120,-86,-11,39,82,-31,95,-77,-40,-48,-21,-94,15,82,13,11,-58,-115,112,-102,-6,-55,-126,-103,-7,126,-65,53,-22,-113,-64,-58,123,-63,97,52,37,-85,53,97,-106,-17,74,55,10,87,79,-39,96,-63,-100,65,76,31,-46,40,-116,73,-39,111,-38,3,81,97,18,108,-41,-113,-124,-126,-52,-48,-100,-62,-50,-89,120,-103,-107,-56,108,-99,9,71,52,92,-123,49,52,91,-41,-109,125,19,44,55,9,-51,102,-124,-82,-61,24,71,-96,5,85,-101,-92,25,-76,-78,48,-55,-51,71,-61,67,-103,-92,-49,6,-45,108,75,73,27,-80,-49,-62,53,-101,23,-64,-25,75,-96,89,103,72,-67,48,-44,11,-107,-83,-105,71,105,-8,-126,35,-119,29,-70,-48,74,-69,-10,-106,-18,92,-48,-98,104,122,-90,-85,48,93,10,-118,2,108,-78,-100,102,-55,38,85,46,-44,115,-27,46,-60,-123,23,-2,-106,82,18,-49,-33,-54,21,26,4,-109,-35,-86,-114,-15,107,23,-125,119,36,-125,70,-102,71,-55,23,-58,96,47,5,-60,-13,-61,-24,-80,-28,96,97,105,-31,52,-100,123,101,60,53,-61,112,33,12,48,54,19,-61,-56,74,-112,116,41,-127,-42,74,41,-28,-69,-4,34,-53,109,-68,-64,-113,17,1,-7,-3,77,-18,-8,44,-55,112,4,-39,-77,27,12,68,61,-102,-92,-23,126,112,-45,48,-64,-91,100,-67,14,-45,-76,88,11,-45,-2,-61,-75,-108,-113,-113,-13,67,0,-99,-114,12,64,-91,17,3,-128,-124,108,14,0,-46,28,-99,3,-32,104,66,0,-35,-82,12,64,-91,-111,0,48,-35,6,1,-40,-74,-54,1,-48,84,93,-120,-96,-33,-105,33,-88,52,98,4,-70,-34,96,8,116,-45,114,121,4,-70,24,-63,-27,-91,12,65,-91,-111,32,112,27,-116,-127,-127,44,-115,71,96,-70,66,4,87,87,50,4,-107,70,-116,-64,112,26,-116,-127,-87,89,54,-113,-64,21,-57,-32,-6,90,-122,-96,-46,-120,17,-104,77,34,-80,-112,-50,111,100,36,-55,-94,-31,80,-122,-96,-46,-120,17,-40,77,30,69,-74,-87,-15,89,-124,36,103,81,16,-56,16,84,26,49,2,71,111,112,31,56,-82,-55,-97,69,-70,110,10,17,-116,70,50,4,-107,70,-116,-64,117,26,68,-96,-87,-90,-31,-16,16,92,49,-124,-101,27,25,-124,74,35,-71,-110,-31,-38,108,16,3,-46,85,126,51,27,-106,56,-111,38,19,25,-122,74,35,-63,-48,-24,-99,-96,25,8,-103,-4,-61,66,-78,-99,-89,83,25,-122,74,35,-63,96,-94,38,115,-55,-46,85,-2,72,-78,52,113,28,102,51,25,-122,74,35,-63,96,55,-71,-93,53,-57,52,-8,45,109,-19,-10,-61,-61,-57,-71,-78,43,43,-90,25,-50,-74,48,57,-100,96,-73,41,-95,51,-118,-25,-49,121,93,48,25,-34,-113,6,-82,-83,-62,-3,91,124,116,-37,-123,67,-56,50,12,3,-23,-102,-19,-72,-106,-125,92,56,-25,-64,39,-16,99,-76,-51,54,-37,18,-28,106,-123,87,-60,-117,23,67,-126,-93,-76,27,1,-100,-36,-29,-68,-108,41,-86,-118,-78,18,41,94,-53,71,-75,8,91,46,-80,-50,-21,20,-74,58,-108,-32,-25,-37,-51,-2,-127,93,-82,-93,-16,69,46,20,10,94,-43,-99,127,-74,-84,84,0,31,-40,12,59,41,76,90,127,-58,-77,64,-46,112,83,-106,10,-29,105,-105,57,-73,-49,64,-107,-91,-62,-95,-55,109,-69,110,-94,-101,-24,-40,-92,55,-99,-43,76,28,-29,-40,-30,-22,-54,-81,15,-14,-15,112,28,108,-45,97,-50,82,28,-89,120,-50,122,117,9,-55,-105,120,74,-24,75,56,39,-2,19,-90,43,-112,56,-4,-117,51,47,16,-123,111,126,54,-53,-124,-64,-94,80,-78,-24,-122,36,90,-28,-21,-100,71,2,-60,53,-55,42,-65,-59,-64,-63,-63,98,76,-109,100,89,-74,-58,-112,-5,-84,116,-37,-105,-117,65,94,-65,-58,-117,-68,113,-49,-118,46,-88,-54,70,-53,86,72,-77,39,-82,79,-25,117,19,-46,-73,118,81,-39,-42,21,-125,52,-35,66,21,-99,-105,-60,-12,-83,84,6,121,-23,-122,-93,48,43,36,-112,-54,62,43,-91,-65,-66,-101,-125,-68,-64,111,-60,-49,-5,-1,-50,79,112,52,-33,-73,-5,-8,-105,45,-40,15,-20,91,-71,-79,-18,122,67,118,-78,49,-55,97,-10,6,-55,89,-47,-95,80,-18,-113,39,-106,-25,-121,-3,-81,-123,-3,107,-118,-86,80,50,-71,-34,99,-65,-27,11,123,-113,59,94,112,59,-101,-98,121,65,-5,-84,-43,-71,-13,38,94,-81,-61,-115,-90,25,-68,47,-63,-99,-60,-105,-102,66,-18,75,32,-13,37,-16,-4,-2,-24,55,93,-71,12,36,-82,92,122,-125,-32,108,-40,-15,120,127,116,-109,31,-94,-35,-110,12,-47,30,120,-83,-50,108,-32,127,110,24,95,6,-53,-9,-90,-3,-65,58,-97,89,-27,-121,-113,-4,-74,-84,81,86,-58,17,101,5,23,-54,33,101,85,-29,20,126,66,89,-63,-33,39,73,43,-3,-41,-92,85,-50,66,125,-98,-76,-54,57,-82,127,69,90,25,123,50,74,117,-10,100,-108,-128,-76,-86,-20,-64,72,64,90,33,70,90,53,-55,89,125,85,-54,7,-23,-4,-3,117,98,-108,-113,-125,-8,119,-27,-87,81,62,22,66,39,78,-7,-24,26,79,124,-98,26,-27,-125,-48,17,113,120,106,-108,-113,-61,111,-28,-109,-93,124,44,62,-117,78,-116,-14,113,-43,-93,26,-9,-28,40,31,75,-27,121,-73,19,-92,124,44,126,51,-97,32,-27,99,-13,-44,-37,9,82,62,38,127,36,-99,32,-27,-29,30,-47,86,95,-97,-14,41,-33,-14,-115,-110,62,-59,-77,-108,39,125,10,-23,79,73,-97,-39,-92,-50,-24,-120,56,-97,-33,-90,-123,12,83,-1,21,45,-92,33,1,115,116,-56,11,25,34,94,-56,-74,63,-59,11,-79,95,43,-72,119,-87,-50,-1,-112,-73,-69,-52,-66,-119,-95,-26,-35,-4,38,-122,-66,-119,-95,-1,27,49,4,-97,31,15,0,-88,84" Is there a better Option for reducing the number of characters in a string without converting it to byte array and unwanted characters? Thanks in advance,
You can compress to a byte[] and then encode the result in Base64. This will only use alphanumeric and fewer symbols which are safe for transfering as text. i.e. it is widely used for this. public static void main(String[] args) { StringBuilder sb = new StringBuilder(); while (sb.length() < 751396) sb.append("Size: ").append(sb.length()).append("\n"); String s = sb.toString(); String s2 = deflateBase64(s); System.out.println("Uncompressed size = " + s.length() + ", compressed size=" + s2.length()); String s3 = inflateBase64(s2); System.out.println("Same after inflating is " + s3.equals(s)); } public static String deflateBase64(String text) { try { ByteArrayOutputStream baos = new ByteArrayOutputStream(); try (Writer writer = new OutputStreamWriter(new DeflaterOutputStream(baos))) { writer.write(text); } return Base64.getEncoder().encodeToString(baos.toByteArray()); } catch (IOException e) { throw new AssertionError(e); } } public static String inflateBase64(String base64) { try (Reader reader = new InputStreamReader( new InflaterInputStream( new ByteArrayInputStream( Base64.getDecoder().decode(base64))))) { StringWriter sw = new StringWriter(); char[] chars = new char[1024]; for (int len; (len = reader.read(chars)) > 0; ) sw.write(chars, 0, len); return sw.toString(); } catch (IOException e) { throw new AssertionError(e); } } prints Uncompressed size = 751400, compressed size=219564 Same after inflating is true
You can use the Deflater a little more: public static byte[] compress(byte[] data) throws IOException { new Deflater(Deflater.BEST_COMPRESSION, true); //... } So you'll have the strongest compression and you'll skip some of the header data. This is the best you can do with the builtin algorithms.
Convert UTF-8 file to ISO-8859-1
New question: Source: C:\\temp\\test.csv "Русслэнд";"Ελλάς";"Réunion" Expected result: C:\\temp\\test.properties "\u0420\u0443\u0441\u0441\u043b\u044d\u043d\u0434";"\u0395\u03bb\u03bb\u03ac\u03c2";"R\u00e9unio" Current result: C:\\temp\\test.properties "????????", "?????","R궮ion" Code: try { File file = new File("C:\\temp\\test.csv"); FileInputStream is = new FileInputStream(file); InputStreamReader r = new InputStreamReader(is, Charset.forName("UTF-8")); FileOutputStream os = new FileOutputStream("C:\\temp\\test.properties"); OutputStreamWriter ow = new OutputStreamWriter(os, "ISO-8859-1"); char[] buffer = new char[1024]; int x; while ((x = r.read(buffer)) == buffer.length) { ow.write(buffer); } ow.write(buffer, 0, x); ow.flush(); ow.close(); r.close(); } catch (IOException e) { e.printStackTrace(); } ** Old question: ** How to convert a big UTF-8 .csv file to ISO-8859-1 in Java 1.6? I want to read a given file, convert and save it. private byte[] convertToISO(File file, Charset enc) { // enc = Charset.forName("UTF-8"); try { FileInputStream is = new FileInputStream(file); InputStreamReader r = new InputStreamReader(is, enc); char[] buffer = new char[1024]; StringWriter w = new StringWriter(); int x = 0; while ((x = r.read(buffer)) == buffer.length) { w.write(buffer); } w.write(buffer, 0, x); w.flush(); String res = w.toString(); r.close(); return res.getBytes("ISO-8859-1"); } catch (IOException e) { System.err.println("Failed to read file: " + file.getPath()); e.printStackTrace(); return null; } }
You're not trying to convert from UTF-8 to ISO-8859-1, you're rather trying to escape unicode characters into an ASCII stream. This is different from just recoding. Here is a function that does just that, it escapes unicode chars on the fly while writing to the output stream: public class OutputEscapingStreamWriter extends OutputStreamWriter { public OutputEscapingStreamWriter(OutputStream out, Charset cs) { super(out, cs); } public OutputEscapingStreamWriter(OutputStream out) { super(out); } public OutputEscapingStreamWriter(OutputStream out, String cs) throws UnsupportedEncodingException { super(out, cs); } public OutputEscapingStreamWriter(OutputStream out, CharsetEncoder cs) { super(out, cs); } private static String HEX_DIGITS = "0123456789abcdef"; #Override public void write(int c) throws IOException { if (c < 128) { super.write(c); } else { super.write(toHexString(c)); } } #Override public void write(String str, int off, int len) throws IOException { for (int i = off; i < (off + len); i++) { write(str.charAt(i)); } } #Override public void write(char cbuf[], int off, int len) throws IOException { for (int i = off; i < (off + len); i++) { write(cbuf[i]); } } private String toHexString(int c) { StringBuilder sb = new StringBuilder("\\u"); sb.append(HEX_DIGITS.charAt((c & 0xF000) >> 12)); sb.append(HEX_DIGITS.charAt((c & 0x0F00) >> 8)); sb.append(HEX_DIGITS.charAt((c & 0x00F0) >> 4)); sb.append(HEX_DIGITS.charAt((c & 0x000F) )); return sb.toString(); } } To use it on a file, just open a FileOutputStream and wrap it with the OutputEscapingStreamWriter like this: OutputEscapingStreamWriter out = new OutputEscapingStreamWriter(new FileOutputStream("file.txt")); A quick and dirty unit test that demonstrates that it produces the output you expect: #Test public void testConversion() throws Exception { ByteArrayOutputStream output = new ByteArrayOutputStream(); OutputEscapingStreamWriter wrapper = new OutputEscapingStreamWriter(output); wrapper.write("\"Русслэнд\";\"Ελλάς\";\"Réunion\""); wrapper.flush(); wrapper.close(); String result = output.toString(); assertEquals("\"\\u0420\\u0443\\u0441\\u0441\\u043b\\u044d\\u043d\\u0434\";\"\\u0395\\u03bb\\u03bb\\u03ac\\u03c2\";\"R\\u00e9union\"", result); }
I'm assuming that you are trying to print the results in to console. By default any jdk/JRE will use UTF-8 while printing anything in console. To use ISO-8859-1 charset, You may either use -Dfile.encoding=ISO-8859-1 in your JVM parameters. Or, You can configure your IDE as shown below
Exception in thread "main" java.util.zip.ZipException: Not in GZIP format. when compressing file using ISO charset
I'm trying in my program to Compress/Decompress data using GZIP streams and when using the charset "ISO-8859-1", everything working well but when changing the charset to "UTF-8", i'm getting the Error message "Exception in thread "main" java.util.zip.ZipException: Not in GZIP format". this is my code: public static String compress(String str) throws IOException { if (str == null || str.length() == 0) { return str; } System.out.println("String length : " + str.length()); ByteArrayOutputStream out = new ByteArrayOutputStream(); GZIPOutputStream gzip = new GZIPOutputStream(out); gzip.write(str.getBytes()); gzip.close(); String outStr = out.toString("UTF-8"); System.out.println("Output String lenght : " + outStr.length()); System.out.println("Output : " + outStr.toString()); return outStr; } public static String decompress(String str) throws IOException { if (str == null || str.length() == 0) { return str; } System.out.println("Input String length : " + str.length()); GZIPInputStream gis = new GZIPInputStream(new ByteArrayInputStream(str.getBytes("UTF-8"))); BufferedReader bf = new BufferedReader(new InputStreamReader(gis, "UTF-8")); String outStr = ""; String line; while ((line=bf.readLine())!=null) { outStr += line; } System.out.println("Output String lenght : " + outStr.length()); return outStr; } public static void main(String[] args) throws IOException { String string = "my data"; System.out.println("after compress:"); String compressed = compress(string); System.out.println(compressed); System.out.println("after decompress:"); String decomp = decompress(compressed); System.out.println(decomp); }
String outStr = out.toString("UTF-8"); This "out" is ziped byte stream,encode it to String then decode it from String will be lose some bytes.This maybe a bug of java. To Resolve it,you can encode bytes to String in compress() to return, such as : String infoBase64Encode = new String(Base64.encodeBase64(out.toByteArray())) decode String to bytes in decompress() to return, such as : String infoBase64Decode = Base64.decodeBase64(decryptAESinfo) complete code as follows: public static String compress(String str) throws IOException { if (str == null || str.length() == 0) { return str; } System.out.println("String length : " + str.length()); ByteArrayOutputStream out = new ByteArrayOutputStream(); GZIPOutputStream gzip = new GZIPOutputStream(out); gzip.write(str.getBytes()); gzip.close(); String outStr = new String(Base64.encodeBase64(out.toByteArray())); System.out.println("Output String lenght : " + outStr.length()); System.out.println("Output : " + outStr.toString()); return outStr; } public static String decompress(String str) throws IOException { if (str == null || str.length() == 0) { return str; } System.out.println("Input String length : " + str.length()); GZIPInputStream gis = new GZIPInputStream(new ByteArrayInputStream(Base64.decodeBase64(str))); String outStr = ""; ByteArrayOutputStream out = new ByteArrayOutputStream(); byte[] buffer = new byte[256]; int n; while ((n = gis.read(buffer)) >= 0) { out.write(buffer, 0, n); } System.out.println("Output String lenght : " + outStr.length()); return new String(out.toByteArray()); } public static void main(String[] args) throws IOException { String string = "my data"; System.out.println("after compress:"); String compressed = compress(string); System.out.println(compressed); System.out.println("after decompress:"); String decomp = decompress(compressed); System.out.println(decomp); }
Compress a string to gzip in Java
public static String compressString(String str) throws IOException{ if (str == null || str.length() == 0) { return str; } ByteArrayOutputStream out = new ByteArrayOutputStream(); GZIPOutputStream gzip = new GZIPOutputStream(out); gzip.write(str.getBytes()); gzip.close(); Gdx.files.local("gziptest.gzip").writeString(out.toString(), false); return out.toString(); } When I save that string to a file, and run gunzip -d file.txt in unix, it complains: gzip: gzip.gz: not in gzip format
Try to use BufferedWriter public static String compressString(String str) throws IOException{ if (str == null || str.length() == 0) { return str; } BufferedWriter writer = null; try{ File file = new File("your.gzip") GZIPOutputStream zip = new GZIPOutputStream(new FileOutputStream(file)); writer = new BufferedWriter(new OutputStreamWriter(zip, "UTF-8")); writer.append(str); } finally{ if(writer != null){ writer.close(); } } } About your code example try: public static String compressString(String str) throws IOException{ if (str == null || str.length() == 0) { return str; } ByteArrayOutputStream out = new ByteArrayOutputStream(str.length()); GZIPOutputStream gzip = new GZIPOutputStream(out); gzip.write(str.getBytes()); gzip.close(); byte[] compressedBytes = out.toByteArray(); Gdx.files.local("gziptest.gzip").writeBytes(compressedBytes, false); out.close(); return out.toString(); // I would return compressedBytes instead String }
Try that : //... String string = "string"; FileOutputStream fos = new FileOutputStream("filename.zip"); GZIPOutputStream gzos = new GZIPOutputStream(fos); gzos.write(string.getBytes()); gzos.finish(); //...
Save bytes from out with FileOutputStream FileOutputStream fos = new FileOutputStream("gziptest.gz"); fos.write(out.toByteArray()); fos.close(); out.toString() seems suspicious, the result will be unreadable, if you dont care then why not to return byte[], if you do care it would look better as hex or base64 string.