Inconsistent output of zip functions between Python and Java methods - java

I'm integrating my system with external one.
My system is written in Java when external is writen in Python.
This system requires to compress body of request before sending it.
Below are functions used for compression:
def decompress(input_bytes):
bytes = input_bytes.encode('utf-8')
deflate_byte = base64.decodebytes(bytes)
output = zlib.decompress(deflate_byte)
out_str = output.decode('utf-8')
return out_str
def compress(input_str):
input_bytes = input_str.encode('utf-8')
compress = zlib.compress(input_bytes)
output_bytes = base64.encodebytes(compress)
out_str = output_bytes.decode('utf-8')
return out_str
I have developed code for compression/decompression in Java:
public byte[] compress(String str) {
byte[] data = str.getBytes(UTF_8);
Deflater deflater = new Deflater();
deflater.setInput(data);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream(data.length);
deflater.finish();
byte[] buffer = new byte[1024];
while (!deflater.finished()) {
int count = deflater.deflate(buffer); // returns the generated code... index
outputStream.write(buffer, 0, count);
}
try {
outputStream.close();
} catch (IOException e) {
throw new RuntimeException(e);
}
byte[] output = outputStream.toByteArray();
return Base64.decodeBase64(output);
}
#Override
public String decompress(byte[] data) {
Inflater inflater = new Inflater();
inflater.setInput(data);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream(data.length);
byte[] buffer = new byte[1024];
while (!inflater.finished()) {
int count = 0;
try {
count = inflater.inflate(buffer);
} catch (DataFormatException e) {
throw new RuntimeException(e);
}
outputStream.write(buffer, 0, count);
}
try {
outputStream.close();
} catch (IOException e) {
throw new RuntimeException(e);
}
byte[] output = outputStream.toByteArray();
return new String(output);
}
But the output of compression methods is incompatible:
Input:
test
Output of compression implemented in Python:
eJwrSS0uAQAEXQHB
Output of compression implemented in Java:
��>
I would be grateful for explanation of that difference and help how to get in Java solution that is equivalent to those Python methods.
Thank you in advance.

Related

Java 1.8 and below equivalent for InputStream.readAllBytes()

I wrote a program that gets all bytes from an InputStream in Java 9 with
InputStream.readAllBytes()
Now, I want to export it to Java 1.8 and below. Is there an equivalent function? Couldn't find one.
InputStream.readAllBytes() is available since java 9 not java 7...
Other than that you can (no thirdparties):
byte[] bytes = new byte[(int) file.length()];
DataInputStream dataInputStream = new DataInputStream(new FileInputStream(file));
dataInputStream .readFully(bytes);
Or if you don't mind using thirdparties (Commons IO):
byte[] bytes = IOUtils.toByteArray(is);
Guava also helps:
byte[] bytes = ByteStreams.toByteArray(inputStream);
You can use the good old read method like this:
public static byte[] readAllBytes(InputStream inputStream) throws IOException {
final int bufLen = 1024;
byte[] buf = new byte[bufLen];
int readLen;
IOException exception = null;
try {
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
while ((readLen = inputStream.read(buf, 0, bufLen)) != -1)
outputStream.write(buf, 0, readLen);
return outputStream.toByteArray();
} catch (IOException e) {
exception = e;
throw e;
} finally {
if (exception == null) inputStream.close();
else try {
inputStream.close();
} catch (IOException e) {
exception.addSuppressed(e);
}
}
}

How to make FileInputStream read() function stop after having read a certain byte?

I'm trying to create a program which compresses and saves the bytes of files into a .txt file for decompression. So far, I've been succesful only saving the bytes of one file to the .txt file. However, when saving multiple files, I can't find a way to let the program know which bytes belong to which file. How can I instruct the program to stop reading the bytes when it encounters the bytes of the next program? My compress function:
private void compress(File source, File destination) {
try {
byte[] buffer = new byte[1024];
FileInputStream fis = new FileInputStream(source);
GZIPOutputStream gzip = new GZIPOutputStream(new FileOutputStream(destination, true));
int len;
while ((len = fis.read(buffer)) != -1) {
System.out.println(len);
gzip.write(buffer, 0, len);
}
gzip.finish();
gzip.close();
fis.close();
} catch (FileNotFoundException e) {
System.out.println("File couldn't be located. Please check the path given.");
} catch (IOException e) {
e.printStackTrace();
}
}
and my decompress function:
private byte[] decompress(File source) {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
try {
byte[] buffer = new byte[1024];
GZIPInputStream gzip = new GZIPInputStream(new FileInputStream(source));
int len;
while ((len = gzip.read(buffer)) != -1) {
baos.write(buffer, 0, len);
}
gzip.close();
} catch (FileNotFoundException e) {
System.out.println("File couldn't be located. Please check the path given.");
} catch (IOException e) {
e.printStackTrace();
}
return baos.toByteArray();
}

File md5 hash changes when chunking it (for netty transfer)

Question at the bottom
I'm using netty to transfer a file to another server.
I limit my file-chunks to 1024*64 bytes (64KB) because of the WebSocket protocol. The following method is a local example what will happen to the file:
public static void rechunck(File file1, File file2) {
FileInputStream is = null;
FileOutputStream os = null;
try {
byte[] buf = new byte[1024*64];
is = new FileInputStream(file1);
os = new FileOutputStream(file2);
while(is.read(buf) > 0) {
os.write(buf);
}
} catch (IOException e) {
Controller.handleException(Thread.currentThread(), e);
} finally {
try {
if(is != null && os != null) {
is.close();
os.close();
}
} catch (IOException e) {
Controller.handleException(Thread.currentThread(), e);
}
}
}
The file is loaded by the InputStream into a ByteBuffer and directly written to the OutputStream.
The content of the file cannot change while this process.
To get the md5-hashes of the file I've wrote the following method:
public static String checksum(File file) {
InputStream is = null;
try {
is = new FileInputStream(file);
MessageDigest digest = MessageDigest.getInstance("MD5");
byte[] buffer = new byte[8192];
int read = 0;
while((read = is.read(buffer)) > 0) {
digest.update(buffer, 0, read);
}
return new BigInteger(1, digest.digest()).toString(16);
} catch(IOException | NoSuchAlgorithmException e) {
Controller.handleException(Thread.currentThread(), e);
} finally {
try {
is.close();
} catch(IOException e) {
Controller.handleException(Thread.currentThread(), e);
}
}
return null;
}
So: just in theory it should return the same hash, shouldn't it? The problem is that it returns two different hashes that do not differ with every run.. file size stays the same and the content either.
When I run the method once for in: file-1, out: file-2 and again with in: file-2 and out: file-3 the hashes of file-2 and file-3 are the same! This means the method will properly change the file every time the same way.
1. 58a4a9fbe349a9e0af172f9cf3e6050a
2. 7b3f343fa1b8c4e1160add4c48322373
3. 7b3f343fa1b8c4e1160add4c48322373
Here is a little test that compares all buffers if they are equivalent. Test is positive. So there aren't any differences.
File file1 = new File("controller/templates/Example.zip");
File file2 = new File("controller/templates2/Example.zip");
try {
byte[] buf1 = new byte[1024*64];
byte[] buf2 = new byte[1024*64];
FileInputStream is1 = new FileInputStream(file1);
FileInputStream is2 = new FileInputStream(file2);
boolean run = true;
while(run) {
int read1 = is1.read(buf1), read2 = is2.read(buf2);
String result1 = Arrays.toString(buf1), result2 = Arrays.toString(buf2);
boolean test = result1.equals(result2);
System.out.println("1: " + result1);
System.out.println("2: " + result2);
System.out.println("--- TEST RESULT: " + test + " ----------------------------------------------------");
if(!(read1 > 0 && read2 > 0) || !test) run = false;
}
} catch (IOException e) {
e.printStackTrace();
}
Question: Can you help me chunking the file without changing the hash?
while(is.read(buf) > 0) {
os.write(buf);
}
The read() method with the array argument will return the number of files read from the stream. When the file doesn't end exactly as a multiple of the byte array length, this return value will be smaller than the byte array length because you reached the file end.
However your os.write(buf); call will write the whole byte array to the stream, including the remaining bytes after the file end. This means the written file gets bigger in the end, therefore the hash changed.
Interestingly you didn't make the mistake when you updated the message digest:
while((read = is.read(buffer)) > 0) {
digest.update(buffer, 0, read);
}
You just have to do the same when you "rechunk" your files.
Your rechunk method has a bug in it. Since you have a fixed buffer in there, your file is split into ByteArray-parts. but the last part of the file can be smaller than the buffer, which is why you write too many bytes in the new file. and that's why you do not have the same checksum anymore. the error can be fixed like this:
public static void rechunck(File file1, File file2) {
FileInputStream is = null;
FileOutputStream os = null;
try {
byte[] buf = new byte[1024*64];
is = new FileInputStream(file1);
os = new FileOutputStream(file2);
int length;
while((length = is.read(buf)) > 0) {
os.write(buf, 0, length);
}
} catch (IOException e) {
Controller.handleException(Thread.currentThread(), e);
} finally {
try {
if(is != null)
is.close();
if(os != null)
os.close();
} catch (IOException e) {
Controller.handleException(Thread.currentThread(), e);
}
}
}
Due to the length variable, the write method knows that until byte x of the byte array, only the file is off, then there are still old bytes in it that no longer belong to the file.

Compression and Encoding giving Wrong results in Strings

I'm trying to compress a string . I'm using Base64 encoding and decoding to conversion of String to Bytes and viceversa.
import org.apache.axis.encoding.Base64;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.util.zip.Deflater;
import java.util.zip.Inflater;
public class UtilTesting {
public static void main(String[] args) {
try {
String original = "I am the god";
System.out.println("Starting Zlib");
System.out.println("==================================");
String zcompressed = compressString(original);
String zdecompressed = decompressString(zcompressed);
System.out.println("Original String: "+original);
System.out.println("Compressed String: "+zcompressed);
System.out.println("Decompressed String: "+zdecompressed);
} catch (IOException e) {
e.printStackTrace();
}
public static String compressString(String uncompressedString){
String compressedString = null;
byte[] bytes = Base64.decode(uncompressedString);
try {
bytes = compressBytes(bytes);
compressedString = Base64.encode(bytes);
} catch (IOException e) {
e.printStackTrace();
}
return compressedString;
}
public static String decompressString(String compressedString){
String decompressedString = null;
byte[] bytes = Base64.decode(compressedString);
try {
bytes = decompressBytes(bytes);
decompressedString = Base64.encode(bytes);
} catch (IOException e) {
e.printStackTrace();
} catch (DataFormatException e) {
e.printStackTrace();
}
return decompressedString;
}
public static byte[] compressBytes(byte[] data) throws IOException {
Deflater deflater = new Deflater();
deflater.setInput(data);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream(data.length);
deflater.finish();
byte[] buffer = new byte[1024];
while (!deflater.finished()) {
int count = deflater.deflate(buffer); // returns the generated code... index
outputStream.write(buffer, 0, count);
}
outputStream.close();
byte[] output = outputStream.toByteArray();
return output;
}
public static byte[] decompressBytes(byte[] data) throws IOException, DataFormatException {
Inflater inflater = new Inflater();
inflater.setInput(data);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream(data.length);
byte[] buffer = new byte[1024];
while (!inflater.finished()) {
int count = inflater.inflate(buffer);
outputStream.write(buffer, 0, count);
}
outputStream.close();
byte[] output = outputStream.toByteArray();
return output;
}
}
This is giving the result :
Starting Zlib
==================================
Original String: I am the god
Compressed String: eJxTXLm29YUGAApUAw0=
Decompressed String: Iamthego
As you can see, it is missing the white-spaces and it even lost the final letter in the given String.
Can someone please suggest what is wrong with this code.
I'm following below steps:
Decode
compress
encode
save
retrieve
decode
decompress
encode.
Please help. Thank you.
In compressString, replace:
Base64.decode(uncompressedString)
with
uncompressString.getBytes(StandardCharsets.UTF_8)
You're not passing in a base64-encoded string; you simply want the bytes of the input string. Note that spaces never appear in base64 encoding, so they are likely treated as redundant and discarded.
Similarly in decompressString, replace:
Base64.encode(bytes)
with
new String(bytes, StandardCharsets.UTF_8)

How to return and delete file?

I want to return file (read or load) from method and then remove this file.
public File method() {
File f = loadFile();
f.delete();
return f;
}
But when I delete a file, I delete it from disk and then exists only descriptor to non-existing file on return statement. So what is the most effective way for it.
You can't keep the File handle of deleted file, rather you can keep the data in a byte array temporarily, delete the file and then return the byte array
public byte[] method() {
File f =loadFile();
FileInputStream fis = new FileInputStream(f);
byte[] data = new byte[fis.available()];
fis.read(data);
f.delete();
return data;
}
// Edit Aproach 2
FileInputStream input = new FileInputStream(f);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
byte[] buf = new byte[1024];
int bytesRead = input.read(buf);
while (bytesRead != -1) {
baos.write(buf, 0, bytesRead);
bytesRead = input.read(buf);
}
baos.flush();
byte[] bytes = baos.toByteArray();
you can construct the file data from byte array
However, my suggestion is to use IOUtils.toByteArray(InputStream input) from Jakarta commons, why do you want re write when already in plate
Assuming you want to return the file to the browser, this is how I did it :
File pdf = new File("file.pdf");
if (pdf.exists()) {
try {
InputStream inputStream = new FileInputStream(pdf);
httpServletResponse.setContentType("application/pdf");
httpServletResponse.addHeader("content-disposition", "inline;filename=file.pdf");
copy(inputStream, httpServletResponse.getOutputStream());
inputStream.close();
pdf.delete();
} catch (Exception e) {
e.printStackTrace();
}
}
private static int copy(InputStream input, OutputStream output) throws IOException {
byte[] buffer = new byte[512];
int count = 0;
int n = 0;
while (-1 != (n = input.read(buffer))) {
output.write(buffer, 0, n);
count += n;
}
return count;
}

Categories

Resources