Are zlib.compress on Python and Deflater.deflate on Java (Android) compatible? - java

I am porting a Python application to Android and, at some point, this application has to communicate with a Web Service, sending it compressed data.
In order to do that it uses the next method:
def stuff(self, data):
"Convert into UTF-8 and compress."
return zlib.compress(simplejson.dumps(data))
I am using the next method to try to emulate this behavior in Android:
private String compressString(String stringToCompress)
{
Log.i(TAG, "Compressing String " + stringToCompress);
byte[] input = stringToCompress.getBytes();
// Create the compressor with highest level of compression
Deflater compressor = new Deflater();
//compressor.setLevel(Deflater.BEST_COMPRESSION);
// Give the compressor the data to compress
compressor.setInput(input);
compressor.finish();
// Create an expandable byte array to hold the compressed data.
// You cannot use an array that's the same size as the orginal because
// there is no guarantee that the compressed data will be smaller than
// the uncompressed data.
ByteArrayOutputStream bos = new ByteArrayOutputStream(input.length);
// Compress the data
byte[] buf = new byte[1024];
while (!compressor.finished())
{
int count = compressor.deflate(buf);
bos.write(buf, 0, count);
}
try {
bos.close();
} catch (IOException e)
{
}
// Get the compressed data
byte[] compressedData = bos.toByteArray();
Log.i(TAG, "Finished to compress string " + stringToCompress);
return new String(compressedData);
}
But the HTTP response from the server is not correct and I guess it is because the result of the compression in Java is not the same as the one in Python.
I ran a little test compressing "a" both with zlib.compress and deflate.
Python, zlib.compress() -> x%9CSJT%02%00%01M%00%A6
Android, Deflater.deflate -> H%EF%BF%BDK%04%00%00b%00b
How should I compress the data in Android to obtain the same value of zlib.compress() in Python?
Any help, guidance or pointer is greatly appreciated!

compress and deflate are different compression algorithms so the answer is they will not be compatible. As an example of the difference here is 'a' compressed using the two algorithms via Tcl:
% binary encode hex [zlib compress a]
789c4b040000620062
% binary encode hex [zlib deflate a]
4b0400
Your python code is indeed doing compress. And the android code is doing deflate, however you are also getting the UTF-8 byte order mark prepended to the android version (\xef\xbf\xbf)
You can emit deflate data using python:
def deflate(data):
zobj = zlib.compressobj(6,zlib.DEFLATED,-zlib.MAX_WBITS,zlib.DEF_MEM_LEVEL,0)
zdata = zobj.compress(data)
zdata += zobj.flush()
return zdata
>>> deflate("a")
'K\x04\x00'

Although they are not exactly the same algorithms, it seems that they are totally compatible (meaning that if you compress, for example, an String using Deflater.deflate you can correctly uncompress it using zlib).
What caused my problem was that all form variables in a POST need to be percent escaped, and the Android application was not doing that. Encoding the data to Base64 before sending it, and modifying the server to decode it using Base64 before uncompressing it using zlib solved the problem.

Does byte[] input = stringToCompress.getBytes("utf-8"); help? In case your platform's default encoding is not UTF-8, this will force the encoding String -> bytes to use UTF-8. Also, the same goes for the last line of your code where you create a new String - you may want to explicitly specify UTF-8 as the decoding Charset.

Related

Is there a way to read files in bytes?

I am working on changing the code completed with php to java.
I have a test.tpf file encrypted with SEED and the file is encoded in ANSI.
The test.tpf file contains a string and image information for base64 encoding and output.
I have to cut 16 bytes each and read the file to decrypt test.tpf.
So I used fileInputStream to save and decrypt the bytes that I read in a 16-size byte array.
int nReadCur=0;
byte[] outbuf = new byte[16];
int nRead =0;
FileInputStream fis = new FileInputStream(tpf);
while (true) {
byte[] fileContentArray=new byte[16];
nRead = fis.read(fileContentArray);
nReadCur = nReadCur + nRead;
seed.SeedDecrypt(fileContentArray, pdwRoundKey, outbuf); //Decryption
String dbgmsg =new String(outbuf,"MS949");
mergeStr+=dbgmsg;
if(nFileSize<=nReadCur || nRead==-1)
break;
}//while
Then, they encoded the part corresponding to the image information in base64.
In js, the base64 code was changed to a string to receive the string and base64 information in json and display it on the screen.
String[] dataExplode=mergeStr.split("<TextData>");
String[] dataExplode1=dataExplode[1].split("</FileInfo>");
String[] dataExplode2=dataExplode1[0].split("</TextData>");
String textData = null;
String imageData = null;
textData=dataExplode2[0];
imageData=dataExplode1[1];
Encoder encoder=Base64.getEncoder();
imageData=encoder.encodeToString(imageData.getBytes());
JSONArray ja=new JSONArray();
ja.put(textData);
ja.put(imageData);
result.put("imageContent", ja);
However, it seems that the file cannot be read properly.
Compared to the result value of the php code I have, the string is incorrectly entered.
My Eclipse basic encoding is UTF8, so I think this problem is due to encoding.When I read files using fileInputStream, I wanted to set up characters and read them.
I don't know how to read bytes at this time.
How can I read files 16 bytes at a time after setting up the encoding?
Also, I would like to know if there is a mistake in my code.
My java version is 1.8 and I use spring 3.1.1
++)add
I succeeded in making a 16-size outbuf array into one array using the ByteArrayOutputStream.
ByteArrayOutputStream baos =new ByteArrayOutputStream();
.
.
.
seed.SeedDecrypt(fileContentArray, pdwRoundKey, outbuf);
baos.write(outbuf, 0, 16);
break;
}
}//while
mergeStr=new String(baos.toByteArray(),"MS949");
.
.
.
However, compared to the php code I have, I found that the result value of php is different from the result value of java.
in java:System.out.println("mergeStr:"+mergeStr.length()+" / image:"+imageData.length());
java console: mergeStr:69716 / image:168092
in php:alog("mergeStr: ".strlen($mergeStr)." / imageData : ".strlen($imageData ));
php log: mergeStr: 85552 / imageData: 111860
Since the result string decoded by java and php is different, the result of java and php is different for the value encoded by imageData as base64
When you read a file in bytes, there is no encoding.
Only when you convert/interprete the bytes as characters, encoding becomes important.
So if you want to stick to reading bytes, FileInputStream is the way.
If you want to read characters, a FileReader is the way.
Note you can specify the encoding to be used on the constructor - which is necessary if the file is not in the system default encoding.
Edit:
How can I read files 16 bytes at a time after setting up the encoding?
You simply cannot. But you can read 16 bytes, then when you convert to String specify the encoding.

Uploading image to server corrupts the image

I have in my application a image upload method that need to send a image and a string to my server.
The problem is that the server receives the content (image and string) but when it saves the image on the disk it is corrupted and can't be opened.
This is the relevant part of the script.
HttpPost httpPost = new HttpPost(url);
Bitmap bmp = ((BitmapDrawable) imageView.getDrawable()).getBitmap();
ByteArrayOutputStream stream = new ByteArrayOutputStream();
bmp.compress(Bitmap.CompressFormat.PNG, 100, stream);
byte[] byteArray = stream.toByteArray();
String byteStr = new String(byteArray);
StringBuilder stringBuilder = new StringBuilder();
stringBuilder.append("--"+boundary+"\r\n");
stringBuilder.append("Content-Disposition: form-data; name=\"content\"\r\n\r\n");
stringBuilder.append(message+"\r\n");
stringBuilder.append("--"+boundary+"\r\n");
stringBuilder.append("Content-Disposition: form-data; name=\"image\"; filename=\"image.jpg\"\r\n");
stringBuilder.append("Content-Type: image/jpeg\r\n\r\n");
stringBuilder.append(byteStr);
stringBuilder.append("\r\n");
stringBuilder.append("--"+boundary+"--\r\n");
StringEntity entity = new StringEntity(stringBuilder.toString());
httpPost.setEntity(entity);
I can't change the server because other clients use it and it works for them. I just need to understand why the image is being corrupted.
When you do new String(byteArray), it's converting binary into the default character set (which is typically UTF-8). Most character sets aren't a suitable encoding for binary data. In other words if you were to encode certain binary strings to UTF-8 and then decode back to binary, you would not get the same binary string.
Since you're using multipart encoding, you need to write directly to the stream of the entity. Apache HTTP Client has helpers for doing this. See this guide, or this Android guide to uploading with multipart.
If you NEED to using strings only, you can safely convert your byte array to a string with
String byteStr = android.util.Base64.encode(byteArray, android.util.Base64.DEFAULT);
But it's important to note that your server will need to Base64 decode the string back to a byte array and save it to an image. Further, the transfer size will be greater because Base64 encoding isn't as space efficient as raw binary.
Your solutions above is not working because you are using new String(byteArray). The constructor encodes the byte array using the default encoding - see What is the default encoding - and it is very likely, that you have byte sequences in your data that cannot be encoded into a character.
To be more precise, a charset defines how characters are represented as bytes.
Most charsets have more than 256 characters. That is why you need more than one byte to represent a character. UTF-8 and UTF-16 uses up to four bytes.
So you have a mapping between the number space and the character space and this mapping is not bejectiv a priori. So it is very likely that there exist a number in the number space that have no character mapped to it.
The solution #Samuel suggested is foolproof because Base64 uses A–Z, a–z, 0–9, + , / and terminates with = to represent a byte. I would prefer this solution!
If you don't want or cannot use Base64, than you can try just to throw in every byte as it is into the StringBuilder hoping that the server does not do any encoding before you get it.
for (byte b : byteArray) {
stringBuilder.append((char)b);
}
I do not recommand that solution in general, but it may help you to get your stuff done.

Java: How to restore string data which was compressed with python's zlib encoder

I send data through the socket from python to java.
So on the python 2.7 side I have:
s = "this is test str"
compressed = s.encode('zlib')
push_to_tcp_socket(compressed)
So I need to restore initial string on the java side. How I could do that?
You will need to send gthe length of the string, or close the connection so you know where the last byte is.
The most likely class to help you is the DeflatorInputStream which youc an use once the bytes have been read. This is a bare wrapper for the zlib class. I haven't tested it works with python but it's you best option.
You can try other compressions like Snappy or LZ4 which have cross platform support.
I assumed you already know the networking part on Java. You can use Inflater class to get your string like in javadocs
// Decompress the bytes
Inflater decompresser = new Inflater();
decompresser.setInput(output, 0, compressedDataLength);
byte[] result = new byte[100];
int resultLength = decompresser.inflate(result);
decompresser.end();
//Then create string in java i assumed you are using python 2 and string is ASCII
String str = new String(result,"US-ASCII")

transfer jpeg from c++ to android (java)

I am struggling with the transfer of a simple jpeg file inside an ID3v2 tag from c++ over a TCP socket to java (Android). The library "taglib" offers to extract this file and I am able to save the jpeg as a new file.
The send function looks like this
char *parameter_full = new char[f3->picture().size()+2];
sprintf(parameter_full,"%s\n\0",f3->picture().data());
// send
result = send(c,parameter_full,strlen(parameter_full),0);
delete[] parameter_full;
where
f3->picture().data() returns a pointer to the internal data structure (it returns char*) and
f3->picture().size() returns the size of the array.
Then Android receives it with
String imageString = inFromServer.readLine();
byte[] imageBytes = imageString.getBytes();
Bitmap cover = BitmapFactory.decodeByteArray(imageBytes,0,imageBytes.length);
But somehow decodeByteArray always returns null. My idea is that Java doesn't receive the image correctly because imageString only consists of 4 characters...while the extracted jpeg file has a size of 12.7 KB.
But what has gone wrong?
Martin
You shouldn't use string functions on byte data because 0 values are taken as string terminators. Try looking into memcpy on the C++ side if you need to copy the char* and also the byte[] read functions for InputStream on the Java side.

How do you convert binary data to Strings and back in Java?

I have binary data in a file that I can read into a byte array and process with no problem. Now I need to send parts of the data over a network connection as elements in an XML document. My problem is that when I convert the data from an array of bytes to a String and back to an array of bytes, the data is getting corrupted. I've tested this on one machine to isolate the problem to the String conversion, so I now know that it isn't getting corrupted by the XML parser or the network transport.
What I've got right now is
byte[] buffer = ...; // read from file
// a few lines that prove I can process the data successfully
String element = new String(buffer);
byte[] newBuffer = element.getBytes();
// a few lines that try to process newBuffer and fail because it is not the same data anymore
Does anyone know how to convert binary to String and back without data loss?
Answered: Thanks Sam. I feel like an idiot. I had this answered yesterday because my SAX parser was complaining. For some reason when I ran into this seemingly separate issue, it didn't occur to me that it was a new symptom of the same problem.
EDIT: Just for the sake of completeness, I used the Base64 class from the Apache Commons Codec package to solve this problem.
String(byte[]) treats the data as the default character encoding. So, how bytes get converted from 8-bit values to 16-bit Java Unicode chars will vary not only between operating systems, but can even vary between different users using different codepages on the same machine! This constructor is only good for decoding one of your own text files. Do not try to convert arbitrary bytes to chars in Java!
Encoding as base64 is a good solution. This is how files are sent over SMTP (e-mail). The (free) Apache Commons Codec project will do the job.
byte[] bytes = loadFile(file);
//all chars in encoded are guaranteed to be 7-bit ASCII
byte[] encoded = Base64.encodeBase64(bytes);
String printMe = new String(encoded, "US-ASCII");
System.out.println(printMe);
byte[] decoded = Base64.decodeBase64(encoded);
Alternatively, you can use the Java 6 DatatypeConverter:
import java.io.*;
import java.nio.channels.*;
import javax.xml.bind.DatatypeConverter;
public class EncodeDecode {
public static void main(String[] args) throws Exception {
File file = new File("/bin/ls");
byte[] bytes = loadFile(file, new ByteArrayOutputStream()).toByteArray();
String encoded = DatatypeConverter.printBase64Binary(bytes);
System.out.println(encoded);
byte[] decoded = DatatypeConverter.parseBase64Binary(encoded);
// check
for (int i = 0; i < bytes.length; i++) {
assert bytes[i] == decoded[i];
}
}
private static <T extends OutputStream> T loadFile(File file, T out)
throws IOException {
FileChannel in = new FileInputStream(file).getChannel();
try {
assert in.size() == in.transferTo(0, in.size(), Channels.newChannel(out));
return out;
} finally {
in.close();
}
}
}
If you encode it in base64, this will turn any data into ascii safe text, but base64 encoded data is larger than the orignal data
See this question, How do you embed binary data in XML?
Instead of converting the byte[] into String then pushing into XML somewhere, convert the byte[] to a String via BASE64 encoding (some XML libraries have a type to do this for you). The BASE64 decode once you get the String back from XML.
Use http://commons.apache.org/codec/
You data may be getting messed up due to all sorts of weird character set restrictions and the presence of non-priting characters. Stick w/ BASE64.
How are you building your XML document? If you use java's built in XML classes then the string encoding should be handled for you.
Take a look at the javax.xml and org.xml packages. That's what we use for generating XML docs, and it handles all the string encoding and decoding quite nicely.
---EDIT:
Hmm, I think I misunderstood the problem. You're not trying to encode a regular string, but some set of arbitrary binary data? In that case the Base64 encoding suggested in an earlier comment is probably the way to go. I believe that's a fairly standard way of encoding binary data in XML.

Categories

Resources