Decompressing PHP's gzcompress in Java - java

I'm trying to decompress a json object in Java that was initially compressed in PHP. Here's how it gets compressed into PHP:
function zip_json_encode(&$arr) {
$uncompressed = json_encode($arr);
return pack('L', strlen($uncompressed)).gzcompress($uncompressed);
}
and decoded (again in PHP):
function unzip_json_decode(&$data) {
$uncompressed = #gzuncompress(substr($data,4));
return json_decode($uncompressed, $array_instead_of_object);
}
That gets put into MySQL and now it must be pulled out of the db by Java. We pull it out from the ResultSet like this:
String field = rs.getString("field");
I then pass that string to a method to decompress it. This is where it falls apart.
private String decompressHistory(String historyString) throws SQLException {
StringBuffer buffer = new StringBuffer();
try {
byte[] historyBytes = historyString.substring(4).getBytes();
ByteArrayInputStream bin = new ByteArrayInputStream(historyBytes);
InflaterInputStream in = new InflaterInputStream(bin, new Inflater(true));
int len;
byte[] buf = new byte[1024];
while ((len = in.read(buf)) != -1) {
// buf should be decoded, right?
}
} catch (IOException e) {
e.getStackTrace();
}
return buffer.toString();
}
Not quite sure what's going wrong here, but any pointers would be appreciated!

You need to get rid of the true in Inflater(true). Use just Inflater(). The true makes it expect raw deflate data. Without the true, it is expecting zlib-wrapped deflate data. PHP's gzcompress() produces zlib-wrapped deflate data.

Gzipped data is binary, byte[]. Using String, Unicode text, not only needs conversion, but is faulty.
For instance this involves a conversion:
byte[] historyBytes = historyString.substring(4).getBytes();
byte[] historyBytes = historyString.substring(4).getBytes("ISO-8859-1");
The first version uses the default platform encoding, making the application non-portable.
The first to-do is to use binary data in the database as VARBINARY or BLOB.
ImputStream field = rs.getBinaryStream("field");
try (InputStream in = new GZIPInputStream(field)) {
...
}
Or so. Mind the other answer.

In the end, neither of the above solutions worked, but both have merits. When we pulled the data out of mysql and cast it to bytes we have a number of missing character bytes (67). This made it impossible to decompress on the java side. As for the answers above. Mark is correct that gzcompress() uses zlib and therefore you should use the Inflater() class in Java.
Joop is correct that the data conversion is faulty. Our table was too large to convert it to varbinary or blob. That may have solved the problem, but didn't work for us. We ended up having java make a request to our PHP app, then simply unpacked the compressed data on the PHP side. This worked well. Hopefully this is helpful to anyone else that stumbles across it.

Related

Java UDP - Sending a string array from server to client

Hi and thanks in advance,
So I'm trying to take an array of JList items and convert them to a string array (which I think I've gotten right), and then I'm trying to send that string array over to my client who will then attempt to display them back into a JList on their side.
I've tried a few different things but none are working.
Here is my latest code attempt to send the string array over:
String[] FilesList = (String[]) lClient1Files.getSelectedValues();
FilesBuffer = FilesList.getBytes();
DatagramPacket DGPFilesResponse = new DatagramPacket(FilesBuffer,FilesBuffer.length, DGP.getAddress(), DGP.getPort());
SeederSocket.send(DGPFilesResponse);
The line: FilesBuffer = FilesList.getBytes(); is causing the issue because getBytes() isn't applicable here.
So my questions are:
1) How do I send the array of JList items(they are names) over to the client (it doesn't particularly have to be a string array), and
2) How would I receive the list on the clients side, so that I can use it?
Thank you.
One must make a binary format for the string array.
ByteArrayOutputStream baos = new ByteArrayOutputStream();
try (DataOutputStream dos = new DataOutputStream(baos)) {
dos.writeInt(filesList.length);
for (String files : filesList) {
dos.writeUTF(files);
}
}
byte[] bytes = baos.toByteArray();
This internally for a String writes first the length in bytes, and uses String.getBytes("UTF-8") so any string can be written.
Reading goes with the reversed input classes.
If you think of having many clients out there, maybe with different versions,
then add in the message a version number.
On the other side
ByteArrayInputStream bais = new ByteArrayInputStream(bytes);
try (DataInputStream dis = new DataInputStream(baos)) {
int stringsCount = dis.readInt();
String[] filesList = new String[stringsCount];
for (int i = 0; i < stringsCount; ++i) {
filesList[i] = dis.readUTF();
}
return filesList;
}
The UDP payload has to be a byte[]. You need to choose a way to encode your data into a byte[], such that it can be converted back at the receiving end.
So you need to write encode() and decode() so that unit tests like this work:
#Test
public void encodesAndDecodesStringArray() {
String[] strings = new String[] { "foo", "bar" };
byte[] encoded = Encoder.encode(strings);
String[] decoded = Encoder.decode(encoded);
assertThat(decoded, is(strings));
}
There are literally hundreds of encoding schemes you could choose from. Delimiter-separated, length-separated, JSON, XML, BSON, ASN.1 ... take a look at Wikipedia's List of data serialization formats.
A very simple option that might work for you is delimiter-separation:
public byte[] encode(String[] strings) {
return String.join(",", strings).getBytes(UTF_8);
}
public String[] decode(byte[] encodedArray) {
return new String(encodedArray, UTF_8).split(",");
}
But note that this very basic scheme fails if any of the input strings contains a "," (or whatever delimiter you choose). Pick a scheme that works for you.
Consider using JSON -- there are easy to use libraries to read and write JSON. Readable ASCII in network traces is often convenient. The space overhead is not that high. It's ready for arbitrarily complex hierarchical data structures.
Consider that if you change the structure of the data produced by your sender, the receiver must also change. If that matters, consider encoding a protocol version into what you send (it might be enough to just say "the first two bytes are the version", and always stick a 0x0001 in there to start with).

How to convert a String-represented ByteBuffer into a byte array in Java

I'm new to Java and I'm no sure how to do the following:
A Scala application somewhere converts a String into bytes:
ByteBuffer.wrap(str.getBytes)
I collect this byte array as a Java String, and I wish to do the inverse of what the Scala code above did, hence get the original String (object str above).
Getting the ByteBuffer as a String to begin with is the only option I have, as I'm reading it from an AWS Kinesis stream (or is it?). The Scala code shouldn't change either.
Example string:
String str = "AAAAAAAAAAGZ7dFR0XmV23BRuufU+eCekJe6TGGUBBu5WSLIse4ERy9............";
How can this be achieved in Java?
EDIT
Okay, so I'll try to elaborate a little more about the process:
A 3rd party Scala application produces CSV rows which I need to consume
Before storing those rows in an AWS Kinesis stream, the application does the following to each row:
ByteBuffer.wrap(output.getBytes);
I read the data from the stream as a string, and the string could look like the following one:
String str = "AAAAAAAAAAGZ7dFR0XmV23BRuufU+eCekJe6TGGUBBu5WSLIse4ERy9............";
I need to restore the contents of the string above into its original, readable, form;
I hope I've made it clearer now, sorry for puzzling you all to begin with.
If you want to go from byte[] to String, try new String(yourBytes).
Both getBytes and the String(byte[]) uses the default character encoding.
From Amazon Kinesis Service API Reference:
The data blob to put into the record, which is Base64-encoded when the blob is serialized.
You need to base64 decode the string. Using Java 8 it would look like:
byte[] bytes = Base64.getDecoder().decode("AAAAAAAAAAGZ7dFR0XmV23BR........");
str = new String(bytes, "utf-8"));
Other options: Base64 Encoding in Java
I m not sure if I understand the question exactly but do you mean this?
String decoded = new String(bytes);
public static void main(String[] args){
String decoded = new String(bytesData);
String actualString;
try{
actualString = new String(bytesData,"UTF-8");
System.out.printLn("String is" + actualString);
}catch(UnsupportedEncodingException e){
e.printstacktrace();
}
}
Sorry,wrong answer.
Again,ByteBuffer is a java class. SO they may work the same way
You need java version..
From kafka ApiUtils:
def writeShortString(buffer:ByteBuffer,string:String){
if(String == null){
buffer.putShort(-1)
}
else{
val encodedString = string.getBytes(“utf-8”)
if(encodedString.length > Short.MaxValue){
throw YourException(Your Message)
else{
buffer.putShort(encodedString.length.asInstanceOf[Short])
buffer.put(encodedString)
}
}
}
For Kinesis data blobs:
private CharsetDecoder decoder = Charset.forName("UTF-8").newDecoder();
decoder.decode(record.getData()).toString();

Invalid info_hash (Java BitTorrent client)

according to the specification: http://wiki.theory.org/BitTorrentSpecification
info_hash: urlencoded 20-byte SHA1 hash of the value of the info key from the Metainfo file. Note that the value will be a bencoded dictionary, given the definition of the info key above.
torrentMap is my dictionary, I get the info key which is another dictionary, I calculate the hash and I URLencode it.
But I always get an invalid info_hash message when I try to send it to the tracker.
This is my code:
public String GetInfo_hash() {
String info_hash = "";
ByteArrayOutputStream bos = new ByteArrayOutputStream();
ObjectOutput out = null;
try {
out = new ObjectOutputStream(bos);
out.writeObject(torrentMap.get("info"));
byte[] bytes = bos.toByteArray(); //Map => byte[]
MessageDigest md = MessageDigest.getInstance("SHA1");
info_hash = urlencode(md.digest(bytes)); //Hashing and URLEncoding
out.close();
bos.close();
} catch (Exception ex) { }
return info_hash;
}
private String urlencode(byte[] bs) {
StringBuffer sb = new StringBuffer(bs.length * 3);
for (int i = 0; i < bs.length; i++) {
int c = bs[i] & 0xFF;
sb.append('%');
if (c < 16) {
sb.append('0');
}
sb.append(Integer.toHexString(c));
}
return sb.toString();
}
This is almost certainly the problem:
out = new ObjectOutputStream(bos);
out.writeObject(torrentMap.get("info"));
What you're going to be hashing is the Java binary serialization format of the value of torrentMap.get("info"). I find it very hard to believe that all BitTorrent programs are meant to know about that.
It's not immediately clear to me from the specification what the value of the "info" key is meant to be, but you need to work out some other way of turning it into a byte array. If it's a string, I'd expect some well-specified encoding (e.g. UTF-8). If it's already binary data, then use that byte array directly.
EDIT: Actually, it's sounds like the value will be a "bencoded dictionary" as per your quote, which looks like it will be a string. Quite how you're meant to encode that string (which sounds like it may include values which aren't in ASCII, for example) before hashing it is up for grabs. If your sample strings are all ASCII, then using "ASCII" and "UTF-8" as the encoding names for String.getBytes(...) will give the same result anyway, of course...

Corrupt Gzip string due to character encoding

I have some corrupted Gzip log files that I'm trying to restore. The files were transfered to our servers through a Java backed web page. The files have always been sent as plain text, but we recently started to receive log files that were Gzipped. These Gzipped files appear to be corrupted, and are not unzip-able, and the originals have been deleted. I believe this is from the character encoding in the method below.
Is there any way to revert the process to restore the files to their original zipped format? I have the resulting Strings binary array data in a database blob.
Thanks for any help you can give!
private String convertStreamToString(InputStream is) throws IOException {
/*
* To convert the InputStream to String we use the
* Reader.read(char[] buffer) method. We iterate until the
* Reader return -1 which means there's no more data to
* read. We use the StringWriter class to produce the string.
*/
if (is != null) {
Writer writer = new StringWriter();
char[] buffer = new char[1024];
try {
Reader reader = new BufferedReader(
new InputStreamReader(is, "UTF-8"));
int n;
while ((n = reader.read(buffer)) != -1) {
writer.write(buffer, 0, n);
}
} finally {
is.close();
}
return writer.toString();
} else {
return "";
}
}
If this is the method that was used to convert the InputStream to a String, then your data is almost certainly lost.
The problem is that UTF-8 has quite a few byte sequences that are simply not legal (i.e. they don't represent any value). These sequences will be replaced with the Unicode replacement character.
That character is the same no matter which invalid byte sequence was decoded. Therefore the specific information in those bytes is lost.
If that's the code you have you never should have converted to a Reader (or in fact a String). Only preserving as a Stream (or byte array) would avoid corrupting binary files. And once it's read into the string....illegal sequences (and there are many in utf-8) WILL be discarded.
So no, unless you are quite lucky, there is no way to recover the info. You'll have to provide another process where you process the pure stream and insert as a pure BLOB not a CLOB

Fetch attachment content using javamail

I am using javamail to automate some email handling.
I managed to get a connection to the pop3 server and fetch the messages. Some of them contains an attachment. Base on the email title I am able to "predict" the filename of the attachment that I need to fetch.
But I can't get its content :(
I have a function
public byte[] searchForContent(Part part,String fileName){
if(part.getFileName()!=null){
if(part.getFileName().equals(fileName)){
byte[] content = new byte[part.getSize()];
part.getInputStream().read(content);
return content[]
}
}
return null;
}
The function works very well (ie: return content only if the part was the attachment described by fileName). But the array its returns is too big.
The downloaded attachment is 256 bytes long and the function return a 352 bytes long content.
I think that the problem comes from the headers, but I can't be sure.
How would you proceed to get the content only ?
Thank you.
For what it's worth, the API documentation for javax.mail.Part.getSize() says
Note that the size may not be an exact measure of the content size and may or may not account for any transfer encoding of the content. The size is appropriate for display in a user interface to give the user a rough idea of the size of this part.
Assuming that you know the type of content, you can probably fetch it from Part.getContent() and process that. getContent() returns an Object, but if you know the content type you can cast it appropriately (e.g. to a String for text).
I finally found a solution.
As eaj told : part.getSize() returns the size of the part object. Not the size of the attachment itself.
However the InputStream returned by part.getInputStream() contains the content of the attachment only.
So the code below does give the expected result :
public byte[] searchForContent(Part part,String fileName){
if(part.getFileName()!=null){
if(part.getFileName().equals(fileName)){
InputStream stream=part.getInputStream();
byte[] buffer = new byte[512];
ByteArrayOutputStream baos = new ByteArrayOutputStream();
int bytesRead;
while ((bytesRead = stream.read(buffer) != -1)
{
baos.write(buffer, 0, bytesRead);
}
return baos.toByteArray();
}
}
return null;
}

Categories

Resources