Byte length difference when retrieving from WebSphere MQ Message

Byte length difference when retrieving from WebSphere MQ Message - java

In Java, I am polling a WebSphere MQ message queue, expecting a message of `STRING format, that is composed entirely of XML. Part of this XML will contain bytes to a file attachment (any format: pdf, image, etc) which will then be converted to a blob for storage in an Oracle Db and later retrieval.
The issue I am having is that the known size of example files being sent over end up in my Db with a different size. I am not adding anything to the bytes (as far as I know), and the size appears to be larger directly after I get the message. I cannot determine if I am somehow adding information at retrieve, conversion from bytes -> String, or if this is happening on the front end when the sender populates the message.
My code at retrieve of the message:
inboundmsg = new MQMessage();
inboundmsg = getMQMsg(FrontIncomingQueue, gmo);
strLen = inboundmsg.getMessageLength();
strData = new byte[strLen];
ibm_id = inboundmsg.messageId;
inboundmsg.readFully(strData);
inboundmsgContents = new String(strData);
I see a file known to have size 21K go to 28K. A coworker has suggested that charset/encoding may be the issue. I do not specify a charset in the constructor call to String above, nor in any of the calls to getBytes when converting back from a string (for other unrelated uses). My default charset is ISO-8859-1. When speaking with the vendor who is initiating the message transfer, I asked her what charset she is using. Her reply:
"I am using the File.WriteAllBytes method in C# - I pass it the path of my file and it writes it to a byte[]. I could not find any documentation on the MSDN about what encoding the function uses. The method creates a byte array and from what I have read online this morning there is no encoding, its just a sequence of 8bit unsigned binary data with no encoding."
Another coworker has suggested that perhaps the MQ charset is the culprit, but my reading of the documentation suggests that MQ charset only affects the behavior of readString, readLine, & writeString.
If I circumvent MQ totally, and populate a byte array using a File Input Stream and a local file, the file size is preserved all the way to Db store, so this definitely appears to be happening at or during message transfer.

The problem is evident in the wording of the question. You describe a payload that contains arbitrary binary data and also trying to process it as a string. These two things are mutually exclusive.
This appears to be complicated by the vendor not providing valid XML. For example, consider the attachment:
<PdfBytes>iVBORw0KGgoAAAANS … AAAAASUVORK5CYII=</PdfBytes>
If the attachment legitimately contains any XML special character such as < or > then the result is invalid XML. If it contains null bytes, some parsers assume they have reached the end of the text and stop parsing there. That is why you normally see any attachment in XML either converted to Base64 for transport or else converted to hexadecimal.
The vendor describes writing raw binary data which suggests that what you are receiving contains non-string characters and therefore should not be sent as string data. If she had described some sort of conversion that would make the attachment XML compliant then string would be appropriate.
Interestingly, a Base64 encoding results in a payload that is 1.33 times larger than the original. Coincidence that 21k * 1.3 = 28k? One would think that what is received is actually the binary payload in Base64 format. That actually would be parseable as a string and accounts for the difference in file sizes. But it isn't at all what the vendor described doing. she said she's writing "8bit unsigned binary data with no encoding" and not Base64.
So we expect it to fail but not necessarily to result in a larger payload. Consider that WebSphere MQ receiving a message in String format will attempt to convert it. If the CCSID of the message differs from that requested on the GET then MQ will attempt a conversion. If the inbound CCSID is UTF-16 or any double-byte character set, certain characters will be expanded from one to two bytes - assuming the conversion doesn't hit invalid binary characters that cause it to fail.
If the two CCSIDs are the same then no conversion is attempted in the MQ classes but there is still an issue in that something has to parse an XML payload that is by definition not valid and therefore subject to unexpected results. If it happens that the binary payload does not contain any XML special characters and the parser doesn't choke on any embedded null bytes, then the parser is going to rather heroic lengths to forgive the non-compliant payload. If it gets to the </PdfBytes> tag without choking, it may assume that the payload is valid and convert everything between the <PdfBytes>...</PdfBytes> tags itself. Presumably to Base64.
All of this is conjecture, of course. But in a situation where the payload is unambiguously not string data any attempt to parse it as string data will either fail outright or produce unexpected and potentially bizarre results. You are actually unfortunate that it doesn't fail outright because now there's an expectation that the problem is on your end when it clearly appears to be the vendor's fault.
Assuming that the content of the payload remains unchanged, the vendor should be sending bytes messages and you should be receiving them as bytes. That would at least fix the problems MQ is having reconciling the expected format with the actual received format, but it would still be invalid XML. If it works that the vendor sends binary data in a message set to type String with you processing it as bytes then count your blessings and use it that way but don't count on it being reliable. Eventually you'll get a payload with an embedded XML special character and then you will have a very bad day.
Ideally, the vendor should know better than to send binary data in an XML payload without converting it first to string and it is up to them to fix it so that it is compliant with the XML spec and reliable.
Please see this MSDN page: XML, SOAP, and Binary Data

Related

HexBinary byte order

The first google link about HexBinary for XML shows link to this article.
There are an example of HexBinary representation:
"<?xml version="1.0" encoding="UTF-8"?>"
"3f3c6d78206c657673726f693d6e3122302e20226e656f636964676e223d54552d4622383e3f"
But when I try to implement encoding myself or use online converter, I receive next string:
"3C3F786D6C2076657273696F6E3D22312E302220656E636F64696E673D225554462D38223F3E"
Difference in byte order 3f3c vs 3c3f. It cannot be UTF-16 problem because it will produce something like 3c003f00 or 003c003f.
So, my question: is it problem in article (book from O'Reilly) or there are special rules for xsd:hexBinary which require to invert byte order?

There are no rules that tell you to invert the byte order.
The XSD specification for hexBinary data type is just:
hexBinary represents arbitrary hex-encoded binary data.
Note the word arbitrary - it doesn't tell you how to encode the data, or that you need to swap every two bytes after the encoding is done.
This is an error in the book. It seems likely the author of the book just copied and pasted text into a hex editor or converter, and didn't notice that with default settings it displays data in units of 2 bytes in little-endian byte order. This is an easy mistake to make if you're not used to working with binary data.
It doesn't look like this errata has been reported. If you want to, you can submit an errata to https://www.oreilly.com/catalog/errata.csp?isbn=9780596004217

Java decompress HTTP GZIP content from json attribute

We are working with packetbeat, a network packet analyzer tool to capture http requests and http responses. Packebeat persists this packet events in json format. The problem comes when the server supports gzip compression, packetbeat could not unzip content and save it directly the gzip content as json attribute. As you can see (Note: json has been simplified);
{
{
... ,
"content-type":"application/json;charset=UTF-8",
"transfer-encoding":"chunked",
"content-length":6347,
"x-application-context":"proxy-service:pre,native:8080",
"content-encoding":"gzip",
"connection":"keep-alive",
"date":"Mon, 18 Dec 2017 07:18:23 GMT"
},
"body": "\u001f\ufffd\u0008\u0000\u0000\u0000\u0000\u0000\u0000\u0003\ufffd]k\ufffd\u0014Ǳ\ufffd/\ufffdYI\ufffd#\ufffd*\ufffdo\ufffd\ufffd\ufffd\u0002\t\u0010^\ufffd\u001c\u000eE=\ufffd{\ufffdb\ufffd\ufffdE\ufffd\ufffdC\ufffd\ufffdf\ufffd,\ufffd\u003e\ufffd\ufffd\ufffd\u001ef\u001a\u0008\u0005\ufffd\ufffdg\ufffd\ufffd\ufffdYYU\ufffd\ufffd;\ufffdoN\ufffd\ufffd\ufffdg\ufffd\u0011UdK\ufffd\u0015\u0015\ufffdo\u000eH\ufffd\u000c\u0015Iq\ndC\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd ... "
}
We are thinking in preprocess packet json files to unzip content. Could someone tell me what i need to decompress zipped "body" json attribute using java?

Your data is irrecoverably broken. Generally I would suggest using Base64 encoding for transferring binary data packed into JSON, but you can read about possible alternatives in Binary Data in JSON String. Something better than Base64 if you like experimenting.
Otherwise, in theory you could just use a variant of String.getBytes() to get an array of bytes, and wrap the result into the mentioned (in the other answer) streams:
byte bodyBytes[]=body.getBytes();
ByteArrayInputStream bais=new ByteArrayInputStream(bodyBytes);
GZipInputStream gis=new GZipInputStream(bais);
<do something with gis here, perhaps use an additional DataInputStream>
Apart from the String-thing (which is usually not a good idea), this is how you unpack a gzip-compressed array of bytes.
However valid gzip data starts with a magic number 0x1F,0x8B (see Wikipedia, or you can also dig up the actual specification). Your data starts with 0x1F (the \u001F part), but continues with a \ufffd Unicode character, which a replacement character (see Wikipedia again).
Some tool was encoding the binary data and did not like the 0x8B, most probably because it was >=0x80. If you further read in your JSON, there are many \ufffd-s in it, all values above (or equal to) 0x80 have been replaced with this. So the data at the moment is irrecoverably broken even if JSON would support raw binary data inside (but it does not).

In Java you can use the GZIPInputStream class to decode the GZIP data, I think you would need to turn the value into a ByteArrayInputStream first.

Problems parsing iso8583 message

I've download code from here https://github.com/vikrantlabde/iso8583-Java and after some modifications I'm parsing almost fine my fields....
I defined the schema like this:
ISOSCHEMA.put("1","BITMAP");
ISOSCHEMA.put("2","NUM-2-19-0_0");
ISOSCHEMA.put("3","NUMERIC-0-6-0_0");
ISOSCHEMA.put("4","NUMERIC-0-12-0_0");
ISOSCHEMA.put("7","NUMERIC-0-10-0_0");
ISOSCHEMA.put("11","NUMERIC-0-6-0_0");
ISOSCHEMA.put("12","NUMERIC-0-6-0_0");
ISOSCHEMA.put("13","NUMERIC-0-4-0_0");
ISOSCHEMA.put("22","NUMERIC-0-3-0_0");
ISOSCHEMA.put("23","NUMERIC-0-3-0_0");
ISOSCHEMA.put("35","NUM-2-37-0_0");
ISOSCHEMA.put("41","FCHAR-0-8-0_0");
ISOSCHEMA.put("49","FCHAR-0-3-0_0");
ISOSCHEMA.put("55","NUM-3-999-0_0");
The problem is the field 55 that is a binary field. The standard documentation says it:
55 Reserved ISO B 255 LLLVAR (ISO DOCUMENTATION)
I'm having an error parsing a string that has the bitmap turned on for the field 55.
I'm having from the output:
820200409F36020004950500000000009A031409039C01005F2A0209789F02060000000005009F03060000000000009F10201F430200200000000000000000045895000000000000000000000000000000009F260840D26C4BA5577CFB9F2701809F370443DD7E879F1A0202509F3303E0B0C8
But I expect:
820200409F36020004950500000000009A031409039C01005F2A0201249F02060000000005009F03060000000000009F10201F430200200000000000000000045895000000000000000000000000000000009F260840D26C4BA5577CFB9F2701809F370443DD7E879F1A0202509F3303E0B0C8
The length of the iso payload converted is highly different too...
The program output is:
303130307238060020C280C28200313636353433323131313232333334343535303030303030303030303030303030303031313031363138333432363030323339343133333433303130313630373130303133373635343332313131323233333434353564333131303232303030393238333030313031303238343031373430393132343233303832303230303430394633363032303030343935303530303030303030303030394130333134303930333943303130303546324130323039373839463032303630303030303030303035303039463033303630303030303030303030303039463130323031463433303230303230303030303030303030303030303030303034353839353030303030303030303030303030303030303030303030303030303030303030394632363038343044323643344241353537374346423946323730313830394633373034343344443745383739463141303230323530394633333033453042304338
What I expect is:
30313030723806002080820031363635343332313131323233333434353530303030303030303030303030303030303131303136313833343236303032333934313333343330313031363037313030313337363534333231313132323333343435353D33313130323230303039323833303031303130323834303137343039313234313135820200409F36020004950500000000009A031409039C01005F2A0201249F02060000000005009F03060000000000009F10201F430200200000000000000000045895000000000000000000000000000000009F260840D26C4BA5577CFB9F2701809F370443DD7E879F1A0202509F3303E0B0C8
One advice is:
I have to make the explicit conversion to hex from the resultant byte[] and viceversa.
It is:
String isoMessage = ISOUtil.hexString(packIsoMsg("0100", isofields).getBytes());
And:
unpackIsoMsg(new String(ISOUtil.hex2byte(isoMessage), "UTF-8"));
What about the definition of this kind of fields in this class? I'm really a newbie with the standard but I arrived here because jpos doesn't work in an Android environment. Also I'm confused with the last mentioned conversion to hex.
Any help is really appreciated...
Kind regards.

DE55 is defined as a Tag-Lag-Value (TLV) field that is not in the normal Binary / text / or numeric packed format you see the rest of ISO-8583 messages typically but is in ASN.1 BER-TLV / X.690-0207 format.
Unless you account for the BER-TLV you will not successfully unpack DE55 unless it is for non-EMV/Tokenization purposes. It threw me at first as well as I was thinking something more straight forward. Be aware that sometimes the field transport format is actually longer in this format than the original plain text or other binary data output so it is not the most efficient.
There are a couple of other fields depending on the ISO specification may also use BER-TLV but DE55 is the industry standard field to use BER-TLV for EMV functionality replacing DE55's previous use as a generic and rarely used 'fee field'.
The ISO-7816 specification its ISO-8583 use in detail for EMV and Tokenization, as well there are other references and quick guides out there if you are just looking for something not so in-depth. All volumes of the ISO-7816 specification can be found openly on the internet for free, or can be purchased directly (spendy) from the ISO organization if you want them in the plain ISO format.
I am not familiar with the specific JAVA Git you referenced but this one has a help page on how to use BER-TLV. Oracle also has a page on dealing with BER-TLV here. BinaryFoo has a Git available as well.
For the purposes of initial testing, if your data is just test data (DO NOT USE PRODUCTION DATA!) you can use http://www.emvlab.org/tlvutils/ to verify your results. Which when I input your inputs it kicks out your expected output.

what is the field defined by sender in field 55.Assign the same in unpacking.If they are sending string it should be LLLVAR.
when sending ISO message header must be in hex format.hence they convert bytes to hex.

From the looks of it, you consider 5F2A020978 as wrong whereas you expect 5F2A020124. The EMV tag's 5F2A data (with length 02) is the transaction currency code. This means your transaction is performed in EURO currency instead of Canadian dollar as you expect. You can find a currency code list here.
Hope this helps.

Can I add a binary file to a String based server message queue?

I have a multi-threaded client-server application that uses Vector<String> as a queue of messages to send.
I need, however, to send a file using this application. In C++ I would not really worry, but in Java I'm a little confused when converting anything to string.
Java has 2 byte characters. When you see Java string in HEX, it's usually like:
00XX 00XX 00XX 00XX
Unless some Unicode characters are present.
Java also uses Big endian.
These facts make me unsure, whether - and eventually how - to add the file into the queue. Preferred format of the file would be:
-- Headers --
2 bytes Size of the block (excluding header, which means first four bytes)
2 bytes Data type (text message/file)
-- End of headers --
2 bytes Internal file ID (to avoid referring by filenames)
2 bytes Length of filename
X bytes Filename
X bytes Data
You can see I'm already using 2 bytes for all numbers to avoid some horrible operations required when getting 2 numbers out of one char.
But I have really no idea how to add the file data correctly. For numbers, I assume this would do:
StringBuilder packetData = new StringBuilder();
packetData.append((char) packetSize);
packetData.append((char) PacketType.BINARY.ordinal()); //Just convert enum constant to number
But file is really a problem. If I have also described anything wrongly regarding the Java data types please correct me - I'm a beginner.

Does it have to send only Strings? I think if it does then you really need to encode it using base64 or similar. The best approach overall would probably be to send it as raw bytes. Depending on how difficult it would be to refactor your code to support byte arrays instead of just Strings, that may be worth doing.
To answer your String question I just saw pop up in the comments, there's a getBytes method on a String.
For the socket question, see:
Java sending and receiving file (byte[]) over sockets

Java: Faster alternative to String(byte[])

I am developing a Java-based downloader for binary data. This data is transferred via a text-based protocol (UU-encoded). For the networking task the netty library is used. The binary data is split by the server into many thousands of small packets and sent to the client (i.e. the Java application).
From netty I receive a ChannelBuffer object every time a new message (data) is received. Now I need to process that data, beside other tasks I need to check the header of the package coming from the server (like the HTTP status line). To do so I call ChannelBuffer.array() to receive a byte[] array. This array I can then convert into a string via new String(byte[]) and easily check (e.g. compare) its content (again, like comparison to the "200" status message in HTTP).
The software I am writing is using multiple threads/connections, so that I receive multiple packets from netty in parallel.
This usually works fine, however, while profiling the application I noticed that when the connection to the server is good and data comes in very fast, then this conversion to the String object seems to be a bottleneck. The CPU usage is close to 100% in such cases, and according to the profiler very much time is spent in calling this String(byte[]) constructor.
I searched for a better way to get from the ChannelBuffer to a String, and noticed the former also has a toString() method. However, that method is even slower than the String(byte[]) constructor.
So my question is: Does anyone of you know a better alternative to achieve what I am doing?

Perhaps you could skip the String conversion entirely? You could have constants holding byte arrays for your comparison values and check array-to-array instead of String-to-String.
Here's some quick code to illustrate. Currently you're doing something like this:
String http200 = "200";
// byte[] -> String conversion happens every time
String input = new String(ChannelBuffer.array());
return input.equals(http200);
Maybe this is faster:
// Ideally only convert String->byte[] once. Store these
// arrays somewhere and look them up instead of recalculating.
final byte[] http200 = "200".getBytes("UTF-8"); // Select the correct charset!
// Input doesn't have to be converted!
byte[] input = ChannelBuffer.array();
return Arrays.equals(input, http200);

Some of the checking you are doing might just look at part of the buffer. If you could use the alternate form of the String constructor:
new String(byteArray, startCol, length)
That might mean a lot less bytes get converted to a string.
Your example of looking for "200" within the message would be an example.
2
You might find that you can use the length of the byte array as a clue. If some messages are long and you are looking for a short one, ignore the long ones and don't convert to characters. Or something like that.
3
Along with what #EricGrunzke said, partially looking in the byte buffer to filter out some messages and find that you don't need to convert them from bytes to characters.
4
If your bytes are ASCII characters, the conversion to characters might be quicker if you use charset "ASCII" instead of whatever the default is for your server:
new String(bytes, "ASCII")
might be faster in that case.
In fact, you might be able to pick and choose the charset for conversion byte-character in some organized fashion that speeds up things.

Depending on what you are trying to do there are a few options:
If you are just trying to get the response status to then can't you just call getStatus()? This would probably be faster than getting the string out.
If you are trying to convert the buffer, then, assuming you know it will be ASCII, which it sounds like you do, then just leave the data as byte[] and convert your UUDecode method to work on a byte[] instead of a String.
The biggest cost of the string conversion is most likely the copying of the data from the byte array to the internal char array of the String, this combined with the conversion is most likely just a bunch of work that you don't need to do.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.