HexBinary byte order - java

The first google link about HexBinary for XML shows link to this article.
There are an example of HexBinary representation:
"<?xml version="1.0" encoding="UTF-8"?>"
"3f3c6d78206c657673726f693d6e3122302e20226e656f636964676e223d54552d4622383e3f"
But when I try to implement encoding myself or use online converter, I receive next string:
"3C3F786D6C2076657273696F6E3D22312E302220656E636F64696E673D225554462D38223F3E"
Difference in byte order 3f3c vs 3c3f. It cannot be UTF-16 problem because it will produce something like 3c003f00 or 003c003f.
So, my question: is it problem in article (book from O'Reilly) or there are special rules for xsd:hexBinary which require to invert byte order?

There are no rules that tell you to invert the byte order.
The XSD specification for hexBinary data type is just:
hexBinary represents arbitrary hex-encoded binary data.
Note the word arbitrary - it doesn't tell you how to encode the data, or that you need to swap every two bytes after the encoding is done.
This is an error in the book. It seems likely the author of the book just copied and pasted text into a hex editor or converter, and didn't notice that with default settings it displays data in units of 2 bytes in little-endian byte order. This is an easy mistake to make if you're not used to working with binary data.
It doesn't look like this errata has been reported. If you want to, you can submit an errata to https://www.oreilly.com/catalog/errata.csp?isbn=9780596004217

Related

Converting Java byte array to Buffer in Node.js

In an Android app I have a byte array containing data in the following format:
In another Node.js server, the same data is stored in a Buffer which looks like this:
I am looking for a way to convert both data to the same format so I can compare the two and check if they are equal. What would be the best way to approach this?
[B#cbf1911 is not a format. That is the result of invoking the .toString() method on a java object which doesn't have a custom toString implementation (thus, you get the default implementation written in java.lang.Object itself. The format of that string is:
binary-style-class-name#system-identity-hashcode.
[B is the binary style class name. That's JVM-ese for byte[].
cbf1911 is the system identity hashcode, which is (highly oversimplified and not truly something you can use to look stuff up) basically the memory address.
It is not the content of that byte array.
Lots of java APIs allow you to pass in any object and will just invoke the toString for you. Where-ever you're doing this, you wrote a bug; you need to write some explicit code to turn that byte array into data.
Note that converting bytes into characters, which you'll have to do whenever you need to put that byte array onto a character-based comms channel (such as JSON or email), is tricky.
<Buffer 6a 61 ...>
This is listing each byte as an unsigned hex nibble. This is an incredibly inefficient format, but it gets the job done.
A better option is base64. That is merely highly inefficient (but not incredibly inefficient); it spends 4 characters to encode 3 bytes (vs the node.js thing which spends 3 characters to encode 1 byte). Base64 is a widely supported standard.
When encoding, you need to explicitly write that. When decoding, same story.
In java, to encode:
import android.util.Base64;
class Foo {
void example() {
byte[] array = ....;
String base64 = Base64.encodeToString(array, Base64.DEFAULT);
System.out.println(base64);
}
}
That string is generally 'safe' - it has no characters in it that could end up being interpreted as control flow (so no <, no ", etc), and is 100% ASCII which tends to survive broken charset encoding transitions, which are common when tossing strings about the interwebs.
How do you decode base64 in node? I don't know, but I'm sure a web search for 'node base64 decode' will provide hundreds of tutorials.
Good luck!

Problems parsing iso8583 message

I've download code from here https://github.com/vikrantlabde/iso8583-Java and after some modifications I'm parsing almost fine my fields....
I defined the schema like this:
ISOSCHEMA.put("1","BITMAP");
ISOSCHEMA.put("2","NUM-2-19-0_0");
ISOSCHEMA.put("3","NUMERIC-0-6-0_0");
ISOSCHEMA.put("4","NUMERIC-0-12-0_0");
ISOSCHEMA.put("7","NUMERIC-0-10-0_0");
ISOSCHEMA.put("11","NUMERIC-0-6-0_0");
ISOSCHEMA.put("12","NUMERIC-0-6-0_0");
ISOSCHEMA.put("13","NUMERIC-0-4-0_0");
ISOSCHEMA.put("22","NUMERIC-0-3-0_0");
ISOSCHEMA.put("23","NUMERIC-0-3-0_0");
ISOSCHEMA.put("35","NUM-2-37-0_0");
ISOSCHEMA.put("41","FCHAR-0-8-0_0");
ISOSCHEMA.put("49","FCHAR-0-3-0_0");
ISOSCHEMA.put("55","NUM-3-999-0_0");
The problem is the field 55 that is a binary field. The standard documentation says it:
55 Reserved ISO B 255 LLLVAR (ISO DOCUMENTATION)
I'm having an error parsing a string that has the bitmap turned on for the field 55.
I'm having from the output:
820200409F36020004950500000000009A031409039C01005F2A0209789F02060000000005009F03060000000000009F10201F430200200000000000000000045895000000000000000000000000000000009F260840D26C4BA5577CFB9F2701809F370443DD7E879F1A0202509F3303E0B0C8
But I expect:
820200409F36020004950500000000009A031409039C01005F2A0201249F02060000000005009F03060000000000009F10201F430200200000000000000000045895000000000000000000000000000000009F260840D26C4BA5577CFB9F2701809F370443DD7E879F1A0202509F3303E0B0C8
The length of the iso payload converted is highly different too...
The program output is:
303130307238060020C280C28200313636353433323131313232333334343535303030303030303030303030303030303031313031363138333432363030323339343133333433303130313630373130303133373635343332313131323233333434353564333131303232303030393238333030313031303238343031373430393132343233303832303230303430394633363032303030343935303530303030303030303030394130333134303930333943303130303546324130323039373839463032303630303030303030303035303039463033303630303030303030303030303039463130323031463433303230303230303030303030303030303030303030303034353839353030303030303030303030303030303030303030303030303030303030303030394632363038343044323643344241353537374346423946323730313830394633373034343344443745383739463141303230323530394633333033453042304338
What I expect is:
30313030723806002080820031363635343332313131323233333434353530303030303030303030303030303030303131303136313833343236303032333934313333343330313031363037313030313337363534333231313132323333343435353D33313130323230303039323833303031303130323834303137343039313234313135820200409F36020004950500000000009A031409039C01005F2A0201249F02060000000005009F03060000000000009F10201F430200200000000000000000045895000000000000000000000000000000009F260840D26C4BA5577CFB9F2701809F370443DD7E879F1A0202509F3303E0B0C8
One advice is:
I have to make the explicit conversion to hex from the resultant byte[] and viceversa.
It is:
String isoMessage = ISOUtil.hexString(packIsoMsg("0100", isofields).getBytes());
And:
unpackIsoMsg(new String(ISOUtil.hex2byte(isoMessage), "UTF-8"));
What about the definition of this kind of fields in this class? I'm really a newbie with the standard but I arrived here because jpos doesn't work in an Android environment. Also I'm confused with the last mentioned conversion to hex.
Any help is really appreciated...
Kind regards.
DE55 is defined as a Tag-Lag-Value (TLV) field that is not in the normal Binary / text / or numeric packed format you see the rest of ISO-8583 messages typically but is in ASN.1 BER-TLV / X.690-0207 format.
Unless you account for the BER-TLV you will not successfully unpack DE55 unless it is for non-EMV/Tokenization purposes. It threw me at first as well as I was thinking something more straight forward. Be aware that sometimes the field transport format is actually longer in this format than the original plain text or other binary data output so it is not the most efficient.
There are a couple of other fields depending on the ISO specification may also use BER-TLV but DE55 is the industry standard field to use BER-TLV for EMV functionality replacing DE55's previous use as a generic and rarely used 'fee field'.
The ISO-7816 specification its ISO-8583 use in detail for EMV and Tokenization, as well there are other references and quick guides out there if you are just looking for something not so in-depth. All volumes of the ISO-7816 specification can be found openly on the internet for free, or can be purchased directly (spendy) from the ISO organization if you want them in the plain ISO format.
I am not familiar with the specific JAVA Git you referenced but this one has a help page on how to use BER-TLV. Oracle also has a page on dealing with BER-TLV here. BinaryFoo has a Git available as well.
For the purposes of initial testing, if your data is just test data (DO NOT USE PRODUCTION DATA!) you can use http://www.emvlab.org/tlvutils/ to verify your results. Which when I input your inputs it kicks out your expected output.
what is the field defined by sender in field 55.Assign the same in unpacking.If they are sending string it should be LLLVAR.
when sending ISO message header must be in hex format.hence they convert bytes to hex.
From the looks of it, you consider 5F2A020978 as wrong whereas you expect 5F2A020124. The EMV tag's 5F2A data (with length 02) is the transaction currency code. This means your transaction is performed in EURO currency instead of Canadian dollar as you expect. You can find a currency code list here.
Hope this helps.

Byte length difference when retrieving from WebSphere MQ Message

In Java, I am polling a WebSphere MQ message queue, expecting a message of `STRING format, that is composed entirely of XML. Part of this XML will contain bytes to a file attachment (any format: pdf, image, etc) which will then be converted to a blob for storage in an Oracle Db and later retrieval.
The issue I am having is that the known size of example files being sent over end up in my Db with a different size. I am not adding anything to the bytes (as far as I know), and the size appears to be larger directly after I get the message. I cannot determine if I am somehow adding information at retrieve, conversion from bytes -> String, or if this is happening on the front end when the sender populates the message.
My code at retrieve of the message:
inboundmsg = new MQMessage();
inboundmsg = getMQMsg(FrontIncomingQueue, gmo);
strLen = inboundmsg.getMessageLength();
strData = new byte[strLen];
ibm_id = inboundmsg.messageId;
inboundmsg.readFully(strData);
inboundmsgContents = new String(strData);
I see a file known to have size 21K go to 28K. A coworker has suggested that charset/encoding may be the issue. I do not specify a charset in the constructor call to String above, nor in any of the calls to getBytes when converting back from a string (for other unrelated uses). My default charset is ISO-8859-1. When speaking with the vendor who is initiating the message transfer, I asked her what charset she is using. Her reply:
"I am using the File.WriteAllBytes method in C# - I pass it the path of my file and it writes it to a byte[]. I could not find any documentation on the MSDN about what encoding the function uses. The method creates a byte array and from what I have read online this morning there is no encoding, its just a sequence of 8bit unsigned binary data with no encoding."
Another coworker has suggested that perhaps the MQ charset is the culprit, but my reading of the documentation suggests that MQ charset only affects the behavior of readString, readLine, & writeString.
If I circumvent MQ totally, and populate a byte array using a File Input Stream and a local file, the file size is preserved all the way to Db store, so this definitely appears to be happening at or during message transfer.
The problem is evident in the wording of the question. You describe a payload that contains arbitrary binary data and also trying to process it as a string. These two things are mutually exclusive.
This appears to be complicated by the vendor not providing valid XML. For example, consider the attachment:
<PdfBytes>iVBORw0KGgoAAAANS … AAAAASUVORK5CYII=</PdfBytes>
If the attachment legitimately contains any XML special character such as < or > then the result is invalid XML. If it contains null bytes, some parsers assume they have reached the end of the text and stop parsing there. That is why you normally see any attachment in XML either converted to Base64 for transport or else converted to hexadecimal.
The vendor describes writing raw binary data which suggests that what you are receiving contains non-string characters and therefore should not be sent as string data. If she had described some sort of conversion that would make the attachment XML compliant then string would be appropriate.
Interestingly, a Base64 encoding results in a payload that is 1.33 times larger than the original. Coincidence that 21k * 1.3 = 28k? One would think that what is received is actually the binary payload in Base64 format. That actually would be parseable as a string and accounts for the difference in file sizes. But it isn't at all what the vendor described doing. she said she's writing "8bit unsigned binary data with no encoding" and not Base64.
So we expect it to fail but not necessarily to result in a larger payload. Consider that WebSphere MQ receiving a message in String format will attempt to convert it. If the CCSID of the message differs from that requested on the GET then MQ will attempt a conversion. If the inbound CCSID is UTF-16 or any double-byte character set, certain characters will be expanded from one to two bytes - assuming the conversion doesn't hit invalid binary characters that cause it to fail.
If the two CCSIDs are the same then no conversion is attempted in the MQ classes but there is still an issue in that something has to parse an XML payload that is by definition not valid and therefore subject to unexpected results. If it happens that the binary payload does not contain any XML special characters and the parser doesn't choke on any embedded null bytes, then the parser is going to rather heroic lengths to forgive the non-compliant payload. If it gets to the </PdfBytes> tag without choking, it may assume that the payload is valid and convert everything between the <PdfBytes>...</PdfBytes> tags itself. Presumably to Base64.
All of this is conjecture, of course. But in a situation where the payload is unambiguously not string data any attempt to parse it as string data will either fail outright or produce unexpected and potentially bizarre results. You are actually unfortunate that it doesn't fail outright because now there's an expectation that the problem is on your end when it clearly appears to be the vendor's fault.
Assuming that the content of the payload remains unchanged, the vendor should be sending bytes messages and you should be receiving them as bytes. That would at least fix the problems MQ is having reconciling the expected format with the actual received format, but it would still be invalid XML. If it works that the vendor sends binary data in a message set to type String with you processing it as bytes then count your blessings and use it that way but don't count on it being reliable. Eventually you'll get a payload with an embedded XML special character and then you will have a very bad day.
Ideally, the vendor should know better than to send binary data in an XML payload without converting it first to string and it is up to them to fix it so that it is compliant with the XML spec and reliable.
Please see this MSDN page: XML, SOAP, and Binary Data

Can I add a binary file to a String based server message queue?

I have a multi-threaded client-server application that uses Vector<String> as a queue of messages to send.
I need, however, to send a file using this application. In C++ I would not really worry, but in Java I'm a little confused when converting anything to string.
Java has 2 byte characters. When you see Java string in HEX, it's usually like:
00XX 00XX 00XX 00XX
Unless some Unicode characters are present.
Java also uses Big endian.
These facts make me unsure, whether - and eventually how - to add the file into the queue. Preferred format of the file would be:
-- Headers --
2 bytes Size of the block (excluding header, which means first four bytes)
2 bytes Data type (text message/file)
-- End of headers --
2 bytes Internal file ID (to avoid referring by filenames)
2 bytes Length of filename
X bytes Filename
X bytes Data
You can see I'm already using 2 bytes for all numbers to avoid some horrible operations required when getting 2 numbers out of one char.
But I have really no idea how to add the file data correctly. For numbers, I assume this would do:
StringBuilder packetData = new StringBuilder();
packetData.append((char) packetSize);
packetData.append((char) PacketType.BINARY.ordinal()); //Just convert enum constant to number
But file is really a problem. If I have also described anything wrongly regarding the Java data types please correct me - I'm a beginner.
Does it have to send only Strings? I think if it does then you really need to encode it using base64 or similar. The best approach overall would probably be to send it as raw bytes. Depending on how difficult it would be to refactor your code to support byte arrays instead of just Strings, that may be worth doing.
To answer your String question I just saw pop up in the comments, there's a getBytes method on a String.
For the socket question, see:
Java sending and receiving file (byte[]) over sockets

Best Practice for Writing Data Using protobuf

We need our protobuf messages to contain as little data as possible. So what are the best practices we can follow in order to gain the maximum out of it. As an example writing byte[] as a String or ByteString ? What makes the difference? And adding a list of Integers as a repeated list or something else ?
As an example writing byte[] as a String or ByteString ?
If you want to write binary data, use a bytes fields (so ByteString). A string field is UTF-8-encoded text, so can't be used for all possible byte sequences.
And adding a list of integers as a repeated list or something else ?
Yes, use a repeated list - but with the [packed=true] option.
Basically, look over the whole encoding documentation and work out what's most appropriate for you. In particular, choose carefully between the various numeric representations, based on what your actual data will be. (If you're writing 32-bit values which are typically very large, consider using the fixed32 format instead of just int32 for example.)

Categories

Resources