Unable to unmarshal strange XML format using Java and JAXB

Unable to unmarshal strange XML format using Java and JAXB - java

I need to retrieve financial data using the Open Financial Exchange (OFX) protocol. In order to do this, I am using JAXB to marshal an object tree into an XML string that specifies data request parameters, and then I am sending this XML string to a bank's server. The bank then responds with an XML string containing the requested data, which I unmarshal into an object tree using JAXB. For the first couple of banks I tried, I received the data back in well-formed XML that conformed to the published OFX schema, and I was able to unmarshal it easily using JAXB.
However, when I requested data from Citigroup, they sent me back the following:
OFXHEADER:100
DATA:OFXSGML
VERSION:102
SECURITY:NONE
ENCODING:USASCII
CHARSET:1252
COMPRESSION:NONE
OLDFILEUID:NONE
NEWFILEUID:NONE
<OFX>
<SIGNONMSGSRSV1>
<SONRS>
<STATUS>
<CODE>0
<SEVERITY>INFO
</STATUS>
<DTSERVER>20150513180826.000
<LANGUAGE>ENG
<FI>
<ORG>Citigroup
<FID>24909
</FI>
</SONRS>
</SIGNONMSGSRSV1>
</OFX>
Note that this is an abbreviated form of the actual output, but it is enough to illustrate the problem. The problem is that I cannot figure out how to use JAXB to unmarshal this content. It is not well-formed XML because (1) it doesn't have an XML header, (2) the custom processing instructions (the first nine lines above) are not enclosed in <?...?> tags, and (3) most importantly, the simpleTypes have only opening tags but no closing tags.
I have searched all over for an answer to this and found a similar XML-ish format in a couple of places, and one of those places indicated that this may even be a valid format for sending XML over the web. But I haven't found any information that can help me unmarshal it or parse it.
Does anyone have any suggestions? I am usually pretty resourceful when it comes to these types of problems (hence why this is my first question on here), but this one has me stumped. Thanks in advance for any help you can provide.

Your basic problem is that the input you show here is not XML, it's SGML (see DATA:OFXSGML). You will have to preprocess it to make it acceptable to an XML parser. The kind of preprocessing you have to do will be application specific, as there's no general mechanism to deal well with that. If you have the SGML DTD, you might be able to get a product such as omnimark to "mostly" fix it up.

Well , maybe you need to handle this bank services in some other manner, for example when you receive data from this bank maybe read the Stream and maybe try to undetify the beggining of tag and then the end of (read line by line link)the rest of the stream ..free will . After that the string that remains is the XML that you need , so pass it through your already implemented JAXB code.

Related

xml validate allowed values in java

I have a question what is the best way to validate XML against XSD. I need to validate allowed values in XML, which can be easily done in XSD using enumeration. Problem is, that the list of allowed values is quite big and do this in XSD could be paintfull. Another thing is, that allowed values can be changed from time to time, so I would like to avoid changing XSD schema. I was thinking to filter this values by using java. E.g. to make some config files for each XML tag filled with values and when validating XML, values would be checked. If content of XML tag is not in config file, error would be raised.
My another question is, which parser is the best to do this? XML file has arround 40 XML elements/tags, one XML file could have around 40k records.
And my last question is, how can I change english language of errors which are default in parser? I have read some tutorials which parser to use, but your experiences would be really helpfull. Thank you
example of values:
<order>pancake</order>
<order>milk</order>
Pancake is allowed value, so no error is raised. Milk is not allowed, so the error would be raised: Milk is not allowed.

read this JAXB Turorial to see how to convert from and to xml

Thrift - converting from simple JSON

I created the following Thrift Object:
struct Student{
1: string id;
2: string firstName;
3: string lastName
}
Now I would like to read this object from JSON. According to this post this is possible
So I wrote the following code:
String json = "{\"id\":\"aaa\",\"firstName\":\"Danny\",\"lastName\":\"Lesnik\"}";
StudentThriftObject s = new StudentThriftObject();
byte[] jsonAsByte = json.getBytes("UTF-8");
TMemoryBuffer memBuffer = new TMemoryBuffer(jsonAsByte.length);
memBuffer.write(jsonAsByte);
TProtocol proto = new TJSONProtocol(memBuffer);
s.read(proto);
What I'm getting is the following exception:
Exception in thread "main" org.apache.thrift.protocol.TProtocolException: Unexpected character:i
at org.apache.thrift.protocol.TJSONProtocol.readJSONSyntaxChar(TJSONProtocol.java:322)
at org.apache.thrift.protocol.TJSONProtocol.readJSONInteger(TJSONProtocol.java:698)
at org.apache.thrift.protocol.TJSONProtocol.readFieldBegin(TJSONProtocol.java:837)
at com.vanilla.thrift.example.entities.StudentThriftObject$StudentThriftObjectStandardScheme.read(StudentThriftObject.java:486)
at com.vanilla.thrift.example.entities.StudentThriftObject$StudentThriftObjectStandardScheme.read(StudentThriftObject.java:479)
at com.vanilla.thrift.example.entities.StudentThriftObject.read(StudentThriftObject.java:413)
at com.vanilla.thrift.controller.Main.main(Main.java:24)
Am I missing something?

You are missing the fact, that Thrift's JSON is different from yours. The field names are not written, instead the assigned field ID numbers are written (and expected). Here's an example for Thrift's JSON protocol:
[1,"MyService",2,1,{"1":{"rec":{"1":{"str":"Error: Process() failed"}}}}]
In other words, Thrift is not intended to parse any kind of JSON. It supports a very specific JSON format as one of the possible transports.
However, depending on what the origin of your JSON data is, Thrift can possibly still help you out, if you are able to use it on both sides. In that case, write an IDL to describe the data structures, feed it to the Thrift compiler and integrate both the generated code and the neccessary parts of the library with your projects.
If the origin of the JSON lies outside of your reach, or if the JSON format cannot be changed for some reason, you need to find another way.
Format and semantics are different beasts
To some extent, the whole issue can be compared with XML: There is one general XML syntax, which tells us how we have to fomat things so any standard conformant XML processor can read them.
But knowing the rules of XML is only half the answer, if we get a certain XML file from someone. Even if our XML parser can read the file successfully, because it is well-formed XML, we need to know the semantics of the data to really make use of what's within that file: Is it a customer data record? Or is it a SOAP envelope? Maybe a configuration file?
That is where DTDs or XML Schema come into play, they exist to describe the contents of the XML data. Without knowing the logical structure you are lost, because there are myriads of possible ways to express things in XML. And exactly the same is true with JSON, except that JSON schema descriptions are less commonly used.
"So you mean, we need just a way to tell Thrift how the JSON is organized?"
No, because the purpose and idea behind Thrift is to have a framework to de/serialize things and/or implement RPC servers and clients as efficiently as possible. It is not intended to have a general purpose file parser. Instead, Thrift reads and speaks only its own set of formats, which are plugged into the architecture as protocols: Thrift Binary, Thrift JSON, Thrift Compact, and a few more.
What you could do: In addition to what I said at in the first section of my answer, you may consider writing your own custom Thrift protocol implementation to support your particular JSON format of choice. It is not that hard, and worth a try.

How to get the line number an xml element is on via the Java w3c dom api

Is there a way to lookup the line number that a given element is at in an xml file via the w3c dom api?
My use case for this is that we have 30,000+ maps in kml/xml format. I wrote a unit test that iterates over each file found on the hard drive (about 17GB worth) and tests that it is parseable by our application. When it fails I throw an exception that contains the element instance that was considered "invalid". In order for our mapping department (nobody here knows how to program) to easily track down the typo we would like to log the line number of the element that caused the exception.
Can anybody suggest a way to do this? Please note we are using the W3C dom api included in the Android 1.6 SDK.

I'm not sure whether the Android API is different, but a normal Java application could catch a SAXParseException when parsing and look at the line number.

I may be wrong, but the line number shouldn't be relevant to your XML parser/reader as long as the XML structure itself is valid.
You might try to extrapolate the line-number programatically on the assumption that each node/content must be on a distinct line but it's going to be tricky.

It looks like you're validating your XML files. That is, you're not interested in whether the documents are syntactically correct ("well-formed"), but if they are semantically valid for your application. The right tool for this would be a validating XML parser, coupled with a dedicated XML scheme. See for example this tutorial on XML validation in Java. Validation errors will usually contain detailed error information, including the line number of problematic elements.

validating xml in java as the document is built

I am working on converting an excel spread sheet into an xml document that needs to be validated against a schema. I am currently building the xml document using the DOM api, and validating at the end using SAX and a custom error handler. However, I would really like to be able to validate the xml produced from each Cell as I parse the excel document so I can indicate which cells are problematic in a friendlier way.
The problem that I am currently encountering, is that after validating the xml for the simple types, once they are built into a complex type, all the children nodes get validated again, producing redundant errors.
I found this question here at SO but it is using C# and the Microsoft API.
Thoughts? Thanks!

Sorry, but I don't see the problem. You are producing the XML, so what's the point in validating the XML while you produce it?
Are you looking to validate the cell contents? If yes, then write validation logic into your code. This validation logic may replicate the schema, but I suspect that it will actually be much more detailed than the schema.
Are you looking to validate your program's output? If yes, then write unit tests.

You could try having your parsing code fire SAX events instead of directly constructing a DOM. Then you could just register a validating SAX ContentHandler to listen to it and have that build your DOM for you. That should detect validation errors as they're encountered.

So the solution that I decided to go with and am almost finished implementing, was to use XSOM to parse the XSD. Than when parsing the Excel file, I looked up the column name in the parsed XSD to pull out the restrictions (since the column headers map to simple types in the XSD) and than did manual validation against the restrictions. I am still building the tree so that at the end of it I can validate the entire XML tree against the XSD since there are some things that I can't catch at the Cell level.
Thanks for all of your input.

Try building schemas at multiple levels of granularity. Test the simple (Cells) ones against the most granular, and the complex ones (Rows?) against a less granular schema that doesn't decompose the complex types.

How do you format the body of a JMS text message?

Does everyone just use XML in the message? Are there any good alternatives to XML? If you do use XML, do you define an XML Schema so clients know how to send messages to your service?

We use XML, but I think the important thing is to tailor the solution to the problem. The reason we use XML is that we are basically sending an object across in the message. There's no reason it can't be plain text, if applicable for the message you are sending, using headers to send along properties if appropriate.
We haven't defined an XSD or DTD for our XML messages, but we do have a formal document describing their composition so that other teams can use our feeds without bugging us.

XML, CSV, HTML, a simple word or sentence, ... Any of these are valid depending on the context in which the message is used and created. Just keep it simple and send what is needed in that context.
It is very flexible and can be adapted to the problem space.

XML is probably the most popular along with JSON a close second - but as others have said in this thread - XML, CSV, JSON or even HTML are fine.
XSDs are overrated really - their only real value is if you want your clients/customers to code generate marshalling code (e.g. using JAXB) or if you want to let folks use XSDs in their editors / IDE to get smart completion

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.