Custom JAXB validation error messages

Custom JAXB validation error messages - java

I am writing a small application which will be used for validating xml-files, correct them (if possible) and then perform tests on the contents. The end users will have very little knowledge of XML and parsing, so I want to catch the validation errors and then write my own event and error handler that produces error messages that are hopefully easier to understand for the end users.
I based my initial attempt on the solution detailed in this blogpost.
So far I have classified the errors based on the contents of event.getMessage(). Unfortunately, without knowing all types of parsing errors that may occur, it is more or less impossible to write good custom error messages. Is there a good way to find what types of error messages that can occur during validation?
I.e. I am looking for a listing of all messages, e.g. The content of element X is not complete one of ..., Invalid content was found starting with element Y..., Value Z is not facet-valid with respect to pattern...
Or is there some better way to do this?

The format of and number of messages will depend on which XML Schema validator you've plugged into JAXB. If you're using one based on Xerces-J (like the one included in Oracle's JDK) most of the messages will be prefixed with an identifier corresponding to a validation rule/constraint in the XML Schema specification (such as cvc-maxLength-valid). The list of identifiers for the XML Schema validation rules are available here in the specification. The full list of XML schema related error messages produced by Xerces can be found in its XMLSchemaMessages.properties message file, but keep in mind that this has changed over time and will depend on which version of Xerces you're using.

Related

Unable to unmarshal strange XML format using Java and JAXB

I need to retrieve financial data using the Open Financial Exchange (OFX) protocol. In order to do this, I am using JAXB to marshal an object tree into an XML string that specifies data request parameters, and then I am sending this XML string to a bank's server. The bank then responds with an XML string containing the requested data, which I unmarshal into an object tree using JAXB. For the first couple of banks I tried, I received the data back in well-formed XML that conformed to the published OFX schema, and I was able to unmarshal it easily using JAXB.
However, when I requested data from Citigroup, they sent me back the following:
OFXHEADER:100
DATA:OFXSGML
VERSION:102
SECURITY:NONE
ENCODING:USASCII
CHARSET:1252
COMPRESSION:NONE
OLDFILEUID:NONE
NEWFILEUID:NONE
<OFX>
<SIGNONMSGSRSV1>
<SONRS>
<STATUS>
<CODE>0
<SEVERITY>INFO
</STATUS>
<DTSERVER>20150513180826.000
<LANGUAGE>ENG
<FI>
<ORG>Citigroup
<FID>24909
</FI>
</SONRS>
</SIGNONMSGSRSV1>
</OFX>
Note that this is an abbreviated form of the actual output, but it is enough to illustrate the problem. The problem is that I cannot figure out how to use JAXB to unmarshal this content. It is not well-formed XML because (1) it doesn't have an XML header, (2) the custom processing instructions (the first nine lines above) are not enclosed in <?...?> tags, and (3) most importantly, the simpleTypes have only opening tags but no closing tags.
I have searched all over for an answer to this and found a similar XML-ish format in a couple of places, and one of those places indicated that this may even be a valid format for sending XML over the web. But I haven't found any information that can help me unmarshal it or parse it.
Does anyone have any suggestions? I am usually pretty resourceful when it comes to these types of problems (hence why this is my first question on here), but this one has me stumped. Thanks in advance for any help you can provide.

Your basic problem is that the input you show here is not XML, it's SGML (see DATA:OFXSGML). You will have to preprocess it to make it acceptable to an XML parser. The kind of preprocessing you have to do will be application specific, as there's no general mechanism to deal well with that. If you have the SGML DTD, you might be able to get a product such as omnimark to "mostly" fix it up.

Well , maybe you need to handle this bank services in some other manner, for example when you receive data from this bank maybe read the Stream and maybe try to undetify the beggining of tag and then the end of (read line by line link)the rest of the stream ..free will . After that the string that remains is the XML that you need , so pass it through your already implemented JAXB code.

How to get HL72.6 ORC-21 in hapi2.1

I am trying to get the value in ORC-21:
//--------------
ORC orcObj = messageObj.getCOMMON_ORDER().getORC();
String result = orcObj.getOrc21_OrderingFacilityName(0).getOrganizationName().getValue();
//--------------
But it turns out that I have to put the ORC field between PID and FT1, as a "global ORC". Otherwise, the return is null.
Does anyone know how to fix this? I use PipeParser()

HL7 standard is mainly based on the correct order of segments in a message and fields in a segment.
What type of message are You trying to parse? Chances are, that the HL7 standard specifies the message order and HAPI in its object model follows the standard precisely. Any non-standard segments or unexpected segments in wrong order are ingored by the parser.
If You are dealing with a 3rd party source of messages and You have no way of making the input messages standards compliant, then You might have to modify the existing HAPI message types to accept Your own order of segments. A simple example is available on HAPI website - have a look!. Based on this example I added custom Z-segment mappings in an application I developed recently. It might also help You!

Validation with rule engine

We work with messages that are text-based (no XML). Our goal is to validate the messages, a message is valid if the content is correct. We developed our own language defined in XML to express rules on the message. We need to add more complex rules and we think that it’s now time to look at other alternative and use real rules engine. We support these types of rules:
name in a list of values or a regular
expression ex {SMITH, MOORE, A*}
name is present in the message-
name is not present in the message
if condition then name = John else name = Jane
Note that the condition is simple and does not contain any logical operators.
We need to support these types of rules:
if then else but the condition contains logical operators
for ... loop :
For all the customers in the message we want at least one from the USA and at least one from France
For all the customers in the message we want at least five that are from the USA and are buying more than $1000 a year
For any customer with name John, the last name must be Doe
Total customers with name John < 15
The name of the company is equal to the name of the company in another location in the message
The rules will depend on the type of messages we process. So we were investigating several existing solutions like:
JESS
OWL (Consistency checking)
Schematron (by transforming the message in XML)
What would be the best alternatives considering that we develop in Java? Another thing to consider is that we should be able to do error reporting like error description, error location (line and column number).

It sounds to me like you're on the right track already; my suggestions are:
Inspect your text-based messages directly with a parser/interpreter and apply rules over the generated objects. #Kdeveloper has suggested JavaCC for generating parser/interpreters, and I can add to this by personally vouching for ANTLRv3 which is an excellent environment for generating parser/interpreter/transformers in Java (amongst other languages). From there, you could use Jess or some other Java rules engine to validate the objects you generate. You could possibly also try encoding your rules into a parser/interpreter directly, but I'd advise against this and instead opt for separating the rules out to keep the parsing and semantic validation steps separate.
Transforming your text-based messages to XML to apply Schematron is also another viable option, but you'll obviously need to parse your text messages to get them into XML anyway. For this, I'd still suggest looking at JavaCC or ANTLRv3, and perhaps populating a pre-determined object model which can be marshaled to XML (such as that which can be generated by Castor or JAXB from a W3C XML Schema). From there, you can apply Schematron over the resulting XML.
I'd argue that transforming to OWL the trickiest option of your suggestions, but could be the most powerful. To start with, you'll probably want an ontology terminology (TBox) (the classes, properties, etc). to map your instance data (ABox) into. From there, consistency checking will only get you so far; many of the kinds of constraints you've outlined as wanting to capture simply can't be represented in OWL and validated using a DL-reasoner alone. However, if you couple your OWL ontology with SWRL rules (for example), you have a chance of capturing much of the types of rules you've outlined. Look at the types of rules and built-ins available in SWRL to see if this is expressive enough for you. If it is, you can employ the use of DL-Reasoners with SWRL support such as Pellet or HermiT. Note that individual implementations of OWL/SWRL reasoners such as these may implement more or less of the W3C specification, so you'll need to inspect each to determine their applicability.

If you're rules are static (i.e. known at compile time) you could make this with well known Java parser generator: JavaCC.

How to get the line number an xml element is on via the Java w3c dom api

Is there a way to lookup the line number that a given element is at in an xml file via the w3c dom api?
My use case for this is that we have 30,000+ maps in kml/xml format. I wrote a unit test that iterates over each file found on the hard drive (about 17GB worth) and tests that it is parseable by our application. When it fails I throw an exception that contains the element instance that was considered "invalid". In order for our mapping department (nobody here knows how to program) to easily track down the typo we would like to log the line number of the element that caused the exception.
Can anybody suggest a way to do this? Please note we are using the W3C dom api included in the Android 1.6 SDK.

I'm not sure whether the Android API is different, but a normal Java application could catch a SAXParseException when parsing and look at the line number.

I may be wrong, but the line number shouldn't be relevant to your XML parser/reader as long as the XML structure itself is valid.
You might try to extrapolate the line-number programatically on the assumption that each node/content must be on a distinct line but it's going to be tricky.

It looks like you're validating your XML files. That is, you're not interested in whether the documents are syntactically correct ("well-formed"), but if they are semantically valid for your application. The right tool for this would be a validating XML parser, coupled with a dedicated XML scheme. See for example this tutorial on XML validation in Java. Validation errors will usually contain detailed error information, including the line number of problematic elements.

validating xml in java as the document is built

I am working on converting an excel spread sheet into an xml document that needs to be validated against a schema. I am currently building the xml document using the DOM api, and validating at the end using SAX and a custom error handler. However, I would really like to be able to validate the xml produced from each Cell as I parse the excel document so I can indicate which cells are problematic in a friendlier way.
The problem that I am currently encountering, is that after validating the xml for the simple types, once they are built into a complex type, all the children nodes get validated again, producing redundant errors.
I found this question here at SO but it is using C# and the Microsoft API.
Thoughts? Thanks!

Sorry, but I don't see the problem. You are producing the XML, so what's the point in validating the XML while you produce it?
Are you looking to validate the cell contents? If yes, then write validation logic into your code. This validation logic may replicate the schema, but I suspect that it will actually be much more detailed than the schema.
Are you looking to validate your program's output? If yes, then write unit tests.

You could try having your parsing code fire SAX events instead of directly constructing a DOM. Then you could just register a validating SAX ContentHandler to listen to it and have that build your DOM for you. That should detect validation errors as they're encountered.

So the solution that I decided to go with and am almost finished implementing, was to use XSOM to parse the XSD. Than when parsing the Excel file, I looked up the column name in the parsed XSD to pull out the restrictions (since the column headers map to simple types in the XSD) and than did manual validation against the restrictions. I am still building the tree so that at the end of it I can validate the entire XML tree against the XSD since there are some things that I can't catch at the Cell level.
Thanks for all of your input.

Try building schemas at multiple levels of granularity. Test the simple (Cells) ones against the most granular, and the complex ones (Rows?) against a less granular schema that doesn't decompose the complex types.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.