xml validate allowed values in java - java

I have a question what is the best way to validate XML against XSD. I need to validate allowed values in XML, which can be easily done in XSD using enumeration. Problem is, that the list of allowed values is quite big and do this in XSD could be paintfull. Another thing is, that allowed values can be changed from time to time, so I would like to avoid changing XSD schema. I was thinking to filter this values by using java. E.g. to make some config files for each XML tag filled with values and when validating XML, values would be checked. If content of XML tag is not in config file, error would be raised.
My another question is, which parser is the best to do this? XML file has arround 40 XML elements/tags, one XML file could have around 40k records.
And my last question is, how can I change english language of errors which are default in parser? I have read some tutorials which parser to use, but your experiences would be really helpfull. Thank you
example of values:
<order>pancake</order>
<order>milk</order>
Pancake is allowed value, so no error is raised. Milk is not allowed, so the error would be raised: Milk is not allowed.

read this JAXB Turorial to see how to convert from and to xml

Related

Unable to unmarshal strange XML format using Java and JAXB

I need to retrieve financial data using the Open Financial Exchange (OFX) protocol. In order to do this, I am using JAXB to marshal an object tree into an XML string that specifies data request parameters, and then I am sending this XML string to a bank's server. The bank then responds with an XML string containing the requested data, which I unmarshal into an object tree using JAXB. For the first couple of banks I tried, I received the data back in well-formed XML that conformed to the published OFX schema, and I was able to unmarshal it easily using JAXB.
However, when I requested data from Citigroup, they sent me back the following:
OFXHEADER:100
DATA:OFXSGML
VERSION:102
SECURITY:NONE
ENCODING:USASCII
CHARSET:1252
COMPRESSION:NONE
OLDFILEUID:NONE
NEWFILEUID:NONE
<OFX>
<SIGNONMSGSRSV1>
<SONRS>
<STATUS>
<CODE>0
<SEVERITY>INFO
</STATUS>
<DTSERVER>20150513180826.000
<LANGUAGE>ENG
<FI>
<ORG>Citigroup
<FID>24909
</FI>
</SONRS>
</SIGNONMSGSRSV1>
</OFX>
Note that this is an abbreviated form of the actual output, but it is enough to illustrate the problem. The problem is that I cannot figure out how to use JAXB to unmarshal this content. It is not well-formed XML because (1) it doesn't have an XML header, (2) the custom processing instructions (the first nine lines above) are not enclosed in <?...?> tags, and (3) most importantly, the simpleTypes have only opening tags but no closing tags.
I have searched all over for an answer to this and found a similar XML-ish format in a couple of places, and one of those places indicated that this may even be a valid format for sending XML over the web. But I haven't found any information that can help me unmarshal it or parse it.
Does anyone have any suggestions? I am usually pretty resourceful when it comes to these types of problems (hence why this is my first question on here), but this one has me stumped. Thanks in advance for any help you can provide.
Your basic problem is that the input you show here is not XML, it's SGML (see DATA:OFXSGML). You will have to preprocess it to make it acceptable to an XML parser. The kind of preprocessing you have to do will be application specific, as there's no general mechanism to deal well with that. If you have the SGML DTD, you might be able to get a product such as omnimark to "mostly" fix it up.
Well , maybe you need to handle this bank services in some other manner, for example when you receive data from this bank maybe read the Stream and maybe try to undetify the beggining of tag and then the end of (read line by line link)the rest of the stream ..free will . After that the string that remains is the XML that you need , so pass it through your already implemented JAXB code.

Parsing Invalid XML Characters using XStream parser - Java [duplicate]

This question already has answers here:
How to parse invalid (bad / not well-formed) XML?
(4 answers)
Closed 5 years ago.
I am having a classic XML validation question -
I need to parse incoming XML (from other applications - which don't use proper XML formatter) where
there are Broken Tags and XML Special characters embedded in Data (but not using CDData tag to wrap around)
I am using simple XStream parser to unmarshall the incoming stream as it's simple serialization and not a strict parser. For special characters it throws ConverterException and won't parse the file.
I want to know if there is any other parser which can be used to parse Invalid XML files (special characters etc)
We have no control over what would be sent as Input stream and as a part of auditing application, need to read as much Good records from the incoming file as possible.
Is there a better parsing option available or do I need to write Custom Parser to parse these files?
I am using Spring Batch to do batch processing and XStream(1.x) to parse the XML files.
AS XSD validation is failing, I am wondering even if it's worth to explore other parsers/ Custom parser option..
Looking for your expert opinions on XML Validations..
I understand that you trying to make best of messy input. Unfortunately, since there doesn't seem to be a clear specification of the format of that input, you are actually on your own. An approach could be to first convert the input files to valid XML, which is basically what you would do by writing your own parser. In Java you could do this by reading and parsing the files using your own specialized code and output a standard Java XML interface (SAX, DOM, etc.). But, depending on your knowledge, it may be faster to use a different language specialized in text parsing.
My experience is that the only real long-term solution here is to force the data suppliers to provide valid XML. The reason for this is that, although you can do your best in making valid data out of the invalid data, there is always the risk that your interpretation is wrong. And half-valid data is often worse than no data at all. IMHO it is best to leave the responsibility for correct data at the suppliers.

When using the JAXB_FRAGMENT property, do you need to output the XML Declaration?

I need to output large amounts of data to an XML file using JAXB. My question is a follow-up question to:
Can JAXB Incrementally Marshall An Object?
In Blaise Doughan's answer he stated to first manually write the opening xml tag followed by the repeated elements (which must be root elements) and then the closing tag. His example outputted to the console (System.out) and not to a file. If a FileOutputStream was used instead; what is the best way to ensure the XML Declaration (<?xml version="1.0" encoding="UTF-8" standalone="true"?>) is written to the file before the opening xml tag? I would not think the best answer would be to manually write it as well.
I reviewed the following answer:
How to stream large Files using JAXB Marshaller?
However, I would think JAXB would have a solution to this problem without using an external interface to do so.
If your object model fits into memory and you have a single root object then JAXB can marshal if for you and write out the XML declaration.
If on the other hand you had a large number of objects that wouldn't fit into memory If referenced by a single root object then you need do things differently. You would need to start the document yourself using StAX or an OutputStream/Writer directly and then marshal the objects according then ending the document yourself. With this approach you need to ensure the declaration is written out (StAX will handle this for you).
http://blog.bdoughan.com/2012/08/handle-middle-of-xml-document-with-jaxb.html

How to get the line number an xml element is on via the Java w3c dom api

Is there a way to lookup the line number that a given element is at in an xml file via the w3c dom api?
My use case for this is that we have 30,000+ maps in kml/xml format. I wrote a unit test that iterates over each file found on the hard drive (about 17GB worth) and tests that it is parseable by our application. When it fails I throw an exception that contains the element instance that was considered "invalid". In order for our mapping department (nobody here knows how to program) to easily track down the typo we would like to log the line number of the element that caused the exception.
Can anybody suggest a way to do this? Please note we are using the W3C dom api included in the Android 1.6 SDK.
I'm not sure whether the Android API is different, but a normal Java application could catch a SAXParseException when parsing and look at the line number.
I may be wrong, but the line number shouldn't be relevant to your XML parser/reader as long as the XML structure itself is valid.
You might try to extrapolate the line-number programatically on the assumption that each node/content must be on a distinct line but it's going to be tricky.
It looks like you're validating your XML files. That is, you're not interested in whether the documents are syntactically correct ("well-formed"), but if they are semantically valid for your application. The right tool for this would be a validating XML parser, coupled with a dedicated XML scheme. See for example this tutorial on XML validation in Java. Validation errors will usually contain detailed error information, including the line number of problematic elements.

validating xml in java as the document is built

I am working on converting an excel spread sheet into an xml document that needs to be validated against a schema. I am currently building the xml document using the DOM api, and validating at the end using SAX and a custom error handler. However, I would really like to be able to validate the xml produced from each Cell as I parse the excel document so I can indicate which cells are problematic in a friendlier way.
The problem that I am currently encountering, is that after validating the xml for the simple types, once they are built into a complex type, all the children nodes get validated again, producing redundant errors.
I found this question here at SO but it is using C# and the Microsoft API.
Thoughts? Thanks!
Sorry, but I don't see the problem. You are producing the XML, so what's the point in validating the XML while you produce it?
Are you looking to validate the cell contents? If yes, then write validation logic into your code. This validation logic may replicate the schema, but I suspect that it will actually be much more detailed than the schema.
Are you looking to validate your program's output? If yes, then write unit tests.
You could try having your parsing code fire SAX events instead of directly constructing a DOM. Then you could just register a validating SAX ContentHandler to listen to it and have that build your DOM for you. That should detect validation errors as they're encountered.
So the solution that I decided to go with and am almost finished implementing, was to use XSOM to parse the XSD. Than when parsing the Excel file, I looked up the column name in the parsed XSD to pull out the restrictions (since the column headers map to simple types in the XSD) and than did manual validation against the restrictions. I am still building the tree so that at the end of it I can validate the entire XML tree against the XSD since there are some things that I can't catch at the Cell level.
Thanks for all of your input.
Try building schemas at multiple levels of granularity. Test the simple (Cells) ones against the most granular, and the complex ones (Rows?) against a less granular schema that doesn't decompose the complex types.

Categories

Resources