I am working on converting an excel spread sheet into an xml document that needs to be validated against a schema. I am currently building the xml document using the DOM api, and validating at the end using SAX and a custom error handler. However, I would really like to be able to validate the xml produced from each Cell as I parse the excel document so I can indicate which cells are problematic in a friendlier way.
The problem that I am currently encountering, is that after validating the xml for the simple types, once they are built into a complex type, all the children nodes get validated again, producing redundant errors.
I found this question here at SO but it is using C# and the Microsoft API.
Thoughts? Thanks!
Sorry, but I don't see the problem. You are producing the XML, so what's the point in validating the XML while you produce it?
Are you looking to validate the cell contents? If yes, then write validation logic into your code. This validation logic may replicate the schema, but I suspect that it will actually be much more detailed than the schema.
Are you looking to validate your program's output? If yes, then write unit tests.
You could try having your parsing code fire SAX events instead of directly constructing a DOM. Then you could just register a validating SAX ContentHandler to listen to it and have that build your DOM for you. That should detect validation errors as they're encountered.
So the solution that I decided to go with and am almost finished implementing, was to use XSOM to parse the XSD. Than when parsing the Excel file, I looked up the column name in the parsed XSD to pull out the restrictions (since the column headers map to simple types in the XSD) and than did manual validation against the restrictions. I am still building the tree so that at the end of it I can validate the entire XML tree against the XSD since there are some things that I can't catch at the Cell level.
Thanks for all of your input.
Try building schemas at multiple levels of granularity. Test the simple (Cells) ones against the most granular, and the complex ones (Rows?) against a less granular schema that doesn't decompose the complex types.
Related
Background
I have a situation where I can get data either in the form of an XML-file or Excel/CSV-files. In case the data comes in a non-XML format it will be divided into several different files/tables, representing different subsections of the XML. The end goal is to validate the data and generate a valid XML-file using an existing schema, regardless of the format of the indata.
When receiving an XML-file the idea is to unmarshall and validate it. For simple errors autmatic fixes will be applied, and in the end a new XML-file will be marshalled from the JAXB classes.
Question
In order to be able to generalize as much as possible of the solution, my idea was to try to generate a JAXB representation of the non-XML data too, and then generate the end XML-file from those classes. I have been trying to find a good tutorial or introduction to converting non-XML to a JAXB representation, but I haven't really been able to find anything useful, which makes me wonder, is this a really bad approach? Any better suggestions for how to solve this problem? In the majority of the cases the files are likely to be non-XML, so I am willing to throw out the current approach if anyone has better solution that uses some other technology.
I've worked before with univocity parsers. They work well and are simple to use to converting CSV to Java object which then you searialize using JAXB as well.
I need to retrieve financial data using the Open Financial Exchange (OFX) protocol. In order to do this, I am using JAXB to marshal an object tree into an XML string that specifies data request parameters, and then I am sending this XML string to a bank's server. The bank then responds with an XML string containing the requested data, which I unmarshal into an object tree using JAXB. For the first couple of banks I tried, I received the data back in well-formed XML that conformed to the published OFX schema, and I was able to unmarshal it easily using JAXB.
However, when I requested data from Citigroup, they sent me back the following:
OFXHEADER:100
DATA:OFXSGML
VERSION:102
SECURITY:NONE
ENCODING:USASCII
CHARSET:1252
COMPRESSION:NONE
OLDFILEUID:NONE
NEWFILEUID:NONE
<OFX>
<SIGNONMSGSRSV1>
<SONRS>
<STATUS>
<CODE>0
<SEVERITY>INFO
</STATUS>
<DTSERVER>20150513180826.000
<LANGUAGE>ENG
<FI>
<ORG>Citigroup
<FID>24909
</FI>
</SONRS>
</SIGNONMSGSRSV1>
</OFX>
Note that this is an abbreviated form of the actual output, but it is enough to illustrate the problem. The problem is that I cannot figure out how to use JAXB to unmarshal this content. It is not well-formed XML because (1) it doesn't have an XML header, (2) the custom processing instructions (the first nine lines above) are not enclosed in <?...?> tags, and (3) most importantly, the simpleTypes have only opening tags but no closing tags.
I have searched all over for an answer to this and found a similar XML-ish format in a couple of places, and one of those places indicated that this may even be a valid format for sending XML over the web. But I haven't found any information that can help me unmarshal it or parse it.
Does anyone have any suggestions? I am usually pretty resourceful when it comes to these types of problems (hence why this is my first question on here), but this one has me stumped. Thanks in advance for any help you can provide.
Your basic problem is that the input you show here is not XML, it's SGML (see DATA:OFXSGML). You will have to preprocess it to make it acceptable to an XML parser. The kind of preprocessing you have to do will be application specific, as there's no general mechanism to deal well with that. If you have the SGML DTD, you might be able to get a product such as omnimark to "mostly" fix it up.
Well , maybe you need to handle this bank services in some other manner, for example when you receive data from this bank maybe read the Stream and maybe try to undetify the beggining of tag and then the end of (read line by line link)the rest of the stream ..free will . After that the string that remains is the XML that you need , so pass it through your already implemented JAXB code.
I am new to this validation process in Java...
-->XML file named Validation Limits
-->Structure of the XML
parameter /parameter
lowerLimit /lowerLimit
upperLimit /upperLimit
enable /enable
-->Depending the the enable status, 'true or false', i must perform the validation process for the respective parameter
--> what could be the best possible method to perform this operation...
I have parsed the xml (DOM) [forgot this to mention earlier] and stored the values in the arrays but is complicated with lot of referencing that take place from one array to another. If any better method that could replace array procedure will be helpful
Thank you in advance.
Try using a DOM or SAX parser, they will do the parsing for you. You can find some good, free tutorials in the internet.
The difference between DOM and SAX is as follows: DOM loads the XML into a tree structure which you can browse through (i.e. the whole XML is loaded), whereas SAX parses the document and triggers events (calls methods) in the process. Both have advantages and disadvantages, but personally, for reasonably sized XML files, I would use DOM.
So, in your case: use DOM to get a tree of your XML document, locate the attribute, see other elements depending on it.
Also, you can achieve this in pure XML, using XML Schema, although this might be too much for simple needs.
Is there a way to lookup the line number that a given element is at in an xml file via the w3c dom api?
My use case for this is that we have 30,000+ maps in kml/xml format. I wrote a unit test that iterates over each file found on the hard drive (about 17GB worth) and tests that it is parseable by our application. When it fails I throw an exception that contains the element instance that was considered "invalid". In order for our mapping department (nobody here knows how to program) to easily track down the typo we would like to log the line number of the element that caused the exception.
Can anybody suggest a way to do this? Please note we are using the W3C dom api included in the Android 1.6 SDK.
I'm not sure whether the Android API is different, but a normal Java application could catch a SAXParseException when parsing and look at the line number.
I may be wrong, but the line number shouldn't be relevant to your XML parser/reader as long as the XML structure itself is valid.
You might try to extrapolate the line-number programatically on the assumption that each node/content must be on a distinct line but it's going to be tricky.
It looks like you're validating your XML files. That is, you're not interested in whether the documents are syntactically correct ("well-formed"), but if they are semantically valid for your application. The right tool for this would be a validating XML parser, coupled with a dedicated XML scheme. See for example this tutorial on XML validation in Java. Validation errors will usually contain detailed error information, including the line number of problematic elements.
I have a requirement where i need to generate html forms on the fly based on many different xml schema's (as of now i have 20 of them and the count keeps increasing). I need to collect data from the user to create instance docs corresponding to each of them and then store the instance docs in db....
challenges
1) schema has lot of unbounded complex types. so we doesnt know in advance the number and type of input types to be created. so pre-creating html etc is not an option
2) even if i can handle generation of the form on the fly, the problem is collecting the data entered..as forms generated dynamically should/will have dynamic id/names for input types
Can anyone suggest the best way to implement this?
thank you in advance
It seems to me like a clear case for XSLT.
Generating HTML from XML through XSLT is the primary goal of XSLT.
As for the id/names, you can create an XSLT which will also generate a set of id/names in a way that you can use.
Use WSDL2XForms to create XForms from XML Schemas (XSD). Then publish them with Chiba (chiba.sourceforge.net) - it converts these XForms to standard HTML forms on the server side.
The Google Code project xsd-forms seems to be a promising approach.
A XQuery-based translator from XSD to XForms is available at http://en.wikibooks.org/wiki/XRX/XForms_Generator.
I don't know much about that one: http://nunojob.wordpress.com/2008/01/05/creating-a-user-interface-for-xml-schema-using-xforms/. Seems to be a presentation only.
We had a problem somewhat like this. One of our team thought that we ought to be able to create a web form UI on the fly to accept data conforming to an XSD. It turned out that this is very difficult ... given all the complexity of full XSD. So we ended up inventing our own schema language (which was both simpler and richer than XSD) and using this as the basis for generating our UI layouts. We also implemented a tool-chain for creating and validating the schemas and for generating equivalent XSDs and OWL schemas.