CSV to XML using JAXB? - java

Background
I have a situation where I can get data either in the form of an XML-file or Excel/CSV-files. In case the data comes in a non-XML format it will be divided into several different files/tables, representing different subsections of the XML. The end goal is to validate the data and generate a valid XML-file using an existing schema, regardless of the format of the indata.
When receiving an XML-file the idea is to unmarshall and validate it. For simple errors autmatic fixes will be applied, and in the end a new XML-file will be marshalled from the JAXB classes.
Question
In order to be able to generalize as much as possible of the solution, my idea was to try to generate a JAXB representation of the non-XML data too, and then generate the end XML-file from those classes. I have been trying to find a good tutorial or introduction to converting non-XML to a JAXB representation, but I haven't really been able to find anything useful, which makes me wonder, is this a really bad approach? Any better suggestions for how to solve this problem? In the majority of the cases the files are likely to be non-XML, so I am willing to throw out the current approach if anyone has better solution that uses some other technology.

I've worked before with univocity parsers. They work well and are simple to use to converting CSV to Java object which then you searialize using JAXB as well.

Related

How to convert Badgerfish style JSON to XML in Java?

My Java application takes XML input, and parses it using the simpleframework. I want it to accept JSON as well, therefore I want to convert the JSON to XML.
Tags and attributes are important, therefore I use the Badgerfish convention.
This works well in Python with xmljson, but I can't find a decent package to do this. GSON doesn't seem to have a Badgerfish implementation. This topic doesn't provide any tag/attribute retaining packages, the topic is a bit old as well.
Which Java packages can do the conversion from JSON to XML while putting tags/attributes at the right place?
Suggestions for alternative methods than Badgerfish are welcome as well...
Many thanks in advance!
You can use the XPath 3.1 json-to-xml() function, and then do an XSLT transformation on the generated XML to get it into the format you require.

How to generate xml file by Java?

I have to generate huge and quite complex xml files by Java. I have to fetch the data from a Oracle database. What I really don't know is a proper and reliable way to this? I could of course create a String and concatenate all the tags, attributes and data but it doesn't feel right. I guess this is a quite common task and there are many established ways to this by Java. My question is what is the best way to this? What is your suggestion?
Thank you for any clues...
You could use JAXB for building XML out of structured objects that are a result of querying your data store.
If your object hierarchy is not complex, you can use Oracle's capability to generate results in XML.
There are several options for object to xml transformation.
jaxb
saxparser
dom parser
I would personally suggest JAXB for easy of use and saxparser for performance centric application.
You can use JAXP(Java API for XML parsing ) to create a XML structure.This is having all the features you wanted.

validating xml in java as the document is built

I am working on converting an excel spread sheet into an xml document that needs to be validated against a schema. I am currently building the xml document using the DOM api, and validating at the end using SAX and a custom error handler. However, I would really like to be able to validate the xml produced from each Cell as I parse the excel document so I can indicate which cells are problematic in a friendlier way.
The problem that I am currently encountering, is that after validating the xml for the simple types, once they are built into a complex type, all the children nodes get validated again, producing redundant errors.
I found this question here at SO but it is using C# and the Microsoft API.
Thoughts? Thanks!
Sorry, but I don't see the problem. You are producing the XML, so what's the point in validating the XML while you produce it?
Are you looking to validate the cell contents? If yes, then write validation logic into your code. This validation logic may replicate the schema, but I suspect that it will actually be much more detailed than the schema.
Are you looking to validate your program's output? If yes, then write unit tests.
You could try having your parsing code fire SAX events instead of directly constructing a DOM. Then you could just register a validating SAX ContentHandler to listen to it and have that build your DOM for you. That should detect validation errors as they're encountered.
So the solution that I decided to go with and am almost finished implementing, was to use XSOM to parse the XSD. Than when parsing the Excel file, I looked up the column name in the parsed XSD to pull out the restrictions (since the column headers map to simple types in the XSD) and than did manual validation against the restrictions. I am still building the tree so that at the end of it I can validate the entire XML tree against the XSD since there are some things that I can't catch at the Cell level.
Thanks for all of your input.
Try building schemas at multiple levels of granularity. Test the simple (Cells) ones against the most granular, and the complex ones (Rows?) against a less granular schema that doesn't decompose the complex types.

Error-tolerant XML parsing in Scala

I would like to be able to parse XML that isn't necessarily well-formed. I'd be looking for a fuzzy rather than a strict parser, able to recover from badly nested tags, for example. I could write my own but it's worth asking here first.
Update:
What I'm trying to do is extract links and other info from HTML. In the case of well-formed XML I can use the Scala XML API. In the case of ill-formed XML, it would be nice to somehow convert it into correct XML (somehow) and deal with it the same way, otherwise I'd have to have two completely different sets of functions for dealing with documents.
Obviously because the input is not well-formed and I'm trying to create a well-formed tree, there would have to be some heuristic involved (such as when you see <parent><child></parent> you would close the <child> first and when you then see a <child> you ignore it). But of course this isn't a proper grammar and so there's no correct way of doing it.
What you're looking for would not be an XML parser. XML is very strict about nesting, closing, etc. One of the other answers suggests Tag Soup. This is a good suggestion, though technically it is much closer to a lexer than a parser. If all you want from XML-ish content is an event stream without any validation, then it's almost trivial to roll your own solution. Just loop through the input, consuming content which matches regular expressions along the way (this is exactly what Tag Soup does).
The problem is that a lexer is not going to be able to give you many of the features you want from a parser (e.g. production of a tree-based representation of the input). You have to implement that logic yourself because there is no way that such a "lenient" parser would be able to determine how to handle cases like the following:
<parent>
<child>
</parent>
</child>
Think about it: what sort of tree would expect to get out of this? There's really no sane answer to that question, which is precisely why a parser isn't going to be of much help.
Now, that's not to say that you couldn't use Tag Soup (or your own hand-written lexer) to produce some sort of tree structure based on this input, but the implementation would be very fragile. With tree-oriented formats like XML, you really have no choice but to be strict, otherwise it becomes nearly impossible to get a reasonable result (this is part of why browsers have such a hard time with compatibility).
Try the parser on the XHtml object. It is much more lenient than the one on XML.
Take a look at htmlcleaner. I have used it successfully to convert "HTML from the wild" to valid XML.
Try Tag Soup.
JTidy does something similar but only for HTML.
I mostly agree with Daniel Spiewak's answer. This is just another way to create "your own parser".
While I don't know of any Scala specific solution, you can try using Woodstox, a Java library that implements the StAX API. (Being an even-based API, I am assuming it will be more fault tolerant than a DOM parser)
There is also a Scala wrapper around Woodstox called Frostbridge, developed by the same guy who made the Simple Build Tool for Scala.
I had mixed opinions about Frostbridge when I tried it, but perhaps it is more suitable for your purposes.
I agree with the answers that turning invalid XML into "correct" XML is impossible.
Why don't you just do a regular text search for the hrefs if that's all you're interested in? One issue would be commented out links, but if the XML is invalid, it might not be possible to tell what is intended to be commented out!
Caucho has a JAXP compliant XML parser that is a little bit more tolerant than what you would usually expect. (Including support for dealing with escaped character entity references, AFAIK.)
Find JavaDoc for the parsers here
A related topic (with my solution) is listed below:
Scala and html parsing

web form generation out of xml schema

I have a requirement where i need to generate html forms on the fly based on many different xml schema's (as of now i have 20 of them and the count keeps increasing). I need to collect data from the user to create instance docs corresponding to each of them and then store the instance docs in db....
challenges
1) schema has lot of unbounded complex types. so we doesnt know in advance the number and type of input types to be created. so pre-creating html etc is not an option
2) even if i can handle generation of the form on the fly, the problem is collecting the data entered..as forms generated dynamically should/will have dynamic id/names for input types
Can anyone suggest the best way to implement this?
thank you in advance
It seems to me like a clear case for XSLT.
Generating HTML from XML through XSLT is the primary goal of XSLT.
As for the id/names, you can create an XSLT which will also generate a set of id/names in a way that you can use.
Use WSDL2XForms to create XForms from XML Schemas (XSD). Then publish them with Chiba (chiba.sourceforge.net) - it converts these XForms to standard HTML forms on the server side.
The Google Code project xsd-forms seems to be a promising approach.
A XQuery-based translator from XSD to XForms is available at http://en.wikibooks.org/wiki/XRX/XForms_Generator.
I don't know much about that one: http://nunojob.wordpress.com/2008/01/05/creating-a-user-interface-for-xml-schema-using-xforms/. Seems to be a presentation only.
We had a problem somewhat like this. One of our team thought that we ought to be able to create a web form UI on the fly to accept data conforming to an XSD. It turned out that this is very difficult ... given all the complexity of full XSD. So we ended up inventing our own schema language (which was both simpler and richer than XSD) and using this as the basis for generating our UI layouts. We also implemented a tool-chain for creating and validating the schemas and for generating equivalent XSDs and OWL schemas.

Categories

Resources