Using Jackson to parse XML with repeated element - java

I am looking for a simple way to parse an XML structure with a repeated element using Jackson. Here is a simplified example:
<root>
<a>
<x>
<b>some content</b>
<c>some content</c>
</x>
<x>
<b>some content</b>
<c>some content</c>
</x>
...
<x>
<b>some content</b>
<c>some content</c>
</x>
</a>
... some other content ...
</root>
I would like to collect all the x elements in a list or array The problem is that when using something like:
XmlMapper().readTree(xml).get("a").fields()
the result contains only the last instance of x so it looks like some map key gets overwritten. There is a solution using JsonParser
e.g.: XmlMapper().createParser(xml) but it's a bit icky.
Is there a better way?
Edit
The way I read the "Known Limitations" section in the README , XML seems to be a second class citizen in Jackson. I accept #galuszkak 's answer as it is what it is , but to my mind when a developer invokes XmlMapper().readTree(xml) they do not expect XML processing to be shoehorned into JSON processing model with the limitations that come with that approach.

The problem mentioned here is described in this Github issue:
https://github.com/FasterXML/jackson-dataformat-xml/issues/187
Basically what is happening is that Jackson is translating XML tree structure in JsonNode data model and this will not work as it's not supported.
There is 2 options described in that Github issue:
Fully transform this XML to JSON (answer from #cawena on Github)
Or if you know your data structure to just use answer from p0sitron which is:
Code:
List<List<JsonNode>> elA = xmlMapper.readValue(xml, new TypeReference<List<List<JsonNode>>>() { });
assertEquals(3, elA.get(0).size());

Related

Parse XML in which tag name is not fixed

It is easy to parse XML in which tags name are fixed. In XStream, we can simply use #XStreamAlias("tagname") annotation. But how to parse XML in which tag name is not fixed. Suppose I have following XML :
<result>
<result1>
<fixed1> ... </fixed1>
<fixed2> ... </fixed2>
</result1>
<result2>
<item>
<America>
<name> America </name>
<language> English </language>
</America>
</item>
<item>
<Spain>
<name> Spain </name>
<language> Spanish </language>
</Spain>
</item>
</result2>
</result>
Tag names America and Spain are not fixed and sometimes I may get other tag names like Germany, India, etc.
How to define pojo for tag result2 in such case? Is there a way to tell XStream to accept anything as alias name if tag name is not known before-hand?
if it is ok for you to get the tag from inside the tag itself (field 'name'), using Xpath, you can do:
//result2/*/name/text()
another option could be to use the whole element, like:
//result2/*
or also:
//result2/*/name()
Some technologies (specifically, data binding approaches) are optimized for handling XML whose structure is known at compile time. Others (like DOM and other DOM-like tree models - JDOM, XOM etc) are designed for handling XML whose structure is not known in advance. Use the tool for the job.
XSLT and XQuery try to blend both. In their schema-aware form, they can take advantage of static structure information when it is available. But more usually they are run in "untyped" mode, where there is no a-priori knowledge of element names or structure, and everything is handled as it comes. The XSLT rule-based processing paradigm is particularly well suited to "semi-structured" XML whose content is unpredictable or variable.

XML parsing /JAVA

I have an XML, for example
<root>
<config x="xxx" y="yyyy" z="zzz" />
<properties>blah blah blah </properties>
<example>
<name>...</name>
<decr>...</descr>
</example>
<example>
<name>...</name>
<decr>...</descr>
</example>
</root>
and I need to get nodes config, and properties and all values in it.
Thank you
Xpath can fetch you the data in the config tag. You need to create an expression first like this
expression="//root/config/#x", to get value of x,y,z.
For properties, follow this thread :
Parsing XML with XPath in Java
Hope this helps
DOM,DOM4J,SAX..
if the size of XML file is small,you can use DOM or DOM4J,but the size is big , you use the SAX
If you directly want to query or fetch data XPath can help, but if you want the data as Java Objects so that you can use it further then use JAXB
You can use SAX parser to read the xml manipulate its event based parsing and consumes more memory.
If your xml is big and requires lot of manipulations then go-for DOM/DOM4j either is good. DOM4L is very latest. DOM is widely used in industry.
Based on your requirement go for good parser.
Thanks,
Pavan

XML Parent & child Attributes reading in java?

I have the following data in my XML file.
<main>
<Team Name="Development" ID="10">
<Emp Source="Business" Total="130" Active="123" New="12" />
<Emp Source="Business" Total="131" Active="124" New="13" />
</Team>
<Team Name="Testing" ID="10">
<Emp Source="Business" Total="133" Active="125" New="14" />
</Team>
</main>
I want to read above data & store values into arrays,Can any one help on these?
Not sure why you need to convert those xml into Arrays, anyhow you can read xml and parse it by several ways. Normally we use DOM or Stax Parser and a Tutorial link is here, also here is a Java SAX Parsing Example tutorial.
Hope this can help you to achieve your goal. Update your question again if you stuck anywhere.
You can use parser in JAVA to parse the XML document. The package in java for this purpose is javax.xml.parsers . DocumentBuilder parses XML into a Document and Document is a tree structured data structure that is DOM(Document Object Model) readable file. Its nodes can be traversed/ changed/ accessed by DOM methods.
Here is a very good tutorial on XML DOM: http://www.roseindia.net/xml/dom/
and more specifically: http://www.roseindia.net/xml/dom/accessing-xml-file-java.shtml
also you can always refer to w3school for more theory on DOM!

Removing nodes with invalid tag names from a xml document

I transform xml with the Saxon XSLT2 processor (using Java + the Saxon S9API) and have to deal with xml-documents as the source, which contain invalid characters as tag names and therefore can't be parsed by the document-builder.
Example:
<A>
<B />
<C>
<D />
</C>
<E!_RANDOM_ />
< />
</A>
Code:
import net.sf.saxon.s9api.*;
[...]
/* XSLT Processor & Compiler */
proc = new Processor(false);
/* build document from input*/
XdmNode source = proc.newDocumentBuilder().build(new StreamSource(input));
Error:
Error on line X column Y
SXXP0003: Error reported by XML parser: Element type
"E" must be followed by either attribute specifications, ">" or "/>".
The exclamation mark and the tag name just being space are currently my only invalid tags.
I am searching for a more robust solution rather than just removing whole lines of the (formated) xml.
With some mind-bending I could come up with a regular expression to identify the invalid strings, but would struggle with the removal of the nodes containing attributes and child-nodes.
Thank you for your help!
If the input contains invalid tags then it is not XML. It's best to get your mind-set right by referring to these as non-XML documents rather than XML documents; that helps to make it clear that to process non-XML documents, you need non-XML tools. (Forget about "nodes" - there are no nodes until the document has been parsed, and it can't be parsed until you have turned it into well-formed XML). To turn non-XML into XML, you will typically want to use non-XML tools that are good at text manipulation, such as Perl. Of course, it's much better to fix the problem at source: all the benefits of XML are lost if people generate data in private non-XML formats.

The markup must be well-formed

First off, let me say I am a new to SAX and Java.
I am trying to read information from an XML file that is not well formed.
When I try to use the SAX or DOM Parser I get the following error in response:
The markup in the document following the root element must be well-formed.
This is how I set up my XML file:
<format type="filename" t="13241">0;W650;004;AG-Erzgeb</format>
<format type="driver" t="123412">001;023</format>
...
Can I force the SAX or DOM to parse XML files even if they are not well formed XML?
Thank you for your help. Much appreciated.
Haythem
Your best bet is to make the XML well-formed, probably by pre-processing it a bit. In this case, you can achieve that simply by putting an XML declaration on (and even that's optional) and providing a root element (which is not optional), like this:
<?xml version="1.0"?>
<wrapper>
<format type="filename" t="13241">0;W650;004;AG-Erzgeb</format>
<format type="driver" t="123412">001;023</format>
</wrapper>
There I've arbitrarily picked the name "wrapper" for the root element; it can be whatever you like.
Hint: using sax or stax you can successfully parse a not well formed xml document until the FIRST "well formed-ness" error is encountered.
(I know that this is not of too much help...)
As the DOM will scan you xml file then build a tree, the root node of the tree is like the as 1 Answer. However, if the Parser can't find the or even , it can even build the tree. So, its better to do some pre-processing the xml file before parser it by DOM or Sax.

Categories

Resources