Java Sax tree (duplicate attributes)

Java Sax tree (duplicate attributes) - java

Im trying to make a program which will build a tree from xml document via SAX parser.
But in result names of tags build sucsessful , but in attributes I see only attribute of last tag.
What is wrong with code?
Print to tree is in tag.toString()

Try to change line 48 into:
Tag t = new Tag(eName, new org.xml.sax.helpers.AttributesImpl(attrs));

I guess the problem is that you are storing the Attributes instance for each Tag, and that the Attributesinstance is reused for each each call to startElement() invokation. Thus, every Tag will will see the same attributes instance with the same content, that of the last one constructed by the parser. You will have to create a copy (or a Map or something else) of the actual attributes for each Tag.

Related

How to read XML attribute values using JXPathContext if reference is missing

Given following XML, we are using JXPathContext to create Java object out of it.
<fb1:Activity fb2:metadata="Activity1">
</fb1:Activity>
<fb21:ActivityMetadata fb2:id="Activity1">
<fb1:Response>XXXX</fb1:Response>
</fb1:ActivityMetadata>
reading the value -
String responseCode = context.getValue("metadata[1]/Response/value");
This is working as expected. Now lets say for instance, the reference from Activity to ActivityMetadata is missing. What can we do to read the response value in such case? It is guaranteed that there can only be one ActivityMetadata element at max in the XML.
Incomplete XML - need to parse this
<fb1:Activity fb2:metadata="">
</fb1:Activity>
<fb21:ActivityMetadata>
<fb1:Response>XXXX</fb1:Response>
</fb1:ActivityMetadata>

The path you're giving us doesn't match the document you're showing us.
Ignoring that for a moment:
XML doesn't constrain the tree at all; that's done by the XML Schema (if there is one) and/or the applications which process that kind of document. Only the folks who defined this particular kind of document, or the schema, or the code can tell you whether there are any guarantees about only one ActivityMetadata being present or what it means if there's more than one.
XML is pure syntax. Meaning is someone else's problem.

Parsing XML for deeply nested data

I have an XML file that is structured something like this:
<element1>
<element2>
<element3>
<elementIAmInterestedIn attribute="data">
<element4>
<element5>
<element6>
<otherElementIAmInterestedIn>
<data1>text1</data1>
<data2>text2</data2>
<data3>text3</data3>
</otherElementIAmInterestedIn>
</element6>
</element5>
</element4>
</elementIAmInterestedIn>
<elementIAmInterestedIn attribute="data">
<element4>
<element5>
<element6>
<otherElementIAmInterestedIn>
<data1>text1</data1>
<data2>text2</data2>
<data3>text3</data3>
</otherElementIAmInterestedIn>
</element6>
</element5>
</element4>
</elementIAmInterestedIn>
<elementIAmInterestedIn attribute="data">
<element4>
<element5>
<element6>
<otherElementIAmInterestedIn>
<data1>text1</data1>
<data2>text2</data2>
<data3>text3</data3>
</otherElementIAmInterestedIn>
</element6>
</element5>
</element4>
</elementIAmInterestedIn>
</element3>
</element2>
</element1>
As you can see, I am interested in two elements, the first of which is deeply nested within the root element, and the second of which is deeply nested within that first element. There are multiple (sibling) elementIAmInterestedIn and otherElementIAmInterestedIn elements in the document.
I want to parse this XML file with Java and put the data from all the elementIAmInterestedIn and otherElementIAmInterestedIn elements into either a data structure or Java objects - it doesn't matter much to me as long as it is organized and I can access it later.
I'm able to write a recursive DOM parser method that does a depth-first traversal of the XML so that it touches every element. I also wrote a Java class with JAXB annotations that represents elementIAmInterestedIn. Then, in the recursive method, I can check when I get to an elementIAmInterestedIn and unmarshal it into an instance of the JAXB class. This works fine except that such an object should also contain multiple otherElementIAmInterestedIn.
This is where I'm stuck. How can I get the data out of otherElementIAmInterestedIn and assign it to the JAXB object? I've seen the #XmlWrapper annotation, but this seems to only work for one layer of nesting. Also, I cannot use #XmlPath.
Maybe I should scratch that idea and use a whole new approach. I'm really just getting started with XML parsing so perhaps I'm overlooking a more obvious solution. How would you parse an XML document structured like this and store the data in an organized way?

Maybe you should use SAX parser instead of DOM. When you use DOM you are loading all the document in memory and in your case you only want to read 2 fields. This is quite inefficient.
Using sax parser you'll be able to read only those nodes that you are interested in. Here is a pseudocode for your task using a SAX parsing model:
1) Keep reading nodes until you get <elementInterestedIn> node
2) Grab that field in your class
3) Keep on reading until you get <otherElementInterestedIn> node
4) Grab that field too and save the object.
Loop from 1 to 4 until it reachs the end of document.
If you try this aproach, i suggest you first reading this document to understand how SAX parser works, it's very different from DOM aproach: How to Use SAX

Define the order of attributes in dom

I currently working on DOM and i wonder how can change the place of tags data
for example
I have created element:
propElement = document.createElement("prop");
The prop is opening the tag.
Then
propElement.setAttribute("name", "name1");
propElement.setAttribute("name2", "name2");
The problem is that despite i put the set method name2 after name1 I will see in the tag name2 before name1.
How can I change the order ?
(Note; I'm using a Java DOM API, not JavaScript.)

You can't, the order of attributes on elements is not significant. In fact, in a live DOM, there is no order. Order only seems to exist in relation to the serialized form of a DOM (e.g., HTML markup and the like). And even then, the order doesn't have any meaning except in relation to invalid text (more below).
Attributes are basically simple properties of an object (the DOM element to which they're attached). There is absolutely no order to them, and in fact the representation of them in the DOM is a NamedNodeMap which is "...not maintained in any particular order."
It's important to remember that the DOM describes an object model. The serialized form of a DOM may be textual (for instance, an HTML document defining a DOM), but the DOM is not. In an HTML document, since it's linear text (top-to-bottom, left-to-right), naturally the text defining one attribute has to precede the text describing another, but that does not imply any kind of order to the attributes in the resulting DOM object, because they have no order at all. So this:
<div a="1" b="2">...</div>
describes exactly the same element as this:
<div b="2" a="1">...</div>
The resulting element is a div which has an attribute a with the value 1 and an attribute b with the value 2.
This is exactly the same as setting properties on an object in program source. Consider some hypothetical obj with x and y properties. This code:
obj.a = 1;
obj.b = 2;
...results in exactly the same object as this code:
obj.b = 2;
obj.a = 1;
...provided a and b really are simple fields (not hidden function calls that may have side effects), which is true of attributes in the DOM.
There is one small way in which attribute order in the textual (serialized) form of a DOM may be significant, and it's only related to invalid text: If the same attribute is specified more than once, only the first value given is used, because it's invalid to specify the same attribute more than once. The values are not combined, and the subsequent value doesn't overwrite the previous one. The first one, only, is used.
So this invalid HTML:
<div class="foo" class="bar">...</div>
...actually results in a div with class "foo" ("bar" is not present at all). But this is just a coping mechanism for dealing with invalid serialized forms.

Apache Digester How do I get some xml nested within a tag as a literal string?

I am parsing a XML with Digester. A part of it contains content formatted in cryptic pseudo-HTML XML elements which I need to transform into an PDF. That will be done via Apache FOP. Hence I need to access the xml element which contains the content elements directly and pipe it to FOP. To do so the Digester FAQ states that one either
Wrap the nested xml in CDATA
or
If this can't be done then you need to use a NodeCreateRule to create a DOM node representing the body tag and its children, then serialise that DOM node back to text
Since it is a third party XML the CDATA approach could only be done via (another) XSLT which I hestitate to do.
It looks like this issue should be solvable via NodeCreateRule but I can not figure out how to get it done.
The documentation states that NodeCreateRule will push a Node onto the stack however I can only get it to pass null.
I tried
digester.addRule(docPath + "/contents", new NodeCreateRule());
digester.addCallMethod(docPath + "/contents", "setContentsXML");
setContentsXML expects a Element parameter.
I also tried this and this without any luck.
I am using the latest stable Digester. Would be thankful for any advice.
Update:
I found the bug . The result on my system is null, too. I am using JDK 6u24

The problem in my case as well as the linked bug lays in the proper serialisation of an Element. In my case the mentioned null value was not returned by Digester but by Element#toString(). I assume something changed since JDK 1.4.
By the bug example:
result contains another (text-)node with the actual content. toString() however simply takes the content of the Element instance it is called uppon.
The Element tree has to be serialized explicitly. For example with the serialization method in this useage example of NodeCreateRule.
In case someone else tries to use that with Digester 3: you have to change the method signature SetSerializedNodeRule#end() to SetSerializedNodeRule#end(String, String).

how to parse xml using sax by checking both parent and child tags at a time?

Hi
I want to parse an xml doc using sax parser. What i want to do is to check out both outer and inner tags(Eg: am having same tags in other tags also. like . ) so i want to get the data based on outer tag i.e only from tag or tag1 in our case. can u please help me with this? can i check both the parent and child tags using sax in java???? Please help me out.
Thanks..

Write a ContentHandler that utilises a Stack to hold data of nested elements. In the beginning of StartElement do Stack.push and in the end of EndElement do Stack.pop. Use a Stack holding just element names or Stack with your own Element class to hold more data.
You may also write a general purpose abstract ContentHandler that holds the Stack and provides its descendants methods for getting all kinds of information of the current element and its path.

I think one way you can do is, if the outer tags are different then when u get the outer tag can set some variable value, then when it comes to inner tag can check which value that variable is set to and know parent tag.
for
Ex:
<x>
<y/>
</x>
<z>
<y/>
</z>
so in startElement() of handler you can check if name==x, set variable value to x( else z). when you get y, check what the variable is set to(x/z).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java Sax tree (duplicate attributes) - java

Im trying to make a program which will build a tree from xml document via SAX parser. But in result names of tags build sucsessful , but in attributes I see only attribute of last tag. What is wrong with code? Print to tree is in tag.toString()

Try to change line 48 into: Tag t = new Tag(eName, new org.xml.sax.helpers.AttributesImpl(attrs));

Related

How to read XML attribute values using JXPathContext if reference is missing

Parsing XML for deeply nested data

Define the order of attributes in dom

Apache Digester How do I get some xml nested within a tag as a literal string?

how to parse xml using sax by checking both parent and child tags at a time?

Categories

Resources