XML Document traversal in java - java

which is the most preferable xml document traversal method in java ? Using getElementsByTagName or using TreeWalker .
I've one TreeModel. a Dom Node is the root of the TreeModel. There are two Threads adding nodes to it. One Thread is adding nodes according the nodes added by the other Thread.
e.g.
One Thread adding Nodes named App. The other Thread adding nodes according to the name attribute of the Nodes named App. Sometimes the nodes are not added correctly. The TreeModel only shows the details in the elements by traversing through the nodes.
Note: Adding the App Node is according to the Name attribute of the Node.
Currently for the second Thread, the Nodes are taken by calling getElementsByTagName. Is there any advantage by changing it to TreeWalker ?

I like XPath. W3schools link here, Javadocs here. It is tedious to get started with factories and builders, IMO write your own utility class to save on that tedium. But the syntax to traverse around is expressive and powerful, and it is a "standard" with good documentation.
If you are brave, check out my beta Groovy-like xpath-like project, but I would not propose this as "the most preferable". :-)
ADDED: XPath is a query language for selecting nodes from an XML document. It is good for traversing (moving around in) a DOM structure. However, OP's updated requirements are for manipulating / modifying the DOM structure. XPath is not a good fit there.

Related

Dom4j vs JAXB for reading and updating large and complex XML files

I have an XML file with a stable tree structure and more than 5000 elements.
A fraction of it is below:
<Companies>
<Offices>
<RevenueInfo>
<TransactionId>14042015014606877</TransactionId>
<Company>
<Identification>
<GlobalId>25142400905</GlobalId>
<BranchId>373287734</BranchId>
<GeoId>874</GeoId>
<LastUpdated>2015-04-14T01:46:06.940</LastUpdated>
<RecordType>7785</RecordType>
</Identification>
<Info>
<DataEntry>
<EntryId>12345</EntryId>
</DataEntry>
<DataEntry>
<EntryId>34567</EntryId>
</DataEntry>
<DataEntry>
<EntryId>89076</EntryId>
</DataEntry>
<DataEntry>
<EntryId>13211</EntryId>
</DataEntry>
</Info>
...more elements
</Company>
</RevenueInfo>
</Offices>
</Companies>
I need to be able to update any of the values in the document based on user input and create a new XML file with the updated information. User will pass BranchId, the name of the element to update and it's number of order if multiple occurring element ( for example, for EntryId 12345 the user will pass 373287734 EntryId=1 010101 )
I've been looking at JAXB but it seems like a considerable effort to create the model classes for this kind of XML but it also seems like it would make printing to file and locating the element to update a lot easier.
Dom4j seems to have good performance results too, but not sure how parsing will be.
My question is, is JAXB the best approach in this case or can you suggest a better way to parse this type of XML?
In my experience JAXB only works well when the schema is simple and stable. In other cases you are better off using a generic tree model. The main generic models in the Java world are DOM, JDOM2, DOM4J, XOM, AXIOM. My own preferences are JDOM2 and XOM; DOM4J seems to me overcomplex, and somewhat old-fashioned. But it depends what you are looking for.
But then, the application you describe looks an ideal candidate for an "XML end-to-end" or XRX approach - XForms, XSLT, XQuery, XProc. You don't need Java at all.
Leaving performance and memory requirements aside, I would recommend trying XPath together with DOM4J (or JDOM, or even plain DOM). To select the company you could use an XPath expression like this:
"//Company[Identification/BranchId = '373287734']"
Then, using the returned company element as context, you can get the element to be updated with another XPath expression:
"//EntryId[position() = 1]"

Which is the best way to locate an element in selenium webdriver other than XPath?

The application which I'm testing is fast developing, and new features keep being adding, requiring changes to the testing XPaths. So the selenium scripts which were successful before now failed as the XPaths have changed. Is there any reliable way to locate element (which will never change)? FYI, I thought of using ID's but my application does not have ID's for each and every element as it is not recommended to give ID's in the code.
I feel the following is the hierarchy for choosing the element in selenium
1.id
2.class name
3.name
4.css
5.xpath
6.link text
7.Partial link text
8.tag name
In case of changing DOM structure you can try using functions like text() and contains(). The following link explains basic of the mentioned function.
http://www.guru99.com/using-contains-sbiling-ancestor-to-find-element-in-selenium.html
The following link can be referred for Writing reliable locators
https://blog.mozilla.org/webqa/2013/09/26/writing-reliable-locators-for-selenium-and-webdriver-tests/
Hope this helps you.
If you cannot impose #id discipline on the interface that keeps changing, one alternative is to use CSS selectors.
Another alternative to write more robust XPath:
Be smart about using the descendent-or-self axis (//):
Rather than /some/long/and/brittle/path/uniquepart use //uniquepart or //uniquepart/further/path to bypass that which is likely to change.
Don't overspecify label matching.
Use case-insensitive contains(), and try to match critical parts of labels that are likely to remain invariant across interface changes.
One other way I can think if is that you can load your page elements in to DOM and use DOM element navigation. It is a good practice to have id on elements though. If you have to use the xpath way then it is a good practice to split the path to keep the common path separately and adding the leaf elements as needed. In a way change in xpath triggering the test to fail is a good indication of catching the changes.

Given a Node, how can I select the equivalent Node in a Document?

My caller is handing me an org.w3c.dom.Node and an org.w3c.dom.Document that serves as its owner document. The supplied Node, in other words, is guaranteed to be parented in the supplied Document, and represents the Node in that Document against which some work should be performed.
I am now in a position where I need to effectively clone the Document (I need to perform modifications, and cannot modify the source.)
Obviously, if I do that, the Node I still have in my hand is not owned by the new Document resulting from the clone. I have now effectively lost the selection in the cloned Document.
However, I know that there will be a Node in that cloned document that is exactly equal to the Node I have in my hand. I need to find it.
What is the best way to accomplish this, short of plowing through the whole Document and calling isEqualNode(Node) on each one?
I thought perhaps there would be some way to say document.find(myUnparentedNode), but no such method exists.
you could generate an XPath that describes the position of the node in the old document and then apply that to the new document. See this SO question for approaches how to do that.
If it's possible for you to modify the node before cloning just give it a unique attribute that doesn't collide with anything else (e.g. generated randomly) that you can then locate in the cloned document.
If it already has an id, just use that.

Creating Multiple Child Nodes in XML for Java

I need to create multiple Child Nodes in one element node in XML, do I just append as many times as required to create these nodes? Like this:
rootElement.appendChild(creator);
creator.appendChild(name);
creator.appendChild(email);
creator.appendChild(name);
creator.appendChild(email);
Or does java automatically create the extra child nodes whenever I do this:
name.appendChild(doc.createTextNode("Bob"));
email.appendChild(doc.createTextNode("bob#email.com"));
name.appendChild(doc.createTextNode("Smith"));
email.appendChild(doc.createTextNode("smith#email.com"));
I'm not too sure how it works, any advice or help would be appreciated!
Behavior varies across different implementations, but in general you want to go with the second approach.
When appending or adding a child to a parent the previous parent is replaced. This means that the first approach does nothing but shuffle the same to children. The second approach is correct because you create new children as you go and the previously added children remain untouched by later API calls.

Code to insert arbitrary XML string into XML document by XPath

I am attempting to create a script that wraps a Groovy class that will take the following arguments:
An input XML file to update.
An arbitrary snippet to insert into the input file (might not even be well-formed in an of itself; it would become part of a larger well-formed document).
XPath for the marker element (used for positioning the snippet in #2).
An action (insert before, insert after, append child).
Optional output XML file.
I'm at a loss for finding an API that will allow me to:
Find a node by XPath and
Cram XML from a String adjacent to the node.
Does anyone have some ideas for technologies that I can combine to achieve this effect? Small examples would be especially useful.
If the snippet is well-formed most DOM implementations I've seen will also support the non-standard DocumentFragment node type which allows you to inject dom nodes from string.
EDIT: Quick Google search throws up some JavaDocs: http://download.oracle.com/javase/1.4.2/docs/api/org/w3c/dom/DocumentFragment.html
IIRC the api works like this (pseudo code):
parent = find_parent_node_of_fragment(document);
fragment = document.createDocumentFragment();
fragment.appendXML("<my>xmlstring</my>");
parent.appendChild(fragment);
If you don't have this luxury or if your string is not well formed there is the option to inject CDATA.
If you can't make do with injecting CDATA (because you essentially want to affect nodes that follow, for instance the new node must become the parent of old nodes which will be enclosed in the new document), you could try an XSLT transformation.
I suspect what I was trying to do is non-trivial and would have required a much larger framework than what I had time for. I ended up abandoning this endeavor.

Categories

Resources