XML handling in Java - java

Need to select all nodes from the path a/b/c as NodeList from a Document using getElementsByTagName() . How do i provide path of node as input?
eg: -
<root>
<a>
<b>
<c>1</c>
<c>2</c>
<c>3</c>
<c>4</c>
<c>5</c>
<c>6</c>
</b>
</a>
</root>
need to select all 'c' nodes from the path a/b/c . How can I achieve this. Directly selecting c is an option, but to avoid ambiguity if more 'c's are present, I need to give the path. How do I achieve this?

Take a look at the Java XPathAPI. You probably want to specify an XPath of /root/a/b to specify all the <c/> nodes in the above hierarchy.

Related

Reading value outside span tags in selenium

I am new to selenium, I am trying to get the text which is outside span
tags
<span>Source directory: </span>
"/cs/orasoa/devtools/build/2353704573222573/A2B_LSP/RPMS/x86_64"
<br>.
I want to get this value:
/cs/orasoa/devtools/build/2353704573222573/A2B_LSP/RPMS/x86_64
I am using xpath /html/body/div/div[4]/div[2]/span[3], but it is giving output as : Source directory:
Can please someone suggest.
I have used following XML to test it:
<root xmlns:foo="http://www.foo.org/" xmlns:bar="http://www.bar.org">
<actors>
<actor id="1">Christian Bale</actor>
<actor id="2">Liam Neeson</actor>
<actor id="3">Michael Caine</actor>
</actors>
<foo:singers>
<foo:singer id="4">Tom Waits</foo:singer>
<foo:singer id="5">B.B. King</foo:singer>
<foo:singer id="6">Ray Charles</foo:singer>
</foo:singers>
<span>Source directory: </span>
"/cs/orasoa/devtools/build/2353704573222573/A2B_LSP/RPMS/x86_64"
</root>
and XPATH of
root/span/following-sibling::text()
results in
Text='
"/cs/orasoa/devtools/build/2353704573222573/A2B_LSP/RPMS/x86_64"
'
so it is more or less what you want.
I am not sure however, if this
/html/body/div/div[4]/div[2]/span[3]/following-sibling::text()
will be valid xpath expression in your case.
Assuming your html looks something like this:
<div>
<span>Source directory: </span>
"/cs/orasoa/devtools/build/2353704573222573/A2B_LSP/RPMS/x86_64"
<br />
</div>
Using your xpath, you would select: /html/body/div/div[4]/div[2]
So, you have a WebElement that represent the parent div. When you call element.Text, you should get something like: Source directory: /cs/orasoa/devtools/build/2353704573222573/A2B_LSP/RPMS/x86_64
Side note:
I would very much suggest putting an ID on the element. The xpath that you have is extremely brittle as it relies on the entire page loading in a specific way. If someone adds a div above the current one, your test will not behave the same.

XPath how to check if an ancestor element has a certain name

I have an element structure like so:
<template>
<x name="foo">
<y>
<x>
<a>
<b number="1" />
<c>2</c>
</a>
</x>
<c>5</c>
</y>
</x>
</template>
I need it so that a b element can only exist in a as long as one of the ancestors is an x with a name. I have a set of alternatives written, but I can't get the test to work. This is what I thought would work:
ancestor::x[#name]
but I get no result. The only xpath I've had luck with is this:
ancestor::node()
Can anyone tell me what I'm missing?
Edit:
I'm using a java xerces 2.11 library to load an xsd file which includes tags.
I tried using below xpath and it gave me proper result. I hope you are using proper context for ancestor element.
//template//b/ancestor::x[#name]
Output result is as below :
Element='<x name="foo">
<y>
<x>
<a>
<b number="1" />
<c>2</c>
</a>
</x>
<c>5</c>
</y>
</x>'
As per your condition, the xpath will be :
//template//b[ancestor::x[#name]]
And the output will be as below which will only be given if it has ancestor element 'x' with name attribute :
Element='<b number="1" />'

In XSLT, how do I get the filepath of the xml file of a certain element if that xml file was included with xinclude?

I have these XML files:
master.xml (which uses XInclude to include child1.xml and child2.xml)
child1.xml
child2.xml
Both child1.xml and child2.xml contain a <section> element with some text.
In the XSLT transformation, I 'd want to add the name of the file the <section> element came from, so I get something like:
<section srcFile="child1.xml">Text from child 1.</section>
<section srcFile="child2.xml">Text from child 2.</section>
How do I retrieve the values child1.xml and child2.xml?
Unless you turn off that feature, all XInclude processors should add an #xml:base attribute
with the URL of the included file. So you don't have to do anything, it should already be:
<section xml:base="child1.xml">Text from child 1.</section>
<section xml:base="child2.xml">Text from child 2.</section>
( If you want, you can use XSLT to transform the #xml:base attr into #srcFile. )
I'm 99% sure that once xi:include has been processed, you have a single document (and single infoset) that won't let you determine which URL any given part of the document came from.
I think you will need to place that information directly in the individual included files. Having said that, you can still give document-uri a try, but I think all nodes will return the same URI.

dom4j: How to resolve this XPath Error?

I am reading an XML using dom4j by using XPath techniques for selecting desired nodes. Consider that my XML looks like this:
<Emp_Dir>
<Emp_Classification type ="Permanent" >
<Emp id= "1">
<name>jame</name>
<Emp_Bio>
<age>12</age>
<height>5.4</height>
<weight>78</weight>
</Emp_Bio>
<Emp_Details>
<salary>2000</salary>
<designation>developer</designation>
</Emp_Details>
</Emp>
<Emp id= "2">
<name>jame</name>
<Emp_Bio>
<age>12</age>
<height>5.4</height>
<weight>78</weight>
</Emp_Bio>
<Emp_Details>
<salary>2000</salary>
<designation>developer</designation>
</Emp_Details>
</Emp>
</Emp_Classification>
<Emp_Classification type ="Contract" >
.
.
.
</Emp_Classification>
<Emp_Classification type ="PartTime" >
.
.
.
</Emp_Classification>
</Emp_Dir>
Note: The above XML might looks ugly to you but i only create this dummy file for the sake of understanding and keeping the secracy of my project
When i specify some simple XPath expression, like:
//Emp_Classification (or)
/Emp_Dir/Emp_Classification
then its works fine but when i specify some complex expression like:
/Emp_Dir/Emp_Classification/[#type='Permanent'] (or)
//Emp_Dir/Emp_Classification/[#type='Permanent']
then it gives me the following error:
"Invalid XPath expression: /Emp_Dir/Emp_Classification/[#type='Permanent'] Expected one of '.', '..', '#', '*', <QName>"
Coulde anybody guides me what goes wrong in my XPath?
My second question is that how do i select the Emp_Bio node of Permanent Employees only, does this works?
//Emp_Dir/Emp_Classification/[#type='Permanent']/Emp/Emp_Bio
Use : //Emp_Dir/Emp_Classification[#type='Permanent']
(note the removal of /)
And then use this : //Emp_Dir/Emp_Classification[#type='Permanent']/Emp/Emp_Bio for the latter part of the question.

Using Both Tagged And Untagged Data With XPath

I'm trying to parse some HTML using XPath in Java. Consider this HTML:
<td class="postbody">
<img src="...""><br />
<br />
<b>What is Blah?</b><br />
<br />
Blah blah blah
<br />
Note that "What Is Blah" is helpfully contained within a b tag and is therefore easily parseable. But "Blah blah blah" is out in the open, and so I can only pick it up by calling text() on its parent node.
Thing is, I need to go through this in sequence, putting the img down, then the bolded text, then the body text. It's important it ends up in order (it needn't be processed in order, if you can suggest a way that takes two passes).
So are there any suggestions for how, if I've got the above contained within a Java XPath node, I can go through it in turn and get what I need?
I think an SAX based parser would be a better tool for this problem. It's event based so you can parse your XML document in order.
But it's an XML parser so you'll need to have a valid XML document. I never used JTidy but it's a java port of the HTML Tidy, so hopefully it can help you to transform your (invalid) HTML documents to a valid XML.
Use this XPath expression evaluated with the parent of the provided XML fragment as the context node:
node()
This selects every node - child of the context node -- every element -child, every text-node-child, every comment-child and every PI (processing instruction) - child.
In case you want to exclude comments and PIs, use:
node()[not(self::comment() or self::processing-instruction)]
In case that in addition to this you don't want to select the whitespace-only-text-nodes, use:
node()
[not(self::comment() or self::processing-instruction)]
[not(self::text()[string-length() = 0])]

Categories

Resources