How do I create an xPath statement from a NodeInfo? - java

I'm using the S9API with Saxon 9.7 HE, and I have a NodeInfo object. I need to create an xPath statement that uniquely corresponds to this NodeInfo. It's easy enough to get the prefix, the node name, and the parent:
String prefix = node.getPrefix();
String localPart = node.getLocalPart();
NodeInfo parent = node.getParent();
So I can walk up the tree, building the xPath as I go. But what I can't find is any way to get the positional predicate info. IOW, it's not sufficient to create:
/persons/person/child
because it might match multiple child elements. I need to create:
/persons/person[2]/child[1]
which will match only one child element. Any ideas on how to get the positional predicate info? Or maybe there's a better way to do this altogether?
BTW, for those who use the generic DOM and not the S9API, here's an easy solution to this problem: http://practicalxml.sourceforge.net/apidocs/net/sf/practicalxml/DomUtil.html#getAbsolutePath(org.w3c.dom.Element)
Edit: #Michael Kay's answer works. To add some meat to it:
XPathExpression xPathExpression = xPath.compile("./path()");
List matches = (List) xPathExpression.evaluate(node, XPathConstants.NODESET);
String pathToNode = matches.get(0).toString();
// If you want to remove the expanded QName syntax:
pathToNode = pathToNode.replaceAll("Q\\{.*?\\}", "");
This must be done using the same xPath object that was previously used to acquire the NodeInfo object.

In XPath 3.0 you can use fn:path().
Earlier Saxon releases offer saxon:path().
The challenge here is handling namespaces. fn:path() returns a path that's not sensitive to namespace-prefix bindings by using the new expanded-QName syntax
/Q{}persons/Q{}person[2]/Q{}child[1]

Related

Find child element by xpath

public WebElement findChildByXpath(WebElement parent, String xpath) {
loggingService.timeMark("findChildByXpath", "begin. Xpath: " + xpath);
String parentInnerHtml = parent.getAttribute("innerHTML"); // Uncomment for debug purpose.
WebElement child = parent.findElement(By.xpath(xpath));
String childInnerHtml = child.getAttribute("innerHTML"); // Uncomment for debug purpose.
return child;
}
The problem with this code is that childInnerHtml gives me wrong result. I scrape numbers and they are equal.
I even suppose that my code is equal to driver.findElement(By.xpath.
Could you tell me whether my comment really finds a child or what to correct?
Child XPath need to be a relative XPath. Normally this means the XPath expression is started with a dot . to make this XPath relative to the node it applied on. I.e. to be relative to the parent node. Otherwise Selenium will search for the given xpath (the parameter you passing to this method) starting from the top of the entire page.
So, if for example, the passed xpath is "//span[#id='myId']" it should be ".//span[#id='myId']".
Alternatevely you can add this dot . inside the parent.findElement(By.xpath(xpath)); line to make it
WebElement child = parent.findElement(By.xpath("." + xpath));
But passing the xpath with the dot is more simple and clean way. Especially if the passed xpath is come complex expression like "(//div[#class='myClass'])[5]//input" - in this case automatically adding a dot before this expression may not work properly.

Difference between JSoup Element and JSoup Node

Can anyone please explain the difference between the Element object and Node object provided in JSoup ?
Which is the best thing to be used in which situation/condition.
A node is the generic name for any type of object in the DOM hierarchy.
An element is one specific type of node.
The JSoup class model reflects this:
Node
Element
Since Element extends Node anything you can do on a Node, you can do on an Element too. But Element provides additional behaviour which makes it easier to use, for example; an Element has properties such as id and class etc which make it easier to find them in a HTML document.
In most cases using Element (or one of the other subclasses of Document) will meet your needs and will be easier to code to. I suspect the only scenario in which you might need to fall back to Node is if there is a specific node type in the DOM for which JSoup does not provide a subclass of Node.
Here's an example showing the same HTML document inspection using both Node and Element:
String html = "<html><head><title>This is the head</title></head><body><p>This is the body</p></body></html>";
Document doc = Jsoup.parse(html);
Node root = doc.root();
// some content assertions, using Node
assertThat(root.childNodes().size(), is(1));
assertThat(root.childNode(0).childNodes().size(), is(2));
assertThat(root.childNode(0).childNode(0), instanceOf(Element.class));
assertThat(((Element) root.childNode(0).childNode(0)).text(), is("This is the head"));
assertThat(root.childNode(0).childNode(1), instanceOf(Element.class));
assertThat(((Element) root.childNode(0).childNode(1)).text(), is("This is the body"));
// the same content assertions, using Element
Elements head = doc.getElementsByTag("head");
assertThat(head.size(), is(1));
assertThat(head.first().text(), is("This is the head"));
Elements body = doc.getElementsByTag("body");
assertThat(body.size(), is(1));
assertThat(body.first().text(), is("This is the body"));
YMMV but I think the Element form is easier to use and much less error prone.
It's seem like same. but different.
Node have Element. and additionally have TextNode too.
so... Example.
<p>A<span>B</span></p>
In P Elements.
.childNodes() // get node list
-> A
-> <span>B</span>
.children() // get element list
-> <span>B</span>

Java: How to execute an XPath query on a node

So I'm reading from an XML file with many layers of nesting in Java using xPath.
At the moment I have a method that takes the path to XML file and a xpath query as parameters, and returns a NodeIterator.
Then I iterate through those node, and for some of the nodes (if their name matches) I need to execute another query on them and get a NodeIterator of their children etc
Is it possible to have a function with 2 parameters, one an already existing Node and the other an xPath query to execute on that Node?
So replacing:NodeIterator ni = XPathAPI.selectNodeIterator(document,xpathQuery);
With some like : NodeIterator ni2 = xPathAPI.selectNodeIterator(parentNode, query);
I've searched on the internet and I can't find any examples, and I'm not sure what the syntax to do the above would be, or if it's even possible?
Many thanks in advance :)
Presumably your XPathAPI class is the Apache/Xalan org.apache.xpath.XPathAPI?
In that case, what's wrong with
static NodeIterator selectNodeIterator(Node contextNode, java.lang.String str)
It seems to do exactly what you want.

How to check if an element exists in the XML using XPath?

Below is my element hierarchy. How to check (using XPath) that AttachedXml element is present under CreditReport of Primary Consumer
<Consumers xmlns="http://xml.mycompany.com/XMLSchema">
<Consumer subjectIdentifier="Primary">
<DataSources>
<Credit>
<CreditReport>
<AttachedXml><![CDATA[ blah blah]]>
Use the boolean() XPath function
The boolean function converts its
argument to a boolean as follows:
a number is true if and only if
it is neither positive or negative
zero nor NaN
a node-set is true if and only if
it is non-empty
a string is true if and only if
its length is non-zero
an object of a type other than
the four basic types is converted to a
boolean in a way that is dependent on
that type
If there is an AttachedXml in the CreditReport of primary Consumer, then it will return true().
boolean(/mc:Consumers
/mc:Consumer[#subjectIdentifier='Primary']
//mc:CreditReport/mc:AttachedXml)
The Saxon documentation, though a little unclear, seems to suggest that the JAXP XPath API will return false when evaluating an XPath expression if no matching nodes are found.
This IBM article mentions a return value of null when no nodes are matched.
You might need to play around with the return types a bit based on this API, but the basic idea is that you just run a normal XPath and check whether the result is a node / false / null / etc.
XPathFactory xpathFactory = XPathFactory.newInstance(NamespaceConstant.OBJECT_MODEL_SAXON);
XPath xpath = xpathFactory.newXPath();
XPathExpression expr = xpath.compile("/Consumers/Consumer/DataSources/Credit/CreditReport/AttachedXml");
Object result = expr.evaluate(doc, XPathConstants.NODE);
if ( result == null ) {
// do something
}
Use:
boolean(/*/*[#subjectIdentifier="Primary"]/*/*/*/*
[name()='AttachedXml'
and
namespace-uri()='http://xml.mycompany.com/XMLSchema'
]
)
Normally when you try to select a node using xpath your xpath-engine will return null or equivalent if the node doesn't exists.
xpath: "/Consumers/Consumer/DataSources/Credit/CreditReport/AttachedXml"
If your using xsl check out this question for an answer:
xpath find if node exists
take look at my example
<tocheading language="EN">
<subj-group>
<subject>Editors Choice</subject>
<subject>creative common</subject>
</subj-group>
</tocheading>
now how to check if creative common is exist
tocheading/subj-group/subject/text() = 'creative common'
hope this help you
If boolean() is not available (the tool I'm using does not) one way to achieve it is:
//SELECT[#id='xpto']/OPTION[not(not(#selected))]
In this case, within the /OPTION, one of the options is the selected one. The "selected" does not have a value... it just exists, while the other OPTION do not have "selected". This achieves the objective.

Remove Element from JDOM document using removeContent()

Given the following scenario, where the xml, Geography.xml looks like -
<Geography xmlns:ns="some valid namespace">
<Country>
<Region>
<State>
<City>
<Name></Name>
<Population></Population>
</City>
</State>
</Region>
</Country>
</Geography>
and the following sample java code -
InputStream is = new FileInputStream("C:\\Geography.xml");
SAXBuilder saxBuilder = new SAXBuilder();
Document doc = saxBuilder.build(is);
XPath xpath = XPath.newInstance("/*/Country/Region/State/City");
Element el = (Element) xpath.selectSingleNode(doc);
boolean b = doc.removeContent(el);
The removeContent() method doesn't remove the Element City from the content list of the doc. The value of b is false
I don't understand why is it not removing the Element, I even tried to delete the Name & Population elements from the xml just to see if that was the issue but apparently its not.
Another way I tried, I don't know why I know its not essentially different, still just for the sake, was to use Parent -
Parent p = el.getParent();
boolean s = p.removeContent(new Element("City"));
What might the problem? and a possible solution? and if anyone can share the real behaviour of the method removeContent(), I suspect it has to do with the parent-child relationship.
Sure, removeContent(Content child) removes child if child belongs to the parents immediate children, which it does not in your case. Use el.detach()instead.
If you want to remove the City element, get its parent and call removeContent:
XPath xpath = XPath.newInstance("/*/Country/Region/State/City");
Element el = (Element) xpath.selectSingleNode(doc);
el.getParent().removeContent(el);
The reason why doc.removeContent(el) does not work is because el is not a child of doc.
Check the javadocs for details. There are a number of overloaded removeContent methods there.
This way works keeping in mind that .getParent() returns a Parent object instead of an Element object, and the detach() method which eliminates the actual node, must be called from an Element.
Instead do:
el.getParentElement().detach();
This will remove the parent element with all it's children !

Categories

Resources