XPath expression with where statement - java

I have xml and I want to get, using xpath expression, text from Text node only if Text_2 contains elements. Is there any possibility? I couldn't find out any.
<List>
<Response>
<Node>
<SomeNode>
<Text>text</Text>
<Text_1>text_1</Text_1>
<Text_2 value_1="some value 1" value_2="some value 2" />
</SomeNode>
</Node>
</Response>
</List>
I tried to get Text_2 elements using //*[#value_1] but I stuck and do not have any other idea

Your text says "want to get, using xpath expression, text from Text node only if Text_2 contains elements", with your given sample that would be //SomeNode[Text_2/*]/Text. For some reasons I don't understand, however, in your sample the Text_2 element doesn't have any child elements.

Related

XPath returns for one element but doesn't return the other?

I am using Java to extract values using XPath. I was able to extract elements under the element fields but the elements under records are not returned.
XML is as follows:
<?xml version="1.0" ?>
<qdbapi>
<action>****</action>
<errcode>0</errcode>
<errtext>No error</errtext>
<qid>****</qid>
<qname>****</qname>
<table>
<fields>
<field id="19" field_type="text" base_type="text">
</field>
</fields>
<records>
<record>
<f id="6">1</f>
</record>
</records>
</table>
</qdbapi>
Code below:
XMLDOMDocObj.selectNodes("//*[local-name()='fields']")//21 fields returned
XMLDOMDocObj.selectNodes("//*[local-name()='records']")//no records are returned
XML must have a single root element; yours has two: fields and records.
Wrap them in a single common root to get the results you expect.
Also, if your XML has no namespaces, there's no reason to defeat them. Instead of
//*[local-name()='records']
use
//records
See also
How does XPath deal with XML namespaces?
Why must XML documents have a single root element?
What is the difference between root node, root element and document element in XML?

How to get only first level nodes with Jsoup

I have following XML tree and I'm using Jsoup to parse it.
<?xml version="1.0" encoding="UTF-8" ?>
<nodes>
<node>
<name>NODE 1</name>
<value1>
<value1>NODE 1 VALUE 1</value1>
</value1>
<nodes>
<node>
<name>NODE 1 CHILD</name>
<value1>NODE 1 CHILD VALUE 1</value1>
</node>
</nodes>
</node>
<node>
<name>NODE 2</name>
<value1>NODE 2 VALUE 1</value1>
</node>
</nodes>
However when I try to get only first level of node-elements. It returns all elements including children nodes, and it is doing it correctly, because clearly child elements also match my query.
Elements elements = data.select("nodes > node");
Is there any way to get just first level node-elements without adding additional level information to XML data?
You can do something like this:
Elements elements = data.select("nodes").first().select("> node");
This will work as well:
Elements elements = data.select("> nodes > node");
but only if you've used Jsoup.parse(xml, "", Parser.xmlParser()) to parse the XML and the XML is indeed as you've specified in your question (<nodes> is the root element)

Why is my XPath selecting nothing?

My XML file
<classifications>
<classification sequence="1">
<classification-scheme office="" scheme="CS" />
<section>G</section>
<class>01</class>
<subclass>R</subclass>
<main-group>33</main-group>
<subgroup>365</subgroup>
<classification-value>I</classification-value>
</classification>
<classification sequence="2">
<classification-scheme office="" scheme="CS" />
<section>G</section>
<class>01</class>
<subclass>R</subclass>
<main-group>33</main-group>
<subgroup>3415</subgroup>
<classification-value>A</classification-value>
</classification>
<classification sequence="1">
<classification-scheme office="US" scheme="UC" />
<classification-symbol>324/300</classification-symbol>
</classification>
<classification sequence="2">
<classification-scheme office="US" scheme="UC" />
<classification-symbol>324/307</classification-symbol>
</classification>
</classifications>
I want to parse the value with following condition
required all the classification-symbol element value along with condition office="US"
I tried with below XPath,
NodeList usClassification = (NodeList)xPath.compile("//classifications//classification//classification-scheme[#office=\"US\"]//classification-symbol//text()").evaluate(xmlDocument, XPathConstants.NODESET);
but I'm getting an empty result set,
System.out.println(usClassification.getLength()); //its becomes zero
This XPath (written on two lines to ease readability),
/classifications/classification[classification-scheme/#office='US']
/classification-symbol/text()
will select the classification-symbol text of the classification elements with classification-scheme #office attribute value equal to US:
324/300
324/307
as requested.
classification-symbol is not a child of classification-scheme - they are siblings. Use following-sibling axis to get from "scheme" to "symbol" instead:
//classifications/classification/classification-scheme[#office=\"US\"]/following-sibling::classification-symbol/text()

java xpath text

I'm kind of stuck with XPath and/or Java. I have XML document structures like the following:
<document>
<text>
<headline>This is the text's headline</headline>
This is the text.
</text>
</document>
What I need is this:
<document>
<text>
<headline>This is the text's headline</headline>
<w>This</w>
<w>is</w>
<w>the</w>
<w>text</w>
<w>.</w>
</text>
</document>
How in the world can I change the text content of the text-node while leaving the headline-node untouched?! (I'm using org.jdom2.xpath.)
Bob
Since you need to clear the text content of <text> and setting (empty) text removes all other content, you could clone the <headline> element first and add it back after using setText(). Then continue into your loop and create the <w> elements.
As for XPath, you can select the <heading> element with
/document/text/heading
and the text content of the <text> element with
/document/text/text()

Using Both Tagged And Untagged Data With XPath

I'm trying to parse some HTML using XPath in Java. Consider this HTML:
<td class="postbody">
<img src="...""><br />
<br />
<b>What is Blah?</b><br />
<br />
Blah blah blah
<br />
Note that "What Is Blah" is helpfully contained within a b tag and is therefore easily parseable. But "Blah blah blah" is out in the open, and so I can only pick it up by calling text() on its parent node.
Thing is, I need to go through this in sequence, putting the img down, then the bolded text, then the body text. It's important it ends up in order (it needn't be processed in order, if you can suggest a way that takes two passes).
So are there any suggestions for how, if I've got the above contained within a Java XPath node, I can go through it in turn and get what I need?
I think an SAX based parser would be a better tool for this problem. It's event based so you can parse your XML document in order.
But it's an XML parser so you'll need to have a valid XML document. I never used JTidy but it's a java port of the HTML Tidy, so hopefully it can help you to transform your (invalid) HTML documents to a valid XML.
Use this XPath expression evaluated with the parent of the provided XML fragment as the context node:
node()
This selects every node - child of the context node -- every element -child, every text-node-child, every comment-child and every PI (processing instruction) - child.
In case you want to exclude comments and PIs, use:
node()[not(self::comment() or self::processing-instruction)]
In case that in addition to this you don't want to select the whitespace-only-text-nodes, use:
node()
[not(self::comment() or self::processing-instruction)]
[not(self::text()[string-length() = 0])]

Categories

Resources