Searching part of a String using XPath query - java

I am using the following XPath query to search the name of the author of a book and return the book name when it matches the author.
String rawXPath = String.format("//book[author= '%s']/bookname/text()", authorBook);
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expr
= xpath.compile(rawXPath);
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}
how to modify the search so that I can search in the content ...../content(node name) for a specific word.
Example: String inside the xml content variable: "This book contains the glory and history of our forefathers. And the impact of it in our daily life is immense."
Now I want to search for the word "daily". If it matches daily it will retun me the book name/author name watever the user wants.
Thanks

Use the contains() Xpath function.
//book[contains(content, '%s')]/bookname
depending a bit of the structure of you input XML

You want:
//book[contains(content, $searchString)
and
author = $wantedAuthor
]
/bookname/text()
This selects the text() nodes that are children of the bookname element that is a child of any book element in the document, the string value of whose content child contains the string (contained in) $searchString and the string value of whose author element is the same as the (string contained in) $wantedAuthor variable
In this Xpath expression the variables need to be substituted by specific strings. Also, it assumes that the element content is a child of book.
I don't know Java, but suppose that the final Java code wil look like the following:
String.format("//book[contains(content, '%s')
and
author= '%s']/bookname/text()",
searchString, authorBook);

if your xml looks like this:
<book>
<author> Author Name </author>
<description> This book contains the glory and history of our forefathers. And the impact of it in our daily life is immense.</description>
</book>
then you can try:
String rawXPath = "//book//*[contains(text(),\"daily\")]";
which means "give any book element that any child holds a text that contains the word 'daily'.
I hope that helps you out! cheers!

Now if u want to search also some term which is inside one node u can do it like this -
String rawXPath = String.format("//book/title[contains(innerItem, '%s')]/type/text()", value);
Add one backslash for each node and the node name and ur in.

Related

Retrive xml info using java

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:georss="http://www.georss.org/georss"><title>USGS All Earthquakes, Past Week</title><updated>2020-02-26T21:28:38Z</updated><author><name>U.S. Geological Survey</name><uri>https://earthquake.usgs.gov/</uri></author><id>https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_week.atom</id><link rel="self" href="https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_week.atom"/><icon>https://earthquake.usgs.gov/favicon.ico</icon>
<entry><id>urn:earthquake-usgs-gov:ci:39084495</id><title>M 0.6 - 13km WNW of Anza, CA</title><updated>2020-02-26T21:23:25.292Z</updated><link rel="alternate" type="text/html" href="https://earthquake.usgs.gov/earthquakes/eventpage/ci39084495"/><summary type="html"><![CDATA[<dl><dt>Time</dt><dd>2020-02-26 21:19:49 UTC</dd><dd>2020-02-26 13:19:49 -08:00 at epicenter</dd><dt>Location</dt><dd>33.602°N 116.804°W</dd><dt>Depth</dt><dd>4.62 km (2.87 mi)</dd></dl>]]></summary><georss:point>33.6016667 -116.8035</georss:point><georss:elev>-4620</georss:elev><category label="Age" term="Past Hour"/><category label="Magnitude" term="Magnitude 0"/><category label="Contributor" term="ci"/><category label="Author" term="ci"/></entry></feed>
I'm Trying to extract information from this xml, I did manage to do it but im not sure how its working, more precisely i don't understand why the value seems to be so deep in the nodes. the code im using is as follows
builder = fac.newDocumentBuilder();
Document doc = builder.parse(source);
NodeList nodeList = doc.getDocumentElement().getChildNodes();
for(int i=0;i < nodeList.getLength();i++){
Node node = nodeList.item(i);
if (node.getNodeName().equals("entry")){
Element element = (Element) node;
String nl1 = element.getElementsByTagName("georss:point").item(0).getChildNodes().item(0).getNodeValue();
}
}
I had expected that after getting the element by the tag i should be able to get the value right away but instead i have to go two levels deeper, can anyone explain why?
EDIT: typo
You are asking, why it seems to be so deeply nested. I will answer that first and then suggest three other approaches you also could consider to work with that xml in java.
Why is it so nested
Element.getElementsByTagName() returns a NodeList which contains 0-n elements. You know that you only have one but the xml-engine not. So you need to say: Give me the first element of that list (.item(0)).
Now you have the element point but you want to have its content. You are accessing the content as a child, but again: there could be multiple children (like other tags), so again you need to tell which element.
Alternative approaches
Access the content as text content: element.getElementsByTagName("georss:point").item(0).getTextContent();
Using Xpath:
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile("//*[local-name()='point']");
String evaluate = expr.evaluate(doc);
System.out.println(evaluate);
Kindly note that the xpath //*[local-name()='point'] is not the best way. You should work with the namespaces which would require some more code.
Use xml to bean mapping, for example with JAXB
You should really consider this approach as it basically allows you to have Pojo objects with all content of the xml (like the age, magnitude, ...) and simply have the xml transformed in a object like this:
public class Entry{
private String id;
private String title;
...
private String point;
private String age;
private String magnitude;
...
I hope this helps.

Find a Element by attribute in a XML Document in Java and get value of another attribute in the same element

I was tasked to convert some of my Python code to Java.
In the original there is a lot of operations like this:
name = element.find('*/DIAttribute[#name="ui_display_name"]').attrib['value']
Where element is a lxml.etree.Element object.
In Java I'm doing this to get the same value:
XPath xPath = XPathFactory.newInstance().newXPath();
NodeList nodesName = (NodeList) xPath.evalute("DIAttribute[#name='ui_display_name']", element, XPathConstants.NODE);
if nodesName.getLength() > 0 {
Node node = nodesName.item(0);
name = node.getAttributes().getNamedItem("value");
}
I'm doing it right? There is a better way of doing this? I'm using the org.w3c.dom objects, and the powers that be forbid me of using other XML libraries.
Thanks!
Passing XPathConstants.NODE does not cause evaluate to return a NodeList, it causes the method to return a single Node. The class documentation describes exactly what type of object will be returned for each XPathConstant field.
Node node = (Node) xPath.evaluate("DIAttribute[#name='ui_display_name']", element, XPathConstants.NODE);
Attributes are document nodes too, so you can simplify the code into a single XPath expression:
Node node = (Node) xPath.evaluate("DIAttribute[#name='ui_display_name']/#value", element, XPathConstants.NODE);
String name = node.getNodeValue();
Since you just want the string value of the node, you can use the two-argument evaluate method instead, omitting the XPathConstants value:
String name = xPath.evaluate("DIAttribute[#name='ui_display_name']/#value", element);
That will only find DIAttribute elements which are direct children of element. If you want to search all DIAttribute descendants at all levels below element, use .// in your XPath expression:
String name = xPath.evaluate(".//DIAttribute[#name='ui_display_name']/#value", element);

How to distinguish between attribute and element nodes returned from a Saxon XPathSelector

Given the XML:
<root name="value">
<level1>
<level2>Text</level2>
</level1>
</root>
I want the XPath /root/#name to return value, and the XPath /root/level1 to return the XML serialisation of the <level1> node:
<level1>
<level2>Text</level2>
</level1>
I'm using the a9api interface from Saxon 9.6 in Java.
I've found that I can call XdmValue.toString() to get the XML serialisation of the result of the evaluation of the XPath, which gets me the desired result for selecting an element, but returns name="value" when selecting an attribute. And I can call XdmItem.getStringValue() to get the string value, which gets me the right value for the attribute, but returns the textual content of the element.
Michael Kay has previously said "Saxon's s9api interface ... returns XdmValue objects whose type you can interrogate". I can see that I could perform an instanceof check to determine whether it is an XdmAtomicValue, XdmExternalObject, XdmFunctionItem, or XdmNode, but elements and attributes are both instances of XdmNode. How do I distinguish between the two?
(I can't modify the XPaths, as they're provided by the user.)
I discovered the answer just as I finished writing the question, so I'll share it for others.
After casting the XdmItem to an XdmNode, you can call XdmNode.getNodeKind(), which returns a value from the XdmNodeKind enumeration specifying which type of node it is:
XdmValue matchList = xPathSelector.evaluate();
XdmItem firstItem = matchList.itemAt(0);
if (firstItem instanceof XdmNode) {
XdmNode xdmNode = (XdmNode) firstItem;
XdmNodeKind nodeKind = xdmNode.getNodeKind();
if (nodeKind == XdmNodeKind.ELEMENT) {
return xdmNode.toString();
}
}
return firstItem.getStringValue();

Extract Java element based on its corresponding XML element

I have a XML file resulted from an input java file. I also have xPath expressions for the XML file.
I need a function that receives one xPath expression and return its java element (in the abstract syntax tree). I tried the below code:
First extract XML element based on the input xPath expression.
XPath xPath = XPathFactory.newInstance().newXPath();
String query = "//unit[1]/unit[1]/class[1]/block[1]/function[6]"; //a method
Node node = (Node) xPath.compile(query).evaluate(XmlDocument, XPathConstants.NODE);
However, I do not know how to link extracted XML node to Java element in the source code.
PS:
The reslut should be a node in the abstract syntax tree. I have AST created by spoon. Therefore, in the above example, I want to extract related CtMethodImpl.
node.getTextContent() is not the answer as it is possible that there is more than one instance with the similar text content.
To the best of my knowledge there is no 'direct' way of doing this.
This: "//unit[1]/unit[1]/class[1]/block[1]/function[6]" is what we call a signature in the sense that it uniquely identifies an element (somehow).
What I would do is to create a spoon processor and go through the entire AST checking each element to see if it matches the signature.
public class ProcessorExample <E extends CtElement> extends AbstractProcessor<E> {
HashMap<String, Node> nodes;
//Sets your XML Nodes here, sorted by signature
public void setNodes(HashMap<String, Node> nodes) {
this.nodes = nodes;
}
#Override
public void process(E element) {
if (nodes.containsKey(signature(element))) {
Node n = nodes.get(signature(element));
//YOU FOUND IT!
}
}
private String signature(E element) {
//YOU MUST PROVIDE THIS IMPLEMENTATION
//TO MATCH YOUR "//unit[1]/unit[1]/class[1]/block[1]/function[6]"
//KIND OF SIGNATURE
return null;
}
}

Sending XPath a variable from Java

I have an XPath expression that searches for a static value. In this example, "test" is that value:
XPathExpression expr = xpath.compile("//doc[contains(., 'test')]/*/text()");
How can I pass a variable instead of a fixed string? I use Java with Eclipse. Is there a way to use the value of a Java String to declare an XPath variable?
You can define a variable resolver and have the evaluation of the expression resolve variables such as $myvar, for example:
XPathExpression expr = xpath.compile("//doc[contains(., $myVar)]/*/text()");
There's a fairly good explanation here. I haven't actually done this before myself, so I might have a go and provide a more complete example.
Update:
Given this a go, works a treat. For an example of a very simple implementation, you could define a class that returns the value for a given variable from a map, like this:
class MapVariableResolver implements XPathVariableResolver {
// local store of variable name -> variable value mappings
Map<String, String> variableMappings = new HashMap<String, String>();
// a way of setting new variable mappings
public void setVariable(String key, String value) {
variableMappings.put(key, value);
}
// override this method in XPathVariableResolver to
// be used during evaluation of the XPath expression
#Override
public Object resolveVariable(QName varName) {
// if using namespaces, there's more to do here
String key = varName.getLocalPart();
return variableMappings.get(key);
}
}
Now, declare and initialise an instance of this resolver in the program, for example
MapVariableResolver vr = new MapVariableResolver() ;
vr.setVariable("myVar", "text");
...
XPath xpath = factory.newXPath();
xpath.setXPathVariableResolver(vr);
Then, during evaluation of the XPath expression XPathExpression expr = xpath.compile("//doc[contains(., $myVar)]/*/text()");, the variable $myVar will be replaced with the string text.
Nice question, I learnt something useful myself!
You don't need to evaluate Java (or whatever else PL variables in XPath). In C# (don't know Java well) I'll use:
string XPathExpression =
"//doc[contains(., " + myVar.ToString() + ")]/*/text()";
XmlNodelist result = xmlDoc.SelectNodes(XPathExpression);
Apart from this answer here, that explains well how to do it with the standard Java API, you could also use a third-party library like jOOX that can handle variables in a simple way:
List<String> list = $(doc).xpath("//doc[contains(., $1)]/*", "test").texts();
I use something similar to #brabster:
// expression: "/message/PINConfiguration/pinValue[../keyReference=$keyReference]";
Optional<Node> getNode(String xpathExpression, Map<String, String> variablesMap)
throws XPathExpressionException {
XPath xpath = XPathFactory.newInstance().newXPath();
xpath.setXPathVariableResolver(qname -> variablesMap.get(qname.getLocalPart()));
return Optional.ofNullable((Node) xpath.evaluate(xpathExpression, document,
XPathConstants.NODE));
}
Optional<Node> getNode(String xpathExpression) throws XPathExpressionException {
return getNode(xpathExpression, Collections.emptyMap());
}

Categories

Resources