Retrive xml info using java

Retrive xml info using java - java

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:georss="http://www.georss.org/georss"><title>USGS All Earthquakes, Past Week</title><updated>2020-02-26T21:28:38Z</updated><author><name>U.S. Geological Survey</name><uri>https://earthquake.usgs.gov/</uri></author><id>https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_week.atom</id><link rel="self" href="https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_week.atom"/><icon>https://earthquake.usgs.gov/favicon.ico</icon>
<entry><id>urn:earthquake-usgs-gov:ci:39084495</id><title>M 0.6 - 13km WNW of Anza, CA</title><updated>2020-02-26T21:23:25.292Z</updated><link rel="alternate" type="text/html" href="https://earthquake.usgs.gov/earthquakes/eventpage/ci39084495"/><summary type="html"><![CDATA[<dl><dt>Time</dt><dd>2020-02-26 21:19:49 UTC</dd><dd>2020-02-26 13:19:49 -08:00 at epicenter</dd><dt>Location</dt><dd>33.602°N 116.804°W</dd><dt>Depth</dt><dd>4.62 km (2.87 mi)</dd></dl>]]></summary><georss:point>33.6016667 -116.8035</georss:point><georss:elev>-4620</georss:elev><category label="Age" term="Past Hour"/><category label="Magnitude" term="Magnitude 0"/><category label="Contributor" term="ci"/><category label="Author" term="ci"/></entry></feed>
I'm Trying to extract information from this xml, I did manage to do it but im not sure how its working, more precisely i don't understand why the value seems to be so deep in the nodes. the code im using is as follows
builder = fac.newDocumentBuilder();
Document doc = builder.parse(source);
NodeList nodeList = doc.getDocumentElement().getChildNodes();
for(int i=0;i < nodeList.getLength();i++){
Node node = nodeList.item(i);
if (node.getNodeName().equals("entry")){
Element element = (Element) node;
String nl1 = element.getElementsByTagName("georss:point").item(0).getChildNodes().item(0).getNodeValue();
}
}
I had expected that after getting the element by the tag i should be able to get the value right away but instead i have to go two levels deeper, can anyone explain why?
EDIT: typo

You are asking, why it seems to be so deeply nested. I will answer that first and then suggest three other approaches you also could consider to work with that xml in java.
Why is it so nested
Element.getElementsByTagName() returns a NodeList which contains 0-n elements. You know that you only have one but the xml-engine not. So you need to say: Give me the first element of that list (.item(0)).
Now you have the element point but you want to have its content. You are accessing the content as a child, but again: there could be multiple children (like other tags), so again you need to tell which element.
Alternative approaches
Access the content as text content: element.getElementsByTagName("georss:point").item(0).getTextContent();
Using Xpath:
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile("//*[local-name()='point']");
String evaluate = expr.evaluate(doc);
System.out.println(evaluate);
Kindly note that the xpath //*[local-name()='point'] is not the best way. You should work with the namespaces which would require some more code.
Use xml to bean mapping, for example with JAXB
You should really consider this approach as it basically allows you to have Pojo objects with all content of the xml (like the age, magnitude, ...) and simply have the xml transformed in a object like this:
public class Entry{
private String id;
private String title;
...
private String point;
private String age;
private String magnitude;
...
I hope this helps.

Related

Find a Element by attribute in a XML Document in Java and get value of another attribute in the same element

I was tasked to convert some of my Python code to Java.
In the original there is a lot of operations like this:
name = element.find('*/DIAttribute[#name="ui_display_name"]').attrib['value']
Where element is a lxml.etree.Element object.
In Java I'm doing this to get the same value:
XPath xPath = XPathFactory.newInstance().newXPath();
NodeList nodesName = (NodeList) xPath.evalute("DIAttribute[#name='ui_display_name']", element, XPathConstants.NODE);
if nodesName.getLength() > 0 {
Node node = nodesName.item(0);
name = node.getAttributes().getNamedItem("value");
}
I'm doing it right? There is a better way of doing this? I'm using the org.w3c.dom objects, and the powers that be forbid me of using other XML libraries.
Thanks!

Passing XPathConstants.NODE does not cause evaluate to return a NodeList, it causes the method to return a single Node. The class documentation describes exactly what type of object will be returned for each XPathConstant field.
Node node = (Node) xPath.evaluate("DIAttribute[#name='ui_display_name']", element, XPathConstants.NODE);
Attributes are document nodes too, so you can simplify the code into a single XPath expression:
Node node = (Node) xPath.evaluate("DIAttribute[#name='ui_display_name']/#value", element, XPathConstants.NODE);
String name = node.getNodeValue();
Since you just want the string value of the node, you can use the two-argument evaluate method instead, omitting the XPathConstants value:
String name = xPath.evaluate("DIAttribute[#name='ui_display_name']/#value", element);
That will only find DIAttribute elements which are direct children of element. If you want to search all DIAttribute descendants at all levels below element, use .// in your XPath expression:
String name = xPath.evaluate(".//DIAttribute[#name='ui_display_name']/#value", element);

How to distinguish between attribute and element nodes returned from a Saxon XPathSelector

Given the XML:
<root name="value">
<level1>
<level2>Text</level2>
</level1>
</root>
I want the XPath /root/#name to return value, and the XPath /root/level1 to return the XML serialisation of the <level1> node:
<level1>
<level2>Text</level2>
</level1>
I'm using the a9api interface from Saxon 9.6 in Java.
I've found that I can call XdmValue.toString() to get the XML serialisation of the result of the evaluation of the XPath, which gets me the desired result for selecting an element, but returns name="value" when selecting an attribute. And I can call XdmItem.getStringValue() to get the string value, which gets me the right value for the attribute, but returns the textual content of the element.
Michael Kay has previously said "Saxon's s9api interface ... returns XdmValue objects whose type you can interrogate". I can see that I could perform an instanceof check to determine whether it is an XdmAtomicValue, XdmExternalObject, XdmFunctionItem, or XdmNode, but elements and attributes are both instances of XdmNode. How do I distinguish between the two?
(I can't modify the XPaths, as they're provided by the user.)

I discovered the answer just as I finished writing the question, so I'll share it for others.
After casting the XdmItem to an XdmNode, you can call XdmNode.getNodeKind(), which returns a value from the XdmNodeKind enumeration specifying which type of node it is:
XdmValue matchList = xPathSelector.evaluate();
XdmItem firstItem = matchList.itemAt(0);
if (firstItem instanceof XdmNode) {
XdmNode xdmNode = (XdmNode) firstItem;
XdmNodeKind nodeKind = xdmNode.getNodeKind();
if (nodeKind == XdmNodeKind.ELEMENT) {
return xdmNode.toString();
}
}
return firstItem.getStringValue();

Extract Java element based on its corresponding XML element

I have a XML file resulted from an input java file. I also have xPath expressions for the XML file.
I need a function that receives one xPath expression and return its java element (in the abstract syntax tree). I tried the below code:
First extract XML element based on the input xPath expression.
XPath xPath = XPathFactory.newInstance().newXPath();
String query = "//unit[1]/unit[1]/class[1]/block[1]/function[6]"; //a method
Node node = (Node) xPath.compile(query).evaluate(XmlDocument, XPathConstants.NODE);
However, I do not know how to link extracted XML node to Java element in the source code.
PS:
The reslut should be a node in the abstract syntax tree. I have AST created by spoon. Therefore, in the above example, I want to extract related CtMethodImpl.
node.getTextContent() is not the answer as it is possible that there is more than one instance with the similar text content.

To the best of my knowledge there is no 'direct' way of doing this.
This: "//unit[1]/unit[1]/class[1]/block[1]/function[6]" is what we call a signature in the sense that it uniquely identifies an element (somehow).
What I would do is to create a spoon processor and go through the entire AST checking each element to see if it matches the signature.
public class ProcessorExample <E extends CtElement> extends AbstractProcessor<E> {
HashMap<String, Node> nodes;
//Sets your XML Nodes here, sorted by signature
public void setNodes(HashMap<String, Node> nodes) {
this.nodes = nodes;
}
#Override
public void process(E element) {
if (nodes.containsKey(signature(element))) {
Node n = nodes.get(signature(element));
//YOU FOUND IT!
}
}
private String signature(E element) {
//YOU MUST PROVIDE THIS IMPLEMENTATION
//TO MATCH YOUR "//unit[1]/unit[1]/class[1]/block[1]/function[6]"
//KIND OF SIGNATURE
return null;
}
}

How to query XML using namespaces in Java with XPath?

When my XML looks like this (no xmlns) then I can easly query it with XPath like /workbook/sheets/sheet[1]
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<workbook>
<sheets>
<sheet name="Sheet1" sheetId="1" r:id="rId1"/>
</sheets>
</workbook>
But when it looks like this then I can't
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<workbook xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
<sheets>
<sheet name="Sheet1" sheetId="1" r:id="rId1"/>
</sheets>
</workbook>
Any ideas?

In the second example XML file the elements are bound to a namespace. Your XPath is attempting to address elements that are bound to the default "no namespace" namespace, so they don't match.
The preferred method is to register the namespace with a namespace-prefix. It makes your XPath much easier to develop, read, and maintain.
However, it is not mandatory that you register the namespace and use the namespace-prefix in your XPath.
You can formulate an XPath expression that uses a generic match for an element and a predicate filter that restricts the match for the desired local-name() and the namespace-uri(). For example:
/*[local-name()='workbook'
and namespace-uri()='http://schemas.openxmlformats.org/spreadsheetml/2006/main']
/*[local-name()='sheets'
and namespace-uri()='http://schemas.openxmlformats.org/spreadsheetml/2006/main']
/*[local-name()='sheet'
and namespace-uri()='http://schemas.openxmlformats.org/spreadsheetml/2006/main'][1]
As you can see, it produces an extremely long and verbose XPath statement that is very difficult to read (and maintain).
You could also just match on the local-name() of the element and ignore the namespace. For example:
/*[local-name()='workbook']/*[local-name()='sheets']/*[local-name()='sheet'][1]
However, you run the risk of matching the wrong elements. If your XML has mixed vocabularies (which may not be an issue for this instance) that use the same local-name(), your XPath could match on the wrong elements and select the wrong content:

Your problem is the default namespace. Check out this article for how to deal with namespaces in your XPath: http://www.edankert.com/defaultnamespaces.html
One of the conclusions they draw is:
So, to be able to use XPath
expressions on XML content defined in
a (default) namespace, we need to
specify a namespace prefix mapping
Note that this doesn't mean that you have to change your source document in any way (though you're free to put the namespace prefixes in there if you so desire). Sounds strange, right? What you will do is create a namespace prefix mapping in your java code and use said prefix in your XPath expression. Here, we'll create a mapping from spreadsheet to your default namespace.
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
// there's no default implementation for NamespaceContext...seems kind of silly, no?
xpath.setNamespaceContext(new NamespaceContext() {
public String getNamespaceURI(String prefix) {
if (prefix == null) throw new NullPointerException("Null prefix");
else if ("spreadsheet".equals(prefix)) return "http://schemas.openxmlformats.org/spreadsheetml/2006/main";
else if ("xml".equals(prefix)) return XMLConstants.XML_NS_URI;
return XMLConstants.NULL_NS_URI;
}
// This method isn't necessary for XPath processing.
public String getPrefix(String uri) {
throw new UnsupportedOperationException();
}
// This method isn't necessary for XPath processing either.
public Iterator getPrefixes(String uri) {
throw new UnsupportedOperationException();
}
});
// note that all the elements in the expression are prefixed with our namespace mapping!
XPathExpression expr = xpath.compile("/spreadsheet:workbook/spreadsheet:sheets/spreadsheet:sheet[1]");
// assuming you've got your XML document in a variable named doc...
Node result = (Node) expr.evaluate(doc, XPathConstants.NODE);
And voila...Now you've got your element saved in the result variable.
Caveat: if you're parsing your XML as a DOM with the standard JAXP classes, be sure to call setNamespaceAware(true) on your DocumentBuilderFactory. Otherwise, this code won't work!

All namespaces that you intend to select from in the source XML must be associated with a prefix in the host language. In Java/JAXP this is done by specifying the URI for each namespace prefix using an instance of javax.xml.namespace.NamespaceContext. Unfortunately, there is no implementation of NamespaceContext provided in the SDK.
Fortunately, it's very easy to write your own:
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
import javax.xml.namespace.NamespaceContext;
public class SimpleNamespaceContext implements NamespaceContext {
private final Map<String, String> PREF_MAP = new HashMap<String, String>();
public SimpleNamespaceContext(final Map<String, String> prefMap) {
PREF_MAP.putAll(prefMap);
}
public String getNamespaceURI(String prefix) {
return PREF_MAP.get(prefix);
}
public String getPrefix(String uri) {
throw new UnsupportedOperationException();
}
public Iterator getPrefixes(String uri) {
throw new UnsupportedOperationException();
}
}
Use it like this:
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
HashMap<String, String> prefMap = new HashMap<String, String>() {{
put("main", "http://schemas.openxmlformats.org/spreadsheetml/2006/main");
put("r", "http://schemas.openxmlformats.org/officeDocument/2006/relationships");
}};
SimpleNamespaceContext namespaces = new SimpleNamespaceContext(prefMap);
xpath.setNamespaceContext(namespaces);
XPathExpression expr = xpath
.compile("/main:workbook/main:sheets/main:sheet[1]");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
Note that even though the first namespace does not specify a prefix in the source document (i.e. it is the default namespace) you must associate it with a prefix anyway. Your expression should then reference nodes in that namespace using the prefix you've chosen, like this:
/main:workbook/main:sheets/main:sheet[1]
The prefix names you choose to associate with each namespace are arbitrary; they do not need to match what appears in the source XML. This mapping is just a way to tell the XPath engine that a given prefix name in an expression correlates with a specific namespace in the source document.

If you are using Spring, it already contains org.springframework.util.xml.SimpleNamespaceContext.
import org.springframework.util.xml.SimpleNamespaceContext;
...
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
SimpleNamespaceContext nsc = new SimpleNamespaceContext();
nsc.bindNamespaceUri("a", "http://some.namespace.com/nsContext");
xpath.setNamespaceContext(nsc);
XPathExpression xpathExpr = xpath.compile("//a:first/a:second");
String result = (String) xpathExpr.evaluate(object, XPathConstants.STRING);

I've written a simple NamespaceContext implementation (here), that takes a Map<String, String> as input, where the key is a prefix, and the value is a namespace.
It follows the NamespaceContext spesification, and you can see how it works in the unit tests.
Map<String, String> mappings = new HashMap<>();
mappings.put("foo", "http://foo");
mappings.put("foo2", "http://foo");
mappings.put("bar", "http://bar");
context = new SimpleNamespaceContext(mappings);
context.getNamespaceURI("foo"); // "http://foo"
context.getPrefix("http://foo"); // "foo" or "foo2"
context.getPrefixes("http://foo"); // ["foo", "foo2"]
Note that it has a dependency on Google Guava

Make sure that you are referencing the namespace in your XSLT
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" >

Startlingly, if I don't set factory.setNamespaceAware(true); then the xpath you mentioned does work with and without namespaces at play. You just aren't able to select things "with namespace specified" only generic xpaths. Go figure. So this may be an option:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(false);

Two things to add to the existing answers:
I don't know whether this was the case when you asked the question: With Java 10, your XPath actually works for the second document if you don't use setNamespaceAware(true) on the document builder factory (falseis the default).
If you do want to use setNamespaceAware(true), other answers have already shown how to do this using a namespace context. However, you don't need to provide the mapping of prefixes to namespaces yourself, as these answers do: It's already there in the document element, and you can use that for your namespace context:
import java.util.Iterator;
import javax.xml.namespace.NamespaceContext;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
public class DocumentNamespaceContext implements NamespaceContext {
Element documentElement;
public DocumentNamespaceContext (Document document) {
documentElement = document.getDocumentElement();
}
public String getNamespaceURI(String prefix) {
return documentElement.getAttribute(prefix.isEmpty() ? "xmlns" : "xmlns:" + prefix);
}
public String getPrefix(String namespaceURI) {
throw new UnsupportedOperationException();
}
public Iterator<String> getPrefixes(String namespaceURI) {
throw new UnsupportedOperationException();
}
}
The rest of the code is as in the other answers. Then the XPath /:workbook/:sheets/:sheet[1] yields the sheet element. (You could also use a non-empty prefix for the default namespace, as the other answers do, by replacing prefix.isEmpty() by e.g. prefix.equals("spreadsheet") and using the XPath /spreadsheet:workbook/spreadsheet:sheets/spreadsheet:sheet[1].)
P.S.: I just found here that there's actually a method Node.lookupNamespaceURI(String prefix), so you could use that instead of the attribute lookup:
public String getNamespaceURI(String prefix) {
return documentElement.lookupNamespaceURI(prefix.isEmpty() ? null : prefix);
}
Also, note that namespaces can be declared on elements other than the document element, and those wouldn't be recognized (by either version).

Searching part of a String using XPath query

I am using the following XPath query to search the name of the author of a book and return the book name when it matches the author.
String rawXPath = String.format("//book[author= '%s']/bookname/text()", authorBook);
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expr
= xpath.compile(rawXPath);
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}
how to modify the search so that I can search in the content ...../content(node name) for a specific word.
Example: String inside the xml content variable: "This book contains the glory and history of our forefathers. And the impact of it in our daily life is immense."
Now I want to search for the word "daily". If it matches daily it will retun me the book name/author name watever the user wants.
Thanks

Use the contains() Xpath function.
//book[contains(content, '%s')]/bookname
depending a bit of the structure of you input XML

You want:
//book[contains(content, $searchString)
and
author = $wantedAuthor
]
/bookname/text()
This selects the text() nodes that are children of the bookname element that is a child of any book element in the document, the string value of whose content child contains the string (contained in) $searchString and the string value of whose author element is the same as the (string contained in) $wantedAuthor variable
In this Xpath expression the variables need to be substituted by specific strings. Also, it assumes that the element content is a child of book.
I don't know Java, but suppose that the final Java code wil look like the following:
String.format("//book[contains(content, '%s')
and
author= '%s']/bookname/text()",
searchString, authorBook);

if your xml looks like this:
<book>
<author> Author Name </author>
<description> This book contains the glory and history of our forefathers. And the impact of it in our daily life is immense.</description>
</book>
then you can try:
String rawXPath = "//book//*[contains(text(),\"daily\")]";
which means "give any book element that any child holds a text that contains the word 'daily'.
I hope that helps you out! cheers!

Now if u want to search also some term which is inside one node u can do it like this -
String rawXPath = String.format("//book/title[contains(innerItem, '%s')]/type/text()", value);
Add one backslash for each node and the node name and ur in.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Retrive xml info using java - java

Related

Find a Element by attribute in a XML Document in Java and get value of another attribute in the same element

How to distinguish between attribute and element nodes returned from a Saxon XPathSelector

Extract Java element based on its corresponding XML element

How to query XML using namespaces in Java with XPath?

Searching part of a String using XPath query

Categories

Resources