Extract Java element based on its corresponding XML element - java

I have a XML file resulted from an input java file. I also have xPath expressions for the XML file.
I need a function that receives one xPath expression and return its java element (in the abstract syntax tree). I tried the below code:
First extract XML element based on the input xPath expression.
XPath xPath = XPathFactory.newInstance().newXPath();
String query = "//unit[1]/unit[1]/class[1]/block[1]/function[6]"; //a method
Node node = (Node) xPath.compile(query).evaluate(XmlDocument, XPathConstants.NODE);
However, I do not know how to link extracted XML node to Java element in the source code.
PS:
The reslut should be a node in the abstract syntax tree. I have AST created by spoon. Therefore, in the above example, I want to extract related CtMethodImpl.
node.getTextContent() is not the answer as it is possible that there is more than one instance with the similar text content.

To the best of my knowledge there is no 'direct' way of doing this.
This: "//unit[1]/unit[1]/class[1]/block[1]/function[6]" is what we call a signature in the sense that it uniquely identifies an element (somehow).
What I would do is to create a spoon processor and go through the entire AST checking each element to see if it matches the signature.
public class ProcessorExample <E extends CtElement> extends AbstractProcessor<E> {
HashMap<String, Node> nodes;
//Sets your XML Nodes here, sorted by signature
public void setNodes(HashMap<String, Node> nodes) {
this.nodes = nodes;
}
#Override
public void process(E element) {
if (nodes.containsKey(signature(element))) {
Node n = nodes.get(signature(element));
//YOU FOUND IT!
}
}
private String signature(E element) {
//YOU MUST PROVIDE THIS IMPLEMENTATION
//TO MATCH YOUR "//unit[1]/unit[1]/class[1]/block[1]/function[6]"
//KIND OF SIGNATURE
return null;
}
}

Related

Retrive xml info using java

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:georss="http://www.georss.org/georss"><title>USGS All Earthquakes, Past Week</title><updated>2020-02-26T21:28:38Z</updated><author><name>U.S. Geological Survey</name><uri>https://earthquake.usgs.gov/</uri></author><id>https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_week.atom</id><link rel="self" href="https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_week.atom"/><icon>https://earthquake.usgs.gov/favicon.ico</icon>
<entry><id>urn:earthquake-usgs-gov:ci:39084495</id><title>M 0.6 - 13km WNW of Anza, CA</title><updated>2020-02-26T21:23:25.292Z</updated><link rel="alternate" type="text/html" href="https://earthquake.usgs.gov/earthquakes/eventpage/ci39084495"/><summary type="html"><![CDATA[<dl><dt>Time</dt><dd>2020-02-26 21:19:49 UTC</dd><dd>2020-02-26 13:19:49 -08:00 at epicenter</dd><dt>Location</dt><dd>33.602°N 116.804°W</dd><dt>Depth</dt><dd>4.62 km (2.87 mi)</dd></dl>]]></summary><georss:point>33.6016667 -116.8035</georss:point><georss:elev>-4620</georss:elev><category label="Age" term="Past Hour"/><category label="Magnitude" term="Magnitude 0"/><category label="Contributor" term="ci"/><category label="Author" term="ci"/></entry></feed>
I'm Trying to extract information from this xml, I did manage to do it but im not sure how its working, more precisely i don't understand why the value seems to be so deep in the nodes. the code im using is as follows
builder = fac.newDocumentBuilder();
Document doc = builder.parse(source);
NodeList nodeList = doc.getDocumentElement().getChildNodes();
for(int i=0;i < nodeList.getLength();i++){
Node node = nodeList.item(i);
if (node.getNodeName().equals("entry")){
Element element = (Element) node;
String nl1 = element.getElementsByTagName("georss:point").item(0).getChildNodes().item(0).getNodeValue();
}
}
I had expected that after getting the element by the tag i should be able to get the value right away but instead i have to go two levels deeper, can anyone explain why?
EDIT: typo
You are asking, why it seems to be so deeply nested. I will answer that first and then suggest three other approaches you also could consider to work with that xml in java.
Why is it so nested
Element.getElementsByTagName() returns a NodeList which contains 0-n elements. You know that you only have one but the xml-engine not. So you need to say: Give me the first element of that list (.item(0)).
Now you have the element point but you want to have its content. You are accessing the content as a child, but again: there could be multiple children (like other tags), so again you need to tell which element.
Alternative approaches
Access the content as text content: element.getElementsByTagName("georss:point").item(0).getTextContent();
Using Xpath:
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile("//*[local-name()='point']");
String evaluate = expr.evaluate(doc);
System.out.println(evaluate);
Kindly note that the xpath //*[local-name()='point'] is not the best way. You should work with the namespaces which would require some more code.
Use xml to bean mapping, for example with JAXB
You should really consider this approach as it basically allows you to have Pojo objects with all content of the xml (like the age, magnitude, ...) and simply have the xml transformed in a object like this:
public class Entry{
private String id;
private String title;
...
private String point;
private String age;
private String magnitude;
...
I hope this helps.

How to find children of children's context in ANTLR?

As the title say, is there a way to find the children of children node when listen or visit a node in ANTLR.
For example: (use grammars-v4-java lexer and parse rule)
First, I take a java file to grammar tree.
grun Java compilationUnit -gui Example.java
// Example.java
public class Example {
String name = "test";
void call(){
String name1 = "test";
}
}
and the grammar tree is
Then I try to use java to extends the baseListerner to listen enterClassDeclaration node. So I can get the ClassDeclarationContext node. I want to find the ClassDeclarationContext node's children of children that the child type is LocalDeclarationContext.
In this example:
public class MyListener extends JavaParserBaseListener {
#Override
public void enterClassDeclaration(JavaParser.ClassDeclarationContext ctx) {
// find the children of children by ctx
List<ParserRuleContext> contexts = findChildContextBy(ctx, LocalVariableDeclarationContext.class);
super.enterClassDeclaration(ctx);
}
}
The variable contexts should has two elements. name and name1
I do not want to find the children one layer by one layer. emmm, Is there have a convenient way?
For a given parse tree it's easy to look up specific child nodes (at any nesting level) using ANTLR4's XPath implementation.
You can trigger that search from either the full parse tree return by the called parser rule or within a listener/visitor method for the particular subtree, for example:
List<ParseTreeMatch> matches = XPath.findAll(ctx, "//localVariableDeclaration", parser);
The return matches are instances of LocalVariableDeclarationContext (if any matched).
Note: the linked page describe two search utilities, parse tree matching and XPath, which can be used individually or together.

Find a Element by attribute in a XML Document in Java and get value of another attribute in the same element

I was tasked to convert some of my Python code to Java.
In the original there is a lot of operations like this:
name = element.find('*/DIAttribute[#name="ui_display_name"]').attrib['value']
Where element is a lxml.etree.Element object.
In Java I'm doing this to get the same value:
XPath xPath = XPathFactory.newInstance().newXPath();
NodeList nodesName = (NodeList) xPath.evalute("DIAttribute[#name='ui_display_name']", element, XPathConstants.NODE);
if nodesName.getLength() > 0 {
Node node = nodesName.item(0);
name = node.getAttributes().getNamedItem("value");
}
I'm doing it right? There is a better way of doing this? I'm using the org.w3c.dom objects, and the powers that be forbid me of using other XML libraries.
Thanks!
Passing XPathConstants.NODE does not cause evaluate to return a NodeList, it causes the method to return a single Node. The class documentation describes exactly what type of object will be returned for each XPathConstant field.
Node node = (Node) xPath.evaluate("DIAttribute[#name='ui_display_name']", element, XPathConstants.NODE);
Attributes are document nodes too, so you can simplify the code into a single XPath expression:
Node node = (Node) xPath.evaluate("DIAttribute[#name='ui_display_name']/#value", element, XPathConstants.NODE);
String name = node.getNodeValue();
Since you just want the string value of the node, you can use the two-argument evaluate method instead, omitting the XPathConstants value:
String name = xPath.evaluate("DIAttribute[#name='ui_display_name']/#value", element);
That will only find DIAttribute elements which are direct children of element. If you want to search all DIAttribute descendants at all levels below element, use .// in your XPath expression:
String name = xPath.evaluate(".//DIAttribute[#name='ui_display_name']/#value", element);

How to query XML using namespaces in Java with XPath?

When my XML looks like this (no xmlns) then I can easly query it with XPath like /workbook/sheets/sheet[1]
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<workbook>
<sheets>
<sheet name="Sheet1" sheetId="1" r:id="rId1"/>
</sheets>
</workbook>
But when it looks like this then I can't
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<workbook xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
<sheets>
<sheet name="Sheet1" sheetId="1" r:id="rId1"/>
</sheets>
</workbook>
Any ideas?
In the second example XML file the elements are bound to a namespace. Your XPath is attempting to address elements that are bound to the default "no namespace" namespace, so they don't match.
The preferred method is to register the namespace with a namespace-prefix. It makes your XPath much easier to develop, read, and maintain.
However, it is not mandatory that you register the namespace and use the namespace-prefix in your XPath.
You can formulate an XPath expression that uses a generic match for an element and a predicate filter that restricts the match for the desired local-name() and the namespace-uri(). For example:
/*[local-name()='workbook'
and namespace-uri()='http://schemas.openxmlformats.org/spreadsheetml/2006/main']
/*[local-name()='sheets'
and namespace-uri()='http://schemas.openxmlformats.org/spreadsheetml/2006/main']
/*[local-name()='sheet'
and namespace-uri()='http://schemas.openxmlformats.org/spreadsheetml/2006/main'][1]
As you can see, it produces an extremely long and verbose XPath statement that is very difficult to read (and maintain).
You could also just match on the local-name() of the element and ignore the namespace. For example:
/*[local-name()='workbook']/*[local-name()='sheets']/*[local-name()='sheet'][1]
However, you run the risk of matching the wrong elements. If your XML has mixed vocabularies (which may not be an issue for this instance) that use the same local-name(), your XPath could match on the wrong elements and select the wrong content:
Your problem is the default namespace. Check out this article for how to deal with namespaces in your XPath: http://www.edankert.com/defaultnamespaces.html
One of the conclusions they draw is:
So, to be able to use XPath
expressions on XML content defined in
a (default) namespace, we need to
specify a namespace prefix mapping
Note that this doesn't mean that you have to change your source document in any way (though you're free to put the namespace prefixes in there if you so desire). Sounds strange, right? What you will do is create a namespace prefix mapping in your java code and use said prefix in your XPath expression. Here, we'll create a mapping from spreadsheet to your default namespace.
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
// there's no default implementation for NamespaceContext...seems kind of silly, no?
xpath.setNamespaceContext(new NamespaceContext() {
public String getNamespaceURI(String prefix) {
if (prefix == null) throw new NullPointerException("Null prefix");
else if ("spreadsheet".equals(prefix)) return "http://schemas.openxmlformats.org/spreadsheetml/2006/main";
else if ("xml".equals(prefix)) return XMLConstants.XML_NS_URI;
return XMLConstants.NULL_NS_URI;
}
// This method isn't necessary for XPath processing.
public String getPrefix(String uri) {
throw new UnsupportedOperationException();
}
// This method isn't necessary for XPath processing either.
public Iterator getPrefixes(String uri) {
throw new UnsupportedOperationException();
}
});
// note that all the elements in the expression are prefixed with our namespace mapping!
XPathExpression expr = xpath.compile("/spreadsheet:workbook/spreadsheet:sheets/spreadsheet:sheet[1]");
// assuming you've got your XML document in a variable named doc...
Node result = (Node) expr.evaluate(doc, XPathConstants.NODE);
And voila...Now you've got your element saved in the result variable.
Caveat: if you're parsing your XML as a DOM with the standard JAXP classes, be sure to call setNamespaceAware(true) on your DocumentBuilderFactory. Otherwise, this code won't work!
All namespaces that you intend to select from in the source XML must be associated with a prefix in the host language. In Java/JAXP this is done by specifying the URI for each namespace prefix using an instance of javax.xml.namespace.NamespaceContext. Unfortunately, there is no implementation of NamespaceContext provided in the SDK.
Fortunately, it's very easy to write your own:
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
import javax.xml.namespace.NamespaceContext;
public class SimpleNamespaceContext implements NamespaceContext {
private final Map<String, String> PREF_MAP = new HashMap<String, String>();
public SimpleNamespaceContext(final Map<String, String> prefMap) {
PREF_MAP.putAll(prefMap);
}
public String getNamespaceURI(String prefix) {
return PREF_MAP.get(prefix);
}
public String getPrefix(String uri) {
throw new UnsupportedOperationException();
}
public Iterator getPrefixes(String uri) {
throw new UnsupportedOperationException();
}
}
Use it like this:
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
HashMap<String, String> prefMap = new HashMap<String, String>() {{
put("main", "http://schemas.openxmlformats.org/spreadsheetml/2006/main");
put("r", "http://schemas.openxmlformats.org/officeDocument/2006/relationships");
}};
SimpleNamespaceContext namespaces = new SimpleNamespaceContext(prefMap);
xpath.setNamespaceContext(namespaces);
XPathExpression expr = xpath
.compile("/main:workbook/main:sheets/main:sheet[1]");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
Note that even though the first namespace does not specify a prefix in the source document (i.e. it is the default namespace) you must associate it with a prefix anyway. Your expression should then reference nodes in that namespace using the prefix you've chosen, like this:
/main:workbook/main:sheets/main:sheet[1]
The prefix names you choose to associate with each namespace are arbitrary; they do not need to match what appears in the source XML. This mapping is just a way to tell the XPath engine that a given prefix name in an expression correlates with a specific namespace in the source document.
If you are using Spring, it already contains org.springframework.util.xml.SimpleNamespaceContext.
import org.springframework.util.xml.SimpleNamespaceContext;
...
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
SimpleNamespaceContext nsc = new SimpleNamespaceContext();
nsc.bindNamespaceUri("a", "http://some.namespace.com/nsContext");
xpath.setNamespaceContext(nsc);
XPathExpression xpathExpr = xpath.compile("//a:first/a:second");
String result = (String) xpathExpr.evaluate(object, XPathConstants.STRING);
I've written a simple NamespaceContext implementation (here), that takes a Map<String, String> as input, where the key is a prefix, and the value is a namespace.
It follows the NamespaceContext spesification, and you can see how it works in the unit tests.
Map<String, String> mappings = new HashMap<>();
mappings.put("foo", "http://foo");
mappings.put("foo2", "http://foo");
mappings.put("bar", "http://bar");
context = new SimpleNamespaceContext(mappings);
context.getNamespaceURI("foo"); // "http://foo"
context.getPrefix("http://foo"); // "foo" or "foo2"
context.getPrefixes("http://foo"); // ["foo", "foo2"]
Note that it has a dependency on Google Guava
Make sure that you are referencing the namespace in your XSLT
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" >
Startlingly, if I don't set factory.setNamespaceAware(true); then the xpath you mentioned does work with and without namespaces at play. You just aren't able to select things "with namespace specified" only generic xpaths. Go figure. So this may be an option:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(false);
Two things to add to the existing answers:
I don't know whether this was the case when you asked the question: With Java 10, your XPath actually works for the second document if you don't use setNamespaceAware(true) on the document builder factory (falseis the default).
If you do want to use setNamespaceAware(true), other answers have already shown how to do this using a namespace context. However, you don't need to provide the mapping of prefixes to namespaces yourself, as these answers do: It's already there in the document element, and you can use that for your namespace context:
import java.util.Iterator;
import javax.xml.namespace.NamespaceContext;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
public class DocumentNamespaceContext implements NamespaceContext {
Element documentElement;
public DocumentNamespaceContext (Document document) {
documentElement = document.getDocumentElement();
}
public String getNamespaceURI(String prefix) {
return documentElement.getAttribute(prefix.isEmpty() ? "xmlns" : "xmlns:" + prefix);
}
public String getPrefix(String namespaceURI) {
throw new UnsupportedOperationException();
}
public Iterator<String> getPrefixes(String namespaceURI) {
throw new UnsupportedOperationException();
}
}
The rest of the code is as in the other answers. Then the XPath /:workbook/:sheets/:sheet[1] yields the sheet element. (You could also use a non-empty prefix for the default namespace, as the other answers do, by replacing prefix.isEmpty() by e.g. prefix.equals("spreadsheet") and using the XPath /spreadsheet:workbook/spreadsheet:sheets/spreadsheet:sheet[1].)
P.S.: I just found here that there's actually a method Node.lookupNamespaceURI(String prefix), so you could use that instead of the attribute lookup:
public String getNamespaceURI(String prefix) {
return documentElement.lookupNamespaceURI(prefix.isEmpty() ? null : prefix);
}
Also, note that namespaces can be declared on elements other than the document element, and those wouldn't be recognized (by either version).

Sending XPath a variable from Java

I have an XPath expression that searches for a static value. In this example, "test" is that value:
XPathExpression expr = xpath.compile("//doc[contains(., 'test')]/*/text()");
How can I pass a variable instead of a fixed string? I use Java with Eclipse. Is there a way to use the value of a Java String to declare an XPath variable?
You can define a variable resolver and have the evaluation of the expression resolve variables such as $myvar, for example:
XPathExpression expr = xpath.compile("//doc[contains(., $myVar)]/*/text()");
There's a fairly good explanation here. I haven't actually done this before myself, so I might have a go and provide a more complete example.
Update:
Given this a go, works a treat. For an example of a very simple implementation, you could define a class that returns the value for a given variable from a map, like this:
class MapVariableResolver implements XPathVariableResolver {
// local store of variable name -> variable value mappings
Map<String, String> variableMappings = new HashMap<String, String>();
// a way of setting new variable mappings
public void setVariable(String key, String value) {
variableMappings.put(key, value);
}
// override this method in XPathVariableResolver to
// be used during evaluation of the XPath expression
#Override
public Object resolveVariable(QName varName) {
// if using namespaces, there's more to do here
String key = varName.getLocalPart();
return variableMappings.get(key);
}
}
Now, declare and initialise an instance of this resolver in the program, for example
MapVariableResolver vr = new MapVariableResolver() ;
vr.setVariable("myVar", "text");
...
XPath xpath = factory.newXPath();
xpath.setXPathVariableResolver(vr);
Then, during evaluation of the XPath expression XPathExpression expr = xpath.compile("//doc[contains(., $myVar)]/*/text()");, the variable $myVar will be replaced with the string text.
Nice question, I learnt something useful myself!
You don't need to evaluate Java (or whatever else PL variables in XPath). In C# (don't know Java well) I'll use:
string XPathExpression =
"//doc[contains(., " + myVar.ToString() + ")]/*/text()";
XmlNodelist result = xmlDoc.SelectNodes(XPathExpression);
Apart from this answer here, that explains well how to do it with the standard Java API, you could also use a third-party library like jOOX that can handle variables in a simple way:
List<String> list = $(doc).xpath("//doc[contains(., $1)]/*", "test").texts();
I use something similar to #brabster:
// expression: "/message/PINConfiguration/pinValue[../keyReference=$keyReference]";
Optional<Node> getNode(String xpathExpression, Map<String, String> variablesMap)
throws XPathExpressionException {
XPath xpath = XPathFactory.newInstance().newXPath();
xpath.setXPathVariableResolver(qname -> variablesMap.get(qname.getLocalPart()));
return Optional.ofNullable((Node) xpath.evaluate(xpathExpression, document,
XPathConstants.NODE));
}
Optional<Node> getNode(String xpathExpression) throws XPathExpressionException {
return getNode(xpathExpression, Collections.emptyMap());
}

Categories

Resources