Parsing XML with filter - java

i parse XML document in java with:
doc = DocumentBuilderFactory
.newInstance()
.newDocumentBuilder()
.parse(new URL(url).openStream());
work, but is possible to parse with some filter? for example my XML file have one attribute priority, is possible to parse with filter for example priority>8 ?
So in the doc have only element with priority > 8.
Example xml:
<url>
<loc>http</loc>
<lastmod>2015-02-26</lastmod>
<title>Hello</titolo>
<priority>1.0</priority>
</url>
...
Thanks

For the following sample input file named urls.xml
<root>
<url>
<loc>http</loc>
<lastmod>2015-02-26</lastmod>
<title>Hello</title>
<priority>1.0</priority>
</url>
<url>
<loc>http</loc>
<lastmod>2015-02-26</lastmod>
<title>Hello</title>
<priority>7.0</priority>
</url>
<url>
<loc>http</loc>
<lastmod>2015-02-26</lastmod>
<title>Hello</title>
<priority>10.0</priority>
</url>
</root>
You first create the full Document tree as usual
Document document = DocumentBuilderFactory
.newInstance()
.newDocumentBuilder()
.parse(new File("urls.xml"));
Then run the XPath query that selects all the Nodes above a certain priority
XPathExpression expr = XPathFactory.newInstance()
.newXPath().compile("//url[priority > 5]");
NodeList urls = (NodeList) expr.evaluate(document, XPathConstants.NODESET);
If you want to serialize the results to another xml file, create a new Document first.
Document result = DocumentBuilderFactory.newInstance()
.newDocumentBuilder().newDocument();
Node root = result.createElement("results");
result.appendChild(root);
Then append the filtered url Nodes as
for (int i = 0; i < urls.getLength(); i++) {
Node copy = result.importNode(urls.item(i), true);
root.appendChild(result.createTextNode("\n\t"));
root.appendChild(copy);
}
root.appendChild(result.createTextNode("\n"));
Now, all you need to do is to serialize the new Document to a String and write that out to a file. Here's I'm just printing it out on to the console.
System.out.println(
((DOMImplementationLS) result.getImplementation())
.createLSSerializer().writeToString(result));
Output:
<?xml version="1.0" encoding="UTF-16"?>
<results>
<url>
<loc>http</loc>
<lastmod>2015-02-26</lastmod>
<title>Hello</title>
<priority>7.0</priority>
</url>
<url>
<loc>http</loc>
<lastmod>2015-02-26</lastmod>
<title>Hello</title>
<priority>10.0</priority>
</url>
</results>

You should use XPath to find the elements you require:
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile([your xpath here]);
Then...
NodeList nl = (NodeList) expr.evaluate(doc);
... to get the nodes you require. You can use...
for(Node node in nl) {
if (node.getNodeType() == Node.ELEMENT_NODE) {
}
}
... to pull out only the genuine elements.
Of course, you'll need to also build up a basic XPath expression to find the nodes you require.

Related

Extracting the node values in XML with XPath in Java

I have an XML document:
<response>
<result>
<phone>1233</phone>
<sys_id>asweyu4</sys_id>
<link>rft45fgd</link>
<!-- Many more in result -->
</result>
<!-- Many more result nodes -->
</response>
The XML structure is unknown. I am getting XPath for attributes from user.
e.g. inputs are strings like:
//response/result/sys_id , //response/result/phone
How can I get these node values for whole XML document by evaluating XPath?
I referred this but my xpath is as shown above i.e it does not have * or text() format.
The xpath evaluator works perfectly fine with my input format, so is there any way I can achieve the same in java?
Thank you!
It's difficult without seeing your code... I'd just evaluate as a NodeList and then call getTextContent() on each node in the result list...
String input = "<response><result><phone>1233</phone><sys_id>asweyu4</sys_id><link>rft45fgd</link></result><result><phone>1233</phone><sys_id>another-sysid</sys_id><link>another-link</link></result></response>";
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder()
.parse(new ByteArrayInputStream(input.getBytes("UTF-8")));
XPath path = XPathFactory.newInstance().newXPath();
NodeList node = (NodeList) path.compile("//response/result/sys_id").evaluate(doc, XPathConstants.NODESET);
for (int i = 0; i < node.getLength(); i++) {
System.out.println(node.item(i).getTextContent());
}
Output
asweyu4
another-sysid

XPath select element with an attribute

I have a xml file which looks like:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<alarm-response-list xmlns="http://www.ca.com/spectrum/restful/schema/response"
error="EndOfResults" throttle="277" total-alarms="288">
<alarm-responses>
<alarm id="53689bf8-6cc8-1003-0060-008010186429">
<attribute id="0x11f4a" error="NoSuchAttribute" />
<attribute id="0x12b4c">UPS DIAGNOSTIC TEST FAILED</attribute>
<attribute id="0x10b5a">IDG860237, SL3-PL4, US, SapNr=70195637,</attribute>
</alarm>
<alarm id="536b8c9a-28b3-1008-0060-008010186429">
<attribute id="0x11f4a" error="NoSuchAttribute" />
<attribute id="0x12b4c">DEVICE IN MAINTENANCE MODE</attribute>
<attribute id="0x10b5a">IDG860237, SL3-PL4, US, SapNr=70195637,</attribute>
</alarm>
</alarm-responses>
</alarm-response-list>
There a lot of these alarms. Now I want save for every alarm tag the attribute with the id = 0x10b5a in a String. But I haven't a great clue. In my way it doesn't do it. I get only showed the expression.
My idea:
FileInputStream file = new FileInputStream(
new File(
"alarms.xml"));
DocumentBuilderFactory builderFactory = DocumentBuilderFactory
.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(file);
XPath xPath = XPathFactory.newInstance().newXPath();
System.out.println("*************************");
String expression = "/alarm-responses/alarm/attribute[#id='0x10b5a'] ";
System.out.println(expression);
NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(
xmlDocument, XPathConstants.NODESET);
for (int i = 0; i < nodeList.getLength(); i++) {
System.out.println(nodeList.item(i).getFirstChild()
.getNodeValue());
}
There are several different problems here that are interacting to mean that your XPath expression doesn't match anything. Firstly the alarm-responses element isn't the root of the document - you need an extra step on the front of the path to select the alarm-response-list element. But more importantly you have namespace issues.
XPath only works when the XML has been parsed with namespaces enabled, which for some reason is not the default for DocumentBuilderFactory. You need to enable namespaces before you do newDocumentBuilder.
Now your XML document has xmlns="http://www.ca.com/spectrum/restful/schema/response", which puts all the elements in this namespace, but unprefixed node names in an XPath expression always refer to nodes that are not in a namespace. In order to match namespaced nodes you need to bind a prefix to the namespace URI and then use prefixed names in the path.
For javax.xml.xpath this is done using a NamespaceContext, but annoyingly there is no default implementation of this interface available by default in the Java core library. There is a SimpleNamespaceContext implementation available as part of Spring, or it's fairly simple to write your own. Using the Spring class:
DocumentBuilderFactory builderFactory = DocumentBuilderFactory
.newInstance();
// enable namespaces
builderFactory.setNamespaceAware(true);
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(file);
XPath xPath = XPathFactory.newInstance().newXPath();
// Set up the namespace context
SimpleNamespaceContext ctx = new SimpleNamespaceContext();
ctx.bindNamespaceUri("ca", "http://www.ca.com/spectrum/restful/schema/response");
xPath.setNamespaceContext(ctx);
System.out.println("*************************");
// corrected expression
String expression = "/ca:alarm-response-list/ca:alarm-responses/ca:alarm/ca:attribute[#id='0x10b5a']";
System.out.println(expression);
NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(
xmlDocument, XPathConstants.NODESET);
for (int i = 0; i < nodeList.getLength(); i++) {
System.out.println(nodeList.item(i).getTextContent());
}
Note also how I'm using getTextContent() to get the text under each matched element. The getNodeValue() method always returns null for element nodes.

Boolean operation in Xpath: Using attributes

Following is a snippet of .xml file. I did following :
Document doc = docBuilder.parse(filesInDirectory.get(i));
doc.getDocumentElement().normalize();
XPath xPath = XPathFactory.newInstance().newXPath();
XPathExpression expr1 = xPath.compile("//codes[# class ='class2']/code[#code]");
Object result1 = expr1.evaluate(doc, XPathConstants.NODESET);
NodeList nodes1 = (NodeList) result1;
Now,
System.out.println("result length"+":"+nodes1.getLength());
returns 2.
I would like to make logical decision based on the attribute names, like(pseudocode)
if(nodes1.contains(123))
or
if(nodes1.contains(123) && nodes1.contains(456))
and make decision.
how would i do it?
<metadata>
<codes class="class1">
<code code="ABC">
<detail "blah" "blah">
</code>
</codes>
<codes class="class2">
<code code="123">
<detail "blah blah"/>
</code>
<code code="456">
<detail "blah blah"/>
</code>
</codes>
</metadata>
This:
XPathExpression expr1 = xPath.compile("//codes[#class]");
Object result1 = expr1.evaluate(doc, XPathConstants.NODESET);
NodeList nodes1 = (NodeList) result1;
should return you a list of elements with a class attribute. Iterate over this node list and foreach node extract the [#code] element and use a check like
if (node.getNodeValue().equals("123"))
to establish whether your node has the value you are looking for.
Try out this:
File f = new File("test.xml");
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
InputSource src = new InputSource(new FileInputStream(f));
Object result = xpath.evaluate("//codes[#class='class2']/code/#code",src,XPathConstants.NODESET);
NodeList lst = (NodeList)result;
List<String> codeList = new ArrayList<String>();
for(int idx=0; idx<lst.getLength(); idx++){
codeList.add(lst.item(idx).getNodeValue());
}
if(codeList.contains("123")){
System.out.println("123");
}
if(codeList.contains("123") && codeList.contains("456")){
System.out.println("123 and 456");
}
Explanation:
XPath //codes[#class='class2']/code/#code will collect all code values under codes with having class as class2.
You can then build a List from NodeList so that you can use contains() method.
Use this XPath expression:
/*/codes[#class]/code[#code = '123' or #code = '456']
It selects any code element whose code attribute's string value is one of the strings "123" or "456" and that (the code element) is a child of a codes element that has a `class attribute and is a child of the top element of the XML document.
XSLT - based verification:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:copy-of select=
"/*/codes[#class]/code[#code = '123' or #code = '456']"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided XML document (corrected to be made well-formed):
<metadata>
<codes class="class1">
<code code="ABC">
<detail/>
</code>
</codes>
<codes class="class2">
<code code="123">
<detail />
</code>
<code code="456">
<detail />
</code>
</codes>
</metadata>
the XPath expression is evaluated and the selected nodes are copied to the output:
<code code="123">
<detail/>
</code>
<code code="456">
<detail/>
</code>
Explanation:
Proper use of the standard XPath operator or.

Java XPath: Get all the elements that match a query

I want to make an XPath query on this XML file (excerpt shown):
<?xml version="1.0" encoding="UTF-8"?>
<!-- MetaDataAPI generated on: Friday, May 25, 2007 3:26:31 PM CEST -->
<Component xmlns="http://xml.sap.com/2002/10/metamodel/webdynpro" xmlns:IDX="urn:sap.com:WebDynpro.Component:2.0" mmRelease="6.30" mmVersion="2.0" mmTimestamp="1180099591892" name="MassimaleContr" package="com.bi.massimalecontr" masterLanguage="it">
...
<Component.UsedModels>
<Core.Reference package="com.test.test" name="MasterModel" type="Model"/>
<Core.Reference package="com.test.massimalecontr" name="MassimaleModel" type="Model"/>
<Core.Reference package="com.test.test" name="TravelModel" type="Model"/>
</Component.UsedModels>
...
I'm using this snippet of code:
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setNamespaceAware(true);
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document document = builder.parse(new File("E:\\Test branch\\test.wdcomponent"));
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
xpath.setNamespaceContext(new NamespaceContext() {
...(omitted)
System.out.println(xpath.evaluate(
"//d:Component/d:Component.UsedModels/d:Core.Reference/#name",
document));
What I'm expecting to get:
MasterModel
MassimaleModel
TravelModel
What I'm getting:
MasterModel
It seems that only the first element is returned. How can I get all the occurrences that matches my query?
You'll get a item of type NodeList
XPathExpression expr = xpath.compile("//Core.Reference");
NodeList list= (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
for (int i = 0; i < list.getLength(); i++) {
Node node = list.item(i);
System.out.println(node.getTextContent());
// work with node
See How to read XML using XPath in Java
As per that example, If you first compile the XPath expression then execute it, specifying that you want a NodeSet back you should get the result you want.

How do find an xml element by an attibute and delete it in java [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
How do I remove a node element by id in XML?
XML Structure
<Servers>
<server ID="12234"> // <-- I want to find by this id and remove the entire node
<name>Greg</name>
<ip>127.0.0.1</ip>
<port>1897</port>
</server>
<server ID="42234">
<name>Bob</name>
<ip>127.0.0.1</ip>
<port>1898</port>
</server>
<server ID="5634">
<name>Tom</name>
<ip>127.0.0.1</ip>
<port>1497</port>
</server>
</Servers>
JAVA CODE:
public void removeNodeFromXML(String name)
throws ParserConfigurationException, SAXException, IOException,
TransformerException, XPathExpressionException
{
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(file_);
/**
* BEG FIX ME
*/
Element element = (Element) doc.getElementsByTagName(name).item(0);
// Remove the node
element.removeChild(element);
// Normalize the DOM tree to combine all adjacent nodes
/**
* END FIX ME
*/
doc.normalize();
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
DOMSource source = new DOMSource(doc);
StreamResult result = new StreamResult(file_);
transformer.transform(source, result);
}
DESIRED OUTCOME
<Servers>
<server ID="42234">
<name>Bob</name>
<ip>127.0.0.1</ip>
<port>1898</port>
</server>
<server ID="5634">
<name>Tom</name>
<ip>127.0.0.1</ip>
<port>1497</port>
</server>
</Servers>
You can use Xpath to get the Node then remove the node like you did in your code.
example:
XPathExpression expr = xpath.compile("Server/server[#id="+idToBeDeleted+"]");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
nodes = (NodeList) result;
//if you have atleast 1
Node nodeToBeRemoved = nodes.item(0)
The broad answer is: Xpath. Xpath is a very expressive language that allows you to select nodes in your XML structure based on the structure and content of your XML document.
Specifically to your question, some code making use of xpath will go roughly like this
String xpath = "/Servers/server/*[#id='<your data goes here']";
NodeList nodelist = XPathAPI.selectNodeList(doc, xpath);
if (nodelist.getLength()==1) { // you found the node, and there's only one.
Element elem = (Element)nodelist.item(0);
... // remove the node
}

Categories

Resources