How to update XML using XPath and Java - java

I have an XML document, and an XPath expression for that doc. I have to update the doc by using XPath at runtime.
How can I do this using Java?
The below is my xml:
<?xml version="1.0" encoding="ISO-8859-1"?>
<PersonList>
<Person>
<Name>Sonu Kapoor</Name>
<Age>24</Age>
<Gender>M</Gender>
<PostalCode>54879</PostalCode>
</Person>
<Person>
<Name>Jasmin</Name>
<Age>28</Age>
<Gender>F</Gender>
<PostalCode>78745</PostalCode>
</Person>
<Person>
<Name>Josef</Name>
<Age>232</Age>
<Gender>F</Gender>
<PostalCode>53454</PostalCode>
</Person>
</PersonList>
I have to change the values of name and age under //PersonList/Person[2]/Name.

Use setNodeValue. First, get a NodeList, for example:
myNodeList = (NodeList) xpath.compile("//MyXPath/text()")
.evaluate(myXmlDoc, XPathConstants.NODESET);
Then set the value of e.g. the first node:
myNodeList.item(0).setNodeValue("Hi mom!");
More examples e.g. here.
As mentioned in two other answers here, as well as in your previous question: technically, XPath is not a way to "update" an XML document, but only to locate nodes within an XML document. But I presume the above is what you want.
EDIT: Responding to your comment... Are you asking how to write your DOM to an XML file after you've finished editing the DOM? If so, here are two examples of how to do it:
http://www.java2s.com/Code/Java/XML/WriteDOMout.htm
http://download.oracle.com/javaee/1.4/tutorial/doc/JAXPXSLT4.html

You can delete the file and create a new one.
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(
new InputSource("data.xml"));
XPath xpath = XPathFactory.newInstance().newXPath();
NodeList nodes = (NodeList) xpath.evaluate("//employee/name[text()='old']", doc,
XPathConstants.NODESET);
for (int idx = 0; idx < nodes.getLength(); idx++) {
nodes.item(idx).setTextContent("new value");
}
Transformer xformer = TransformerFactory.newInstance().newTransformer();
xformer.transform(new DOMSource(doc), new StreamResult(new File("data_new.xml")));

XPath is used to select parts of an XML document.It has no provision for updating. But since it returns DOM objects (Elements, if memory serves, or maybe Nodes) you can then use DOM methods for altering the document.

XPath can be used to select nodes in a document, not for modification
You apply the xpath expression to your document and get an element (in your case). Once you have this Element, you can use the Element methods to change values (name and age in your case)
Starting from a NodeList it should work like that:
NodeList nodes = getNodeListFromXPathExpression(); // you know how
if (nodes.length == 0)
return; // empty nodelist, xpath didn't select anything
Node first = node.getItem(0); // take the first from the list, your element
// this is a shortcut for your example:
// first is the actual selected element (a node)
// .getFirst() returns the first child node, the "text node" (="Jasmine", ="28")
// .setNodeValue() replace the actual value of that text node with a new string
first.getFirstChild().setNodeValue("New Name or new age");

Consider using XQuery Update instead of XPath. This allows you to write
replace value of node //PersonList/Person[2]/Name with "Anonymous"
This is much easier than using the Java DOM API.

I've created a small project for using XPATH to create/update XML:
https://github.com/shenghai/xmodifier
the code to change your xml is like:
Document document = readDocument("personList.xml");
XModifier modifier = new XModifier(document);
modifier.addModify("//PersonList/Person[2]/Name", "newName");
modifier.modify();

This is a super cool function where you can able to modify any tag value for any XML document using its xpath. You need to pass three arguments xml,xpathExpression and newValue and it returns the XML file as String with modified value.
If you want to pass XML as file, you need to change the function accordingly. But the logic will be same.
public String updateXML(String xml, String xpathExpression, String newValue)
{
try {
//Creating document builder
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document document = builder.parse(new org.xml.sax.InputSource(new StringReader(xml)));
//Evaluating xpath expression using Element
XPath xpath = XPathFactory.newInstance().newXPath();
Element element = (Element)xpath.evaluate(xpathExpression, document, XPathConstants.NODE);
//Setting value in the text
element.setTextContent(value);
//Transformation of document to xml
StringWriter stringWriter = new StringWriter();
Transformer xformer = TransformerFactory.newInstance().newTransformer();
xformer.transform(new DOMSource(document), new StreamResult(stringWriter));
xml = stringWriter.toString();
}
catch (Exception e)
{
e.printStackTrace();
}
return xml;
}

Here is the code to change the content with vtd-xml... vtd-xml is unique in that it is the only API that offers incremental update capability.
import com.ximpleware.*;
import java.io.*;
public class changeName {
public static void main(String s[]) throws VTDException,java.io.UnsupportedEncodingException,java.io.IOException{
VTDGen vg = new VTDGen();
if (!vg.parseFile("input.xml", false))
return;
VTDNav vn = vg.getNav();
AutoPilot ap = new AutoPilot(vn);
XMLModifier xm = new XMLModifier(vn);
ap.selectXPath("//PersonList/Person[2]");
int i=0;
while((i=ap.evalXPath())!=-1){
if (vn.toElement(VTDNav.FIRST_CHILD,"Name")){
int k=vn.getText();
if (i!=-1)
xm.updateToken(k, "Jonathan");
vn.toElement(VTDNav.PARENT);
}
if (vn.toElement(VTDNav.FIRST_CHILD,"Age")){
int k=vn.getText();
if (i!=-1)
xm.updateToken(k, "42");
vn.toElement(VTDNav.PARENT);
}
}
xm.output("new.xml");
}
}

Related

Getting a node from an XML document

I use the worldweatheronline API. The service gives xml in the following form:
<hourly>
<tempC>-3</tempC>
<weatherDesc>rain</weatherDesc>
<precipMM>0.0</precipMM>
</hourly>
<hourly>
<tempC>5</tempC>
<weatherDesc>no</weatherDesc>
<precipMM>0.1</precipMM>
</hourly>
Can I somehow get all the nodes <hourly> in which <tempC>> 0 and <weatherDesc> = rain?
How to exclude from the response the nodes that are not interesting to me <hourly>?
This is quite feasible using XPath.
You can filter a document based on element values, attribute values and other criteria.
Here is a working example that gets the elements according to the first point in the question:
try (InputStream is = Files.newInputStream(Paths.get("C:/temp/test.xml"))) {
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document xmlDocument = builder.parse(is);
XPath xPath = XPathFactory.newInstance().newXPath();
// get hourly elements that have tempC child element with value > 0 and weatherDesc child element with value = "rain"
String expression = "//hourly[tempC>0 and weatherDesc=\"rain\"]";
NodeList hours = (NodeList) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODESET);
for (int i = 0; i < hours.getLength(); i++) {
System.out.println(hours.item(i) + " " + hours.item(i).getTextContent());
}
} catch (Exception e) {
e.printStackTrace();
}
I think you should create xsd from xml and generate JAXB classes.Using those JAXB class you can easily unmarshal the xml and process your logic.

DOM4J Parse not returning any child nodes

I am attempting to begin writing a program which uses DOM4j with which I wish to parse a XML file, save it to some tables and finally allow the user to manipulate the data.
Unfortunately I am stuck on the most basic step, the parsing.
Here is the portion of my XML I am attempting to include:
<?xml version="1.0"?>
<Document xmlns="urn:iso:std:iso:20022:tech:xsd:camt.054.001.04">
<BkToCstmrDbtCdtNtfctn>
<GrpHdr>
<MsgId>000022222</MsgId>
When I attempt to find the root of my XML it does return the root correctly as "Document". When I attempt to get the child node from Document it also correctly gives me "BkToCstmrDbtCdtNtfctn". The problem is that when I try to go any further and get the child nodes from "Bk" I can't. I get this in the console:
org.dom4j.tree.DefaultElement#2b05039f [Element: <BkToCstmrDbtCdtNtfctn uri: urn:iso:std:iso:20022:tech:xsd:camt.054.001.04 attributes: []/>]
Here is my code, I would appreciate any feedback. Ultimately I want to get the "MsgId" attribute back but in general I just want to figure how to parse deeper into the XML because in reality it probably has about 25 layers.
public static Document getDocument(final String xmlFileName){
Document document = null;
SAXReader reader = new SAXReader();
try{
document = reader.read(xmlFileName);
}
catch (DocumentException e)
{
e.printStackTrace();
}
return document;
}
public static void main(String args[]){
String xmlFileName = "C:\\Users\\jhamric\\Desktop\\Camt54.xml";
String xPath = "//Document";
Document document = getDocument(xmlFileName);
Element root = document.getRootElement();
List<Node> nodes = document.selectNodes(xPath);
for(Iterator i = root.elementIterator(); i.hasNext();){
Element element = (Element) i.next();
System.out.println(element);
}
for(Iterator i = root.elementIterator("BkToCstmrDbtCdtNtfctn");i.hasNext();){
Element bk = (Element) i.next();
System.out.println(bk);
}
}
}
The best approach is probably to use XPath, but since the XML document uses namespaces, you cannot use the "simple" selectNodes methods in the API. I would create a helper method to easily evaluate any XPath expression on either the Document or the Element level:
public static void main(String[] args) throws Exception {
Document doc = getDocument(...);
Map<String, String> namespaceContext = new HashMap<>();
namespaceContext.put("ns", "urn:iso:std:iso:20022:tech:xsd:camt.054.001.04");
// Select the first GrpHdr element in document order
Element element = (Element) select("//ns:GrpHdr[1]", doc, namespaceContext);
System.out.println(element.asXML());
// Select the text content of the MsgId element
Text msgId = (Text) select("./ns:MsgId/text()", element, namespaceContext);
System.out.println(msgId.getText());
}
static Object select(String expression, Branch contextNode, Map<String, String> namespaceContext) {
XPath xp = contextNode.createXPath(expression);
xp.setNamespaceURIs(namespaceContext);
return xp.evaluate(contextNode);
}
Note that the XPath expression must use namespace prefixes that is mapped to the namespace URIs used in the input document, but that the actual value of the prefix doesn't matter.

DOM parsing in Java not able to get the nested notes

I have to parse an xml file in which I have many name value pairs.
I have to update the value in case it matches a given name.
I opted for DOM parsing as it can easily traverse any part and can quickly update the value.
It is however giving me some wired results when I am running it on my sample file.
I am new to DOM so if someone can help it can solve my problem.
I tried various things but all resulting in either null values for content or #text node name.
I am not able to get the text content of the tag.
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
Document document = documentBuilder.parse(xmlFilePath);
//This will get the first NVPair
Node NVPairs = document.getElementsByTagName("NVPairs").item(0);
//This should assign nodes with all the child nodes of NVPairs. This should be ideally
//<nameValuePair>
NodeList nodes = NVPairs.getChildNodes();
for (int i = 0; i < nodes.getLength(); i++) {
Node node = nodes.item(i);
// I think it will consider both starting and closing tag as node so checking for if it has
//child
if(node.hasChildNodes())
{
//This should give me the content in the name tag.
//However this is not happening
if ("Tom".equals(node.getFirstChild().getTextContent())) {
node.getLastChild().setTextContent("2000000");
}
}
}
Sample xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?><application>
<NVPairs>
<nameValuePair>
<name>Tom</name>
<value>12</value>
</nameValuePair>
<nameValuePair>
<name>Sam</name>
<value>121</value>
</nameValuePair>
</NVPairs>
#getChildNodes() and #getFirstChild() returns all kinds of nodes, not just Element nodes, and in this case the first child of <name>Tom</name> is a Text node (with newline and blanks). So your test will never return true.
However, in cases like this, it always much more convenient to use XPath:
XPath xpath = XPathFactory.newInstance().newXPath();
NodeList nodes = (NodeList) xpath.evaluate(
"//nameValuePair/value[preceding-sibling::name = 'Tom']", document,
XPathConstants.NODESET);
for (int i = 0; i < nodes.getLength(); i++) {
Node node = nodes.item(i);
node.setTextContent("2000000");
}
I.e., return all <name> elements that has a preceding sibling element <name> with value 'Tom'.

Convert XML String to ArrayList

Seems like a basic question but I can't find this anywhere. Basically I've got a list of XML links like so: (all in one string)
I already have the "string" var which contains all the XML. Just extracting the HTML strings.
<?xml version="1.0" encoding="UTF-8"?>
<fql_query_response xmlns="http://api.facebook.com/1.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" list="true">
<photo>
<src_small>http://photos-a.ak.fbcdn.net/hphotos-ak-ash4/486603_10151153207000351_1200565882_t.jpg</src_small>
</photo>
<photo>
<src_small>http://photos-c.ak.fbcdn.net/hphotos-ak-ash3/578919_10150988289678715_1110488833_t.jpg</src_small>
</photo>
I want to convert these into a arrayList, so something like URLArray[0] would be the first address as a string.
Can anyone tell me how to do this thanks?
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
InputSource is = new InputSource( new StringReader( xmlString) );
Document doc = builder.parse( is );
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
xpath.setNamespaceContext(new PersonalNamespaceContext());
XPathExpression expr = xpath.compile("//src_small/text()");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
List<String> urls = new ArrayList<String>();
for (int i = 0; i < nodes.getLength(); i++) {
urls.add (nodes.item(i).getNodeValue());
System.out.println(nodes.item(i).getNodeValue());
}
You are right, there should be some other resources out there that can help you. Maybe your searches just do not use the right keywords.
You basically have 2 choices:
Use an XML processing library. SAX, DOM, XPATH, & xmlreader are some keywords you can use to find some.
Just ignore the fact that your string is xml and perform normal string operations on it. splits, iterate through it, regular expressions, ect...
Yes for that you have to perform XML Parsing.
then store that in ArrayList.
ex:
ArrayList<String> aList = new ArrayList<String>();
aList.add("your string");

How to getElementById using DOM?

I am having part of HTML page given below and want to extract the content of div tag its id is hiddenDivHL using DOM Parser:
Part Of a HTML Page:
<div id='hiddenDivHL' style='display:none'>http://74.127.61.106/udayavaniIpad/details.php?home=0&catid=882&newsid=123069[InnerSep]http://www.udayavani.com/udayavani_cms/gall_content/2012/1/2012_1$thumbimg117_Jan_2012_000221787.jpg[InnerSep]ಯುವಜನತೆಯಿಂದ ಭವ್ಯಭಾರತ[OuterSep]
So far I have used the below code but I am unable to use getElementById.How to do that?
DOM Parser:
try {
URL url = new URL("http://74.127.61.106/udayavaniIpad/details_android.php?home=1&catid=882&newsid=27593");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new InputSource(url.openStream()));
doc.getDocumentElement().normalize();
NodeList nodeList = doc.getElementsByTagName("item");
/** Assign textview array lenght by arraylist size */
name = new TextView[nodeList.getLength()];
website = new TextView[nodeList.getLength()];
category = new TextView[nodeList.getLength()];
for (int i = 0; i < nodeList.getLength(); i++) {
Node node = nodeList.item(i);
name[i] = new TextView(this);
Element fstElmnt = (Element) node;
NodeList nameList = fstElmnt.getElementsByTagName("hiddenDivHL");
Element nameElement = (Element) nameList.item(0);
nameList = nameElement.getChildNodes();
name[i].setText("Name = "
+ ((Node) nameList.item(0)).getNodeValue());
layout.addView(name[i]);
}
} catch (Exception e) {
System.out.println("XML Pasing Excpetion = " + e);
}
/** Set the layout view to display */
setContentView(layout);
}
XPath is IMHO the most common and easiest way to navigate the DOM in Java.
try{
URL url = new URL("http://74.127.61.106/udayavaniIpad/details_android.php?home=1& catid=882&newsid=27593");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new InputSource(url.openStream()));
doc.getDocumentElement().normalize();
XPath xpath = XPathFactory.newInstance().newXPath();
String expression = "/item/div[#id='hiddenDivHL']";
Node node = (Node) xpath.evaluate(expression, doc, XPathConstants.NODE);
} catch (Exception e) {
System.out.println("XML Pasing Excpetion = " + e);
}
I'm not sure if the XPath expression is right, but the link is here: http://developer.android.com/reference/javax/xml/xpath/package-summary.html
There are 2 differences between getElementById and getElementsByName:
getElementById requires a single unique id in your document, whereas getElementsByName can fetch several occurances of the same name.
getElementById is a method (or function) of the document object. You can only access it by using document.getElementById(..).
Your code seems to violate both these requirements, you seem to go through a loop of nodes and expect a hiddenDivHL id in each node list. So the id is not unique. Second your root point is not the document but the root point of each node in that list.
If you know you have a single instance with that id try document.getElementById.
I didn't really get the question.
a) Do you mean getting more elements by document.getElementById('hiddenDivHL')?
so my answer would be that, in a HTML-Document, the id has to be reserved for one element only.
b) If you just want to catch that element?
what exactly does not work? what are you trying to achieve? I fear I don't really get the point.
You have to call fstElmnt.getElementsByTagName("div"); to get all div's elements and them check if their attribute id is equal hiddenDivHL.
The easiest way i can think of is to use jSoup library, what it does is parse the DOM for you and lets you select elements using a css style (or jquery style) selector.
in this example you would do something like this
Document doc = Jsoup.connect("http://74.127.61.106/udayavaniIpad/details_android.php?home=1&catid=882&newsid=27593").get();
String divContents = doc.select("#hiddenDivHL").first().text();
Why are you unable to use getElementById()? It is in JavaSE 7 and JavaSE6/5/1.4.2, since 'DOM Level 2'.
To get the contents of an element in JavaScript:
var el = document.getElementById('hiddenDivHL');
var contents = el.innerHTML;
alert("Found " + contents.length + " characters of content.");
See your example on jsfiddle.
I think the confusion is due to the fact that your question is tagged JavaScript, but the code you posted is Java. They are different languages, and JavaScript people will only be confused by that parser. I haven't used Java in years so I can't really help you there.

Categories

Resources