I have a Java Restful webservice that returns out an userdetails xml as below:
<userdetails>
<firstName>first</firstName>
<firstName>last</firstName>
<email>123#gmail.com</email>
</userdetails>
Any of the fields in the XML can contain special charecters which will cause issues for the client when they use Jaxb to convert xml into java object.
I can use "StringEscapeUtils.escapeXml" to escape special charecters in a field like say for firstName and it is escaping it correctly.
StringEscapeUtils.escapeXml(firstName);
But I have to do this for every field in my XML. Is there any way where I can escape the entire XML at once instead of doing it for every field.
Another option is to traverse all text elements, escaping them in the process (below code taken from here and slightly modified):
import java.io.File;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;
import org.apache.commons.lang.StringEscapeUtils;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
public class EscapeXml {
static String inputFile = "data.xml";
static String outputFile = "data_new.xml";
public static void main(String[] args) throws Exception {
Document doc = DocumentBuilderFactory.newInstance()
.newDocumentBuilder().parse(new InputSource(inputFile));
// locate the node(s)
XPath xpath = XPathFactory.newInstance().newXPath();
NodeList nodes = (NodeList)doc.getElementsByTagName("*");
// escape text nodes
for (int idx = 0; idx < nodes.getLength(); idx++) {
Node node = nodes.item(idx);
if (node.getNodeType() == Node.ELEMENT_NODE) {
NodeList childNodes = node.getChildNodes();
for (int cIdx = 0; cIdx < childNodes.getLength(); cIdx++) {
Node childNode = childNodes.item(cIdx);
if (childNode.getNodeType() == Node.TEXT_NODE) {
String newTextContent =
StringEscapeUtils.escapeXml(childNode.getTextContent());
childNode.setTextContent(newTextContent);
}
}
}
}
// save the result
Transformer xformer = TransformerFactory.newInstance().newTransformer();
xformer.transform(new DOMSource(doc), new StreamResult(new File(outputFile)));
}
}
Related
I'm writing a simple code to scrape data from the web page using selenium and xpath2.0 function.
Since Selenium supports only xpath1.0 functions, I am trying to use Saxon.jar
I have downloaded and extracted the Saxon9he.jar files into the path "C:\Program Files\Java\jre1.8.0_111\lib\ext"
I have created a file "jaxp.properties" with the following lines:
javax.xml.transform.TransformerFactory = net.sf.saxon.TransformerFactoryImpl
javax.xml.xpath.XPathFactory","net.sf.saxon.xpath.XPathFactoryImpl
Also included my jar files in the eclipse library.
But, I am not able to fetch the values with the Xpath2.0 functions.
In my code, if I use
XPathFactory factory = XPathFactory.newInstance();
instead of
XPathFactory factory = XPathFactory.newInstance(NamespaceConstant.OBJECT_MODEL_SAXON);
I am able to use the xpath1.0 functions. But I need Xpath2.0 function. please guide me in this.
My code is:
import java.io.IOException;
import java.io.StringReader;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import javax.xml.xpath.XPathFactoryConfigurationException;
import javax.xml.xpath.XPathFunctionResolver;
import javax.xml.xpath.XPathVariableResolver;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.firefox.FirefoxDriver;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import net.sf.saxon.lib.NamespaceConstant;
import net.sf.saxon.xpath.XPathFactoryImpl;
public class XpathCheckClass {
public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException, XPathFactoryConfigurationException, XPathExpressionException{
WebDriver dr = new FirefoxDriver();
dr.get("http://s15.a2zinc.net/clients/hartenergy/midstream17/Public/eBooth.aspx?Nav=False&BoothID=137384");
try {
Thread.sleep(3000);
} catch (Exception e) {
}
String source = dr.getPageSource();
Document doc = null;
try {
DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
doc = db.parse( new InputSource( new StringReader(source)));
} catch (Exception e) {
e.printStackTrace();
}
System.setProperty("javax.xml.xpath.XPathFactory:"+NamespaceConstant.OBJECT_MODEL_SAXON, "net.sf.saxon.xpath.XPathFactoryImpl");
XPathFactory factory = XPathFactory.newInstance(NamespaceConstant.OBJECT_MODEL_SAXON);
// XPathFactory factory = XPathFactory.newInstance(); ---> default xpath factory
XPath xpath = factory.newXPath();
XPathExpression expr = xpath.compile("if(//h2) then //h2 else //h1");
NodeList nodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
System.out.println(nodes.getLength());
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getTextContent());
}
dr.close();
}
}
Recent releases of Saxon no longer advertise themselves as JAXP XPath services, so you need to instantiate the XPath factory explicitly:
XPathFactory xf = new net.sf.saxon.XPathFactoryImpl();
I have an xml like below.
<name>
<value>123</value>
<value>456</value>
<value>789</value>
</name>
Now using java's Xpath query I tried below method
NodeList list3 = (NodeList) xpath.evaluate("name/value", element,XPathConstants.NODESET);
But it gives me only first value, how can I print all <value> tags ?
Your XPath expression is correct, there is most likely another problem in your code. You really should provide a complete example which demonstrates your problem.
The following code demonstrates how this would look like:
import java.io.StringReader;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
public class XmlTest {
public static void main(String[] args) throws Exception {
String xml = "<name>\n" +
"<value>123</value>\n" +
"<value>456</value>\n" +
"<value>789</value>\n" +
"</name>";
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(new InputSource(new StringReader(xml)));
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
NodeList list = (NodeList) xpath.evaluate("name/value", doc, XPathConstants.NODESET);
for (int i = 0; i < list.getLength(); ++i) {
Node node = list.item(i);
System.out.println(node.getNodeName());
}
}
}
Running this results in the following output:
value
value
value
I have a xml file. I need to get the sub child tag of the parent tag (Body) in xml file using Java. First I need to use DOM for reading an element
and get xml file from my local machine drive. I have one String varaible (Sring getSubChildValue = "181_paragraph_13") and I need to compare the value
with each and every attribute Value in the Xml file. If the given Value may be in sub child tag,I cont able to get a Value.
what I need to do for compare the String variable and with Xml File
What I need to do for print the Tag name if the String value is equal to any attrinbute Value.
Example: (P) Tag is the sub child of Tag (Body) which contain the given String Value. So I need to get tag name P.
How to avoid the Hard coding the sub-child Name to get the solution?
Example XML file:
<parent>
<Body class="student" id="181_student_method_3">
<Book class="Book_In_School_11" id="181_student_method_11"/>
<subject class="subject_information " id="181_student_subject_12"/>
<div class="div_passage " id="181_div_method_3">
<p class=" paragraph_book_name" id="181_paragraph_13">
<LiberaryBook class="Liberary" id="181_Liberary_9" >
<Liberary class="choice "
id="Liberary_replace_1" Uninversity="University_Liberary_1">
Dubliners</Liberary>
<Liberary class="choice "
id="Liberary_replace_2" Uninversity="University_Liberary_2">
Adventure if sherlock Holmes</Liberary>
<Liberary class="choice "
id="Liberary_replace_3" Uninversity="University_Liberary_3">
Charlotte’s Web</Liberary>
<Liberary class="choice "
id="Liberary_replace_4" Uninversity="University_Liberary_4">
The Outsiders</Liberary>
</LiberaryBook>
</p>
</div>
</Body>
</parent>
Example Java code:
import java.io.File;
import java.io.IOException;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
public class PerfectTagChange {
public static void main(String[] args) {
String filePath = "/xmlfile/Xml/check/sample.xml";
File xmlFile = new File(filePath);
DocumentBuilderFactory
dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder;
try {
dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(xmlFile);
doc.getDocumentElement().normalize();
Element root = doc.getDocumentElement();
changeValue(root,doc);
doc.getDocumentElement().normalize();
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
DOMSource source = new DOMSource(doc);
StreamResult result = new StreamResult(new File("/xmlfile/Xml/check/Demo.xml"));
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.transform(source, result);
System.out.println("XML file updated successfully");
} catch (SAXException | ParserConfigurationException | IOException | TransformerException e1) {
e1.printStackTrace();
}
}
//This Method is used to check which attribute contain given string Value : Hard code parent tag, But no other tag.
private static void changeValue(Node someNode,Document doc) {
Sring getSubChildValue = "181_paragraph_13"
NodeList childs = someNode.getChildNodes();
for (int in = 0; in < childs.getLength();) {
Node child = childs.item(in);
if (child.getNodeType() == Document.ELEMENT_NODE) {
if (child.getNodeName().equalsIgnoreCase("Body") ) {
//If I hard code the ID here on getNamedItem("id"),
If the attribute Name got Changed from ID to Name
it will be in problem.
//3.What is the solution for solving the problem.
if(child.getAtrribute.getNamedItem("id").getNodeValue().equals(getSubChildValue)){
system.out.println(child.getAtrribute.getNamedItem("id").getNodeValue());
}
}
}
}
}
If you change your code to this:
private static void changeValue(Node someNode, Document doc, String searchString) throws Exception {
XPath xPath = XPathFactory.newInstance().newXPath();
NodeList nodes = (NodeList) xPath.evaluate("//*[#*=\"" + searchString + "\"]",
doc.getDocumentElement(),
XPathConstants.NODESET);
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println("Tagname: " + nodes.item(i).getNodeName());
}
}
you don't have the name of the attribute to be hardcoded.
EDIT:
Added searchString as parameter.
I'm trying to parse an rdfs xml file in order to find all the Classes in an rdfs file.
The xpath: "/rdf:RDF/rdfs:Class"
is working in my XML editor.
When i insert the xpath in my Java program (i have implemented a dom parser), i get 0 Classes.
The following example runs but it outputs 0 classes!
I do:
import java.io.IOException;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPathExpressionException;
import org.xml.sax.SAXException;
public class Main {
public static void main(String args[]) throws XPathExpressionException, ParserConfigurationException, SAXException, IOException{
FindClasses FSB = new FindClasses();
FSB.FindAllClasses("C:\\Workspace\\file.xml"); //rdfs file
}
}
The class FindClasses is as follows:
import java.io.IOException;
import java.util.Collection;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
public class FindClasses {
public void FindAllClasses(String fileName) throws XPathExpressionException, ParserConfigurationException, SAXException, IOException {
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setNamespaceAware(true);
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document doc = builder.parse(fileName);
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression classes_expr = xpath.compile("/rdf:RDF/rdfs:Class");
Object result = classes_expr.evaluate(doc, XPathConstants.NODESET);
NodeList classes = (NodeList) result;
System.out.println("I found : " + classes.getLength() + " classes " );
}
}
The rdfs file is:
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xml:lang="en" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
<rdfs:Class rdf:about="Class1">
</rdfs:Class>
<rdfs:Class rdf:about="Class2">
</rdfs:Class>
</rdf:RDF>
I don't really understand why the xpath returns 0 nodes in that example.
It's weird, cause i have implemented other dom parsers as well and they were working fine.
Can somebody help me?
Thanks
I visited the following link and i solved my problem:
Issues with xpath in java
The problem was that the xpath contained two namespaces (rdf,rdfs) like "/rdf:RDF/rdfs:Class".
If the xpath didn't contain any namespace e.g. /RDF/Class , there was not going to be an issue.
So after the line:
xpath = XPathFactory.newInstance().newXPath();
and before the line:
XPathExpression classes_expr = xpath.compile("/rdf:RDF/rdfs:Class");
I added the following:
xpath.setNamespaceContext(new NamespaceContext() {
public String getNamespaceURI(String prefix) {
switch (prefix) {
case "rdf": return "http://www.w3.org/1999/02/22-rdf-syntax-ns#";
case "rdfs" : return "http://www.w3.org/2000/01/rdf-schema#";
}
return prefix;
}
public String getPrefix(String namespace) {
if (namespace.equals("rdf")) return "rdf";
else if (namespace.equals("rdfs")) return "rdfs";
else return null;
}
#Override
public Iterator getPrefixes(String arg0) {
// TODO Auto-generated method stub
return null;
}
});
I am trying to retrieve the names of all the nodes from XML file using "node.getNodeName()". While doing so, every node name is preceeded and followed by "#text". Because of that, i am not getting the exact count of nodes as well. I want "#text" to be eliminated while retrieving the names. How do i do that??
With that :
package com.hum;
import java.io.InputStreamReader;
import java.io.StringReader;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
/**
*
* #author herve
*/
public class PrintNameXML
{
public static void main(String[] args) throws Exception
{
String xml = "<a><o><u>ok</u></o></a>";
Document doc =
DocumentBuilderFactory
.newInstance()
.newDocumentBuilder()
.parse(new InputSource(new StringReader(xml)));
NodeList nl = doc.getElementsByTagName("*");
for (int i = 0; i < nl.getLength(); i++)
{
System.out.println("name is : "+nl.item(i).getNodeName());
}
}
}
I get :
name is : a
name is : o
name is : u
Is that you search ?