Extracting XML Elements from Java Object [duplicate] - java

I am new to XML. I want to read the following XML on the basis of request name. Please help me on how to read the below XML in Java -
<?xml version="1.0"?>
<config>
<Request name="ValidateEmailRequest">
<requestqueue>emailrequest</requestqueue>
<responsequeue>emailresponse</responsequeue>
</Request>
<Request name="CleanEmail">
<requestqueue>Cleanrequest</requestqueue>
<responsequeue>Cleanresponse</responsequeue>
</Request>
</config>

If your XML is a String, Then you can do the following:
String xml = ""; //Populated XML String....
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new InputSource(new StringReader(xml)));
Element rootElement = document.getDocumentElement();
If your XML is in a file, then Document document will be instantiated like this:
Document document = builder.parse(new File("file.xml"));
The document.getDocumentElement() returns you the node that is the document element of the document (in your case <config>).
Once you have a rootElement, you can access the element's attribute (by calling rootElement.getAttribute() method), etc. For more methods on java's org.w3c.dom.Element
More info on java DocumentBuilder & DocumentBuilderFactory. Bear in mind, the example provided creates a XML DOM tree so if you have a huge XML data, the tree can be huge.
Related question.
Update Here's an example to get "value" of element <requestqueue>
protected String getString(String tagName, Element element) {
NodeList list = element.getElementsByTagName(tagName);
if (list != null && list.getLength() > 0) {
NodeList subList = list.item(0).getChildNodes();
if (subList != null && subList.getLength() > 0) {
return subList.item(0).getNodeValue();
}
}
return null;
}
You can effectively call it as,
String requestQueueName = getString("requestqueue", element);

In case you just need one (first) value to retrieve from xml:
public static String getTagValue(String xml, String tagName){
return xml.split("<"+tagName+">")[1].split("</"+tagName+">")[0];
}
In case you want to parse whole xml document use JSoup:
Document doc = Jsoup.parse(xml, "", Parser.xmlParser());
for (Element e : doc.select("Request")) {
System.out.println(e);
}

If you are just looking to get a single value from the XML you may want to use Java's XPath library. For an example see my answer to a previous question:
How to use XPath on xml docs having default namespace
It would look something like:
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
public class Demo {
public static void main(String[] args) {
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
try {
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document dDoc = builder.parse("E:/test.xml");
XPath xPath = XPathFactory.newInstance().newXPath();
Node node = (Node) xPath.evaluate("/Request/#name", dDoc, XPathConstants.NODE);
System.out.println(node.getNodeValue());
} catch (Exception e) {
e.printStackTrace();
}
}
}

There are a number of different ways to do this. You might want to check out XStream or JAXB. There are tutorials and the examples.

If the XML is well formed then you can convert it to Document. By using the XPath you can get the XML Elements.
String xml = "<stackusers><name>Yash</name><age>30</age></stackusers>";
Form XML-String Create Document and find the elements using its XML-Path.
Document doc = getDocument(xml, true);
public static Document getDocument(String xmlData, boolean isXMLData) throws Exception {
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
dbFactory.setNamespaceAware(true);
dbFactory.setIgnoringComments(true);
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc;
if (isXMLData) {
InputSource ips = new org.xml.sax.InputSource(new StringReader(xmlData));
doc = dBuilder.parse(ips);
} else {
doc = dBuilder.parse( new File(xmlData) );
}
return doc;
}
Use org.apache.xpath.XPathAPI to get Node or NodeList.
System.out.println("XPathAPI:"+getNodeValue(doc, "/stackusers/age/text()"));
NodeList nodeList = getNodeList(doc, "/stackusers");
System.out.println("XPathAPI NodeList:"+ getXmlContentAsString(nodeList));
System.out.println("XPathAPI NodeList:"+ getXmlContentAsString(nodeList.item(0)));
public static String getNodeValue(Document doc, String xpathExpression) throws Exception {
Node node = org.apache.xpath.XPathAPI.selectSingleNode(doc, xpathExpression);
String nodeValue = node.getNodeValue();
return nodeValue;
}
public static NodeList getNodeList(Document doc, String xpathExpression) throws Exception {
NodeList result = org.apache.xpath.XPathAPI.selectNodeList(doc, xpathExpression);
return result;
}
Using javax.xml.xpath.XPathFactory
System.out.println("javax.xml.xpath.XPathFactory:"+getXPathFactoryValue(doc, "/stackusers/age"));
static XPath xpath = javax.xml.xpath.XPathFactory.newInstance().newXPath();
public static String getXPathFactoryValue(Document doc, String xpathExpression) throws XPathExpressionException, TransformerException, IOException {
Node node = (Node) xpath.evaluate(xpathExpression, doc, XPathConstants.NODE);
String nodeStr = getXmlContentAsString(node);
return nodeStr;
}
Using Document Element.
System.out.println("DocumentElementText:"+getDocumentElementText(doc, "age"));
public static String getDocumentElementText(Document doc, String elementName) {
return doc.getElementsByTagName(elementName).item(0).getTextContent();
}
Get value in between two strings.
String nodeVlaue = org.apache.commons.lang.StringUtils.substringBetween(xml, "<age>", "</age>");
System.out.println("StringUtils.substringBetween():"+nodeVlaue);
Full Example:
public static void main(String[] args) throws Exception {
String xml = "<stackusers><name>Yash</name><age>30</age></stackusers>";
Document doc = getDocument(xml, true);
String nodeVlaue = org.apache.commons.lang.StringUtils.substringBetween(xml, "<age>", "</age>");
System.out.println("StringUtils.substringBetween():"+nodeVlaue);
System.out.println("DocumentElementText:"+getDocumentElementText(doc, "age"));
System.out.println("javax.xml.xpath.XPathFactory:"+getXPathFactoryValue(doc, "/stackusers/age"));
System.out.println("XPathAPI:"+getNodeValue(doc, "/stackusers/age/text()"));
NodeList nodeList = getNodeList(doc, "/stackusers");
System.out.println("XPathAPI NodeList:"+ getXmlContentAsString(nodeList));
System.out.println("XPathAPI NodeList:"+ getXmlContentAsString(nodeList.item(0)));
}
public static String getXmlContentAsString(Node node) throws TransformerException, IOException {
StringBuilder stringBuilder = new StringBuilder();
NodeList childNodes = node.getChildNodes();
int length = childNodes.getLength();
for (int i = 0; i < length; i++) {
stringBuilder.append( toString(childNodes.item(i), true) );
}
return stringBuilder.toString();
}
OutPut:
StringUtils.substringBetween():30
DocumentElementText:30
javax.xml.xpath.XPathFactory:30
XPathAPI:30
XPathAPI NodeList:<stackusers>
<name>Yash</name>
<age>30</age>
</stackusers>
XPathAPI NodeList:<name>Yash</name><age>30</age>

following links might help
http://labe.felk.cvut.cz/~xfaigl/mep/xml/java-xml.htm
http://developerlife.com/tutorials/?p=25
http://www.java-samples.com/showtutorial.php?tutorialid=152

There are two general ways of doing that. You will either create a Domain Object Model of that XML file, take a look at this
and the second choice is using event driven parsing, which is an alternative to DOM xml representation. Imho you can find the best overall comparison of these two basic techniques here. Of course there are much more to know about processing xml, for instance if you are given XML schema definition (XSD), you could use JAXB.

There are various APIs available to read/write XML files through Java.
I would refer using StaX
Also This can be useful - Java XML APIs

You can make a class which extends org.xml.sax.helpers.DefaultHandler and call
start_<tag_name>(Attributes attrs);
and
end_<tag_name>();
For it is:
start_request_queue(attrs);
etc.
And then extends that class and implement xml configuration file parsers you want. Example:
...
public void startElement(String uri, String name, String qname,
org.xml.sax.Attributes attrs)
throws org.xml.sax.SAXException {
Class[] args = new Class[2];
args[0] = uri.getClass();
args[1] = org.xml.sax.Attributes.class;
try {
String mname = name.replace("-", "");
java.lang.reflect.Method m =
getClass().getDeclaredMethod("start" + mname, args);
m.invoke(this, new Object[] { uri, (org.xml.sax.Attributes)attrs });
}
catch (IllegalAccessException e) {
throw new RuntimeException(e);
}
catch (NoSuchMethodException e) {
throw new RuntimeException(e); }
catch (java.lang.reflect.InvocationTargetException e) {
org.xml.sax.SAXException se =
new org.xml.sax.SAXException(e.getTargetException());
se.setStackTrace(e.getTargetException().getStackTrace());
}
and in a particular configuration parser:
public void start_Request(String uri, org.xml.sax.Attributes attrs) {
// make sure to read attributes correctly
System.err.println("Request, name="+ attrs.getValue(0);
}

Since you are using this for configuration, your best bet is apache commons-configuration. For simple files it's way easier to use than "raw" XML parsers.
See the XML how-to

Related

Unable to fetch nodes from xml even with namespace management in place in java

Below is the screenshot of the xml file i am working with, i need to get the value 'switchboardid1' from tag Extensions:
Below is the code i have writen: I need to access property 'switchboardid1' from extensions tag. I always get only null in return. Please correct my code and help me understand.
I have NamespaceContext class to return the namespace in class 'HardcodedNamespaceResolver' and it is correctly returning the value of nfh namespace.
public void test() throws Throwable
{
String xpath="//ElectricalProject/Equipments/Equipment/Extensions/Extension/nfh:extensionProperty[#name='switchboardId']";
Node node = GetNodeFromXml("PutNFInProj.xml",xpath);
Element ele = (Element) node;
System.out.println(ele.getNodeValue().toString());
}
//Function to GET a single Node from xml file wrt to xpath defined
public Node GetNodeFromXml(String XmlFileName, String xPathExpression) throws Throwable
{
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(GetDataFile(XmlFileName));
((org.w3c.dom.Document) doc).getDocumentElement().normalize();
XPath xPath = XPathFactory.newInstance().newXPath();
xPath.setNamespaceContext(new HardcodedNamespaceResolver());
NodeList nodeList = (NodeList) xPath.evaluate(xPathExpression,doc,XPathConstants.NODESET);
switch (nodeList.getLength())
{
case 0:
{
log.error("In Function: GetNodeFromXml - There are no nodes with respect to given xpath, Please Check the Xpath");
return null;
}
case 1:
{
Node nNode = nodeList.item(0);
return nNode;
}
default:
{
log.error("In Function: GetNodeFromXml- There are more than one nodes with respect to given xpath");
return null;
}
}
}
}
SimpleXml can do it:
final String yourxml = ...
final SimpleXml simple = new SimpleXml();
System.out.println(getSwitchBoardId(simple.fromXml(yourxml)));
private static String getSwitchBoardId(final Element element) {
return element.children.get(2).children.get(0).children.get(4).children.get(0).children.get(0).text;
}
Will output:
switchboardid1
From maven central:
<dependency>
<groupId>com.github.codemonstur</groupId>
<artifactId>simplexml</artifactId>
<version>1.4.0</version>
</dependency>

XPath expression unable to match

I'm trying to parse out some information from XML using XPath in Java (v 1.7). My XML looks like this:
<soap:Fault xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<faultcode>code</faultcode>
<faultstring>string</faultstring>
<detail>detail</detail>
</soap:Fault>
My code:
final InputSource inputSource = new InputSource(new StringReader(xmlContent));
final DocumentBuilder documentBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
final Document document = documentBuilder.parse(inputSource);
final XPath xPath = XPathFactory.newInstance().newXPath();
final String faultCode = xPath.compile("/soap:Fault/faultcode/text()[1]").evaluate(document);
I have tried the XPath expression in an online checker with the XML content and it suggests that a match is made. However, when I run it in a wee stand-alone program, I get no value in "faultCode".
This issue is probably something simple, but I am unable to identify what the problem is.
Thanks for any assistance.
You should bind the namespace prefix "soap" to the URI "http://schemas.xmlsoap.org/soap/envelope/" using the XPath.setNamespaceContext() method.
First you need a namespace aware document builder factory:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
Then you need a namespace context:
NamespaceContext nsContext = new NamespaceContext() {
#Override
public Iterator getPrefixes(String namespaceURI) {
return null;
}
#Override
public String getPrefix(String namespaceURI) {
return "soap";
}
#Override
public String getNamespaceURI(String prefix) {
return "http://schemas.xmlsoap.org/soap/envelope/";
}
};
xPath.setNamespaceContext(nsContext);
With these additions, your code should work.
Regarding namespace contexts, I suggest you read http://www.ibm.com/developerworks/xml/library/x-nmspccontext/index.html.
It may be because of the namespaces effect. You can try namespace independent tags by matching with the local-name giving the syntax below:
/*[local-name()='Fault' and namespace-uri()='http://schemas.xmlsoap.org/soap/envelope/']/*[local-name()='faultcode']/text()[1]

Java Transformer how to ignore namespaces

I have to transform XML to XHTML but the XML defines a namespace xmlns='http://www.lotus.com/dxl' which is never used in the whole XML therefore the parser won't parse anything ...
Is there a way I ignore namepsaces? I am using the Oracle java transformer import javax.xml.transform.Transformer; import javax.xml.transform.TransformerFactory
Or are there any better libraries?
No, you can't ignore namespaces.
If the namespace declaration xmlns='http://www.lotus.com/dxl' appears in the outermost element, then you can't say it "isn't used anywhere" - on the contrary, it is used everywhere! It effectively changes every element name in the document to a different name. There's no way you can ignore that.
If you were using XSLT 2.0, then you would be able to say in your stylesheet xpath-default-namespace="http://www.lotus.com/dxl" which would pretty much do what you want: it says that any unprefixed name in a match pattern or XPath expression should be interpreted as referring to a name in namespace http://www.lotus.com/dxl. Sadly, you've chosen an XSLT processor that doesn't implement XSLT 2.0. So you'll have to do it the hard way (which is described in about 10,000 posts that you will find by searching for "XSLT default namespace").
You can't ignore namespaces easily, and it won't be pretty, but it is possible. Of course, tricking the right part inside the Transformer implementation into just outputting the prefixes without getting flustered is implementation dependent!
OK then, this works for me going from a Node to a StringWriter:
public static String nodeToString(Node node) throws TransformerException {
StringWriter results = new StringWriter();
Transformer transformer = createTransformer();
transformer.transform(new DOMSource(node), new StreamResult(results) {
#Override
public Writer getWriter() {
Field field = findFirstAssignable(transformer.getClass());
try {
field.setAccessible(true);
field.set(transformer, new TransletOutputHandlerFactory(false) {
#Override
public SerializationHandler getSerializationHandler() throws
IOException, ParserConfigurationException {
SerializationHandler handler = super.getSerializationHandler();
SerializerBase base = (SerializerBase) handler.asDOMSerializer();
base.setNamespaceMappings(new NamespaceMappings() {
#Override
public String lookupNamespace(String prefix) {
return prefix;
}
});
return handler;
}
});
} catch(IllegalAccessException e) {
throw new AssertionError("Must not happen", e);
}
return super.getWriter();
}
});
return results.toString();
}
private static <E> Field findFirstAssignable(Class<E> clazz) {
return Stream.<Class<? super E>>iterate(clazz, Convert::iteration)
.flatMap(Convert::classToFields)
.filter(Convert::canAssign).findFirst().get();
}
private static <E> Class<? super E> iteration(Class<? super E> c) {
return c == null ? null : c.getSuperclass();
}
private static boolean canAssign(Field f) {
return f == null ||
f.getType().isAssignableFrom(TransletOutputHandlerFactory.class);
}
private static <E> Stream<Field> classToFields(Class<? super E> c) {
return c == null ? Stream.of((Field) null) :
Arrays.stream(c.getDeclaredFields());
}
What this is doing is pretty much just customizing the mapping of namespaces to prefixes. Every prefix is mapped to a namespace identified by its prefix, so there shouldn't even be any conflicts. The rest of it is fighting the API.
To make the example complete, here are the methods converting to and from the XML as well:
public static Transformer createTransformer()
throws TransformerFactoryConfigurationError,
TransformerConfigurationException {
TransformerFactory factory = TransformerFactory.newInstance();
Transformer transformer = factory.newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.setOutputProperty(OutputKeys.INDENT, "no");
return transformer;
}
public static ArrayList<Node> parseNodes(String uri, String expression)
throws ParserConfigurationException, SAXException,
IOException,XPathExpressionException {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(false);
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(uri);
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile(expression);
NodeList nl = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
ArrayList<Node> nodes = new ArrayList<>();
for(int i = 0; i < nl.getLength(); i++) {
nodes.add(nl.item(i));
}
return nodes;
}

XPath returning null for "Node" when isNameSpaceAware and isValidating are "true"

I am getting a null node when I an trying to parse an XML file.
XPath xPath = XPathFactory.newInstance().newXPath();
Node node = null;
try {
node = (Node) xPath.evaluate(
"/mynode",
doc,
XPathConstants.NODE);
I am facing this issue only in case-
1. DocumentBuilderFactory- setNameSpaceAware is true
2. DocumentBuilderFactory- setValidating is true.
If these are set to false, then I am getting correct results. Can anyone help me on understanding what is the relation of setting these attributes to false?
(I have checked this question, but it does not clear my doubt)
Here is the xml-
<?xml version="1.0" encoding="UTF-8"?>
<mynode xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.example.com" xsi:schemaLocation="http://www.example.com example.xsd">
<name>TargetName</name>
<desc>desc goes here</desc>
<pack>my.this</pack>
<object>my.ExampleObject</object>
<properties>
<attrib>
<name>id</name>
<value>ZZZ</value>
</attrib>
<attrib>
<name>ind</name>
<value>X</value>
</attrib>
</properties>
<children>
<child>
<name>childnodename</name>
<desc>description goes here</desc>
<invalues>
<scope>ALL</scope>
</invalues>
<outvalues>
<scope>ALL</scope>
</outvalues>
<akey>
<aname>AAA</aname>
<key></key>
</akey>
<msg>
<success>code1</success>
<failure>code2</failure>
</msg>
</child>
</children>
</mynode>
The quickest fix is to not do setNamespaceAware(true); :-) However, if you want a namespace aware XPath then you have stumbled across a classic problem - XPath: Is there a way to set a default namespace for queries?, in that XPath does not support the concept of a default namespace.
So your XPath must use a namespace prefix in order for the query to find any nodes. However, you can set a NamespaceContext on the XPath instance to resolve the namespace prefix or default namespace to a URI. One way to do this, for example:
import java.util.*;
import java.io.ByteArrayInputStream;
import javax.xml.namespace.NamespaceContext;
import javax.xml.parsers.*;
import javax.xml.xpath.*;
import org.w3c.dom.*;
public class XmlParse {
public static void main(String[] args) throws Exception {
String xml =
"<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
"<mynode xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xmlns=\"http://www.example.com\" xsi:schemaLocation=\"http://www.example.com example.xsd\">" +
"<name>TargetName</name>" +
"</mynode>";
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
DocumentBuilder builder = dbf.newDocumentBuilder();
Document doc = builder.parse(new ByteArrayInputStream(xml.getBytes()));
final String nonameNamespace = doc.getFirstChild().getNamespaceURI();
NamespaceContext ctx = new NamespaceContext() {
public String getNamespaceURI(String prefix) {
String uri = null;
if (prefix.equals("n")) {
uri = nonameNamespace;
}
return uri;
}
#Override
public Iterator getPrefixes(String val) {
throw new IllegalAccessError("Not implemented!");
}
#Override
public String getPrefix(String uri) {
throw new IllegalAccessError("Not implemented!");
}
};
XPath xPath = XPathFactory.newInstance().newXPath();
xPath.setNamespaceContext(ctx);
Node node = null;
try {
node = (Node) xPath.evaluate("/n:mynode/n:name", doc, XPathConstants.NODE);
System.out.println(node.getNodeName());
System.out.println(node.getFirstChild().getNodeValue());
} catch (Exception e) {
}
}
}
So this will resolve the default namespace (xmlns) to http://www.example.com when a node with n prefix is encountered.
XML is namespace-aware. Each XML element (and attribute) has an associated namespace; if not specified otherwise it's the empty (default) namespace.
In your case it is likely that the XML document you're trying to read uses namespaces, and your XPath query seems to only query the emtpy namespace. Therefore you don't get a result back. Make sure to use the proper namespace and it will work.

JAXP: How to force XPath to validate namespace prefixes?

I am relying on the default JAXP implementation and using the Oracle JRE.
When evaluating a XPath which contains an unknown namespace prefix, it does not throw an (expected) exception.
When I run the same application on an IBM JRE, everything is fine and it throws the expected exception javax.xml.xpath.XPathExpressionException: org.apache.xpath.domapi.XPathStylesheetDOM3Exception: Prefix must resolve to a namespace
I am using the following code which tries to access an invalid namespace unknownns
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory
.newInstance();
documentBuilderFactory.setNamespaceAware(true);
documentBuilderFactory.setValidating(true);
documentBuilderFactory.setAttribute(JAXP_SCHEMA_LANGUAGE, W3C_XML_SCHEMA);
DocumentBuilder builder = documentBuilderFactory.newDocumentBuilder();
Document doc = builder.parse(xmlFile_);
XPath xpath = XPathFactory.newInstance().newXPath();
NodeList nodeList = (NodeList) xpath.evaluate("path/to/node/unknowns:#bla", doc,
XPathConstants.NODESET);
Question:
How can I enforce this validation independently from the JAXP implementation?
Try setting a NamespaceContext on your XPath instance:
public final class NSValidator {
private NSValidator() {
}
private static final NamespaceContext INSTANCE = new NamespaceContext() {
#Override public String getNamespaceURI(String prefix) {
return null;
}
#Override public String getPrefix(String namespaceURI) {
return null;
}
#Override public Iterator<?> getPrefixes(String namespaceURI) {
return Collections.emptyList()
.iterator();
}
};
public static NamespaceContext noNamespaces() {
return INSTANCE;
}
}

Categories

Resources