Read xml file which has 2 root node? - java

input file
xml
<?xml version="1.0" encoding="UTF-8"?>
<response>
<message></message>
<messagecode></messagecode>
<messagedescription></messagedescription>
</response>
<response>
<message></message>
<messagecode></messagecode>
<messagedescription></messagedescription>
</response>
two response --
response id root node..
Java code
public void readXML(String output) {
try {
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(output);
doc.getDocumentElement().normalize();
NodeList nodes = doc.getElementsByTagName("response");
for (int i = 0; i < nodes.getLength(); i++) {
Node nNode = nodes.item(i);
if (nNode.getNodeType() == Node.ELEMENT_NODE) {
Element element = (Element) nNode;
NodeList msg = element.getElementsByTagName("message");
Element line = (Element) msg.item(i);
System.out.println("Message: " + getCharacterDataFromElement(line));
NodeList msgcode = element.getElementsByTagName("messagecode");
line = (Element) msgcode.item(i);
System.out.println("Message Code: " + getCharacterDataFromElement(line));
NodeList msgdes = element.getElementsByTagName("messagedescription");
line = (Element) msgdes.item(i);
System.out.println("Message Description: " + getCharacterDataFromElement(line));
NodeList medialink = element.getElementsByTagName("medialink");
line = (Element) medialink.item(i);
System.out.println("Media link: " + getCharacterDataFromElement(line));
NodeList mediastatus = element.getElementsByTagName("mediastatus");
line = (Element) mediastatus.item(i);
System.out.println("Media Status: " + getCharacterDataFromElement(line));
}
}
} catch (ParserConfigurationException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (Exception e) {
e.printStackTrace();
}
}
public static String getCharacterDataFromElement(Element e) {
Node child = e.getFirstChild();
if (child instanceof CharacterData) {
CharacterData cd = (CharacterData) child;
return cd.getData();
}
return "";
}
i try this code but error will display how can i reslove this....
SAX Exception will give error in not well formed xml file.
how can i read two root node values in same java file..

Your XML does not have matching opening/closing tags or a root node.
Something like this would suffice:
<?xml version="1.0" encoding="UTF-8"?>
<responses>
<response>
<message></message>
<messagecode></messagecode>
<messagedescription></messagedescription>
</response>
<response>
<message></message>
<messagecode></messagecode>
<messagedescription></messagedescription>
</response>
</responses>

XML should always be well-formed, meaning that it has only one root node and every opening tag should have a closing tag. Check for correct XML syntax from W3Schools pages: http://www.w3schools.com/xml/xml_syntax.asp There are also lot's of other useful information about constructing XML documents there.
If then for some reason you absolutely need to handle XML with multiple root nodes, then it's probably manual parsing with substrings or regular expressions, which i don't recommend.
Either way, if you can do something about it, try to form the XML so that you have only one root node. Period.

Related

Android XML Parsing error: Only one root element allowed

I want to parse an XML. I am posting my XML response below. In the
pre tag I am getting a JSON which i want to print but I am not able to parse with my code. I am posting my code to parse this XML.
private void xmlParsing(String qrCode) {
try {
qrCode = qrCode.replaceAll("[^\\x20-\\x7e]", "");
//loge("qrCode : " + qrCode);
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(new ByteArrayInputStream(qrCode.getBytes("utf-8")));
Element element = doc.getDocumentElement();
element.normalize();
NodeList nList = doc.getElementsByTagName("head");
loge("--df--nList.getLength()---"+nList.getLength());
for (int i=0; i<nList.getLength(); i++) {
Node node = nList.item(i);
if (node.getNodeType() == Node.ELEMENT_NODE) {
Element element2 = (Element) node;
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
<head></head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">{"status":true,"message":"Login Successfull","data":{"user":{"id":2,"name":"Rommy Garg","email":"rommy#signitysolutions.com","user_group_id":"2","company_id":2,"last_login":"2019-05-29 05:48:27","last_logout":"2019-05-28 10:33:39","profile_pic":null,"created_at":"2018-12-20 10:12:23","updated_at":"2019-05-29 05:48:27","sf_reference_id":"0056F00000BqMZSQA3","sf_setup":1},"company_logo":"http:\/\/staging.sales-chap.com\/dist\/uploads\/company\/1545300743.jpg","client_id":1,"client_secret":"IQ09J2BdDuc3lSKUJlQAp8uhCXRq+s2EucsBOb9rfjo="}}</pre>
</body>
but I am getting below error:
org.w3c.dom.DOMException: Only one root element allowed
Well, as the error said, XML only allows a single root element. You could create a fake one around the string you recieve:
qrCode = "<html>" + qrCode + "</html>";

Unable to parse XML using java

I have an XML string got as a response. But I am unable to reach at Response Code and remarks. Can anybody help me to get the response code.
<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
<s:Body>
<GetIMEIInfoResponse xmlns="http://tempuri.org/">
<GetIMEIInfoResult>
<![CDATA[
<SerialsDetail>
<Item>
<ResponseCode>2</ResponseCode>
<Remark>Invalid Input</Remark>
</Item>
</SerialsDetail>
]]>
</GetIMEIInfoResult>
</GetIMEIInfoResponse>
</s:Body>
</s:Envelope>
Thats how I am trying to do
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder;
try {
builder = factory.newDocumentBuilder();
Document doc = builder.parse(new InputSource(new StringReader(response)));
NodeList list = doc.getElementsByTagName("Remark");
System.out.println(list.getLength());
Node n = list.item(0);
System.out.println(n.getTextContent());
} catch (Exception e) {
e.printStackTrace();
}
You are asking for an element with name "Remark", but you document does not contain such an element. Instead, it contains only an "GetIMEIInfoResult" element with a bunch of text in it. This text happens to be xml. But in order to access the contents of the inner piece of XML, you have to parse the contents of the "GetIMEIInfoResult" in the same way that you've parsed the entire document.
Here is how you can do it:
import java.io.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.InputSource;
public class NestedCDATA {
private static String response =
"<s:Envelope xmlns:s=\"http://schemas.xmlsoap.org/soap/envelope/\">" +
" <s:Body>" +
" <GetIMEIInfoResponse xmlns=\"http://tempuri.org/\">" +
" <GetIMEIInfoResult>" +
" <![CDATA[" +
" <SerialsDetail>" +
" <Item>" +
" <ResponseCode>2</ResponseCode>" +
" <Remark>Aawwwwwwww yeaaaah!</Remark>" +
" </Item>" +
" </SerialsDetail>" +
" ]]>" +
" </GetIMEIInfoResult>" +
" </GetIMEIInfoResponse>" +
" </s:Body>" +
"</s:Envelope>";
public static String getCdata(Node parent) {
NodeList cs = parent.getChildNodes();
for(int i = 0; i < cs.getLength(); i++){
Node c = cs.item(i);
if(c instanceof CharacterData) {
CharacterData cdata = (CharacterData)c;
String content = cdata.getData().trim();
if (content.length() > 0) {
return content;
}
}
}
return "";
}
public static void main(String[] args) {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder;
try {
builder = factory.newDocumentBuilder();
Document doc = builder.parse(new InputSource(new StringReader(response)));
Node cdataParent = doc.getElementsByTagName("GetIMEIInfoResult").item(0);
DocumentBuilder cdataBuilder = factory.newDocumentBuilder();
Document cdataDoc = cdataBuilder.parse(new InputSource(new StringReader(
getCdata(cdataParent)
)));
Node remark = cdataDoc.getElementsByTagName("Remark").item(0);
System.out.println("Content of Remark in CDATA: " + getCdata(remark));
} catch (Exception e) {
e.printStackTrace();
}
}
}
Result: "Content of Remark in CDATA: Aawwwwwwww yeaaaah!".
Here is another interesting question for you: why does your service output XML with XML in it? XML all by itself is already nested enough. Is it really necessary to wrap parts of it in CDATA?
The problem of the XML is that the data in the tag GetIMEIInfoResult is CDATA. This causes the builder not to recognize it as XML. To access the data in the tag GetIMEIInfoResult you can use the following:
Element infoResult = (Element) list.item(0);
String elementData = getCharacterDataOfNode(infoResult.getFirstChild());
public static String getCharacterDataOfNode(Node node) {
String data = "";
if (node instanceof CharacterData) {
data = ((CharacterData) node).getData();
}
return data;
}
Then you have to parse that data again with a DocumentBuilder where you can access the tag Remark. To get the content you have again work with the getCharacterDataOfNode() method.

Incoherent XPath output

I just discovered XPath and I'm trying to use it in order to parse an XML file. I read a few courses about it, but I am stuck with a problem. When I try to get a NodeList from the file, the getLength() method always returns 0.
However, when I try a
document.getElementsByTagName("crtx:env").getLength()
The output is correct (7 in my case).
I do not really understand, because my nodelist is built according to my Document, the output should be similar, isn't it ?
Here is a part of my code :
IFile f = (IFile) PlatformUI.getWorkbench().getActiveWorkbenchWindow().getActivePage().getActiveEditor().getEditorInput().getAdapter(IFile.class);
String fileURL = f.getLocation().toOSString();
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = null;
try {
builder = builderFactory.newDocumentBuilder();
} catch (ParserConfigurationException e) {
e.printStackTrace();
}
Document document = null;
if (builder != null){
try{
document = builder.parse(fileURL);
System.out.println("DOCUMENT URI : " + document.getDocumentURI());
} catch (Exception e){
e.printStackTrace();
}
} else {
System.out.println("builder null");
}
XPath xPath = XPathFactory.newInstance().newXPath();
NodeList nodeList = null;
try {
nodeList = (NodeList) xPath.compile("crtx:env").evaluate(document, XPathConstants.NODESET);
} catch (XPathExpressionException e) {
e.printStackTrace();
}
System.out.println("NODELIST SIZE : " + nodeList.getLength());
System.out.println(document.getElementsByTagName("crtx:env").getLength());
}
The first System.out.println() returns coherent output (a good URI), but the two last lines return a different number.
Any help would be appreciated.
Thanks for reading.
XPath is defined for XML with namespaces so set
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
builderFactory.setNamespaceAware(true);
Then to use a path with a namespace prefix you need to use https://docs.oracle.com/javase/7/docs/api/javax/xml/xpath/XPath.html#setNamespaceContext%28javax.xml.namespace.NamespaceContext%29 to bind the prefix(es) used to namespace URIs, see https://xml.apache.org/xalan-j/xpath_apis.html#namespacecontext for an example
With a namespace aware DOM you will need to change your getElementsByTagName call to use getElementsByTagNameNS however.

getNodeName() operation on an XML node returns #text

<person>
<firstname>
<lastname>
<salary>
</person>
This is the XML I am parsing. When I try printing the node names of child elements of person,
I get
text
firstname
text
lastname
text
salary
How do I eliminate #text being generated?
Update -
Here is my code
try {
NodeList nl = null;
int l, i = 0;
File fXmlFile = new File("file.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
dbFactory.setValidating(false);
dbFactory.setIgnoringElementContentWhitespace(true);
dbFactory.setNamespaceAware(true);
dbFactory.setIgnoringComments(true);
dbFactory.setCoalescing(true);
InputStream in;
in = new FileInputStream(fXmlFile);
Document doc = dBuilder.parse(in);
doc.getDocumentElement().normalize();
Node n = doc.getDocumentElement();
System.out.println(dbFactory.isIgnoringElementContentWhitespace());
System.out.println(n);
if (n != null && n.hasChildNodes()) {
nl = n.getChildNodes();
for (i = 0; i < nl.getLength(); i++) {
System.out.println(nl.item(i).getNodeName());
}
}
} catch (Exception e) {
e.printStackTrace();
}
setIgnoringElementContentWhitespace only works if you use setValidating(true), and then only if the XML file you are parsing references a DTD that the parser can use to work out which whitespace-only text nodes are actually ignorable. If your document doesn't have a DTD it errs on the safe side and assumes that no text nodes can be ignored, so you'll have to write your own code to ignore them as you traverse the child nodes.

Node.getTextContent() is there a way to get text content of the current node, not the descendant's text

Node.getTextContent() returns the text content of the current node and its descendants.
is there a way to get text content of the current node, not the descendant's text.
Example
<paragraph>
<link>XML</link>
is a
<strong>browser based XML editor</strong>
editor allows users to edit XML data in an intuitive word processor.
</paragraph>
expected output
paragraph = is a editor allows users to edit XML data in an intuitive word processor.
link = XML
strong = browser based XML editor
i tried below code
String str = "<paragraph>"+
"<link>XML</link>"+
" is a "+
"<strong>browser based XML editor</strong>"+
"editor allows users to edit XML data in an intuitive word processor."+
"</paragraph>";
org.w3c.dom.Document domDoc = null;
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder;
try {
docBuilder = docFactory.newDocumentBuilder();
ByteArrayInputStream bis = new ByteArrayInputStream(str.getBytes());
domDoc = docBuilder.parse(bis);
} catch (ParserConfigurationException e1) {
e1.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
DocumentTraversal traversal = (DocumentTraversal) domDoc;
NodeIterator iterator = traversal.createNodeIterator(
domDoc.getDocumentElement(), NodeFilter.SHOW_ELEMENT, null, true);
for (Node n = iterator.nextNode(); n != null; n = iterator.nextNode()) {
String tagname = ((Element) n).getTagName();
System.out.println(tagname + "=" + ((Element)n).getTextContent());
}
but it gives the output like this
paragraph=XML is a browser based XML editoreditor allows users to edit XML data in an intuitive word processor.
link=XML
strong=browser based XML editor
note the paragraph element contains the text of link and strong tag, which i dont want.
please suggest some ideas?
What you want is to filter children of your node <paragraph> to only keep ones with node type Node.TEXT_NODE.
This is an example of method that will return you the desired content
public static String getFirstLevelTextContent(Node node) {
NodeList list = node.getChildNodes();
StringBuilder textContent = new StringBuilder();
for (int i = 0; i < list.getLength(); ++i) {
Node child = list.item(i);
if (child.getNodeType() == Node.TEXT_NODE)
textContent.append(child.getTextContent());
}
return textContent.toString();
}
Within your example it means:
String str = "<paragraph>" + //
"<link>XML</link>" + //
" is a " + //
"<strong>browser based XML editor</strong>" + //
"editor allows users to edit XML data in an intuitive word processor." + //
"</paragraph>";
Document domDoc = null;
try {
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
ByteArrayInputStream bis = new ByteArrayInputStream(str.getBytes());
domDoc = docBuilder.parse(bis);
} catch (Exception e) {
e.printStackTrace();
}
DocumentTraversal traversal = (DocumentTraversal) domDoc;
NodeIterator iterator = traversal.createNodeIterator(domDoc.getDocumentElement(), NodeFilter.SHOW_ELEMENT, null, true);
for (Node n = iterator.nextNode(); n != null; n = iterator.nextNode()) {
String tagname = ((Element) n).getTagName();
System.out.println(tagname + "=" + getFirstLevelTextContent(n));
}
Output:
paragraph= is a editor allows users to edit XML data in an intuitive word processor.
link=XML
strong=browser based XML editor
What it does is iterating on all the children of a Node, keeping only TEXT (thus excluding comments, node and so on) and accumulating their respective text content.
There is no direct method in Node or Element to get only the text content at first level.
If you change the last for loop into the following one it behaves as you wanted
for (Node n = iterator.nextNode(); n != null; n = iterator.nextNode()) {
String tagname = ((Element) n).getTagName();
StringBuilder content = new StringBuilder();
NodeList children = n.getChildNodes();
for(int i=0; i<children.getLength(); i++) {
Node child = children.item(i);
if(child.getNodeName().equals("#text"))
content.append(child.getTextContent());
}
System.out.println(tagname + "=" + content);
}
I do this with Java 8 streams and a helper class:
import java.util.*;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class NodeLists
{
/** converts a NodeList to java.util.List of Node */
static List<Node> list(NodeList nodeList)
{
List<Node> list = new ArrayList<>();
for(int i=0;i<nodeList.getLength();i++) {list.add(nodeList.item(i));}
return list;
}
}
And then
NodeLists.list(node)
.stream()
.filter(node->node.getNodeType()==Node.TEXT_NODE)
.map(Node::getTextContent)
.reduce("",(s,t)->s+t);
Implicitly don't have any function for the actual node text but with a simple trick you can do it. Ask if the node.getTextContent() contains "\n", if that is the case then the actual node don't have any text.
Hope this help.

Categories

Resources