Java DOM parser returns null document - java

I have an HTML template which I want to read in:
<html>
<head>
<title>TEST</title>
</head>
<body>
<h1 id="hey">Hello, World!</h1>
</body>
</html>
I want find the tag with the id hey and then paste in new stuff (e.g. new tags). For this purpose I use the DOM parser. But my code returns me null:
public static void main(String[] args) {
try {
File file = new File("C:\\Users\\<username>\\Desktop\\template.html");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(file);
doc.getDocumentElement().normalize();
System.out.println(doc.getElementById("hey")); // returns null
} catch (Exception e) {
e.printStackTrace();
}
}
What am I doing wrong?

You are trying to parse a piece of XML with the Java XML API, that is very compliant with the XML specification and doesn't help the casual developer.
In XML an attribute named id is not automatically of ID type, and thus the XML implementation doesn't get it with .getElementById(). Either you use another library (Jsoup for example), or instruct the parser to treat id as an ID (via the DTD) or you use custom code.

I modified your example to using jsoup
public static void main(String[] args) {
try {
File file = new File("C:\\Users\\<username>\\Desktop\\template.html");
Document doc = Jsoup.parse(file, "UTF8");
Element elementById = doc.getElementById("hey");
System.out.println("hey ="+doc.getElementById("hey").ownText());
System.out.println("hey ="+doc.getElementById("hey"));
} catch (Exception e) {
e.printStackTrace();
}
}

Related

How to get xml page by url

Ok, so I got some url link like https://stackoverflow.com/ and I'm trying to parse it in document but getting error. Why? Because this is not xml file, so the question is how can I get data as xml if i got only url?
My code:
public class URLReader {
public static void main(String[] args) throws Exception {
// or if you prefer DOM:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new URL("https://stackoverflow.com/").openStream());
int nodes = doc.getChildNodes().getLength();
System.out.println(nodes + " nodes found");
}
}
To parse HTML you may use JSOUP: https://jsoup.org/
This library provides also some features to transform HTML to XHTML, which some sort of XML:
Document document = Jsoup.parse(html);
document.outputSettings().syntax(Document.OutputSettings.Syntax.xml);
document.outputSettings().escapeMode(org.jsoup.nodes.Entities.EscapeMode.xhtml);
String xhtml=document.html();

Parsing dom error in java

I am trying to parse an XML and then insert it an Excel File.
If I run my code it works even with errors but I cannot make any modification to it because I still got errors. Here is my code:
public class Parsing {
private void parseXmlFile(){
//get the factory
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
try {
//using Factory get an instance of document builder
DocumentBuilder db = dbf.newDocumentBuilder();
//parse using builder to get DOM representation
dom = db.parse("Employee.xml"); }
} catch )
}
}
What is wrong with this?
Can someone help me? I've been searching all over google and it's eating my nerves.
it should be like this :-
private void parseXmlFile(){
//get the factory
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
try {
//using Factory get an instance of document builder
DocumentBuilder db = dbf.newDocumentBuilder();
//parse using builder to get DOM representation
Document dom = db.parse("Employee.xml");
} catch(IOException ex ){ // OR Any Specific Exception should be catched here
// your error handling code here
}
}
Also Employee.xml should be in the current directory or give complete abosulte path of Employee.xml file also.

Extracting XML Elements from Java Object [duplicate]

I am new to XML. I want to read the following XML on the basis of request name. Please help me on how to read the below XML in Java -
<?xml version="1.0"?>
<config>
<Request name="ValidateEmailRequest">
<requestqueue>emailrequest</requestqueue>
<responsequeue>emailresponse</responsequeue>
</Request>
<Request name="CleanEmail">
<requestqueue>Cleanrequest</requestqueue>
<responsequeue>Cleanresponse</responsequeue>
</Request>
</config>
If your XML is a String, Then you can do the following:
String xml = ""; //Populated XML String....
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new InputSource(new StringReader(xml)));
Element rootElement = document.getDocumentElement();
If your XML is in a file, then Document document will be instantiated like this:
Document document = builder.parse(new File("file.xml"));
The document.getDocumentElement() returns you the node that is the document element of the document (in your case <config>).
Once you have a rootElement, you can access the element's attribute (by calling rootElement.getAttribute() method), etc. For more methods on java's org.w3c.dom.Element
More info on java DocumentBuilder & DocumentBuilderFactory. Bear in mind, the example provided creates a XML DOM tree so if you have a huge XML data, the tree can be huge.
Related question.
Update Here's an example to get "value" of element <requestqueue>
protected String getString(String tagName, Element element) {
NodeList list = element.getElementsByTagName(tagName);
if (list != null && list.getLength() > 0) {
NodeList subList = list.item(0).getChildNodes();
if (subList != null && subList.getLength() > 0) {
return subList.item(0).getNodeValue();
}
}
return null;
}
You can effectively call it as,
String requestQueueName = getString("requestqueue", element);
In case you just need one (first) value to retrieve from xml:
public static String getTagValue(String xml, String tagName){
return xml.split("<"+tagName+">")[1].split("</"+tagName+">")[0];
}
In case you want to parse whole xml document use JSoup:
Document doc = Jsoup.parse(xml, "", Parser.xmlParser());
for (Element e : doc.select("Request")) {
System.out.println(e);
}
If you are just looking to get a single value from the XML you may want to use Java's XPath library. For an example see my answer to a previous question:
How to use XPath on xml docs having default namespace
It would look something like:
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
public class Demo {
public static void main(String[] args) {
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
try {
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document dDoc = builder.parse("E:/test.xml");
XPath xPath = XPathFactory.newInstance().newXPath();
Node node = (Node) xPath.evaluate("/Request/#name", dDoc, XPathConstants.NODE);
System.out.println(node.getNodeValue());
} catch (Exception e) {
e.printStackTrace();
}
}
}
There are a number of different ways to do this. You might want to check out XStream or JAXB. There are tutorials and the examples.
If the XML is well formed then you can convert it to Document. By using the XPath you can get the XML Elements.
String xml = "<stackusers><name>Yash</name><age>30</age></stackusers>";
Form XML-String Create Document and find the elements using its XML-Path.
Document doc = getDocument(xml, true);
public static Document getDocument(String xmlData, boolean isXMLData) throws Exception {
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
dbFactory.setNamespaceAware(true);
dbFactory.setIgnoringComments(true);
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc;
if (isXMLData) {
InputSource ips = new org.xml.sax.InputSource(new StringReader(xmlData));
doc = dBuilder.parse(ips);
} else {
doc = dBuilder.parse( new File(xmlData) );
}
return doc;
}
Use org.apache.xpath.XPathAPI to get Node or NodeList.
System.out.println("XPathAPI:"+getNodeValue(doc, "/stackusers/age/text()"));
NodeList nodeList = getNodeList(doc, "/stackusers");
System.out.println("XPathAPI NodeList:"+ getXmlContentAsString(nodeList));
System.out.println("XPathAPI NodeList:"+ getXmlContentAsString(nodeList.item(0)));
public static String getNodeValue(Document doc, String xpathExpression) throws Exception {
Node node = org.apache.xpath.XPathAPI.selectSingleNode(doc, xpathExpression);
String nodeValue = node.getNodeValue();
return nodeValue;
}
public static NodeList getNodeList(Document doc, String xpathExpression) throws Exception {
NodeList result = org.apache.xpath.XPathAPI.selectNodeList(doc, xpathExpression);
return result;
}
Using javax.xml.xpath.XPathFactory
System.out.println("javax.xml.xpath.XPathFactory:"+getXPathFactoryValue(doc, "/stackusers/age"));
static XPath xpath = javax.xml.xpath.XPathFactory.newInstance().newXPath();
public static String getXPathFactoryValue(Document doc, String xpathExpression) throws XPathExpressionException, TransformerException, IOException {
Node node = (Node) xpath.evaluate(xpathExpression, doc, XPathConstants.NODE);
String nodeStr = getXmlContentAsString(node);
return nodeStr;
}
Using Document Element.
System.out.println("DocumentElementText:"+getDocumentElementText(doc, "age"));
public static String getDocumentElementText(Document doc, String elementName) {
return doc.getElementsByTagName(elementName).item(0).getTextContent();
}
Get value in between two strings.
String nodeVlaue = org.apache.commons.lang.StringUtils.substringBetween(xml, "<age>", "</age>");
System.out.println("StringUtils.substringBetween():"+nodeVlaue);
Full Example:
public static void main(String[] args) throws Exception {
String xml = "<stackusers><name>Yash</name><age>30</age></stackusers>";
Document doc = getDocument(xml, true);
String nodeVlaue = org.apache.commons.lang.StringUtils.substringBetween(xml, "<age>", "</age>");
System.out.println("StringUtils.substringBetween():"+nodeVlaue);
System.out.println("DocumentElementText:"+getDocumentElementText(doc, "age"));
System.out.println("javax.xml.xpath.XPathFactory:"+getXPathFactoryValue(doc, "/stackusers/age"));
System.out.println("XPathAPI:"+getNodeValue(doc, "/stackusers/age/text()"));
NodeList nodeList = getNodeList(doc, "/stackusers");
System.out.println("XPathAPI NodeList:"+ getXmlContentAsString(nodeList));
System.out.println("XPathAPI NodeList:"+ getXmlContentAsString(nodeList.item(0)));
}
public static String getXmlContentAsString(Node node) throws TransformerException, IOException {
StringBuilder stringBuilder = new StringBuilder();
NodeList childNodes = node.getChildNodes();
int length = childNodes.getLength();
for (int i = 0; i < length; i++) {
stringBuilder.append( toString(childNodes.item(i), true) );
}
return stringBuilder.toString();
}
OutPut:
StringUtils.substringBetween():30
DocumentElementText:30
javax.xml.xpath.XPathFactory:30
XPathAPI:30
XPathAPI NodeList:<stackusers>
<name>Yash</name>
<age>30</age>
</stackusers>
XPathAPI NodeList:<name>Yash</name><age>30</age>
following links might help
http://labe.felk.cvut.cz/~xfaigl/mep/xml/java-xml.htm
http://developerlife.com/tutorials/?p=25
http://www.java-samples.com/showtutorial.php?tutorialid=152
There are two general ways of doing that. You will either create a Domain Object Model of that XML file, take a look at this
and the second choice is using event driven parsing, which is an alternative to DOM xml representation. Imho you can find the best overall comparison of these two basic techniques here. Of course there are much more to know about processing xml, for instance if you are given XML schema definition (XSD), you could use JAXB.
There are various APIs available to read/write XML files through Java.
I would refer using StaX
Also This can be useful - Java XML APIs
You can make a class which extends org.xml.sax.helpers.DefaultHandler and call
start_<tag_name>(Attributes attrs);
and
end_<tag_name>();
For it is:
start_request_queue(attrs);
etc.
And then extends that class and implement xml configuration file parsers you want. Example:
...
public void startElement(String uri, String name, String qname,
org.xml.sax.Attributes attrs)
throws org.xml.sax.SAXException {
Class[] args = new Class[2];
args[0] = uri.getClass();
args[1] = org.xml.sax.Attributes.class;
try {
String mname = name.replace("-", "");
java.lang.reflect.Method m =
getClass().getDeclaredMethod("start" + mname, args);
m.invoke(this, new Object[] { uri, (org.xml.sax.Attributes)attrs });
}
catch (IllegalAccessException e) {
throw new RuntimeException(e);
}
catch (NoSuchMethodException e) {
throw new RuntimeException(e); }
catch (java.lang.reflect.InvocationTargetException e) {
org.xml.sax.SAXException se =
new org.xml.sax.SAXException(e.getTargetException());
se.setStackTrace(e.getTargetException().getStackTrace());
}
and in a particular configuration parser:
public void start_Request(String uri, org.xml.sax.Attributes attrs) {
// make sure to read attributes correctly
System.err.println("Request, name="+ attrs.getValue(0);
}
Since you are using this for configuration, your best bet is apache commons-configuration. For simple files it's way easier to use than "raw" XML parsers.
See the XML how-to

validation a xml with several xsd schema DOM java

After search on internet and in differnets forums, have not found my answer.
I have a XML file which is define by two XSD schema.
For write the XML file, there are two ways to write the XML file :
(I have to delete the "<" charactere to display the XML file)
First methode to write it :
?xml version="1.0" encoding="UTF-8" standalone="yes"?>
Policy xmlns="http://www.W3C.com/Policy/v3#" xmlns:ns2="http://www.W3C.com /PolicyExtension/v3#">
DigestAlg Algorithm="http://test"/>
Transforms>
Transform Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n20010315"></Transform>
/Transforms>
ns2:Validation>
ns2:ConditionID>1.0.1</ns2:ConditionID>
ns2:TConditionID>1.0.2</ns2:TConditionID>
/ns2:Validation>
/Policy>"
second methodes :
?xml version="1.0" encoding="UTF-8" standalone="yes"?>
Policy xmlns="http://www.W3C.com/Policy/v3#">
DigestAlg Algorithm="http://test"/>
Transforms>
Transform Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n20010315"></Transform>
/Transforms>
Validation xmlns:ns2="http://www.W3C.com/PolicyExtension/v3#">
ConditionID>1.0.1</ns2:ConditionID>
TConditionID>1.0.2</ns2:TConditionID>
/Validation>
/Policy>
For pasring my XML files, i use :
InputStream doc = new FileInputStream(myXMLFile);
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
List<Source> sourceListSchema = new ArrayList<Source>();
sourceListSchema.add(new StreamSource(SignaturePolicy.class.getResourceAsStream(MY_XSD_SCHEMA_1)));
sourceListSchema.add(new StreamSource(SignaturePolicy.class.getResourceAsStream(MY_XSD_SCHEMA_2)));
Schema schema;
try {
Source[] sourceTmp = new Source[1];
schema = sf.newSchema(sourceListSchema.toArray(sourceTmp));
} catch (SAXException e) {
LogMachine.logger.severe(
"SAXException : The schema can not be parse :"+e.getMessage());
}
dbf.setIgnoringElementContentWhitespace(true);
dbf.setNamespaceAware(true);
dbf.setIgnoringComments(true);
dbf.setSchema(schema);
DocumentBuilder db;
try {
db = dbf.newDocumentBuilder();
documentPolicy = db.parse(Doc);
} catch (ParserConfigurationException e) {
LogMachine.logger.severe(
"ParserConfigurationException : the file can not be parse by DOM :"+e.getMessage());
} catch (SAXException e) {
LogMachine.logger.severe(
"SAXException : the file can not be parse by DOM :"+e.getMessage());
} catch (IOException e) {
LogMachine.logger.severe(
"IOException : the file can not be open like a file :"+e.getMessage());
}
When I want to parse this documents with DOM, the first XML file display an error
Exception in thread "main" org.w3c.dom.ls.LSException: The prefix "ns2" for element "ns2:Validation" is not bound.
But the second XML file is well parse.
Someone can help me to parse the two documents ??
Thank you for you help

Ignoring DTD when parsing XML

How can I ignore the DTD declaration when parsing file with XOM xml library. My file has the following line :
<?xml version="1.0"?>
<!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" "NCBI_BlastOutput.dtd">
//rest of stuff here
And when I try to build() my document I get a filenotfound exception for the DTD file. I know I don't have this file and I don't care about it, so how can it be removed when using XOM?
Here is a code snippet:
public BlastXMLParser(String filePath) {
Builder b = new Builder(false);
//not a good idea to have exception-throwing code in constructor
try {
_document = b.build(filePath);
} catch (ParsingException ex) {
Logger.getLogger(BlastXMLParser.class.getName()).log(Level.SEVERE,"err", ex);
} catch (IOException ex) {
//
}
private Elements getBlastReads() {
Element root = _document.getRootElement();
Elements rootChildren = root.getChildElements();
for (int i = 0; i < rootChildren.size(); i++) {
Element child = rootChildren.get(i);
if (child.getLocalName().equals("BlastOutput_iterations")) {
return child.getChildElements();
}
}
return null;
}
}
I get a NullPointerException at this line:
Element root = _document.getRootElement();
With the DTD line removed from the source XML file I can successfully parse it, but this is not an option in the final production system.
The preferred solution would be to implement an EntityResolver that intercepts requests for the DTD and redirects these to an embedded copy. If you
don't have access to the DTD and
are absolutely sure you won't need it (apart from validation it might also declare character entities that are used in the document) and
you are using the Xerces XML Parser implementation
you can disable fetching of DTD by setting the corresponding SAX feature. In XOM this should be possible by passing an XMLReader to the Builder constructor like this:
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.XMLReaderFactory;
...
XMLReader xmlreader = XMLReaderFactory.createXMLReader();
xmlreader.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
Builder builder = new Builder(xmlreader);
If not using XOM but simply JAXP the abovementioned solution just need to be tweaked into
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setAttribute("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(...);
According to their documentation this is the way to parse document without any validation.
try {
Builder parser = new Builder();
Document doc = parser.build("http://www.cafeconleche.org/");
}
catch (ParsingException ex) {
System.err.println("Cafe con Leche is malformed today. How embarrassing!");
}
catch (IOException ex) {
System.err.println("Could not connect to Cafe con Leche. The site may be down.");
}
If you do want to validate XML schema you have to call new Builder(true):
try {
Builder parser = new Builder(true);
Document doc = parser.build("http://www.cafeconleche.org/");
}
catch (ValidityException ex) {
System.err.println("Cafe con Leche is invalid today. (Somewhat embarrassing.)");
}
catch (ParsingException ex) {
System.err.println("Cafe con Leche is malformed today. (How embarrassing!)");
}
catch (IOException ex) {
System.err.println("Could not connect to Cafe con Leche. The site may be down.");
}
Pay attention that now yet another exception can be thrown: ValidityException

Categories

Resources