Ok, so I got some url link like https://stackoverflow.com/ and I'm trying to parse it in document but getting error. Why? Because this is not xml file, so the question is how can I get data as xml if i got only url?
My code:
public class URLReader {
public static void main(String[] args) throws Exception {
// or if you prefer DOM:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new URL("https://stackoverflow.com/").openStream());
int nodes = doc.getChildNodes().getLength();
System.out.println(nodes + " nodes found");
}
}
To parse HTML you may use JSOUP: https://jsoup.org/
This library provides also some features to transform HTML to XHTML, which some sort of XML:
Document document = Jsoup.parse(html);
document.outputSettings().syntax(Document.OutputSettings.Syntax.xml);
document.outputSettings().escapeMode(org.jsoup.nodes.Entities.EscapeMode.xhtml);
String xhtml=document.html();
I am new to XML. I want to read the following XML on the basis of request name. Please help me on how to read the below XML in Java -
<?xml version="1.0"?>
<config>
<Request name="ValidateEmailRequest">
<requestqueue>emailrequest</requestqueue>
<responsequeue>emailresponse</responsequeue>
</Request>
<Request name="CleanEmail">
<requestqueue>Cleanrequest</requestqueue>
<responsequeue>Cleanresponse</responsequeue>
</Request>
</config>
If your XML is a String, Then you can do the following:
String xml = ""; //Populated XML String....
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new InputSource(new StringReader(xml)));
Element rootElement = document.getDocumentElement();
If your XML is in a file, then Document document will be instantiated like this:
Document document = builder.parse(new File("file.xml"));
The document.getDocumentElement() returns you the node that is the document element of the document (in your case <config>).
Once you have a rootElement, you can access the element's attribute (by calling rootElement.getAttribute() method), etc. For more methods on java's org.w3c.dom.Element
More info on java DocumentBuilder & DocumentBuilderFactory. Bear in mind, the example provided creates a XML DOM tree so if you have a huge XML data, the tree can be huge.
Related question.
Update Here's an example to get "value" of element <requestqueue>
protected String getString(String tagName, Element element) {
NodeList list = element.getElementsByTagName(tagName);
if (list != null && list.getLength() > 0) {
NodeList subList = list.item(0).getChildNodes();
if (subList != null && subList.getLength() > 0) {
return subList.item(0).getNodeValue();
}
}
return null;
}
You can effectively call it as,
String requestQueueName = getString("requestqueue", element);
In case you just need one (first) value to retrieve from xml:
public static String getTagValue(String xml, String tagName){
return xml.split("<"+tagName+">")[1].split("</"+tagName+">")[0];
}
In case you want to parse whole xml document use JSoup:
Document doc = Jsoup.parse(xml, "", Parser.xmlParser());
for (Element e : doc.select("Request")) {
System.out.println(e);
}
If you are just looking to get a single value from the XML you may want to use Java's XPath library. For an example see my answer to a previous question:
How to use XPath on xml docs having default namespace
It would look something like:
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
public class Demo {
public static void main(String[] args) {
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
try {
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document dDoc = builder.parse("E:/test.xml");
XPath xPath = XPathFactory.newInstance().newXPath();
Node node = (Node) xPath.evaluate("/Request/#name", dDoc, XPathConstants.NODE);
System.out.println(node.getNodeValue());
} catch (Exception e) {
e.printStackTrace();
}
}
}
There are a number of different ways to do this. You might want to check out XStream or JAXB. There are tutorials and the examples.
If the XML is well formed then you can convert it to Document. By using the XPath you can get the XML Elements.
String xml = "<stackusers><name>Yash</name><age>30</age></stackusers>";
Form XML-String Create Document and find the elements using its XML-Path.
Document doc = getDocument(xml, true);
public static Document getDocument(String xmlData, boolean isXMLData) throws Exception {
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
dbFactory.setNamespaceAware(true);
dbFactory.setIgnoringComments(true);
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc;
if (isXMLData) {
InputSource ips = new org.xml.sax.InputSource(new StringReader(xmlData));
doc = dBuilder.parse(ips);
} else {
doc = dBuilder.parse( new File(xmlData) );
}
return doc;
}
Use org.apache.xpath.XPathAPI to get Node or NodeList.
System.out.println("XPathAPI:"+getNodeValue(doc, "/stackusers/age/text()"));
NodeList nodeList = getNodeList(doc, "/stackusers");
System.out.println("XPathAPI NodeList:"+ getXmlContentAsString(nodeList));
System.out.println("XPathAPI NodeList:"+ getXmlContentAsString(nodeList.item(0)));
public static String getNodeValue(Document doc, String xpathExpression) throws Exception {
Node node = org.apache.xpath.XPathAPI.selectSingleNode(doc, xpathExpression);
String nodeValue = node.getNodeValue();
return nodeValue;
}
public static NodeList getNodeList(Document doc, String xpathExpression) throws Exception {
NodeList result = org.apache.xpath.XPathAPI.selectNodeList(doc, xpathExpression);
return result;
}
Using javax.xml.xpath.XPathFactory
System.out.println("javax.xml.xpath.XPathFactory:"+getXPathFactoryValue(doc, "/stackusers/age"));
static XPath xpath = javax.xml.xpath.XPathFactory.newInstance().newXPath();
public static String getXPathFactoryValue(Document doc, String xpathExpression) throws XPathExpressionException, TransformerException, IOException {
Node node = (Node) xpath.evaluate(xpathExpression, doc, XPathConstants.NODE);
String nodeStr = getXmlContentAsString(node);
return nodeStr;
}
Using Document Element.
System.out.println("DocumentElementText:"+getDocumentElementText(doc, "age"));
public static String getDocumentElementText(Document doc, String elementName) {
return doc.getElementsByTagName(elementName).item(0).getTextContent();
}
Get value in between two strings.
String nodeVlaue = org.apache.commons.lang.StringUtils.substringBetween(xml, "<age>", "</age>");
System.out.println("StringUtils.substringBetween():"+nodeVlaue);
Full Example:
public static void main(String[] args) throws Exception {
String xml = "<stackusers><name>Yash</name><age>30</age></stackusers>";
Document doc = getDocument(xml, true);
String nodeVlaue = org.apache.commons.lang.StringUtils.substringBetween(xml, "<age>", "</age>");
System.out.println("StringUtils.substringBetween():"+nodeVlaue);
System.out.println("DocumentElementText:"+getDocumentElementText(doc, "age"));
System.out.println("javax.xml.xpath.XPathFactory:"+getXPathFactoryValue(doc, "/stackusers/age"));
System.out.println("XPathAPI:"+getNodeValue(doc, "/stackusers/age/text()"));
NodeList nodeList = getNodeList(doc, "/stackusers");
System.out.println("XPathAPI NodeList:"+ getXmlContentAsString(nodeList));
System.out.println("XPathAPI NodeList:"+ getXmlContentAsString(nodeList.item(0)));
}
public static String getXmlContentAsString(Node node) throws TransformerException, IOException {
StringBuilder stringBuilder = new StringBuilder();
NodeList childNodes = node.getChildNodes();
int length = childNodes.getLength();
for (int i = 0; i < length; i++) {
stringBuilder.append( toString(childNodes.item(i), true) );
}
return stringBuilder.toString();
}
OutPut:
StringUtils.substringBetween():30
DocumentElementText:30
javax.xml.xpath.XPathFactory:30
XPathAPI:30
XPathAPI NodeList:<stackusers>
<name>Yash</name>
<age>30</age>
</stackusers>
XPathAPI NodeList:<name>Yash</name><age>30</age>
following links might help
http://labe.felk.cvut.cz/~xfaigl/mep/xml/java-xml.htm
http://developerlife.com/tutorials/?p=25
http://www.java-samples.com/showtutorial.php?tutorialid=152
There are two general ways of doing that. You will either create a Domain Object Model of that XML file, take a look at this
and the second choice is using event driven parsing, which is an alternative to DOM xml representation. Imho you can find the best overall comparison of these two basic techniques here. Of course there are much more to know about processing xml, for instance if you are given XML schema definition (XSD), you could use JAXB.
There are various APIs available to read/write XML files through Java.
I would refer using StaX
Also This can be useful - Java XML APIs
You can make a class which extends org.xml.sax.helpers.DefaultHandler and call
start_<tag_name>(Attributes attrs);
and
end_<tag_name>();
For it is:
start_request_queue(attrs);
etc.
And then extends that class and implement xml configuration file parsers you want. Example:
...
public void startElement(String uri, String name, String qname,
org.xml.sax.Attributes attrs)
throws org.xml.sax.SAXException {
Class[] args = new Class[2];
args[0] = uri.getClass();
args[1] = org.xml.sax.Attributes.class;
try {
String mname = name.replace("-", "");
java.lang.reflect.Method m =
getClass().getDeclaredMethod("start" + mname, args);
m.invoke(this, new Object[] { uri, (org.xml.sax.Attributes)attrs });
}
catch (IllegalAccessException e) {
throw new RuntimeException(e);
}
catch (NoSuchMethodException e) {
throw new RuntimeException(e); }
catch (java.lang.reflect.InvocationTargetException e) {
org.xml.sax.SAXException se =
new org.xml.sax.SAXException(e.getTargetException());
se.setStackTrace(e.getTargetException().getStackTrace());
}
and in a particular configuration parser:
public void start_Request(String uri, org.xml.sax.Attributes attrs) {
// make sure to read attributes correctly
System.err.println("Request, name="+ attrs.getValue(0);
}
Since you are using this for configuration, your best bet is apache commons-configuration. For simple files it's way easier to use than "raw" XML parsers.
See the XML how-to
I have the following XML:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<application xmlns="http://research.sun.com/wadl/2006/10">
<doc xmlns:jersey="http://jersey.dev.java.net/"
jersey:generatedBy="Jersey: 1.0.2 02/11/2009 07:45 PM"/>
<resources base="http://localhost:8080/stock/">
<resource path="categories"> (<<---I want to get here)
<method id="getCategoriesResource" name="GET">
And I want to get the value of resource/#path so I have the following Java code:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true); // never forget this!
DocumentBuilder builder = factory.newDocumentBuilder();
// get the xml to parse from URI
Document doc = builder.parse(serviceUri + "application.wadl");
XPathFactory xfactory = XPathFactory.newInstance();
XPath xpath = xfactory.newXPath();
XPathExpression expression =
xpath.compile("/application/resources/resource/#path");
this.baseUri = (String) expression.evaluate(doc, XPathConstants.STRING);
With this XPath expression the result (baseUri) is always the empty string ("").
The nodes are not in the empty string namespace, you must specify it: /wadl:application/wadl:resources/wadl:resource/#path. Also, you should register the namespace in the XPath engine namespace context.
This is working example:
xpath.setNamespaceContext(new NamespaceContext()
{
#Override
public String getNamespaceURI(final String prefix)
{
if(prefix.equals("wadl"))
return "http://research.sun.com/wadl/2006/10";
else
return null;
}
#Override
public String getPrefix(final String namespaceURI)
{
throw new UnsupportedOperationException();
}
#Override
public Iterator getPrefixes(final String namespaceURI)
{
throw new UnsupportedOperationException();
}
});
XPathExpression expression = xpath.compile("/wadl:application/wadl:resources/wadl:resource/#path");
I am getting a null node when I an trying to parse an XML file.
XPath xPath = XPathFactory.newInstance().newXPath();
Node node = null;
try {
node = (Node) xPath.evaluate(
"/mynode",
doc,
XPathConstants.NODE);
I am facing this issue only in case-
1. DocumentBuilderFactory- setNameSpaceAware is true
2. DocumentBuilderFactory- setValidating is true.
If these are set to false, then I am getting correct results. Can anyone help me on understanding what is the relation of setting these attributes to false?
(I have checked this question, but it does not clear my doubt)
Here is the xml-
<?xml version="1.0" encoding="UTF-8"?>
<mynode xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.example.com" xsi:schemaLocation="http://www.example.com example.xsd">
<name>TargetName</name>
<desc>desc goes here</desc>
<pack>my.this</pack>
<object>my.ExampleObject</object>
<properties>
<attrib>
<name>id</name>
<value>ZZZ</value>
</attrib>
<attrib>
<name>ind</name>
<value>X</value>
</attrib>
</properties>
<children>
<child>
<name>childnodename</name>
<desc>description goes here</desc>
<invalues>
<scope>ALL</scope>
</invalues>
<outvalues>
<scope>ALL</scope>
</outvalues>
<akey>
<aname>AAA</aname>
<key></key>
</akey>
<msg>
<success>code1</success>
<failure>code2</failure>
</msg>
</child>
</children>
</mynode>
The quickest fix is to not do setNamespaceAware(true); :-) However, if you want a namespace aware XPath then you have stumbled across a classic problem - XPath: Is there a way to set a default namespace for queries?, in that XPath does not support the concept of a default namespace.
So your XPath must use a namespace prefix in order for the query to find any nodes. However, you can set a NamespaceContext on the XPath instance to resolve the namespace prefix or default namespace to a URI. One way to do this, for example:
import java.util.*;
import java.io.ByteArrayInputStream;
import javax.xml.namespace.NamespaceContext;
import javax.xml.parsers.*;
import javax.xml.xpath.*;
import org.w3c.dom.*;
public class XmlParse {
public static void main(String[] args) throws Exception {
String xml =
"<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
"<mynode xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xmlns=\"http://www.example.com\" xsi:schemaLocation=\"http://www.example.com example.xsd\">" +
"<name>TargetName</name>" +
"</mynode>";
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
DocumentBuilder builder = dbf.newDocumentBuilder();
Document doc = builder.parse(new ByteArrayInputStream(xml.getBytes()));
final String nonameNamespace = doc.getFirstChild().getNamespaceURI();
NamespaceContext ctx = new NamespaceContext() {
public String getNamespaceURI(String prefix) {
String uri = null;
if (prefix.equals("n")) {
uri = nonameNamespace;
}
return uri;
}
#Override
public Iterator getPrefixes(String val) {
throw new IllegalAccessError("Not implemented!");
}
#Override
public String getPrefix(String uri) {
throw new IllegalAccessError("Not implemented!");
}
};
XPath xPath = XPathFactory.newInstance().newXPath();
xPath.setNamespaceContext(ctx);
Node node = null;
try {
node = (Node) xPath.evaluate("/n:mynode/n:name", doc, XPathConstants.NODE);
System.out.println(node.getNodeName());
System.out.println(node.getFirstChild().getNodeValue());
} catch (Exception e) {
}
}
}
So this will resolve the default namespace (xmlns) to http://www.example.com when a node with n prefix is encountered.
XML is namespace-aware. Each XML element (and attribute) has an associated namespace; if not specified otherwise it's the empty (default) namespace.
In your case it is likely that the XML document you're trying to read uses namespaces, and your XPath query seems to only query the emtpy namespace. Therefore you don't get a result back. Make sure to use the proper namespace and it will work.
I am relying on the default JAXP implementation and using the Oracle JRE.
When evaluating a XPath which contains an unknown namespace prefix, it does not throw an (expected) exception.
When I run the same application on an IBM JRE, everything is fine and it throws the expected exception javax.xml.xpath.XPathExpressionException: org.apache.xpath.domapi.XPathStylesheetDOM3Exception: Prefix must resolve to a namespace
I am using the following code which tries to access an invalid namespace unknownns
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory
.newInstance();
documentBuilderFactory.setNamespaceAware(true);
documentBuilderFactory.setValidating(true);
documentBuilderFactory.setAttribute(JAXP_SCHEMA_LANGUAGE, W3C_XML_SCHEMA);
DocumentBuilder builder = documentBuilderFactory.newDocumentBuilder();
Document doc = builder.parse(xmlFile_);
XPath xpath = XPathFactory.newInstance().newXPath();
NodeList nodeList = (NodeList) xpath.evaluate("path/to/node/unknowns:#bla", doc,
XPathConstants.NODESET);
Question:
How can I enforce this validation independently from the JAXP implementation?
Try setting a NamespaceContext on your XPath instance:
public final class NSValidator {
private NSValidator() {
}
private static final NamespaceContext INSTANCE = new NamespaceContext() {
#Override public String getNamespaceURI(String prefix) {
return null;
}
#Override public String getPrefix(String namespaceURI) {
return null;
}
#Override public Iterator<?> getPrefixes(String namespaceURI) {
return Collections.emptyList()
.iterator();
}
};
public static NamespaceContext noNamespaces() {
return INSTANCE;
}
}