How to get xml page by url - java

Ok, so I got some url link like https://stackoverflow.com/ and I'm trying to parse it in document but getting error. Why? Because this is not xml file, so the question is how can I get data as xml if i got only url?
My code:
public class URLReader {
public static void main(String[] args) throws Exception {
// or if you prefer DOM:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new URL("https://stackoverflow.com/").openStream());
int nodes = doc.getChildNodes().getLength();
System.out.println(nodes + " nodes found");
}
}

To parse HTML you may use JSOUP: https://jsoup.org/
This library provides also some features to transform HTML to XHTML, which some sort of XML:
Document document = Jsoup.parse(html);
document.outputSettings().syntax(Document.OutputSettings.Syntax.xml);
document.outputSettings().escapeMode(org.jsoup.nodes.Entities.EscapeMode.xhtml);
String xhtml=document.html();

Related

XPath expression unable to match

I'm trying to parse out some information from XML using XPath in Java (v 1.7). My XML looks like this:
<soap:Fault xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<faultcode>code</faultcode>
<faultstring>string</faultstring>
<detail>detail</detail>
</soap:Fault>
My code:
final InputSource inputSource = new InputSource(new StringReader(xmlContent));
final DocumentBuilder documentBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
final Document document = documentBuilder.parse(inputSource);
final XPath xPath = XPathFactory.newInstance().newXPath();
final String faultCode = xPath.compile("/soap:Fault/faultcode/text()[1]").evaluate(document);
I have tried the XPath expression in an online checker with the XML content and it suggests that a match is made. However, when I run it in a wee stand-alone program, I get no value in "faultCode".
This issue is probably something simple, but I am unable to identify what the problem is.
Thanks for any assistance.
You should bind the namespace prefix "soap" to the URI "http://schemas.xmlsoap.org/soap/envelope/" using the XPath.setNamespaceContext() method.
First you need a namespace aware document builder factory:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
Then you need a namespace context:
NamespaceContext nsContext = new NamespaceContext() {
#Override
public Iterator getPrefixes(String namespaceURI) {
return null;
}
#Override
public String getPrefix(String namespaceURI) {
return "soap";
}
#Override
public String getNamespaceURI(String prefix) {
return "http://schemas.xmlsoap.org/soap/envelope/";
}
};
xPath.setNamespaceContext(nsContext);
With these additions, your code should work.
Regarding namespace contexts, I suggest you read http://www.ibm.com/developerworks/xml/library/x-nmspccontext/index.html.
It may be because of the namespaces effect. You can try namespace independent tags by matching with the local-name giving the syntax below:
/*[local-name()='Fault' and namespace-uri()='http://schemas.xmlsoap.org/soap/envelope/']/*[local-name()='faultcode']/text()[1]

Java DOM parser returns null document

I have an HTML template which I want to read in:
<html>
<head>
<title>TEST</title>
</head>
<body>
<h1 id="hey">Hello, World!</h1>
</body>
</html>
I want find the tag with the id hey and then paste in new stuff (e.g. new tags). For this purpose I use the DOM parser. But my code returns me null:
public static void main(String[] args) {
try {
File file = new File("C:\\Users\\<username>\\Desktop\\template.html");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(file);
doc.getDocumentElement().normalize();
System.out.println(doc.getElementById("hey")); // returns null
} catch (Exception e) {
e.printStackTrace();
}
}
What am I doing wrong?
You are trying to parse a piece of XML with the Java XML API, that is very compliant with the XML specification and doesn't help the casual developer.
In XML an attribute named id is not automatically of ID type, and thus the XML implementation doesn't get it with .getElementById(). Either you use another library (Jsoup for example), or instruct the parser to treat id as an ID (via the DTD) or you use custom code.
I modified your example to using jsoup
public static void main(String[] args) {
try {
File file = new File("C:\\Users\\<username>\\Desktop\\template.html");
Document doc = Jsoup.parse(file, "UTF8");
Element elementById = doc.getElementById("hey");
System.out.println("hey ="+doc.getElementById("hey").ownText());
System.out.println("hey ="+doc.getElementById("hey"));
} catch (Exception e) {
e.printStackTrace();
}
}

Parsing dom error in java

I am trying to parse an XML and then insert it an Excel File.
If I run my code it works even with errors but I cannot make any modification to it because I still got errors. Here is my code:
public class Parsing {
private void parseXmlFile(){
//get the factory
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
try {
//using Factory get an instance of document builder
DocumentBuilder db = dbf.newDocumentBuilder();
//parse using builder to get DOM representation
dom = db.parse("Employee.xml"); }
} catch )
}
}
What is wrong with this?
Can someone help me? I've been searching all over google and it's eating my nerves.
it should be like this :-
private void parseXmlFile(){
//get the factory
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
try {
//using Factory get an instance of document builder
DocumentBuilder db = dbf.newDocumentBuilder();
//parse using builder to get DOM representation
Document dom = db.parse("Employee.xml");
} catch(IOException ex ){ // OR Any Specific Exception should be catched here
// your error handling code here
}
}
Also Employee.xml should be in the current directory or give complete abosulte path of Employee.xml file also.

Extracting XML Elements from Java Object [duplicate]

I am new to XML. I want to read the following XML on the basis of request name. Please help me on how to read the below XML in Java -
<?xml version="1.0"?>
<config>
<Request name="ValidateEmailRequest">
<requestqueue>emailrequest</requestqueue>
<responsequeue>emailresponse</responsequeue>
</Request>
<Request name="CleanEmail">
<requestqueue>Cleanrequest</requestqueue>
<responsequeue>Cleanresponse</responsequeue>
</Request>
</config>
If your XML is a String, Then you can do the following:
String xml = ""; //Populated XML String....
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new InputSource(new StringReader(xml)));
Element rootElement = document.getDocumentElement();
If your XML is in a file, then Document document will be instantiated like this:
Document document = builder.parse(new File("file.xml"));
The document.getDocumentElement() returns you the node that is the document element of the document (in your case <config>).
Once you have a rootElement, you can access the element's attribute (by calling rootElement.getAttribute() method), etc. For more methods on java's org.w3c.dom.Element
More info on java DocumentBuilder & DocumentBuilderFactory. Bear in mind, the example provided creates a XML DOM tree so if you have a huge XML data, the tree can be huge.
Related question.
Update Here's an example to get "value" of element <requestqueue>
protected String getString(String tagName, Element element) {
NodeList list = element.getElementsByTagName(tagName);
if (list != null && list.getLength() > 0) {
NodeList subList = list.item(0).getChildNodes();
if (subList != null && subList.getLength() > 0) {
return subList.item(0).getNodeValue();
}
}
return null;
}
You can effectively call it as,
String requestQueueName = getString("requestqueue", element);
In case you just need one (first) value to retrieve from xml:
public static String getTagValue(String xml, String tagName){
return xml.split("<"+tagName+">")[1].split("</"+tagName+">")[0];
}
In case you want to parse whole xml document use JSoup:
Document doc = Jsoup.parse(xml, "", Parser.xmlParser());
for (Element e : doc.select("Request")) {
System.out.println(e);
}
If you are just looking to get a single value from the XML you may want to use Java's XPath library. For an example see my answer to a previous question:
How to use XPath on xml docs having default namespace
It would look something like:
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
public class Demo {
public static void main(String[] args) {
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
try {
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document dDoc = builder.parse("E:/test.xml");
XPath xPath = XPathFactory.newInstance().newXPath();
Node node = (Node) xPath.evaluate("/Request/#name", dDoc, XPathConstants.NODE);
System.out.println(node.getNodeValue());
} catch (Exception e) {
e.printStackTrace();
}
}
}
There are a number of different ways to do this. You might want to check out XStream or JAXB. There are tutorials and the examples.
If the XML is well formed then you can convert it to Document. By using the XPath you can get the XML Elements.
String xml = "<stackusers><name>Yash</name><age>30</age></stackusers>";
Form XML-String Create Document and find the elements using its XML-Path.
Document doc = getDocument(xml, true);
public static Document getDocument(String xmlData, boolean isXMLData) throws Exception {
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
dbFactory.setNamespaceAware(true);
dbFactory.setIgnoringComments(true);
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc;
if (isXMLData) {
InputSource ips = new org.xml.sax.InputSource(new StringReader(xmlData));
doc = dBuilder.parse(ips);
} else {
doc = dBuilder.parse( new File(xmlData) );
}
return doc;
}
Use org.apache.xpath.XPathAPI to get Node or NodeList.
System.out.println("XPathAPI:"+getNodeValue(doc, "/stackusers/age/text()"));
NodeList nodeList = getNodeList(doc, "/stackusers");
System.out.println("XPathAPI NodeList:"+ getXmlContentAsString(nodeList));
System.out.println("XPathAPI NodeList:"+ getXmlContentAsString(nodeList.item(0)));
public static String getNodeValue(Document doc, String xpathExpression) throws Exception {
Node node = org.apache.xpath.XPathAPI.selectSingleNode(doc, xpathExpression);
String nodeValue = node.getNodeValue();
return nodeValue;
}
public static NodeList getNodeList(Document doc, String xpathExpression) throws Exception {
NodeList result = org.apache.xpath.XPathAPI.selectNodeList(doc, xpathExpression);
return result;
}
Using javax.xml.xpath.XPathFactory
System.out.println("javax.xml.xpath.XPathFactory:"+getXPathFactoryValue(doc, "/stackusers/age"));
static XPath xpath = javax.xml.xpath.XPathFactory.newInstance().newXPath();
public static String getXPathFactoryValue(Document doc, String xpathExpression) throws XPathExpressionException, TransformerException, IOException {
Node node = (Node) xpath.evaluate(xpathExpression, doc, XPathConstants.NODE);
String nodeStr = getXmlContentAsString(node);
return nodeStr;
}
Using Document Element.
System.out.println("DocumentElementText:"+getDocumentElementText(doc, "age"));
public static String getDocumentElementText(Document doc, String elementName) {
return doc.getElementsByTagName(elementName).item(0).getTextContent();
}
Get value in between two strings.
String nodeVlaue = org.apache.commons.lang.StringUtils.substringBetween(xml, "<age>", "</age>");
System.out.println("StringUtils.substringBetween():"+nodeVlaue);
Full Example:
public static void main(String[] args) throws Exception {
String xml = "<stackusers><name>Yash</name><age>30</age></stackusers>";
Document doc = getDocument(xml, true);
String nodeVlaue = org.apache.commons.lang.StringUtils.substringBetween(xml, "<age>", "</age>");
System.out.println("StringUtils.substringBetween():"+nodeVlaue);
System.out.println("DocumentElementText:"+getDocumentElementText(doc, "age"));
System.out.println("javax.xml.xpath.XPathFactory:"+getXPathFactoryValue(doc, "/stackusers/age"));
System.out.println("XPathAPI:"+getNodeValue(doc, "/stackusers/age/text()"));
NodeList nodeList = getNodeList(doc, "/stackusers");
System.out.println("XPathAPI NodeList:"+ getXmlContentAsString(nodeList));
System.out.println("XPathAPI NodeList:"+ getXmlContentAsString(nodeList.item(0)));
}
public static String getXmlContentAsString(Node node) throws TransformerException, IOException {
StringBuilder stringBuilder = new StringBuilder();
NodeList childNodes = node.getChildNodes();
int length = childNodes.getLength();
for (int i = 0; i < length; i++) {
stringBuilder.append( toString(childNodes.item(i), true) );
}
return stringBuilder.toString();
}
OutPut:
StringUtils.substringBetween():30
DocumentElementText:30
javax.xml.xpath.XPathFactory:30
XPathAPI:30
XPathAPI NodeList:<stackusers>
<name>Yash</name>
<age>30</age>
</stackusers>
XPathAPI NodeList:<name>Yash</name><age>30</age>
following links might help
http://labe.felk.cvut.cz/~xfaigl/mep/xml/java-xml.htm
http://developerlife.com/tutorials/?p=25
http://www.java-samples.com/showtutorial.php?tutorialid=152
There are two general ways of doing that. You will either create a Domain Object Model of that XML file, take a look at this
and the second choice is using event driven parsing, which is an alternative to DOM xml representation. Imho you can find the best overall comparison of these two basic techniques here. Of course there are much more to know about processing xml, for instance if you are given XML schema definition (XSD), you could use JAXB.
There are various APIs available to read/write XML files through Java.
I would refer using StaX
Also This can be useful - Java XML APIs
You can make a class which extends org.xml.sax.helpers.DefaultHandler and call
start_<tag_name>(Attributes attrs);
and
end_<tag_name>();
For it is:
start_request_queue(attrs);
etc.
And then extends that class and implement xml configuration file parsers you want. Example:
...
public void startElement(String uri, String name, String qname,
org.xml.sax.Attributes attrs)
throws org.xml.sax.SAXException {
Class[] args = new Class[2];
args[0] = uri.getClass();
args[1] = org.xml.sax.Attributes.class;
try {
String mname = name.replace("-", "");
java.lang.reflect.Method m =
getClass().getDeclaredMethod("start" + mname, args);
m.invoke(this, new Object[] { uri, (org.xml.sax.Attributes)attrs });
}
catch (IllegalAccessException e) {
throw new RuntimeException(e);
}
catch (NoSuchMethodException e) {
throw new RuntimeException(e); }
catch (java.lang.reflect.InvocationTargetException e) {
org.xml.sax.SAXException se =
new org.xml.sax.SAXException(e.getTargetException());
se.setStackTrace(e.getTargetException().getStackTrace());
}
and in a particular configuration parser:
public void start_Request(String uri, org.xml.sax.Attributes attrs) {
// make sure to read attributes correctly
System.err.println("Request, name="+ attrs.getValue(0);
}
Since you are using this for configuration, your best bet is apache commons-configuration. For simple files it's way easier to use than "raw" XML parsers.
See the XML how-to

How to get the root element in xml using Android?

Using only code Java I can get the root name with these lines.
Element root = document.getDocumentElement();
and get the name with root.getNodeName()
But in a Android enviroment, how can I get for example, the name 'aluno' as root name?
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soap:Body>
<ns1:autenticaAlunoResponse xmlns:ns1="http://xfire.codehaus.org/AlunoService">
<aluno xmlns="urn:bean.wsi.br">
<matricula xmlns="http://bean.wsi.br">61203475</matricula>
<turma xmlns="http://bean.wsi.br"><codigo>2547</codigo>
<nome>B</nome>
</turma>
</aluno>
</ns1:autenticaAlunoResponse>
</soap:Body>
</soap:Envelope>
Update (copied from a comment below):
I'm using Ksoap2 and trying to parse using SAX.
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db;
db = dbf.newDocumentBuilder();
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(xml));
Document doc = db.parse(is);
In your example 'aluno' appears as a tag name. If you work with Jsoup you can find the element by tag and then use tagName method to retrieve its name:
Document doc;
Elements tagName;
String name;
try {
doc = Jsoup.connect(url).userAgent("Mozilla").get();
} catch (IOException ioe) {
ioe.printStackTrace();
}
doc.select("aluno");
name = tagName.tagName();

Categories

Resources