Problem when parsing xml string in java

Problem when parsing xml string in java - java

I'm writing an android application, and I would like to get an xml string from web and get all info it contains.
First of all, i get the string (this code works):
URL url = new URL("here my adrress");
BufferedReader reader = new BufferedReader(new InputStreamReader(url.openStream()));
String myData = reader.readLine();
reader.close();
Then, I use DOM:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(myData));
Still no problem. When I write
Document doc = db.parse(is);
the application doesn't do anything more. It stops, without errors.
Can someone please tell me what's going on?

I wouldn't know why your code doesn't work since there is no error but I can offer alternatives.
First, I am pretty sure your new InputStream "is" is unnecessary. "parse()" can take "url.openStream()" or "myData" directly as an argument.
Another cause of error could be that your xml data has more than one line(I know you said that the first part of your code worked but I'd rather mention it, just to be sure). If so, "reader.readLine()" will only get you a part of your xml data.
I hope this will help.

Use SAXParser instead of DOM parser. SAXParser is more efficient than DOM parser. Here is two good tutorials on SAXParser
1. http://www.androidpeople.com/android-xml-parsing-tutorial-using-saxparser
2. http://www.anddev.org/parsing_xml_from_the_net_-_using_the_saxparser-t353.html

Use XmlPullParser, it's very fast. Pass in the string from the web and get a hashtable with all the values.
public Hashtable<String, String> parse(String myData) {
XmlPullParser parser = Xml.newPullParser();
Hashtable<String, String> responseFromServer = new Hashtable<String, String>();
try {
parser.setInput(new StringReader (responseString));
int eventType = parser.getEventType();
while (eventType != XmlPullParser.END_DOCUMENT) {
if(eventType == XmlPullParser.START_TAG) {
String currentName = parser.getName();
String currentText = parser.nextText();
if (currentText.trim().length() > 0) {
responseFromServer.put(currentName, currentText);
}
}
eventType = parser.next();
}
} catch (Exception e) {
e.printStackTrace();
}
return responseFromServer;
}

Related

JAVA how to find and delete the structure of sentences?

I have a xml file, and its structure is like this.
<?xml version="1.0" encoding="MS949"?>
<pmd-cpd>
<duplication lines="123" tokens"123">
<file line="1" path="..">
<file line="1" path="..">
<codefragment><![CDATA[........]]></codefragment>
</duplication>
<duplication>
...
</duplication>
</pmd-cpd>
I want to delete 'codefragment' node, because my parser make an error 'invalid XML character(0x1). '
My parsing code is like this,
private void parseXML(File f){
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = null;
Document document = null;
try {
builder = factory.newDocumentBuilder();
document = builder.parse(f);
}catch(...)
The error happens in document = builder.parse(f); so I cannot use parser to delete the codefragment node.
This is why I want to delete these lines without the parser.
How can I delete this node without the parser...?

This is a followup answer to OP's self-answer, and the comment I made to that answer. Here's the recap, plus some extra:
Never do String += String in a loop. Use StringBuilder.
Read the XML in blocks, not lines.
Don't use String.replaceAll(). It has to recompile the regex every time, a regex you already have. Use Matcher.replaceAll().
Remember to close() the Reader. Better yet, use try-with-resources.
No need to save the clean XML back out, just use it directly.
Since XML is usually in UTF-8, read the file as UTF-8.
Don't print and ignore errors. Let caller handle errors.
private static void parseXML(File f) throws IOException, ParserConfigurationException, SAXException {
StringBuilder xml = new StringBuilder();
try (BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream(f),
StandardCharsets.UTF_8))) {
Pattern badChars = Pattern.compile("[^\\u0009\\u000a\\u000d\\u0020-\\uD7FF\\uE000-\\uFFFD]+");
char[] cbuf = new char[1024];
for (int len; (len = in.read(cbuf)) != -1; )
xml.append(badChars.matcher(CharBuffer.wrap(cbuf, 0, len)).replaceAll(""));
}
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder domBuilder = domFactory.newDocumentBuilder();
Document document = domBuilder.parse(new InputSource(new StringReader(xml.toString())));
// insert code using DOM here
}

How I solved this problem was, to remove the bad characters such as x01, save as new XML file, and then parse the new file.
Because I could not even parse my old xml file, I could not remove the node with parser.
So removing invalid character and saving as a new file code was like this.
//save the xml string as a new file.
public static Document stringToDom(String xmlSource)
throws SAXException, ParserConfigurationException, IOException {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
return builder.parse(new InputSource(new StringReader(xmlSource)));
}
//get the file and remove bad characters in it
private static void cleanString(File fileName) {
try {
BufferedReader in = new BufferedReader(new FileReader(fileName));
String xmlLines, cleanXMLString="";
Pattern p = null;
Matcher m = null;
p = Pattern.compile("[^\\u0009\\u000a\\u000d\\u0020-\\uD7FF\\uE000-\\uFFFD]");
while (((xmlLines = in.readLine()) != null)){
m = p.matcher(xmlLines);
if (m.find()){
cleanXMLString = cleanXMLString + xmlLines.replaceAll("[^\\u0009\\u000a\\u000d\\u0020-\\uD7FF\\uE000-\\uFFFD]", "")+"\n";
}else
cleanXMLString = cleanXMLString + xmlLines+"\n";
}
Document doc = stringToDom(cleanXMLString);
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
DOMSource source = new DOMSource(doc);
StreamResult result = new StreamResult(new File("\\new\\"+fileName.getName()));
transformer.transform(source, result);
} catch (IOException | SAXException | ParserConfigurationException | TransformerException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
Maybe, that's not good method since it takes quite long time for even a small file(under 5MB).
But if your file is small, you can try this...

Converting XML to document in java creates null document

I'm trying to parse xml, downloaded from the web, in java, following examples from here (stackoverflow) and other sources.
First I pack the xml in a string:
String xml = getXML(url, logger);
If I printout the xml string at this point:
System.out.println("XML " + xml);
I get a printout of the xml so I'm assuming there is no fault up to this point.
Then I try to create a document that I can evaluate:
InputSource is= new InputSource(new StringReader(xml));
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(is);
If I print out the document here:
System.out.println("Doc: " + doc);
I get:
Doc: [#document: null]
When I later try to evaluate expressions with Xpath I get java.lang.NullPointerException and also when just trying to get the length of the root:
System.out.println("Root length " + rootNode.getLength());
which leaves me to believe the document (and later the node) is truly null.
When I try to print out the Input Source or the Node I get eg.
Input Source: org.xml.sax.InputSource#29453f44
which I don't know how to interpret.
Can any one see what I've done wrong or suggest a way forward?
Thanks in advance.

You may need another way to render the document as a string.
For JDOM:
public static String toString(final Document document) {
try {
final ByteArrayOutputStream out = new ByteArrayOutputStream(1024);
final XMLOutputter outp = new XMLOutputter();
outp.output(document, out);
final String string = out.toString("UTF-8");
return string;
}
catch (final Exception e) {
throw new IllegalStateException("Cannot stringify document.", e);
}
}
The output
org.xml.sax.InputSource#29453f44
simply is the class name + the hash code of the instance (as defined in the Object class). It indicates that the class of the instance has toString not overridden.

RSS reader vs BOM error

I'm trying to read in an RSS Feed/XML file into my application. The problem is that there's a BOM (Byte Order Mark) that my inputStream doesn't like and it throws an error which throws another error and everything dies.
Here's the method:
private Document getDomFromXMLString(String xml) {
Document doc = null;
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
try {
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(xml));
doc = db.parse(is);
} catch (Exception e) {
e.printStackTrace();
}
return doc;
}
So I'm trying to figure out how to effectively skip the BOM and input the rest of the file

If you have a character stream, and a String is, then skipping the BOM is as easy as stripping the first character, which is the BOM:
if (xml.charAt(0) == '\ufeff')
xml = xml.substring(1);
What you should really do, though, is ask the source to fix its feed; the BOM shouldn't be there in the first place.

From URL to Document object

I would like to transform a feed to a Document object.
I tried the following code but it seems it's not working with a real feed (uri = null), but it works with an XML file which is already in my computer.
The transform function :
public static Document obtainDocument(String feedurl) {
Document doc = null;
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
URL url = new URL(feedurl);
doc = builder.parse(url.openStream());
...Exceptions...
return doc;
}
EDIT
I'm pretty sure that the URL is right, I use:
String feedurl = "http://feeds2.feedburner.com/Pressecitron";
I tried to use the following code too:
public static Document obtainDocument(String feedurl) {
Document doc = null;
try {
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
URL url = new URL(feedurl);
URLConnection conn = url.openConnection();
doc = builder.parse(conn.getInputStream());
...
return doc;
}
which seems to not works better
And my first version of parser used a String too, but my mate wants me to use a Document (if the connection doesn't work). It worked with the String if I remember well.

Have you tried all the possible ways of using the parse() method ?
Are you sure the URI / URL is correct ?
From the method that you have, you get the feedURL as a String. You can directly pass it to the parse() method and see if that works.

How to create a XML object from String in Java?

I am trying to write a code that helps me to create a XML object. For example, I will give a string as input to a function and it will return me a XMLObject.
XMLObject convertToXML(String s) {}
When I was searching on the net, generally I saw examples about creating XML documents. So all the things I saw about creating an XML and write on to a file and create the file. But I have done something like that:
Document document = new Document();
Element child = new Element("snmp");
child.addContent(new Element("snmpType").setText("snmpget"));
child.addContent(new Element("IpAdress").setText("127.0.0.1"));
child.addContent(new Element("OID").setText("1.3.6.1.2.1.1.3.0"));
document.setContent(child);
Do you think it is enough to create an XML object? and also can you please help me how to get data from XML? For example, how can I get the IpAdressfrom that XML?
Thank you all a lot
EDIT 1: Actually now I thought that maybe it would be much easier for me to have a file like base.xml, I will write all basic things into that for example:
<snmp>
<snmpType><snmpType>
<OID></OID>
</snmp>
and then use this file to create a XML object. What do you think about that?

If you can create a string xml you can easily transform it to the xml document object e.g. -
String xmlString = "<?xml version=\"1.0\" encoding=\"utf-8\"?><a><b></b><c></c></a>";
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder;
try {
builder = factory.newDocumentBuilder();
Document document = builder.parse(new InputSource(new StringReader(xmlString)));
} catch (Exception e) {
e.printStackTrace();
}
You can use the document object and xml parsing libraries or xpath to get back the ip address.

try something like
public static Document loadXML(String xml) throws Exception
{
DocumentBuilderFactory fctr = DocumentBuilderFactory.newInstance();
DocumentBuilder bldr = fctr.newDocumentBuilder();
InputSource insrc = new InputSource(new StringReader(xml));
return bldr.parse(insrc);
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Problem when parsing xml string in java - java

Use SAXParser instead of DOM parser. SAXParser is more efficient than DOM parser. Here is two good tutorials on SAXParser 1. http://www.androidpeople.com/android-xml-parsing-tutorial-using-saxparser 2. http://www.anddev.org/parsing_xml_from_the_net_-_using_the_saxparser-t353.html

Related

JAVA how to find and delete the structure of sentences?

Converting XML to document in java creates null document

RSS reader vs BOM error

From URL to Document object

How to create a XML object from String in Java?

Categories

Resources