This question already has answers here:
How to parse a String containing XML in Java and retrieve the value of the root node?
(6 answers)
Closed 9 years ago.
Hello I am getting back a string from a webservice.
I need to parse this string and get the text in error message?
My string looks like this:
<response>
<returnCode>-2</returnCode>
<error>
<errorCode>100</errorCode>
<errorMessage>ERROR HERE!!!</errorMessage>
</error>
</response>
Is it better to just parse the string or convert to xml then parse?
I'd use Java's XML document libraries. It's a bit of a mess, but works.
String xml = "<response>\n" +
"<returnCode>-2</returnCode>\n" +
"<error>\n" +
"<errorCode>100</errorCode>\n" +
"<errorMessage>ERROR HERE!!!</errorMessage>\n" +
"</error>\n" +
"</response>";
Document doc = DocumentBuilderFactory.newInstance()
.newDocumentBuilder()
.parse(new InputSource(new StringReader(xml)));
NodeList errNodes = doc.getElementsByTagName("error");
if (errNodes.getLength() > 0) {
Element err = (Element)errNodes.item(0);
System.out.println(err.getElementsByTagName("errorMessage")
.item(0)
.getTextContent());
} else {
// success
}
I would probably use an XML parser to convert it into XML using DOM, then get the text. This has the advantage of being robust and coping with any unusual situations such as a line like this, where something has been commented out:
<!-- commented out <errorMessage>ERROR HERE!!!</errorMessage> -->
If you try and parse it yourself then you might fall foul of things like this. Also it has the advantage that if the requirements expand, then its really easy to change your code.
http://docs.oracle.com/cd/B28359_01/appdev.111/b28394/adx_j_parser.htm
It's an XML document. Use an XML parser.
You could tease it apart using string operations. But you have to worry about entity decoding, character encodings, CDATA sections etc. An XML parser will do all of this for you.
Check out JDOM for a simpler XML parsing approach than using raw DOM/SAX implementations.
Related
This question already has an answer here:
Parse special characters in xml stax file
(1 answer)
Closed last month.
I have an XML which I need to parse using XMLInputFactory(java.xml.stream).
XML is of this type:
<SACL>
<Criteria>Dinner</Criteria>
<Value> Rice & amp ;(without spaces) Beverage </Value>
</SACL>
I am parsing this using XML Factory Reader in JAVA and my code is:
if(xmlEvent.asStartElement().getName().getLocalPart().equals("Value"){
xmlEvent = xmlEventReader.nextEvent();
value = xmlEvent.asCharacters().getData().trim(); //Issue is in the if bracket only
}
(xmlEventReader = XMLInputFactory.newInstance().createXMLEventReader(new FileInputStream(file.getPath())); //using java.xml.stream.XMLEventReader
But it is parsing the data like this only "Rice" (missing & Beverage)
Expected Output : Rice & Beverage
Can someone suggest what is the issue with "& amp ;"(without spaces) and how can it be fixed?
I've worked on a project that did XML parsing recently, so I know almost exactly what's happening here: the parser sees & as a separate event (XMLStreamConstants.ENTITY_REFERENCE).
Try setting property XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES to true in your XML parser's options. If the parser is properly implemented, the entity is replaced and made part of the text.
Keep in mind that the parser is allowed to split it into multiple characters events, especially if you have large pieces of text. Setting property XMLInputFactory.IS_COALESCING to true should prevent that.
I have a XML Document where there are nested tags that should not be interpreted as XML tags
For example something like this
<something>cbaabc</something> should be parsed as a plain String "cbaabc" (it should be mentioned that the document has other elements as well that get parsed just fine). Jackson tho tries to interpret it as an Object and I don't know how to prevent this. I tried using #JacksonXmlText, turning off wrapping and a custom Deserializer, but I didn't get it to work.
The <a should be translated to <a. This back and forth conversion normally happens with every XML API, setting and getting text will use those entities &...;.
An other option is to use an additional CDATA section: <![CDATA[ ... ]]>.
<something><![CDATA[cbaabc]]></something>
If you cannot correct that, and have to live with an already corrupted XML text, you must do your own hack:
Load the wrong XML in a String
Repair the XML
Pass the XML string to jackson
Repairing:
String xml = ...
xml = xml.replaceAll("<(/?a\\b[^>]*)>", "<$1>"); // Links
StringReader in = new StringReader(xml);
I have an large String which contains some XML. This XML contains input like:
<xyz1>...</xyz1>
<hello>text between strings #1</hello>
<xyz2>...</xyz2>
<hello>text between strings #2</hello>
<xyz3>...</xyz3>
I want to get all these <hello>text between strings</hello>.
So in the end I want to have a List or any Collection which contains all <hello>...</hello>
I tried it with Regex and Matcher but the problem is it doesn't work with large strings.... if I try it with smaller Strings, it works. I read a blogpost about this and this says the Java Regex Broken for Alternation over Large Strings.
Is there any easy and good way to do this?
Edit:
An attempt is...
String pattern1 = "<hello>";
String pattern2 = "</hello>";
List<String> helloList = new ArrayList<String>();
String regexString = Pattern.quote(pattern1) + "(.*?)" + Pattern.quote(pattern2);
Pattern pattern = Pattern.compile(regexString);
Matcher matcher = pattern.matcher(scannerString);
while (matcher.find()) {
String textInBetween = matcher.group(1); // Since (.*?) is capturing group 1
// You can insert match into a List/Collection here
helloList.add(textInBetween);
logger.info("-------------->>>> " + textInBetween);
}
If you have to parse an XML file, I suggest you to use XPath language. So you have to do basically these actions:
Parse the XML String inside a DOM object
Create an XPath query
Query the DOM
Try to have a look at this link.
An example of what you haveto do is this:
String xml = ...;
try {
// Build structures to parse the String
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
// Parse the XML string into a DOM object
Document document= builder.parse(new ByteArrayInputStream(xml.getBytes()));
// Create an XPath query
XPath xPath = XPathFactory.newInstance().newXPath();
// Query the DOM object with the query '//hello'
NodeList nodeList = (NodeList) xPath.compile("//hello").evaluate(document, XPathConstants.NODESET);
} catch (Exception e) {
e.printStackTrace();
}
You have to parse your xml with an xml parser. It is easier than using regular expressions.
DOM parser is the simplest to use, but if your xml is very big use the SAX parser
I would highly recommend using one of the multiple public XML parsers available:
Woodstox
Stax
dom4j
It is simply easier to achieve what you're trying to achieve (even if you wish to elaborate on your request in the future). If you have no issues with speed and memory, go ahead and use dom4j. There is ALOT of resource online if you wish me to post good examples on this answer for you, as my answer right now is simply redirecting you alternative options but I'm not sure what your limitations are.
Regarding REGEX when parsing XML, Dour High Arch gave a great response:
XML is not a regular language. You cannot parse it using a regular expression. An expression you think will work will break when you get nested tags, then when you fix that it will break on XML comments, then CDATA sections, then processor directives, then namespaces, ... It cannot work, use an XML parser.
Parsing XML with REGEX in Java
With Java 8 you could use the Dynamics library to do this in a straightforward way
XmlDynamic xml = new XmlDynamic(
"<bunch_of_data>" +
"<xyz1>...</xyz1>" +
"<hello>text between strings #1</hello>" +
"<xyz2>...</xyz2>" +
"<hello>text between strings #2</hello>" +
"<xyz3>...</xyz3>" +
"</bunch_of_data>");
List<String> hellos = xml.get("bunch_of_data").children()
.filter(XmlDynamic.hasElementName("hello"))
.map(hello -> hello.asString())
.collect(Collectors.toList()); // ["text between strings #1", "text between strings #2"]
See https://github.com/alexheretic/dynamics#xml-dynamics
I have some xml that looks like this:
<xml><name>oscar</name><race>puppet</race><class>grouch</class></xml>
The tags change and are variable, so there won't always be a 'name' tag.
I've tried 3 or 4 parses and they all seem to choke on it. Any hints?
Just because it doesn't have a defined schema, doesn't mean it isn't "valid" XML - your sample XML is "well formed".
The dom4j library will do it for you. Once parsed (your XML will parse OK) you can iterate through child elements, no matter what their tag name, and work with your data.
Here's an example of how to use it:
import org.dom4j.*;
String text = "<xml><name>oscar</name><race>puppet</race><class>grouch</class></xml>";
Document document = DocumentHelper.parseText(text);
Element root = document.getRootElement();
for ( Iterator i = root.elementIterator(); i.hasNext(); ) {
Element element = (Element) i.next();
String tagName = element.getQName();
String contents = element.getText();
// do something
}
This is valid xml; try adding an XML Schema that allows for optional elements. If you can write an xml schema, you can use JAXB to parse it. XML allows for having optional elements; it isn't too "strict" about it.
Your XML sample is well-formed XML, and if anything "chokes" on it then it would be useful for us to know exactly what the symptoms of the "choking" are.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Best way to parse an XML String in Java?
I have a String value which is actually an xml data. I have to parse the String of xml data and get individual value from it. How can we do this?
There are lots of different XML APIs in Java - some built into the framework, some not.
One of the simpler ones to use is JDOM:
SAXBuilder builder = new SAXBuilder();
Document doc = builder.build(new StringReader(text));
// Now examine the document, perhaps by XPath or by navigating
// programmatically, like this:
String fooContents = doc.getRootElement()
.getChild("foo")
.getText();
for example xstream. It is very simple