Read XML String using StAX - java

I am using stax for the first time to parse an XML String. I have found some examples but can't get my code to work. This is the latest version of my code:
public class AddressResponseParser
{
private static final String STATUS = "status";
private static final String ADDRESS_ID = "address_id";
private static final String CIVIC_ADDRESS = "civic_address";
String status = null;
String addressId = null;
String civicAddress = null;
public static AddressResponse parseAddressResponse(String response)
{
try
{
byte[] byteArray = response.getBytes("UTF-8");
ByteArrayInputStream inputStream = new ByteArrayInputStream(byteArray);
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
XMLStreamReader reader = inputFactory.createXMLStreamReader(inputStream);
while (reader.hasNext())
{
int event = reader.next();
if (event == XMLStreamConstants.START_ELEMENT)
{
String element = reader.getLocalName();
if (element.equals(STATUS))
{
status = reader.getElementText();
continue;
}
if (element.equals(ADDRESS_ID))
{
addressId = reader.getText();
continue;
}
if (element.equals(CIVIC_ADDRESS))
{
civicAddress = reader.getText();
continue;
}
}
}
}
catch (Exception e)
{
log.error("Couldn't parse AddressResponse", e);
}
}
}
I've put watches on "event" and "reader.getElementText()". When the code is stopped on
String element = reader.getLocalName();
the "reader.getElementText()" value is displayed, but as soon as it moves away from that line it can't be evaluated. When the code is stopped on:
status = reader.getElementText();
the "element" watch displays the correct value. Finally, when I step the code one more line, I catch this exception:
(com.ctc.wstx.exc.WstxParsingException) com.ctc.wstx.exc.WstxParsingException: Current state not START_ELEMENT
at [row,col {unknown-source}]: [1,29]
I've tried using status = reader.getText(); instead, but then I get this exception:
(java.lang.IllegalStateException) java.lang.IllegalStateException: Not a textual event (END_ELEMENT)
Can somebody point out what I'm doing wrong??
EDIT:
Adding JUnit code used to test:
public class AddressResponseParserTest
{
private String status = "OK";
private String address_id = "123456";
private String civic_address = "727";
#Test
public void testAddressResponseParser() throws UnsupportedEncodingException, XMLStreamException
{
AddressResponse parsedResponse = AddressResponseParser.parseAddressResponse(this.responseXML());
assertEquals(this.status, parsedResponse.getStatus());
assertEquals(this.address_id, parsedResponse.getAddress()
.getAddressId());
assertEquals(this.civic_address, parsedResponse.getAddress()
.getCivicAddress());
}
private String responseXML()
{
StringBuffer buffer = new StringBuffer();
buffer.append("<response>");
buffer.append("<status>OK</status>");
buffer.append("<address>");
buffer.append("<address_id>123456</address_id>");
buffer.append("<civic_address>727</civic_address>");
buffer.append("</address>");
buffer.append("</response>");
return buffer.toString();
}
}

I found a solution that uses XMLEventReader instead of XMLStreamReader:
public MyObject parseXML(String xml)
throws XMLStreamException, UnsupportedEncodingException
{
byte[] byteArray = xml.getBytes("UTF-8");
ByteArrayInputStream inputStream = new ByteArrayInputStream(byteArray);
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
XMLEventReader reader = inputFactory.createXMLEventReader(inputStream);
MyObject object = new MyObject();
while (reader.hasNext())
{
XMLEvent event = (XMLEvent) reader.next();
if (event.isStartElement())
{
StartElement element = event.asStartElement();
if (element.getName().getLocalPart().equals("ElementOne"))
{
event = (XMLEvent) reader.next();
if (event.isCharacters())
{
String elementOne = event.asCharacters().getData();
object.setElementOne(elementOne);
}
continue;
}
if (element.getName().getLocalPart().equals("ElementTwo"))
{
event = (XMLEvent) reader.next();
if (event.isCharacters())
{
String elementTwo = event.asCharacters().getData();
object.setElementTwo(elementTwo);
}
continue;
}
}
}
return object;
}
I would still be interested in seeing a solution using XMLStreamReader.

Make sure you read javadocs for Stax: since it is fully streaming parsing mode, only information contained by the current event is available. There are some exceptions, however; getElementText() for example must start at START_ELEMENT, but will then try to combine all textual tokens from inside current element; and when returning, it will point to matching END_ELEMENT.
Conversely, getText() on START_ELEMENT will not returning anything useful (since START_ELEMENT refers to tag, not child text tokens/nodes 'inside' start/end element pair). If you want to use it instead, you have to explicitly move cursor in stream by calling streamReader.next(); whereas getElementText() does it for you.
So what is causing the error? After you have consumed all start/end-element pairs, next token will be END_ELEMENT (matching whatever was the parent tag). So you must check for the case where you get END_ELEMENT, instead of yet another START_ELEMENT.

I faced a similar issue as I was getting "IllegalStateException: Not a textual event" message
When I looked through your code I figured out that if you had a condition:
if (event == XMLStreamConstants.START_ELEMENT){
....
addressId = reader.getText(); // it throws exception here
....
}
(Please note: StaXMan did point out this in his answer!)
This happens since to fetch text, XMLStreamReader instance must have encountered 'XMLStreamConstants.CHARACTERS' event!
There maybe a better way to do this...but this is a quick and dirty fix (I have only shown lines of code that may be of interest)
Now to make this happen modify your code slightly:
// this will tell the XMLStreamReader that it is appropriate to read the text
boolean pickupText = false
while(reader.hasNext()){
if (event == XMLStreamConstants.START_ELEMENT){
if( (reader.getLocalName().equals(STATUS) )
|| ( (reader.getLocalName().equals(STATUS) )
|| ((reader.getLocalName().equals(STATUS) ))
// indicate the reader that it has to pick text soon!
pickupText = true;
}
}else if (event == XMLStreamConstants.CHARACTERS){
String textFromXML = reader.getText();
// process textFromXML ...
//...
//set pickUpText false
pickupText = false;
}
}
Hope that helps!

Here is an example with XMLStreamReader:
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
Map<String, String> elements = new HashMap<>();
try {
XMLStreamReader xmlReader = inputFactory.createXMLStreamReader(file);
String elementValue = "";
while (xmlReader.hasNext()) {
int xmlEventType = xmlReader.next();
switch (xmlEventType) {
// Check for Start Elements
case XMLStreamConstants.START_ELEMENT:
//Get current Element Name
String elementName = xmlReader.getLocalName();
if(elementName.equals("td")) {
//Get Elements Value
elementValue = xmlReader.getElementText();
}
//Add the new Start Element to the Map
elements.put(elementName, elementValue);
break;
default:
break;
}
}
//Close Session
xmlReader.close();
} catch (Exception e) {
log.error(e.getMessage(), e);
}

Related

How to get XPath of all Nodes in XML by using Java or Scala?

I want to get XPath of all Nodes in XML by using Java or Scala ?
<foo>
<foo1>Foo Test 1</foo1>
<foo2>
<another1>
<test10>This is a duplicate</test10>
</another1>
</foo2>
<foo2>
<another1>
<test1>Foo Test 2</test1>
</another1>
</foo2>
<foo3>Foo Test 3</foo3>
<foo4>Foo Test 4</foo4>
</foo>
Output :
foo
foo/foo2/
/foo/foo2/another1/
I think what you need is to use StAX parser. Consider following code:
public class XmlPathIterator implements Iterator<String> {
private static XMLInputFactory factory = XMLInputFactory.newFactory();
private final XMLStreamReader xmlReader;
private List<String> tags = new ArrayList<>(); // really need just Stack but it is old and Vector-based
public XmlPathIterator(XMLStreamReader xmlReader) {
this.xmlReader = xmlReader;
moveNext();
}
public static XmlPathIterator fromInputStream(InputStream is) {
try {
return new XmlPathIterator(factory.createXMLStreamReader(is));
} catch (XMLStreamException e) {
throw new RuntimeException(e);
}
}
public static XmlPathIterator fromReader(Reader reader) {
try {
return new XmlPathIterator(factory.createXMLStreamReader(reader));
} catch (XMLStreamException e) {
throw new RuntimeException(e);
}
}
private void moveNext() {
try {
while (xmlReader.hasNext()) {
int type = xmlReader.next();
switch (type) {
case XMLStreamConstants.END_DOCUMENT:
tags.clear(); // finish
return;
case XMLStreamConstants.START_ELEMENT:
QName qName = xmlReader.getName();
tags.add(qName.getLocalPart());
return;
case XMLStreamConstants.END_ELEMENT:
tags.remove(tags.size() - 1);
break; // but continue the loop!
// also continue the loop on everything else
}
}
} catch (XMLStreamException ex) {
throw new RuntimeException(ex); // just pass throw
}
}
#Override
public boolean hasNext() {
return !tags.isEmpty();
}
#Override
public String next() {
String cur = "/" + String.join("/", tags);
moveNext();
return cur;
}
}
It is an iterator of String that returns XPath for each node. If your file is small and fits in memory, you can easily build a List from it.
Things that are not handled:
Namespaces (as your example has no) but you can modify how you generate String from QName in the case XMLStreamConstants.START_ELEMENT
Positional specificators in case there are several matching tags under the same path. If you want to get only unique strings, you may create a Set from this iterator to filter out duplicates.

Spring Jaxb2: How to append batch data to XML file with no reading it to memory?

I need to write data to xml in batches.
There are following domain objects:
#XmlRootElement(name = "country")
public class Country {
#XmlElements({#XmlElement(name = "town", type = Town.class)})
private Collection<Town> towns = new ArrayList<>();
....
}
And:
#XmlRootElement(name = "town")
public class Town {
#XmlElement
private String townName;
// etc
}
I'm marhalling objects with Jaxb2. Configuration as follows:
marshaller = new Jaxb2Marshaller();
marshaller.setClassesToBeBound(Country.class, Town.class);
Because simple marshalling doesn't work here as marhaller.marshall(fileName, country) - it malformes xml.
Is there a way to tweek marhaller so that it would create file if it's not exists with all marhalled data or if exists just append it at the end of xml file ?
Also as this files are potentially large I don't want to read whole file in memory, append data and then write to disk.
I've used StAX for xml processing as it stream based, consumes less memory then DOM and has ability to read and write comparing to SAX which can only parse xml data, but can't write it.
The is the approach I came up with:
public enum StAXBatchWriter {
INSTANCE;
private static final Logger LOGGER = LoggerFactory.getLogger(StAXBatchWriter.class);
public void writeUrls(File original, Collection<Town> towns) {
XMLEventReader eventReader = null;
XMLEventWriter eventWriter = null;
try {
String originalPath = original.getPath();
File from = new File(original.getParent() + "/old-" + original.getName());
boolean isRenamed = original.renameTo(from);
if (!isRenamed)
throw new IllegalStateException("Failed to rename file: " + original.getPath() + " to " + from.getPath());
File to = new File(originalPath);
XMLInputFactory inFactory = XMLInputFactory.newInstance();
eventReader = inFactory.createXMLEventReader(new FileInputStream(from));
XMLOutputFactory outFactory = XMLOutputFactory.newInstance();
eventWriter = outFactory.createXMLEventWriter(new FileWriter(to));
XMLEventFactory eventFactory = XMLEventFactory.newInstance();
while (eventReader.hasNext()) {
XMLEvent event = eventReader.nextEvent();
eventWriter.add(event);
if (event.getEventType() == XMLEvent.START_ELEMENT && event.asStartElement().getName().toString().contains("country")) {
for (Town town : towns) {
writeTown(eventWriter, eventFactory, town);
}
}
}
boolean isDeleted = from.delete();
if (!isDeleted)
throw new IllegalStateException("Failed to delete old file: " + from.getPath());
} catch (IOException | XMLStreamException e) {
LOGGER.error(e.getMessage(), e);
throw new RuntimeException(e);
} finally {
try {
if (eventReader != null)
eventReader.close();
} catch (XMLStreamException e) {
LOGGER.error(e.getMessage(), e);
}
try {
if (eventWriter != null)
eventWriter.close();
} catch (XMLStreamException e) {
LOGGER.error(e.getMessage(), e);
}
}
}
private void writeTown(XMLEventWriter eventWriter, XMLEventFactory eventFactory, Town town) throws XMLStreamException {
eventWriter.add(eventFactory.createStartElement("", null, "town"));
// write town id
eventWriter.add(eventFactory.createStartElement("", null, "id"));
eventWriter.add(eventFactory.createCharacters(town.getId()));
eventWriter.add(eventFactory.createEndElement("", null, "id"));
//write town name
if (StringUtils.isNotEmpty(town.getName())) {
eventWriter.add(eventFactory.createStartElement("", null, "name"));
eventWriter.add(eventFactory.createCharacters(town.getName()));
eventWriter.add(eventFactory.createEndElement("", null, "name"));
}
// write other fields
eventWriter.add(eventFactory.createEndElement("", null, "town"));
}
}
It's not the best approach, dispite the fact that it's stream based and it's working, it has some overhead. When a batch will be added - the old file has to be re-read.
It will be nice to have an option to append the data at some point in file (like "append data to that file after 4 line"), but seems this can't be done.

Parse special characters in xml stax file

I have the following question:
Original a part of RSS file:
<item>
<title> I can get data in tag this </title>
<description><p> i don't get data in this </p></description></item>
When I read the file using StAX parser the special character '&lt'; . It is automatically converted to '<'. then I cannot get data in the rest of tag "<'description>'
This is my code:
public Feed readFeed() {
Feed feed = null;
try {
boolean isFeedHeader = true;
String description = "";
String title = "";
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
InputStream in = read();
XMLEventReader eventReader = inputFactory.createXMLEventReader(in);
while (eventReader.hasNext()) {
XMLEvent event = eventReader.nextEvent();
if (event.isStartElement()) {
String localPart = event.asStartElement().getName()
.getLocalPart();
switch (localPart) {
case "title":
title = getCharacterData(event, eventReader);
break;
case "description":
description = getCharacterData(event, eventReader);
break;
}
} else if (event.isEndElement()) {
if (event.asEndElement().getName().getLocalPart() == ("item")) {
FeedMessage message = new FeedMessage();
message.setDescription(description);
message.setTitle(title);
feed.getMessages().add(message);
event = eventReader.nextEvent();
continue;
}
}
}
} catch (XMLStreamException e) {
throw new RuntimeException(e);
}
return feed;}
private String getCharacterData(XMLEvent event, XMLEventReader eventReader)
throws XMLStreamException {
String result = "";
event = eventReader.nextEvent();
if (event instanceof Characters) {
result = event.asCharacters().getData();
}
return result;}
I am following the instructions at: http://www.vogella.com/tutorials/RSSFeed/article.html
The tutorial is flawed. It doesn't account for the fact that you could get multiple text events for a single block of text (which tends to happen when you have embedded entities).
In order to make your life easier, make sure you set the IS_COALESCING property to true on the XMLInputFactory before creating your XMLEventReader (this property forces the reader to combine all adjacent text events into a single event).

How to write a unit test for an XML parser I wrote in Java

The context is as follows:
I've got objects that represent Tweets (from Twitter). Each object has an id, a date and the id of the original tweet (if there was one).
I receive a file of tweets (where each tweet is in the format of 05/04/2014 12:00:00, tweetID, originalID and is in its' own line) and I want to save them as an XML file where each field has its' own tag.
I want to then be able to read the file and return a list of Tweet objects corresponding to the Tweets from the XML file.
After writing the XML parser that does this I want to test that it works correctly. I've got no idea how to test this.
The XML Parser:
public class TweetToXMLConverter implements TweetImporterExporter {
//there is a single file used for the tweets database
static final String xmlPath = "src/main/resources/tweetsDataBase.xml";
//some "defines", as we like to call them ;)
static final String DB_HEADER = "tweetDataBase";
static final String TWEET_HEADER = "tweet";
static final String TWEET_ID_FIELD = "id";
static final String TWEET_ORIGIN_ID_FIELD = "original tweet";
static final String TWEET_DATE_FIELD = "tweet date";
static File xmlFile;
static boolean initialized = false;
#Override
public void createDB() {
try {
Element tweetDB = new Element(DB_HEADER);
Document doc = new Document(tweetDB);
doc.setRootElement(tweetDB);
XMLOutputter xmlOutput = new XMLOutputter();
// display nice nice? WTF does that chinese whacko want?
xmlOutput.setFormat(Format.getPrettyFormat());
xmlOutput.output(doc, new FileWriter(xmlPath));
xmlFile = new File(xmlPath);
initialized = true;
} catch (IOException io) {
System.out.println(io.getMessage());
}
}
#Override
public void addTweet(Tweet tweet) {
if (!initialized) {
//TODO throw an exception? should not come to pass!
return;
}
SAXBuilder builder = new SAXBuilder();
try {
Document document = (Document) builder.build(xmlFile);
Element newTweet = new Element(TWEET_HEADER);
newTweet.setAttribute(new Attribute(TWEET_ID_FIELD, tweet.getTweetID()));
newTweet.setAttribute(new Attribute(TWEET_DATE_FIELD, tweet.getDate().toString()));
if (tweet.isRetweet())
newTweet.addContent(new Element(TWEET_ORIGIN_ID_FIELD).setText(tweet.getOriginalTweet()));
document.getRootElement().addContent(newTweet);
} catch (IOException io) {
System.out.println(io.getMessage());
} catch (JDOMException jdomex) {
System.out.println(jdomex.getMessage());
}
}
//break glass in case of emergency
#Override
public void addListOfTweets(List<Tweet> list) {
for (Tweet t : list) {
addTweet(t);
}
}
#Override
public List<Tweet> getListOfTweets() {
if (!initialized) {
//TODO throw an exception? should not come to pass!
return null;
}
try {
SAXBuilder builder = new SAXBuilder();
Document document;
document = (Document) builder.build(xmlFile);
List<Tweet> $ = new ArrayList<Tweet>();
for (Object o : document.getRootElement().getChildren(TWEET_HEADER)) {
Element rawTweet = (Element) o;
String id = rawTweet.getAttributeValue(TWEET_ID_FIELD);
String original = rawTweet.getChildText(TWEET_ORIGIN_ID_FIELD);
Date date = new Date(rawTweet.getAttributeValue(TWEET_DATE_FIELD));
$.add(new Tweet(id, original, date));
}
return $;
} catch (JDOMException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return null;
}
}
Some usage:
private TweetImporterExporter converter;
List<Tweet> tweetList = converter.getListOfTweets();
for (String tweetString : lines)
converter.addTweet(new Tweet(tweetString));
How can I make sure the the XML file I read (that contains tweets) corresponds to the file I receive (in the form stated above)?
How can I make sure the tweets I add to the file correspond to the ones I tried to add?
Assuming that you have the following model:
public class Tweet {
private Long id;
private Date date;
private Long originalTweetid;
//getters and seters
}
The process would be the following:
create an isntance of TweetToXMLConverter
create a list of Tweet instances that you expect to receive after parsing the file
feed the converter the list you generated
compare the list received by parsing the list and the list you initiated at the start of the test
public class MainTest {
private TweetToXMLConverter converter;
private List<Tweet> tweets;
#Before
public void setup() {
Tweet tweet = new Tweet(1, "05/04/2014 12:00:00", 2);
Tweet tweet2 = new Tweet(2, "06/04/2014 12:00:00", 1);
Tweet tweet3 = new Tweet(3, "07/04/2014 12:00:00", 2);
tweets.add(tweet);
tweets.add(tweet2);
tweets.add(tweet3);
converter = new TweetToXMLConverter();
converter.addListOfTweets(tweets);
}
#Test
public void testParse() {
List<Tweet> parsedTweets = converter.getListOfTweets();
Assert.assertEquals(parsedTweets.size(), tweets.size());
for (int i=0; i<parsedTweets.size(); i++) {
//assuming that both lists are sorted
Assert.assertEquals(parsedTweets.get(i), tweets.get(i));
};
}
}
I am using JUnit for the actual testing.

apache.commons.fileupload throws MalformedStreamException

I have got this piece of code (I didn't write, just maintaining):
public class MyMultipartResolver extends CommonsMultipartResolver{
public List parseEmptyRequest(HttpServletRequest request) throws IOException, FileUploadException {
String contentType = request.getHeader(CONTENT_TYPE);
int boundaryIndex = contentType.indexOf("boundary=");
InputStream input = request.getInputStream();
byte[] boundary = contentType.substring(boundaryIndex + 9).getBytes();
MultipartStream multi = new MultipartStream(input, boundary);
multi.setHeaderEncoding(getHeaderEncoding());
ArrayList items = new ArrayList();
boolean nextPart = multi.skipPreamble();
while (nextPart) {
Map headers = parseHeaders(multi.readHeaders());
// String fieldName = getFieldName(headers);
String subContentType = getHeader(headers, CONTENT_TYPE);
if (subContentType == null) {
FileItem item = createItem(headers, true);
OutputStream os = item.getOutputStream();
try {
multi.readBodyData(os);
} finally {
os.close();
}
items.add(item);
} else {
multi.discardBodyData();
}
nextPart = multi.readBoundary();
}
return items;
}
}
I am using commons-fileupload.jar version 1.2.1 and obviously the code is using some deprecated methods...
Anyway, while trying to use this code to upload a very large file (780 MB) I get this:
org.apache.commons.fileupload.MultipartStream$MalformedStreamException: Stream ended unexpectedly
at org.apache.commons.fileupload.MultipartStream$ItemInputStream.makeAvailable(MultipartStream.java:983)
at org.apache.commons.fileupload.MultipartStream$ItemInputStream.read(MultipartStream.java:887)
at java.io.InputStream.read(InputStream.java:89)
at org.apache.commons.fileupload.util.Streams.copy(Streams.java:94)
at org.apache.commons.fileupload.util.Streams.copy(Streams.java:64)
at org.apache.commons.fileupload.MultipartStream.readBodyData(MultipartStream.java:593)
at org.apache.commons.fileupload.MultipartStream.discardBodyData(MultipartStream.java:619)
that is thrown from 'multi.discardBodyData();' line.
My question:
How can I avoid this error and be able to be able to succeed collecting the FileItems?
catch
(org.apache.commons.fileupload.MultipartStream.MalformedStreamException e)
{
e.printStackTrace();
return ERROR;
}
Catch the exception and handle it via ..either InputStream or Return Error use it in struts action tag

Categories

Resources