How to make XStream parse partial input from StAX - java

I am new to Stax and XStream. I am trying to unmarshall some common elements from huge XML stream (there might be between 1.5 million and 2.5 million elements to unmarshal)
I have tried to Stax to parse the stream to get to an element of interest and then call xStream to unMarshall the XML up to the EndElement.
XMLStreamReader reader = xmlInputFactory.createXMLStreamReader(fis);
while (reader.hasNext()) {
if (reader.isStartElement() && reader.getLocalName().toLowerCase().equals("person")) {
break;
}
reader.next();
}
StaxDriver sd = new StaxDriver();
AbstractPullReader rd = sd.createStaxReader(reader);
XStream xstream = new XStream(sd);
xstream.registerConverter(new PersonConverter());
Person p = (Person) xstream.unmarshal(rd);
I create a test input
<Persons>
<Person>
<name>A</name>
</Person>
<Person>
<name>B</name>
</Person>
<Person>
<name>C</name>
</Person>
</Persons>
The problem with this, is that first my converter is not called. Second, I get a CannotResolveClassException for the element "name" in Person and XStream doesn't create my Person object.
What did I miss in my code?

When you instantiate an AbstractPullReader it will read the first open-element event from the stream, establishing the "root" element. Because you've already read the first Person event it will advance to the next one (name), which it doesn't know how to unmarshal.
You'll have to do two things to make your example work:
First, alias the element name Person to your java class
xstream.alias("Person", Person.class);
Second, only advance the SAX cursor up to the element before the one you want to read:
while (reader.hasNext()) {
if (reader.isStartElement() && reader.getLocalName().equals("Persons")) {
break;
}
reader.next();
}

Related

How can I skip element with error in SAXParser

I want to skip the node which contains an error. I use SAXParser
Example XML:
<file>
<person>
<id>1
<name>Jhon</name>
</person>
<person>
<id>2</id>
<name>Julia</name>
</person>
</file>
I use:
SAXParserFactory fact= SAXParserFactory.newInstance();
SAXParser parser= fact.newSAXParser();
MyHandler handler = new MyHandler ();
parser.parse(new File(path), handler);
Example of handler :
public class MyHandler extends org.xml.sax.helpers.DefaultHandler
{
private String message = "";
#Override
public void fatalError(final SAXParseException e)
{
message += "Error : " + e.getMessage();
}
}
I want to skip the error of person with id 1 because we don't have </id>
and continue the execution to person 2 and just save the error message.
There are parsers such as TagSoup and validator.nu that attempt to parse bad XML. Whether they succeed depends on just how bad the XML is.
And of course they have to guess what the "correct" XML was meant to be. In your example, the XML can be made well-formed by adding an </id> end tag anywhere before the </person> tag, so the repair may not be the one you would have liked.
You say you want to skip invalid records, but I think the philosophy of these products is to try to repair them rather than skipping them.

Streaming xml in java

I am trying to read large XML file, I want only to read cars owners and I can't load whole xml to memory, how to do that ?
The XML file:
<root>
<message>
<car>
<owner>adam</owner>
</car>
<desk>
<owner>sam</owner>
<game>
<owner>dorothy</owner>
</game>
<pen>
<owner>dorothy</owner>
</pen>
</desk>
</message>
</root>
For example this code does not know exactly what it reads.. how to be sure that we are reading car owners ?
XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
XMLEventReader reader = xmlInputFactory.createXMLEventReader(new FileInputStream(entry.toFile()));
while (reader.hasNext()) {
XMLEvent nextEvent = reader.nextEvent();
if (nextEvent.isStartElement()) {
StartElement startElement = nextEvent.asStartElement();
log.info(startElement.getName().toString());
switch (startElement.getName().getLocalPart()) {
case "owner":
// whose owner. .. ?
Sturdy but viable solution is to create a small state machine, capture events as they go and mutate state accordingly
If entering car node - store car reference
If entering owner node AND you have entered car node previously, store owner of a car
When exiting car node return car-owner pair
Repeat and handle nesting and/or node level to accept only car>owner.

Unmarshalling if stream contains collection?

I have two classes at the moment. Customer and CustomerItem, each Customer can contain multiple CustomerItems:
#XmlRootElement(name = "Customer")
#XmlAccessorType(XmlAccessType.FIELD)
public class Customer implements Serializable {
private final String customerNumber;
private final String customerName;
#XmlElementWrapper(name = "customerItems")
#XmlElement(name = "CustomerItem")
private List<CustomerItem> customerItems;
...
Via REST we can get a List<Customer>, which will result in an XML looking like that:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<collection>
<Customer>
// normal values
<customerItems>
<customerItem>
// normal values
</customerItem>
</customerItems>
</Customer>
<Customer>
...
</Customer>
<Customer>
...
</Customer>
</collection>
Now if I want to unmarshal the response I get an error:
javax.xml.bind.UnmarshalException: unexpected element (uri:"",
local:"collection"). Expected elements are
<{}Customer>,<{}CustomerItem>
private List<Customer> getCustomersFromResponse(HttpResponse response)
throws JAXBException, IllegalStateException, IOException {
JAXBContext jaxbContext = JAXBContext.newInstance(Customer.class);
Unmarshaller jaxbUnmarshaller = jaxbContext.createUnmarshaller();
InputStream content = response.getEntity().getContent();
InputStreamReader reader = new InputStreamReader(content);
java.util.List<Customer> unmarshal = (List<Customer>) jaxbUnmarshaller.unmarshal(reader);
return unmarshal;
}
Now I know why it's not working, obviously Jaxb expects Customer to be the root element, but now find a collection (which seems to be a default value when a List gets returned?).
A workaround would be to create a new Customers class which contains a list of customer and make it the root element, but actually I wouldn't want a new class.
There must be a way to tell jaxb that I have a list of the classes I want to get and that he should look inside the collection tag or something like that?!
I see here two ways.
1) Create special wrapper class with #XmlRootElement(name = "collection") and set it against unmarshaller.
2) Another way - split input xml into smaller one using simple SAX parser implementation and then parse each fragment separately with JAXB (see: Split 1GB Xml file using Java).
I don't think that you can simply tell JAXB: parse me this xml as set of elements of type XXX.

Java - parse xml string with variable tagnames?

I'm trying to parse an XML string, and the tagnames are variable; I haven't seen any examples on how to pull the information out without knowing them. For example, I will always know the <response> and <data> tags below, but what falls inside/outside of them could be anything from <employee> to you name it.
<?xml version="1.0" encoding="UTF-8"?>
<response>
<generic>
....
</generic>
<data>
<employee>
<name>Seagull</name>
<id>3674</id>
<age>34</age>
</employee>
<employee>
<name>Robin</name>
<id>3675</id>
<age>25</age>
</employee>
</data>
</response>
You could parse it into a generic dom object and traverse it. For example, you could use dom4j to do this.
From the dom4j quick start guide:
public void treeWalk(Document document) {
treeWalk( document.getRootElement() );
}
public void treeWalk(Element element) {
for ( int i = 0, size = element.nodeCount(); i < size; i++ ) {
Node node = element.node(i);
if ( node instanceof Element ) {
treeWalk( (Element) node );
}
else {
// do something....
}
}
}
public Document parse(URL url) throws DocumentException {
SAXReader reader = new SAXReader();
Document document = reader.read(url);
return document;
}
I have seen similar situation in the projects.
If you are going to deal with large XMLs, you can use Stax or Sax parser to read the XML. On every step (like on reaching end element), enter the data into a Map or a dta structure of your choice, where you keep tag names as the key and value as value in the Map. Finally once you have the parsing done, use this Map to figure out which object to build as finally you would have a proper entity representation of the information in the XML
If XML is small,use DOM and directly build the entity object by reading the specific tag (like employee> or use XPATh to where you expect the tag to be present, giving you hint of the entity. Build that object directly by reading the specific information from the XML.

How to add a node to XML with XMLBeans XmlObject

My goal is to take an XML string and parse it with XMLBeans XmlObject and add a few child nodes.
Here's an example document (xmlString),
<?xml version="1.0"?>
<rootNode>
<person>
<emailAddress>joefoo#example.com</emailAddress>
</person>
</rootNode>
Here's the way I'd like the XML document to be after adding some nodes,
<?xml version="1.0"?>
<rootNode>
<person>
<emailAddress>joefoo#example.com</emailAddress>
<phoneNumbers>
<home>555-555-5555</home>
<work>555-555-5555</work>
<phoneNumbers>
</person>
</rootNode>
Basically, just adding the <phoneNumbers/> node with two child nodes <home/> and <work/>.
This is as far as I've gotten,
XmlObject xml = XmlObject.Factory.parse(xmlString);
Thank you
Here is an example of using the XmlCursor to insert new elements. You can also get a DOM Node for an XmlObject and using those APIs.
import org.apache.xmlbeans.*;
/**
* Adding nodes to xml using XmlCursor.
* #see http://xmlbeans.apache.org/docs/2.4.0/guide/conNavigatingXMLwithCursors.html
* #see http://xmlbeans.apache.org/docs/2.4.0/reference/org/apache/xmlbeans/XmlCursor.html
*/
public class AddNodes
{
public static final String xml =
"<rootNode>\n" +
" <person>\n" +
" <emailAddress>joefoo#example.com</emailAddress>\n" +
" </person>\n" +
"</rootNode>\n";
public static XmlOptions saveOptions = new XmlOptions().setSavePrettyPrint().setSavePrettyPrintIndent(2);
public static void main(String[] args) throws XmlException
{
XmlObject xobj = XmlObject.Factory.parse(xml);
XmlCursor cur = null;
try
{
cur = xobj.newCursor();
// We could use the convenient xobj.selectPath() or cur.selectPath()
// to position the cursor on the <person> element, but let's use the
// cursor's toChild() instead.
cur.toChild("rootNode");
cur.toChild("person");
// Move to </person> end element.
cur.toEndToken();
// Start a new <phoneNumbers> element
cur.beginElement("phoneNumbers");
// Start a new <work> element
cur.beginElement("work");
cur.insertChars("555-555-5555");
// Move past the </work> end element
cur.toNextToken();
// Or insert a new element the easy way in one step...
cur.insertElementWithText("home", "555-555-5555");
}
finally
{
if (cur != null) cur.dispose();
}
System.out.println(xobj.xmlText(saveOptions));
}
}
XMLBeans seems like a hassle, here's a solution using XOM:
import nu.xom.*;
Builder = new Builder();
Document doc = builder.build(new java.io.StringBufferInputStream(inputXml));
Nodes nodes = doc.query("person");
Element homePhone = new Element("home");
homePhone.addChild(new Text("555-555-5555"));
Element workPhone = new Element("work");
workPhone.addChild(new Text("555-555-5555"));
Element phoneNumbers = new Element("phoneNumbers");
phoneNumbers.addChild(homePhone);
phoneNumbers.addChild(workPhone);
nodes[0].addChild(phoneNumbers);
System.out.println(doc.toXML()); // should print modified xml
It may be a little difficult to manipulate the objects using just the XmlObject interface. Have you considered generating the XMLBEANS java objects from this xml?
If you don't have XSD for this schema you can generate it using XMLSPY or some such tools.
If you just want XML manipulation (i.e, adding nodes) you could try some other APIs like jdom or xstream or some such thing.
Method getDomNode() gives you access to the underlying W3C DOM Node. Then you can append childs using W3C Document interface.

Categories

Resources