Suggestion to parse this XML in Java - java

Not new to Java; but relatively new to XML-parsing. I know a tiny bit about a lot of the XML tools out there, but not much about any of them. I am also not an XML-pro.
My particular problem is this... I have been given an XML-document which I cannot modify and from which I need only to parse random bits of it into Java objects. Sheer speed is not much of a factor so long as it's reasonable. Likewise, memory-footprint need not be absolutely optimal either, just not insane. I only need to read through the document one time to parse it, after that I'll be throwing it in the bitbucket and just using my POJO.
So, I'm open to suggestion... which tool would you use?
And, would you kindly suggest a bit of starter-code to address my particular need?
Here's a snippet of sample XML and the associated POJO I'm trying to craft:
<xml>
<item id="...">
...
</item>
<metadata>
<resources>
<resource>
<ittype>Service_Links</ittype>
<links>
<link>
<path>http://www.stackoverflow.com</path>
<description>Stack Overflow</description>
</link>
<link>
<path>http://www.google.com</path>
<description>Google</description>
</link>
</links>
</resource>
<resource>
<ittype>Article_Links</ittype>
<links>
...
</links>
</resource>
...
</resources>
</metadata>
</xml>
public class MyPojo {
#Attribute(name="id")
#Path("item")
public String id;
#ElementList(entry="link")
#Path("metadata/resources/resource/links")
public List<Link> links;
}
NOTE: this question was originally spawned by this question with me trying to solve it using SimpleXml; I'm to the point where I thought maybe someone could suggest a different route to solving the same problem.
Also Note: I'm really hoping for a CLEAN solution... by which I mean, using annotations and/or xpath with the least amount of code... the last thing I want is huge class file with huge unwieldy methods... THAT, I already have... I'm trying to find a better way.
:D

OK, so I settled on a solution that (to me) seemed to address my needs in the most reasonable way. My apologies to the other suggestions, but I just liked this route better because it kept most of the parsing-rules as annotations and what little procedural-code I had to write was very minimal.
I ended up going with JAXB; initially I thought JAXB would either create XML from a Java-class or parse XML into a Java-class but only with an XSD. Then I discovered that JAXB has annotations that can parse XML into a Java-class without an XSD.
The XML-file I'm working with is huge and very deep, but I only need bits and bites of it here and there; I was worried that navigating what maps to where in the future would be very difficult. So I chose to structure a tree of folders modeled after the XML... each folder maps to an element and in each folder is a POJO representing that actual element.
Problem is, sometimes there is an element who has a child-element several levels down which has a single property I care about. It would be a pain to create 4 nested-folders and a POJO for each just to get access to a single property. But that's how you do it with JAXB (at least, from what I can tell); once again I was in a corner.
Then I stumbled on EclipseLink's JAXB-implementation: Moxy.
Moxy has an #XPath annotation that I could place in that parent POJO and use to navigate several levels down to get access to a single property without creating all those folders and element-POJOs. Nice.
So I created something like this:
(note: I chose to use getters for cases where I need to massage the value)
// maps to the root-"xml" element in the file
#XmlRootElement( name="xml" )
#XmlAccessorType( XmlAccessType.FIELD )
public class Xml {
// this is standard JAXB
#XmlElement;
private Item item;
public Item getItem() {
return this.item;
}
...
}
// maps to the "<xml><item>"-element in the file
public class Item {
// standard JAXB; maps to "<xml><item id="...">"
#XmlAttribute
private String id;
public String getId() {
return this.id;
}
// getting an attribute buried deep down
// MOXY; maps to "<xml><item><rating average="...">"
#XmlPath( "rating/#average" )
private Double averageRating;
public Double getAverageRating() {
return this.average;
}
// getting a list buried deep down
// MOXY; maps to "<xml><item><service><identification><aliases><alias.../><alias.../>"
#XmlPath( "service/identification/aliases/alias/text()" )
private List<String> aliases;
public List<String> getAliases() {
return this.aliases;
}
// using a getter to massage the value
#XmlElement(name="dateforindex")
private String dateForIndex;
public Date getDateForIndex() {
// logic to parse the string-value into a Date
}
}
Also note that I took the route of separating the XML-object from the model-object I actually use in the app. Thus, I have a factory that transforms these crude objects into much more robust objects which I actually use in my app.

If your XML documents are relatively small (as appears to be the case here), I would use the DOM framework and XPath class. Here is some boilerplate DOM/XPath code from one of my tutorials:
File xmlFile = ...
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(xmlFile);
XPath xp = XPathFactory.newInstance().newXPath();
String value = xp.evaluate("/path/to/element/text()", doc);
// .. reuse xp to get other values as required
In other words, basically you:
get your XML into a Document object, via a DocumentBuilder;
create an XPath object;
repeatedly call XPath.evaluate(), passing in the path of the element(s) required
and your Document.
As you see, there's a little bit of fiddliness in getting hold of your Document object and like all good XML APIs, it throws a plethora of silly pointless checked exceptions. But apart from that, it's fairly no-nonsense for parsing simple small to medium XML documents whose structure is relatively fixed.

You can use SAXParser or STAXParser. If you can afford some more amount of memory, then you can also afford to use DOMParser. I would advise STAXParser would be best for you.

Related

Modeling JSON in Java

The few times I've worked with Java/Rest/JSon, JSON elements have always been built in camelCase format.
For example:
"someField": {
"someSonField1": "20191106",
"someSonField2": "20201119",
...
}
However, in a functional document they have passed me to build a Rest JSon client, they use this notation:
"some_field": {
"some_son_field_1": "20191106",
"some_son_field_2": "20201119",
...
}
Is it expressed somewhere that Java has to use the notation 1?.
It seems to me that if it is done this way, everything goes much more smoothly when modeling the objects:
#XmlRootElement(name = "someField")
#XmlType(propOrder = {"someSonField1", "someSonField2"})
public class someField {
private String someSonField1;
private String someSonField2;
//...
}
Thanks!
Q: Is it expressed somewhere that Java has to use the notation?
A: No: it's 100% "convention", not mandatory.
As it happens, the standard convention for both JSON (a creature of Javascript) and Java is camelcase. For example: Java Naming Conventions.
some_son_field_1 is an example of snake case. It's associated with classic "C" programs. It's also common (but NOT universal) with XML. It, too, is a "convention" - not a requirement.
I'm curious why you're choosing XML bindings for JSON data. Have you considered using Jackson?
Finally, this article might be of interest to you:
5 Basic REST API Design Guidelines
I see you're using javax.xml.bind package? Have you tried #XmlElement?
#XmlRootElement(name = "someField")
#XmlType(propOrder = {"some_son_field_1", "some_son_field_2"})
public class someField {
#XmlElement(name="some_son_field_1")
private String someSonField1;
#XmlElement(name="some_son_field_2")
private String someSonField2;
//...
}
Not sure, probably you should try putting them on getters, as your fields are private.
Or you could use unify-jdocs, a library which I created to read and write JSON documents without using any POJO objects. Rather than defining POJO objects, which we know can be difficult to manage in case of complex documents and changes to the JSON document, just don't use them at all. Directly read and write paths in the JSON document. For example, in your snippet, you could read and write the fields as:
Document d = new JDocument(s); // where s is a JSON string
String s1 = d.getString("$.some_field.some_son_field_1");
String s2 = d.getString("$.some_field.some_son_field_2");
You could use a similar way to write to these paths as so:
d.setString("$.some_field.some_son_field_1", "val1");
d.setString("$.some_field.some_son_field_2", "val2");
This library offers a whole lot of other features which can be used to manipulate JSON documents. Features like model documents which lock the structure of documents to a template, field level validations, comparisons, merging documents etc.
Obviously, you would consider it only if you wanted to work without POJO objects. Alternatively, you could use it to read and write a POJO object using your own method.
Check it out on https://github.com/americanexpress/unify-jdocs.

Unmarshalling multiple XML elements of the same name into a list using JAXB

I'm attempting to unmarshall an XML message into a Java Object. I have it working for the most part but there is one issue I'm stuck on. I have a schema that looks like this:
<DeliveryDetails>
<Name>Ed</Name>
<Location>Toronto</Location>
<Event>
<Date>2013-05-06</Date>
<Time>12:12</Time>
<Description>MARKHAM</Description>
</Event>
<Event>
<Date>2013-05-07</Date>
<Time>05:12</Time>
<Description>MARKHAM</Description>
</Event>
<Event>
<Date>2013-05-08</Date>
<Time>15:12</Time>
<Description>MARKHAM</Description>
</Event>
</DeliveryDetails>
Now, the issue is that the JAXB ObjectFactory is only saving the last event. If there was an element wrapping the events ( ), then I would know how to handle it using an XML Element Wrapper. But since there is no wrapper, I'm not sure what to do. Anybody have any ideas?
I'm guessing the ObjectFactory is getting all the events but constantly overwriting the old one with the newest one. There needs to be some way to tell it to save each individual event instead of just writing over the same one every time, but I don't know how to accomplish that.
By default a JAXB (JSR-222) implementation will represent a List as multiple elements with the same name. As long as you have something like the following you will be fine:
#XmlRootElement(name="DeliveryDetails")
#XmlAccessorType(XmlAccessType.FIELD)
public class DeliveryDetails {
#XmlElement(name="Name")
private String name;
#XmlElement(name="Location")
private String location;
#XmlElement(name="Event")
private List<Event> events;
}
For More Information
http://blog.bdoughan.com/2010/09/jaxb-collection-properties.html

How to generate xml in Java?

Please, tell me, how to generate XML in Java?
I couldn't find any example using SAX framework.
Try Xembly, a small open source library that wraps native Java DOM with a "fluent" interface:
String xml = new Xembler(
new Directives()
.add("root")
.add("order")
.attr("id", "553")
.set("$140.00")
).xml();
Will generate:
<root>
<order id="553">$140.00</order>
</root>
See this, this, Generating XML using SAX and Java and this
SAX is a library to parse existing XML files with Java. It is not to create a new XML file out of Java. If you want to do this use a library like DOM4J to create a XML tree and then write it to a file.
use dom4j, here is quick start for dom4j
dom4j guide
You can also use libraries like JAXB or SimpleXML or XStream if you want to easily map/convert your java objects to XML.
Say we have a simple entity/pojo - Item.The properties of the pojo class can be made the XML's element or attribute with simple annotations.
#Entity #Root public class Item {
#Attribute
#Id
#GeneratedValue(strategy=GenerationType.AUTO)
private Long id;
#Transient
#ManyToOne
private Order order;
#Element
private String product;
#Element
private double price;
#Element
private int quantity; }
To generate XML from this item, the code can be simply
Serializer serializer=new Persister();//SimpleXML serializer
Item itemToSerializeToXml=new Item(2456L, "Head First Java", 250.00,10);//Object to be serialized
StringWriter destinationXMLWriter=new StringWriter();//Destination of XML
serializer.write(itemToSerializeToXml,destinationXMLWriter);//Call to serialize the POJO to XML
System.out.println(destinationXMLWriter.toString());
I found a nice library for XML creation on GitHub at https://github.com/jmurty/java-xmlbuilder . Really good for simple documents at least (I didn't have an opportunity to employ it for anything bigger than around a dozen lines).
The good thing about this library is that each of its commands (i.e. create attribute, create element, etc.) has 3 levels of abbreviations. For example, to add the tag <foo> to the document you can use the following methods:
.e("foo") (single-letter form)
.elem("foo" (4 letter form)
.element("foo") (fully spelled-out form)
This allows creating XML using both longer and more abbreviated code, as appropriate, so it can be a good fit for a variety of coding styles.

After I got XML data, how to parse it and transfer to JSON?

In Jersey RESTful frame work, I know I can get xml data in client as following:
private static final String BaseURI = "http://DOMAN.com";
ClientConfig config = new DefaultClientConfig();
Client client = Client.create(config);
WebResource service = client.resource(BaseURI);
String xmlData = service.path("rest").path("todos").accept(
MediaType.APPLICATION_XML).get(String.class)
My question is how can I parse the xmlData then? I would like to get the needed data from xmlData, and transfer the needed data to JSON, what is the best way to implement this?
As a general rule, NEVER convert straight from XML to JSON (or vice versa) if you do not have to.
Rather, bind data from XML or JSON to POJOs, then do the other conversion. While it may seem non-intuitive this results in cleaner result and less problems, since conversions between POJOs and data formats have much more options, mature, well-designed libs; and POJOs are easier to configure (with annotations) and have more metadata to guide conversion process.
Direct conversions libs (like Jettison, see below) are plagued with various issues; often producing "franken-JSON", JSON that is technically correct but looks alien because of added constructs needed by conversion.
In case of Jersey, then, use JAXB for XML to/from POJOs, and Jackson for doing the same with JSON. These are libraries Jersey uses anyway; and direct usage is quite easy.
If you absolutely insist on direct conversion, you could try Jettison, but be prepared to hit a problem with Lists, arrays and Maps, if you need them (esp. single-element arrays -- arrays are problematic with XML, and auto-conversion often goes wrong).
If your service doesn't provide JSON as an option already (what happens if you change MediaType.APPLICATION_XML to MediaType.APPLICATION_JSON?), then I believe you have a few options, which I list in order of my preference.
Option 1: You have an XML schema for the the data
If you have an XML schema for the returned XML, you could use xjc to generate the JAXB annotated java classes and then leverage jackson to convert the instances to JSON data. I think this will get you going fast by leveraging this libraries over doing the parsing youself. Jackson is a robust library, used by glassfish for their Jersey(JAX-RS) implementation and I don't feel there is any risk in depending on this library.
Option 2: Use the json.org library, but I've had significant problem with this library having to do with its reflection-based methodology, etc. That said, it might work well for you...and you can test relatively easily and see if it does meet your requirements. If so...you're done! =)
Option 3: You don't have the XML schema and/or you want more control
as #Falcon pointed out, you can always use traditional XML parsing technologies to parse the XML into whatever you want. I'm partial to SAX parsing, but DOM could work depending on xml side
Regards,
Steve
Simplest and easiest way would be using org.json package : http://json.org/javadoc/org/json/XML.html
XML.toJSONObject(xmlData).toString()
Just this one line apart from necessary import statement will do it all.
Now that i have mentioned org.json library, lot of people may comment bad about it. Remember, I have said the simplest and easiest way, not the best or the most performant way ;-)
In case you are using maven, add this dependency :
<dependency>
<groupId>org.json</groupId>
<artifactId>json</artifactId>
<version>20090211</version>
</dependency>
Do you have any access to the "lower level interface" that generates the XML? If you do, the only change needed is to have the xml objects annotated with "#XmlRootElement". Then, you can just pass back the XMLobject as JSON without any further code.
Check Jsonix. If you have an XML schema, you can generate XML-JSON mappings and unmarshal/marshal XML in JavaScript. Very similar to JAXB (which Steve Siebert mentioned), but works on client.
// The PO variable provides Jsonix mappings for the purchase order test case
// Its definition will be shown in the next section
var PO = { };
// ... Declaration of Jsonix mappings for the purchase order schema ...
// First we construct a Jsonix context - a factory for unmarshaller (parser)
// and marshaller (serializer)
var context = new Jsonix.Context([ PO ]);
// Then we create an unmarshaller
var unmarshaller = context.createUnmarshaller();
// Unmarshal an object from the XML retrieved from the URL
unmarshaller.unmarshalURL('/org/hisrc/jsonix/samples/po/test/po-0.xml',
// This callback function will be provided with the result
// of the unmarshalling
function(result) {
// We just check that we get the values we expect
assertEquals('Alice Smith', result.value.shipTo.name);
assertEquals('Baby Monitor', result.value.item[1].productName);
});

parse google geocode with xstream

I'm using Java and XStream to parse a google geocode request over http. My idea is to have an Address class with all the geocode attr's (ie. lat/long, city, provice/state etc) but I'm having problems parsing the xml with xstream.
The google response is similar to this:
<?xml version="1.0" encoding="UTF-8" ?>
<kml xmlns="http://earth.google.com/kml/2.0"><Response>
<name>98 St. Patrick St, Toronto</name>
<Status>
<code>200</code>
<request>geocode</request>
</Status>
<Placemark id="p1">
<address>98 St Patrick St, Toronto, ON, Canada</address>
<AddressDetails Accuracy="8" xmlns="urn:oasis:names:tc:ciq:xsdschema:xAL:2.0"> <Country><CountryNameCode>CA</CountryNameCode><CountryName>Canada</CountryName><AdministrativeArea><AdministrativeAreaName>ON</AdministrativeAreaName><Locality><LocalityName>Toronto</LocalityName><Thoroughfare><ThoroughfareName>98 St Patrick St</ThoroughfareName></Thoroughfare><PostalCode><PostalCodeNumber>M5T</PostalCodeNumber></PostalCode></Locality></AdministrativeArea></Country></AddressDetails>
<ExtendedData>
<LatLonBox north="43.6560378" south="43.6497426" east="-79.3864912" west="-79.3927864" />
</ExtendedData>
<Point><coordinates>-79.3896388,43.6528902,0</coordinates></Point>
</Placemark>
</Response></kml>
That doesn't show up very well, but the meat of the code is in the AddressDetails tag.
Anyway, I'm new to Java and XStream so the API terminology is a bit confusing for me. I just need to be able to write some mapper that maps all these tags (ie. CountryName) to an attribute within my Address object, (ie. address.country = blah) The address object will be pretty simple, mainly just strings for country name etc and floats for lat/long.
The docs and example just show straight mapping where each xml tag maps directly to the attribute of the same name of the object. In my case however, the tags are named different than the object attr's. A quick point in the right direction is all I'm looking for really.
I've used XStream in several projects. Unfortunately, your problem isn't really what XStream is designed to solve. You might be able to use its converter mechanism to achieve your immediate goal, but you'll run into limitations. In a nutshell, XStream isn't designed to do conversion of Tree Structure A into Tree Structure B -- it's purpose is to convert from a Java domain model into some reasonable XML. XStream is a great tool when you don't care much about the details of the XML produced. If you care more about the XML than the Java objects, look at XMLBeans -- the Java is ugly, but it's incredibly schema-compliant.
For your project, I'd run the Google XML schema through XML beans, generate some Java that will give you a more literate way of hand-coding a converter. You could use a raw DOM tree, but you'd have code like myAddress.setStreet(root.getFirstChild().getAttribute("addr1"))). With XML beans, you say things like myAddress.setStreet(googleResult.getAddress().getStreetName();
I'd ignore JAXB as it's attempt to separate interface from implementation adds needless complexity. Castor might be a good tool to consider as well, but I haven't used it in years.
In a nutshell, there aren't a lot of good Object-to-Object or XML-to-Object converters that handle structure conversion well. Of those I've seen that attempt declarative solutions, all of them seemed much more complicated (and no more maintainable) than using XStream/XmlBeans along with hand-coded structure conversions.
Would it be possible to define a separate class specifically for dealing with XStream's mapping? You could then simply populate your AddressDetails object by querying values out of this other object.
I've ended up just using xpath and populating my own address object manually. Seems to work fine.
Have you tried with json format? It should be the same but you'll need to set a com.thoughtworks.xstream.io.json.JettisonMappedXmlDriver as the driver for XStream
You could use EclipseLink JAXB (MOXy) to do this:
package com.example;
import javax.xml.bind.annotation.XmlRootElement;
import org.eclipse.persistence.oxm.annotations.XmlPath;
#XmlRootElement(name="kml")
public class Address {
private String country;
#XmlPath("Response/Placemark/ns:AddressDetails/ns:Country/ns:CountryName/text()")
public String getCountry() {
return country;
}
public void setCountry(String country) {
this.country = country;
}
}
and
#javax.xml.bind.annotation.XmlSchema(
namespace = "http://earth.google.com/kml/2.0",
xmlns = {
#javax.xml.bind.annotation.XmlNs(
prefix = "ns", namespaceURI ="urn:oasis:names:tc:ciq:xsdschema:xAL:2.0")
},
elementFormDefault = javax.xml.bind.annotation.XmlNsForm.QUALIFIED)
package com.example;
A full example is available here:
http://bdoughan.blogspot.com/2010/09/xpath-based-mapping-geocode-example.html

Categories

Resources