There are many pretty good json libs lika GSon. But for XML I know only Xerces/JDOM and both have tedious API.
I don't like to use unnecessary objects like DocumentFactory, XpathExpressionFactory, NodeList and so on.
So in the light of native xml support in languages such as groovy/scala I have a question.
Is there are minimalistic java XML IO framework?
PS XStream/JAxB good for serialization/deserialization, but in this case I'm looking for streaming some data in XML with XPath for example.
The W3C DOM model is unpleasant and cumbersome, I agree. JDOM is already pretty simple. The only other DOM API that I'm aware of that is simpler is XOM.
What about StAX? With Java 6 you don't even need additional libs.
Dom4J rocks. It's very easy and understandable
Sample Code:
public static void main(String[] args) throws Exception {
final String xml = "<root><foo><bar><baz name=\"phleem\" />"
+ "<baz name=\"gumbo\" /></bar></foo></root>";
Document document = DocumentHelper.parseText(xml);
// simple collection views
for (Element element : (List<Element>) document
.getRootElement()
.element("foo")
.element("bar")
.elements("baz")) {
System.out.println(element.attributeValue("name"));
}
// and easy xpath support
List<Element> elements2 = (List<Element>)
document.createXPath("//baz").evaluate(document);
for (final Element element : elements2) {
System.out.println(element.attributeValue("name"));
}
}
Output:
phleem
gumbo
phleem
gumbo
try VTD-XML. Its almost 3 to 4 times faster than DOM parsers with outstanding memory footprint.
Deppends on how complex your java objects are: are they self-containing etc (like graph nodes). If your objects are simple, you can use Google gson - it is the simpliest API(IMO).
In Xstream things start get messy when you need to debug.Also you need to be carefull when you choose an aprpriate Driver for XStream.
JDOM and XOM are probably the simplest. DOM4J is more powerful but more complex. DOM is just horrible. Processing XML in Java will always be more complex than processing JSON, because JSON was designed for structured data while XML was designed for documents, and documents are more complex than structured data. Why not use a language that was designed for XML instead, specifically XSLT or XQuery?
NanoXML is very small, below 50kb. I've found this today and I'm really impressed.
Related
I am in a confusion that can you help me out this question i.e., I am directly reading xml message using xpath and reading values from it and also i am trying to convert xml to json and reading values because of light weight object so which one is best approach to read values. I am attaching following snippet code.
Below is the code to read values from xml
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document document = db.parse(source);
XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xpath = xpathFactory.newXPath();
String eventNumber = xpath.evaluate("/event/eventnumber", document);
Below is to convert xml to json
JSONObject xmlJSONObj = XML.toJSONObject(xml1);
This really depends on what you will be using the data for. If you need to do a lot of parsing/traversing through the data, JSON is much faster due to the nature of JSON APIs. So in the case that you will be needing to do a lot of data extraction/examination on a single file, I would convert the XML to JSON.
If you only need to find a single field in the XML file or only do a small amount of parsing/data extraction, I would stick to XML. It is not worth the extra processing of converting an XML file to JSON if you are only going to do a small amount of traversing through the XML file. The processing time it takes to convert the XML to JSON and then traverse the JSON is more costly than the processing time it takes to traverse through the XML file once.
Another discussion that was published here:
JSON and XML comparison
Bottom line in my humble opinion since you receive the message in XML it won't increase efficient to convert it to JSON and then parse it - so stick to XML
With your input being XML, converting to JSON for the sake of speed looks like no good idea. I guess, with your other approach, you're losing time on the DOM creation and then even much more on xpath. The fastest solution is IMHO a SAX parser. It creates no objects and only calls you on events you're interested in.
I have .xml files inside a package in my Java project that contains data in the following format...
<?xml version="1.0"?>
<postcodes>
<entry postcode='AB1 0AA' latitude='7.101478' longitude='2.242852' />
</postcodes>
I currently have overrided the startElement() in my custom DefaultHandler to the following;
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
if (attributes.getValue("postcode") == "AB43 8TZ"){
System.out.println("The postcode 'AB43 8TZ', has a latitude of "+attributes.getValue("latitude")+" and a longitude of "+attributes.getValue("longitude"));
}
}
I know the code is working outside of this method because I previously tested it by having it print out all of the attributes for each element and that worked fine. Now however, it does nothing, as if it never found that postcode value. (I know it's there because it's a copy paste job from the XML source)
Extra details; Apologies for originally leaving out important details. Some of these files have up to 50k lines, so storing them in memory is a no no if at all possible. As such, I am using SAX. As a side, I use the words "from these files from within my project" because I also can't find how to reference a file from within the same project rather than from an absolute directory.
(From comments as requested by OP.)
First, you cannot compare strings with the == operator. Use equals() instead. See the question How do I compare strings in Java? for more information.
Second, not every element has the postcode attribute, so it is possible that you will be invoking equals() on a null object, leading to NullPointerException. Do it the other way around, e.g.
"AB43 8TZ".equals(attributes.getValue("postcode"))
You would use an XML parser. Luckily, JDK offers these out-of-the-box in form of JAXP. Now, there are several ways to do it, as there are few major "flavours" of parsing XML. For this task, I believe DOM parser would be easiest to use. You could do it like that:
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document document = builder.parse(new File("name/of/the/file.xml"));
Element root = document.getDocumentElement();
and then use DOM traversal API.
Edit: it was not clear from the original question that the data you want to process is large. In that case, DOM parser is indeed not a good solution, precisely due to memory consumption. For the purpose of parsing large XML documents, SAX and StAX parsers were invented. You might find them a little more cumbersome to use, due to their streaming nature, but that's also the source of their efficiency. Linked Oracle JAXP tutorial has sections on SAX and StAX as well.
Assuming you can read the XML relatively quickly using SAX or DOM, I would parse it in advance, and use the attributes to construct a map of postcode vs long/lang e.g.
Map<String, Pair<BigDecimal,BigDecimal>>
and simply lookup using Map.get(String)
I note that you say:
Some of these files have up to 50k lines, so storing them in memory is
a no no if at all possible
I wouldn't worry about that at all. A map of 50k entries isn't going to be a major deal.
You can use the javax.xml.xpath APIs included in the JDK/JRE and use XPath to specify the data you wish to retrieve from the XML document.
Example
Xml parser, one item
How would I parse this JSON array in Java? I'm confused because there is no object. Thanks!
EDIT: I'm an idiot! I should have read the documentation... that's probably what it's there for...
[
{
"id":"63565",
"name":"Buca di Beppo",
"user":null,
"phone":"(408)377-7722",
"address":"1875 S Bascom Ave Campbell, California, United States",
"gps_lat":"37.28967000",
"gps_long":"-121.93179700",
"monhh":"",
"tuehh":"",
"wedhh":"",
"thuhh":"",
"frihh":"",
"sathh":"",
"sunhh":"",
"monhrs":"",
"tuehrs":"",
"wedhrs":"",
"thuhrs":"",
"frihrs":"",
"sathrs":"",
"sunhrs":"",
"monspecials":"",
"tuespecials":"",
"wedspecials":"",
"thuspecials":"",
"frispecials":"",
"satspecials":"",
"sunspecials":"",
"description":"",
"source":"ripper",
"worldsbarsname":"BucadiBeppo31",
"url":"www.bucadebeppo.com",
"maybeDupe":"no",
"coupontext":"",
"couponimage":"0",
"distance":"1.00317",
"images":[
0
]
}
]
It is perfectly valid JSON. It is an array containing one object.
In JSON, arrays and objects don't have names. Only attributes of objects have names.
This is all described clearly by the JSON syntax diagrams at http://json.org. (FWIW, the site has translations in a number of languages ...)
How do you parse it? There are many libraries for parsing JSON. Many of them are linked from the site above. I suggest you use one of those rather than writing your own parsing code.
In response to this comment:
OTOH, writing your own parser is a reasonable project, and a good exercise for both learning JSON and learning Java (or whatever language). A reasonable parser can be written in about 500 lines of text.
In my opinion (having written MANY parsers in my time), writing a parser for a language is a very inefficient way to gain a working understanding the syntax of a language. And depending on how you implement the parser (and the nature of the language syntax specification) you can easily get an incorrect understanding.
A better approach is to read the language's syntax specification, which the OP has now done, and which you would have to do in order to implement a parser.
Writing a parser can be a good learning exercising, but it is really a learning exercise in writing parsers. Even then, you need to pick an appropriate implementation approach, and an appropriate language to be parsed.
It's an array containing one element. That element is an object. The object (dictionary) contains about 20 name/value pairs.
I am trying to port code written in Java to Objective C (for iPhone), but I'm kind of confused about a few lines of my code (mentioned below). How should I port this efficiently?
Namespace nmgrhistory=Namespace.getNamespace("history", "http://www.mywebsite.com/History.xsd");
pEventEl.addContent(new Element("History",nmgrhistory));
Namespace nmgrState=Namespace.getNamespace("state", "http://www.mywebsite.com/State.xsd");
pEventEl.addContent(new Element("State",nmgrState));
Iterator<Element> eld=(Iterator<Element>) pEventEl.getChild(
pEventEl.getName() == "event"? "./history:history/state:state" : "./state:state",pEventEl.getNamespace());
I'm not very sure about the replacements for the classes Namespace, Iterator and Element.
Anybody having idea or having done this before, please enlighten me.
Ok... So although these are not the actual replacements ... But basically what u need for parsing XML in Objecive - C is "NSXMLParser"
so u can say that NSXMLParser is the replacement for Namespace
And for "Iterator" NSXMLParserDelegate has a method named:-
– parser:didStartElement:namespaceURI:qualifiedName:attributes:
OR
– parser:foundCharacters:
I don't know java, but the url's your are pointing at are .xsd files which are xml definition files. XML parsing on iOS is somewhat limited out of the box: NSXMLParser.
I strongly recommend one of the bazillion open source XML parsers. They're much more user friendly.
Well thanks to all for making the efforts to answer, but I got a nice library TouchXML that solves the purpose.
Are there any production-ready libraries for streaming XPath expressions evaluation against provided xml-document? My investigations show that most of existing solutions load entire DOM-tree into memory before evaluating xpath expression.
XSLT 3.0 provides streaming mode of processing and this will become a standard with the XSLT 3.0 W3C specification becoming a W3C Recommendation.
At the time of writing this answer (May, 2011) Saxon provides some support for XSLT 3.0 streaming .
Would this be practical for a complete XPath implementation, given that XPath syntax allows for:
/AAA/XXX/following::*
and
/AAA/BBB/following-sibling::*
which implies look-ahead requirements ? i.e. from a particular node you're going to have to load the rest of the document anyway.
The doc for the Nux library (specifically StreamingPathFilter) makes this point, and references some implementations that rely on a subset of XPath. Nux claims to perform some streaming query capability, but given the above there will be some limitations in terms of XPath implementation.
There are several options:
DataDirect Technologies sells an XQuery implementation that employs projection and streaming, where possible. It can handle files into the multi-gigabyte range - e.g. larger than available memory. It's a thread-safe library, so it's easy to integrate. Java-only.
Saxon is an open-source version, with a modestly-priced more expensive cousin, which will do streaming in some contexts. Java, but with a .net port also.
MarkLogic and eXist are XML databases that, if your XML is loaded into them, will process XPaths in a fairly intelligent fashion.
Try Joost.
Though I have no practical experience with it, I thought it is worth mentioning QuiXProc ( http://code.google.com/p/quixproc/ ). It is a streaming approach to XProc, and uses libraries that provide streaming support for XPath amongst others..
FWIW, I've used Nux streaming filter xpath queries against very large (>3GB) files, and it's both worked flawlessly and used very little memory. My use case is been slightly different (not validation centric), but I'd highly encourage you to give it a shot with Nux.
I think I'll go for custom code. .NET library gets us quite close to the target, if one just wants to read some paths of the xml document.
Since all the solutions I see so far respect only XPath subset, this is also this kind of solution. The subset is really small though. :)
This C# code reads xml file and counts nodes given an explicit path. You can also operate on attributes easily, using xr["attrName"] syntax.
int c = 0;
var r = new System.IO.StreamReader(asArgs[1]);
var se = new System.Xml.XmlReaderSettings();
var xr = System.Xml.XmlReader.Create(r, se);
var lstPath = new System.Collections.Generic.List<String>();
var sbPath = new System.Text.StringBuilder();
while (xr.Read()) {
//Console.WriteLine("type " + xr.NodeType);
if (xr.NodeType == System.Xml.XmlNodeType.Element) {
lstPath.Add(xr.Name);
}
// It takes some time. If 1 unit is time needed for parsing the file,
// then this takes about 1.0.
sbPath.Clear();
foreach(object n in lstPath) {
sbPath.Append('/');
sbPath.Append(n);
}
// This takes about 0.6 time units.
string sPath = sbPath.ToString();
if (xr.NodeType == System.Xml.XmlNodeType.EndElement
|| xr.IsEmptyElement) {
if (xr.Name == "someElement" && lstPath[0] == "main")
c++;
// And test simple XPath explicitly:
// if (sPath == "/main/someElement")
}
if (xr.NodeType == System.Xml.XmlNodeType.EndElement
|| xr.IsEmptyElement) {
lstPath.RemoveAt(lstPath.Count - 1);
}
}
xr.Close();