Are the messages from an xpath-splitter ordered? - java

Can I depend on the order of the messages output from an int-xml:xpath-splitter?
I am using:
Java 1.6
spring-integration-xml-3.0.0.RELEASE
spring-xml-2.1.1.RELEASE
Saxon-HE-9.4.0-9
Example:
Given the following XML Document:
<?xml version="1.0" encoding="UTF-8"?>
<tests>
<test>test</test>
<test>test2</test>
</tests>
And the following int-xml:xpath-splitter:
<int-xml:xpath-splitter id="messageSplitter"
input-channel="inbound"
output-channel="routing"
create-documents="false">
<int-xml:xpath-expression expression="/*/*"/>
</int-xml:xpath-splitter>
<int:channel id="routing">
<int:queue/>
</int:channel>
Will the routing channel always receive <test>test</test> before <test>test2</test>?

I believe the answer to this question is it depends on which JAXP provider you are using.
If you have configured an XPath engine that supports XPath 2.0; then yes, messages will be in document-order.
If you use the default or an XPath 1.0 engine; then no, order cannot be guaranteed1.
Does XPath support ordering?
XPath 1.0 does not1, XPath 2.0 does.
According to the XPath 1.0 specification, XPath 1.0 expressions return node-set types and node-set types are unordered:
The primary syntactic construct in XPath is the expression. An expression matches the production Expr. An expression is evaluated to yield an object, which has one of the following four basic types:
node-set (an unordered collection of nodes without duplicates)
boolean (true or false)
number (a floating-point number)
string (a sequence of UCS characters)
On the other hand, the XPath 2.0 specification introduces the idea of document order:
Document order is a total ordering, although the relative order of some nodes is implementation-dependent. [Definition: Informally, document order is the order in which nodes appear in the XML serialization of a document.] [Definition: Document order is stable, which means that the relative order of two nodes will not change during the processing of a given expression, even if this order is implementation-dependent.]
The specification also states that evaluation of Path Expressions returns a sequence of nodes in document order:
The sequences resulting from all the evaluations of E2 are combined as follows:
If every evaluation of E2 returns a (possibly empty) sequence of nodes, these sequences are combined, and duplicate nodes are eliminated based on node identity. The resulting node sequence is returned in document order.
If every evaluation of E2 returns a (possibly empty) sequence of atomic values, these sequences are concatenated, in order, and returned.
If the multiple evaluations of E2 return at least one node and at least one atomic value, a type error is raised [err:XPTY0018].
Does Spring Integration/Spring XML use XPath 1.0 or XPath 2.0?
Spring uses javax.xml.xpath.XPathFactory.newInstance(String uri) to create an XPathFactory that is used to create XPath objects. The JavaDoc explains in detail how XPathFactory.newInstance(uri) behaves, but suffice it to say that it searches for an appropriate factory within your project.
If your project does not specify differently, Spring will use a JAXP 1.3 implementation to create XPath Expressions.
As of Java 1.6, JAXP 1.3 only supports XPath 1.0.
What if I have an XPathFactory that supports XPath 2.0?
Refer to the documentation for the XPathFactory you are using.
In my case, I have Saxon-HE-9.4.0-9 setup for my project. Saxon implements the JAXP 1.3 API and Supports XPath 2.0.
The Saxon documentation for how XPath's are evaluated does not explicitly state that NODESET expressions will be returned in document-order. However, it does state that the return object is a Java List object.
Based on the fact that XPath 2.0 expressions return nodes in document-order, Saxon supports XPath 2.0, and Saxon returns a Java List object (which is ordered), I think that Saxon will return nodes in document-order.
So my factory returns nodes from my XPath expression in document-order; are the messages from xpath-splitter also in document-order?
Yes they will; based on my analysis of the code in org.springframework.integration.xml.splitter.XPathMessageSplitter.splitMessage(Message<?>).
Both splitDocument(document) and splitNode(Node node) retain the order of the List returned from the evaluation of the XPath Expression.
1. As noted by Michael Kay in his comment and answer on another question, although the XPath specification does not guarantee order, in practice implementations of XPath 1.0 generally also support XSLT 1.0 and will return nodes in document-order because XSLT requires it.

Related

How do I get the string value of a node?

The XPath string(/ROOT/Products/UnitPrice) works fine in dom4j & the .NET runtime. But in Saxon it throws an exception of:
net.sf.saxon.s9api.SaxonApiException: A sequence of more than one item is not allowed as the first argument of string() (<UnitPrice/>, <UnitPrice/>, ...)
What's going on here? Why is this not OK?
Saxon expects a single node as input.
The .NET implementation is different; it considers only the first one:
The string() function converts a node-set to a string by returning the string value of the first node in the node-set, which in some instances may yield unexpected results.
See MSDN
Problem is: /ROOT/Products/UnitPrice may return more than one result and XPath 2.0 string function does not accept more than one argument (see here).
Saxon is XPath 2.0 compliant. To solve your problem, you can write this XPath expression:
for $price in /ROOT/Products/UnitPrice return string($price)
You will then have to iterate over the result (XdmValue object).
If you are using the s9api interface, you can call
XPathCompiler.setBackwardsCompatible(true);
to make XPath expressions run in XPath 1.0 compatibility mode. This doesn't completely replicate all aspects of XPath 1.0 behaviour, but it will handle most of the things that changed between XPath 1.0 and 2.0.
Very often the incompatibilities that were introduced in 2.0 are because they affect areas that were a common source of user errors in 1.0. It's really best not to rely on the implicit truncation of an input sequence performed by functions like string(); it's the cause of many application bugs.
==LATER==
We tried to remove 1.0 compatibility mode in Saxon-HE 9.8, thinking that after 10 years few people would still be relying on it. Unfortunately those few made a fuss, and we decided to backtrack. But I've just seen that in HE 9.8, the setBackwardsCompatible() method will throw an error saying it's not supported. Try instead:
XPathCompiler.getUnderlyingStaticContext().setBackwardsCompatibilityMode(true);

Find difference between xml file contents

I am comparing my XML files using the sample code (Possible duplicate) in the below post by acdcjunior - Best way to compare 2 XML documents in Java
I see the below error from the assert test.
Expected presence of doctype declaration 'null' but was 'not null' - comparing at to <!DOCTYPE plist PSECTOR " ..........
Can someone please guide me what I can do to fix this?
Okay, I found the solution here - http://xmlunit.sourceforge.net/userguide/XMLUnit-Java.pdf
For efficiency reasons a Diff stops the comparison process as soon as the first difference is found. To get all the differences
between two pieces of XML an instance of the DetailedDiff class, a subclass of Diff, is required. Note that a Detailed
Diff is constructed using an existing Diff instance.
For future readers, here is the solution (also in the link - Pg 9) -
DifferenceListener myDifferenceListener = new IgnoreTextAndAttributeValuesDifferenceListener();
Diff myDiff = new Diff(expectedXML, actualXML);
myDiff.overrideDifferenceListener(myDifferenceListener);
Assert.assertTrue("test XML matches control skeleton XML", myDiff.similar());
From the link again,
The DifferenceEngine class generates the events that are passed to a DifferenceListener implementation as two
pieces of XML are compared. Using recursion it navigates through the nodes in the control XML DOM, and determines which
node in the test XML DOM qualifies for comparison to the current control node. The qualifying test node will match the control
node’s node type, as well as the node name and namespace (if defined for the control node).

Simple java recursive descent parsing library with placeholders

For an application I want to parse a String with arithmetic expressions and variables. Just imagine this string:
((A + B) * C) / (D - (E * F))
So I have placeholders here and no actual integer/double values. I am searching for a library which allows me to get the first placeholder, put (via a database query for example) a value into the placeholder and proceed with the next placeholder.
So what I essentially want to do is to allow users to write a string in their domain language without knowing the actual values of the variables. So the application would provide numeric values depending on some "contextual logic" and would output the result of the calculation.
I googled and did not find any suitable library. I found ANTLR, but I think it would be very "heavyweight" for my usecase. Any suggestions?
You are right that ANTLR is a bit of an overkill. However parsing arithmetic expressions in infix notation isn't that hard, see:
Operator-precedence parser
Shunting-yard algorithm
Algorithms for Parsing Arithmetic Expressions
Also you should consider using some scripting languages like Groovy or JRuby. Also JDK 6 onwards provides built-in JavaScript support. See my answer here: Creating meta language with Java.
If all you want to do is simple expressions, and you know the grammar for those expressions in advance, you don't even need a library; you can code this trivially in pure Java.
See this answer for a detailed version of how:
Is there an alternative for flex/bison that is usable on 8-bit embedded systems?
If the users are defining thier own expression language, if it is always in the form of a few monadic or binary operators, and they can specify the precedence, you can bend the above answer by parameterizing the parser with a list of operators at several levels of precedence.
If the language can be more sophisticated, you might want to investigate metacompilers.

XSD: Index of sequence in Element name

I'm building an XSD to generate JAXB objects in Java. Then I ran into this:
<TotalBugs>
<Bug1>...</Bug1>
<Bug2>...</Bug2>
...
<BugN>...</BugN>
</TotalBugs>
How do I build a sequence of elements where the index of the sequence is in the element name? Specifically, how do I get the 1 in Bug1
You don't want to do it in this way, XML has a top-down order by nature. Consequently, you don't have to enumerate yourself:
<totalBugs>
<bug><!-- Here comes 1st bug --></bug>
<bug><!-- Here comes 2nd bug --></bug>
...
<bug><!-- Here comes last bug --></bug>
</totalBugs>
You can access the 1st bug node in the list by the XPath expression:
/totalBugs/bug[1]
Note, indexes start by W3C standard at 1. Please refer to for further readings to w3schools.
I'm pretty sure XSD won't support what you need. However you can use <xsd:any> for that bit of the schema, then use something lower-level than JAXB to generate the XML for that particular part. (I think your generated classes will have fields like protected List<Element> any; which you can fill in using DOM).

Streaming XPath evaluation

Are there any production-ready libraries for streaming XPath expressions evaluation against provided xml-document? My investigations show that most of existing solutions load entire DOM-tree into memory before evaluating xpath expression.
XSLT 3.0 provides streaming mode of processing and this will become a standard with the XSLT 3.0 W3C specification becoming a W3C Recommendation.
At the time of writing this answer (May, 2011) Saxon provides some support for XSLT 3.0 streaming .
Would this be practical for a complete XPath implementation, given that XPath syntax allows for:
/AAA/XXX/following::*
and
/AAA/BBB/following-sibling::*
which implies look-ahead requirements ? i.e. from a particular node you're going to have to load the rest of the document anyway.
The doc for the Nux library (specifically StreamingPathFilter) makes this point, and references some implementations that rely on a subset of XPath. Nux claims to perform some streaming query capability, but given the above there will be some limitations in terms of XPath implementation.
There are several options:
DataDirect Technologies sells an XQuery implementation that employs projection and streaming, where possible. It can handle files into the multi-gigabyte range - e.g. larger than available memory. It's a thread-safe library, so it's easy to integrate. Java-only.
Saxon is an open-source version, with a modestly-priced more expensive cousin, which will do streaming in some contexts. Java, but with a .net port also.
MarkLogic and eXist are XML databases that, if your XML is loaded into them, will process XPaths in a fairly intelligent fashion.
Try Joost.
Though I have no practical experience with it, I thought it is worth mentioning QuiXProc ( http://code.google.com/p/quixproc/ ). It is a streaming approach to XProc, and uses libraries that provide streaming support for XPath amongst others..
FWIW, I've used Nux streaming filter xpath queries against very large (>3GB) files, and it's both worked flawlessly and used very little memory. My use case is been slightly different (not validation centric), but I'd highly encourage you to give it a shot with Nux.
I think I'll go for custom code. .NET library gets us quite close to the target, if one just wants to read some paths of the xml document.
Since all the solutions I see so far respect only XPath subset, this is also this kind of solution. The subset is really small though. :)
This C# code reads xml file and counts nodes given an explicit path. You can also operate on attributes easily, using xr["attrName"] syntax.
int c = 0;
var r = new System.IO.StreamReader(asArgs[1]);
var se = new System.Xml.XmlReaderSettings();
var xr = System.Xml.XmlReader.Create(r, se);
var lstPath = new System.Collections.Generic.List<String>();
var sbPath = new System.Text.StringBuilder();
while (xr.Read()) {
//Console.WriteLine("type " + xr.NodeType);
if (xr.NodeType == System.Xml.XmlNodeType.Element) {
lstPath.Add(xr.Name);
}
// It takes some time. If 1 unit is time needed for parsing the file,
// then this takes about 1.0.
sbPath.Clear();
foreach(object n in lstPath) {
sbPath.Append('/');
sbPath.Append(n);
}
// This takes about 0.6 time units.
string sPath = sbPath.ToString();
if (xr.NodeType == System.Xml.XmlNodeType.EndElement
|| xr.IsEmptyElement) {
if (xr.Name == "someElement" && lstPath[0] == "main")
c++;
// And test simple XPath explicitly:
// if (sPath == "/main/someElement")
}
if (xr.NodeType == System.Xml.XmlNodeType.EndElement
|| xr.IsEmptyElement) {
lstPath.RemoveAt(lstPath.Count - 1);
}
}
xr.Close();

Categories

Resources