How do I get the string value of a node? - java

The XPath string(/ROOT/Products/UnitPrice) works fine in dom4j & the .NET runtime. But in Saxon it throws an exception of:
net.sf.saxon.s9api.SaxonApiException: A sequence of more than one item is not allowed as the first argument of string() (<UnitPrice/>, <UnitPrice/>, ...)
What's going on here? Why is this not OK?

Saxon expects a single node as input.
The .NET implementation is different; it considers only the first one:
The string() function converts a node-set to a string by returning the string value of the first node in the node-set, which in some instances may yield unexpected results.
See MSDN

Problem is: /ROOT/Products/UnitPrice may return more than one result and XPath 2.0 string function does not accept more than one argument (see here).
Saxon is XPath 2.0 compliant. To solve your problem, you can write this XPath expression:
for $price in /ROOT/Products/UnitPrice return string($price)
You will then have to iterate over the result (XdmValue object).

If you are using the s9api interface, you can call
XPathCompiler.setBackwardsCompatible(true);
to make XPath expressions run in XPath 1.0 compatibility mode. This doesn't completely replicate all aspects of XPath 1.0 behaviour, but it will handle most of the things that changed between XPath 1.0 and 2.0.
Very often the incompatibilities that were introduced in 2.0 are because they affect areas that were a common source of user errors in 1.0. It's really best not to rely on the implicit truncation of an input sequence performed by functions like string(); it's the cause of many application bugs.
==LATER==
We tried to remove 1.0 compatibility mode in Saxon-HE 9.8, thinking that after 10 years few people would still be relying on it. Unfortunately those few made a fuss, and we decided to backtrack. But I've just seen that in HE 9.8, the setBackwardsCompatible() method will throw an error saying it's not supported. Try instead:
XPathCompiler.getUnderlyingStaticContext().setBackwardsCompatibilityMode(true);

Related

Are the messages from an xpath-splitter ordered?

Can I depend on the order of the messages output from an int-xml:xpath-splitter?
I am using:
Java 1.6
spring-integration-xml-3.0.0.RELEASE
spring-xml-2.1.1.RELEASE
Saxon-HE-9.4.0-9
Example:
Given the following XML Document:
<?xml version="1.0" encoding="UTF-8"?>
<tests>
<test>test</test>
<test>test2</test>
</tests>
And the following int-xml:xpath-splitter:
<int-xml:xpath-splitter id="messageSplitter"
input-channel="inbound"
output-channel="routing"
create-documents="false">
<int-xml:xpath-expression expression="/*/*"/>
</int-xml:xpath-splitter>
<int:channel id="routing">
<int:queue/>
</int:channel>
Will the routing channel always receive <test>test</test> before <test>test2</test>?
I believe the answer to this question is it depends on which JAXP provider you are using.
If you have configured an XPath engine that supports XPath 2.0; then yes, messages will be in document-order.
If you use the default or an XPath 1.0 engine; then no, order cannot be guaranteed1.
Does XPath support ordering?
XPath 1.0 does not1, XPath 2.0 does.
According to the XPath 1.0 specification, XPath 1.0 expressions return node-set types and node-set types are unordered:
The primary syntactic construct in XPath is the expression. An expression matches the production Expr. An expression is evaluated to yield an object, which has one of the following four basic types:
node-set (an unordered collection of nodes without duplicates)
boolean (true or false)
number (a floating-point number)
string (a sequence of UCS characters)
On the other hand, the XPath 2.0 specification introduces the idea of document order:
Document order is a total ordering, although the relative order of some nodes is implementation-dependent. [Definition: Informally, document order is the order in which nodes appear in the XML serialization of a document.] [Definition: Document order is stable, which means that the relative order of two nodes will not change during the processing of a given expression, even if this order is implementation-dependent.]
The specification also states that evaluation of Path Expressions returns a sequence of nodes in document order:
The sequences resulting from all the evaluations of E2 are combined as follows:
If every evaluation of E2 returns a (possibly empty) sequence of nodes, these sequences are combined, and duplicate nodes are eliminated based on node identity. The resulting node sequence is returned in document order.
If every evaluation of E2 returns a (possibly empty) sequence of atomic values, these sequences are concatenated, in order, and returned.
If the multiple evaluations of E2 return at least one node and at least one atomic value, a type error is raised [err:XPTY0018].
Does Spring Integration/Spring XML use XPath 1.0 or XPath 2.0?
Spring uses javax.xml.xpath.XPathFactory.newInstance(String uri) to create an XPathFactory that is used to create XPath objects. The JavaDoc explains in detail how XPathFactory.newInstance(uri) behaves, but suffice it to say that it searches for an appropriate factory within your project.
If your project does not specify differently, Spring will use a JAXP 1.3 implementation to create XPath Expressions.
As of Java 1.6, JAXP 1.3 only supports XPath 1.0.
What if I have an XPathFactory that supports XPath 2.0?
Refer to the documentation for the XPathFactory you are using.
In my case, I have Saxon-HE-9.4.0-9 setup for my project. Saxon implements the JAXP 1.3 API and Supports XPath 2.0.
The Saxon documentation for how XPath's are evaluated does not explicitly state that NODESET expressions will be returned in document-order. However, it does state that the return object is a Java List object.
Based on the fact that XPath 2.0 expressions return nodes in document-order, Saxon supports XPath 2.0, and Saxon returns a Java List object (which is ordered), I think that Saxon will return nodes in document-order.
So my factory returns nodes from my XPath expression in document-order; are the messages from xpath-splitter also in document-order?
Yes they will; based on my analysis of the code in org.springframework.integration.xml.splitter.XPathMessageSplitter.splitMessage(Message<?>).
Both splitDocument(document) and splitNode(Node node) retain the order of the List returned from the evaluation of the XPath Expression.
1. As noted by Michael Kay in his comment and answer on another question, although the XPath specification does not guarantee order, in practice implementations of XPath 1.0 generally also support XSLT 1.0 and will return nodes in document-order because XSLT requires it.

Simple java recursive descent parsing library with placeholders

For an application I want to parse a String with arithmetic expressions and variables. Just imagine this string:
((A + B) * C) / (D - (E * F))
So I have placeholders here and no actual integer/double values. I am searching for a library which allows me to get the first placeholder, put (via a database query for example) a value into the placeholder and proceed with the next placeholder.
So what I essentially want to do is to allow users to write a string in their domain language without knowing the actual values of the variables. So the application would provide numeric values depending on some "contextual logic" and would output the result of the calculation.
I googled and did not find any suitable library. I found ANTLR, but I think it would be very "heavyweight" for my usecase. Any suggestions?
You are right that ANTLR is a bit of an overkill. However parsing arithmetic expressions in infix notation isn't that hard, see:
Operator-precedence parser
Shunting-yard algorithm
Algorithms for Parsing Arithmetic Expressions
Also you should consider using some scripting languages like Groovy or JRuby. Also JDK 6 onwards provides built-in JavaScript support. See my answer here: Creating meta language with Java.
If all you want to do is simple expressions, and you know the grammar for those expressions in advance, you don't even need a library; you can code this trivially in pure Java.
See this answer for a detailed version of how:
Is there an alternative for flex/bison that is usable on 8-bit embedded systems?
If the users are defining thier own expression language, if it is always in the form of a few monadic or binary operators, and they can specify the precedence, you can bend the above answer by parameterizing the parser with a list of operators at several levels of precedence.
If the language can be more sophisticated, you might want to investigate metacompilers.

Detecting equivalent expressions

I'm currently working on a Java application where I need to implement a system for building BPF expressions. I also need to implement mechanism for detecting equivalent BPF expressions.
Building the expression is not too hard. I can build a syntax tree using the Interpreter design pattern and implement the toString for getting the BPF syntax.
However, detecting if two expressions are equivalent is much harder. A simple example would be the following:
A: src port 1024 and dst port 1024
B: dst port 1024 and src port 1024
In order to detect that A and B are equivalent I probably need to transform each expression into a "normalized" form before comparing them. This would be easy for above example, however, when working with a combination of nested AND, OR and NOT operations it's getting harder.
Does anyone know how I should best approach this problem?
One way to compare boolean expressions may be to convert both to the disjunctive normal form (DNF), and compare the DNF. Here, the variables would be Berkeley Packet Filter tokens, and the same token (e.g. port 80) appearing anywhere in either of the two expressions would need to be assigned the same variable name.
There is an interesting-looking applet at http://www.izyt.com/BooleanLogic/applet.php - sadly I can't give it a try right now due to Java problems in my browser.
I'm pretty sure detecting equivalent expressions is either an np-hard or np-complete problem, even for boolean-only expressions. Meaning that to do it perfectly, the optimal way is basically to build complete tables of all possible combinations of inputs and the results, then compare the tables.
Maybe BPF expressions are limited in some way that changes that? I don't know, so I'm assuming not.
If your problems are small, that may not be a problem. I do exactly that as part of a decision-tree designing algorithm.
Alternatively, don't try to be perfect. Allow some false negatives (cases which are equivalent, but which you won't detect).
A simple approach may be to do a variant of the normal expression-evaluation, but evaluating an alternative representation of the expression rather than the result. Impose an ordering on commutative operators. Apply some obvious simplifications during the evaluation. Replace a rich operator set with a minimal set of primitive operators - e.g. using de-morgans to eliminate OR operators.
This alternative representation forms a canonical representation for all members of a set of equivalent expressions. It should be an equivalence class in the sense that you always find the same canonical form for any member of that set. But that's only the set-theory/abstract-algebra sense of an equivalence class - it doesn't mean that all equivalent expressions are in the same equivalence class.
For efficient dictionary lookups, you can use hashes or comparisons based on that canonical representation.
I'd definitely go with syntax normalization. That is, like aix suggested, transform the booleans using DNF and reorder the abstract syntax tree such that the lexically smallest arguments are on the left-hand side. Normalize all comparisons to < and <=. Then, two equivalent expressions should have equivalent syntax trees.

XSD: Index of sequence in Element name

I'm building an XSD to generate JAXB objects in Java. Then I ran into this:
<TotalBugs>
<Bug1>...</Bug1>
<Bug2>...</Bug2>
...
<BugN>...</BugN>
</TotalBugs>
How do I build a sequence of elements where the index of the sequence is in the element name? Specifically, how do I get the 1 in Bug1
You don't want to do it in this way, XML has a top-down order by nature. Consequently, you don't have to enumerate yourself:
<totalBugs>
<bug><!-- Here comes 1st bug --></bug>
<bug><!-- Here comes 2nd bug --></bug>
...
<bug><!-- Here comes last bug --></bug>
</totalBugs>
You can access the 1st bug node in the list by the XPath expression:
/totalBugs/bug[1]
Note, indexes start by W3C standard at 1. Please refer to for further readings to w3schools.
I'm pretty sure XSD won't support what you need. However you can use <xsd:any> for that bit of the schema, then use something lower-level than JAXB to generate the XML for that particular part. (I think your generated classes will have fields like protected List<Element> any; which you can fill in using DOM).

Streaming XPath evaluation

Are there any production-ready libraries for streaming XPath expressions evaluation against provided xml-document? My investigations show that most of existing solutions load entire DOM-tree into memory before evaluating xpath expression.
XSLT 3.0 provides streaming mode of processing and this will become a standard with the XSLT 3.0 W3C specification becoming a W3C Recommendation.
At the time of writing this answer (May, 2011) Saxon provides some support for XSLT 3.0 streaming .
Would this be practical for a complete XPath implementation, given that XPath syntax allows for:
/AAA/XXX/following::*
and
/AAA/BBB/following-sibling::*
which implies look-ahead requirements ? i.e. from a particular node you're going to have to load the rest of the document anyway.
The doc for the Nux library (specifically StreamingPathFilter) makes this point, and references some implementations that rely on a subset of XPath. Nux claims to perform some streaming query capability, but given the above there will be some limitations in terms of XPath implementation.
There are several options:
DataDirect Technologies sells an XQuery implementation that employs projection and streaming, where possible. It can handle files into the multi-gigabyte range - e.g. larger than available memory. It's a thread-safe library, so it's easy to integrate. Java-only.
Saxon is an open-source version, with a modestly-priced more expensive cousin, which will do streaming in some contexts. Java, but with a .net port also.
MarkLogic and eXist are XML databases that, if your XML is loaded into them, will process XPaths in a fairly intelligent fashion.
Try Joost.
Though I have no practical experience with it, I thought it is worth mentioning QuiXProc ( http://code.google.com/p/quixproc/ ). It is a streaming approach to XProc, and uses libraries that provide streaming support for XPath amongst others..
FWIW, I've used Nux streaming filter xpath queries against very large (>3GB) files, and it's both worked flawlessly and used very little memory. My use case is been slightly different (not validation centric), but I'd highly encourage you to give it a shot with Nux.
I think I'll go for custom code. .NET library gets us quite close to the target, if one just wants to read some paths of the xml document.
Since all the solutions I see so far respect only XPath subset, this is also this kind of solution. The subset is really small though. :)
This C# code reads xml file and counts nodes given an explicit path. You can also operate on attributes easily, using xr["attrName"] syntax.
int c = 0;
var r = new System.IO.StreamReader(asArgs[1]);
var se = new System.Xml.XmlReaderSettings();
var xr = System.Xml.XmlReader.Create(r, se);
var lstPath = new System.Collections.Generic.List<String>();
var sbPath = new System.Text.StringBuilder();
while (xr.Read()) {
//Console.WriteLine("type " + xr.NodeType);
if (xr.NodeType == System.Xml.XmlNodeType.Element) {
lstPath.Add(xr.Name);
}
// It takes some time. If 1 unit is time needed for parsing the file,
// then this takes about 1.0.
sbPath.Clear();
foreach(object n in lstPath) {
sbPath.Append('/');
sbPath.Append(n);
}
// This takes about 0.6 time units.
string sPath = sbPath.ToString();
if (xr.NodeType == System.Xml.XmlNodeType.EndElement
|| xr.IsEmptyElement) {
if (xr.Name == "someElement" && lstPath[0] == "main")
c++;
// And test simple XPath explicitly:
// if (sPath == "/main/someElement")
}
if (xr.NodeType == System.Xml.XmlNodeType.EndElement
|| xr.IsEmptyElement) {
lstPath.RemoveAt(lstPath.Count - 1);
}
}
xr.Close();

Categories

Resources