How do I correctly iterate through xml using Java?

How do I correctly iterate through xml using Java? - java

I have an XML file starting:
<?xml version="1.0"?>
<results>
<result id="0001">
<hometeam>
<name>Dantooine Destroyers</name>
<score>6</score>
</hometeam>
<awayteam>
<name>Wayland Warriors</name>
<score>0</score>
</awayteam>
</result>
<result id="0002">
<hometeam>
<name>Dantooine Destroyers</name>
<score>3</score>
</hometeam>
<awayteam>...
and in a java file:
if(event.isStartElement()){
if(event.asStartElement().getName().getLocalPart().equals(HOME)){
System.out.println("In hometeam"); // for testing purposes
event = eventReader.nextEvent(); // I expect <name> element
if(event.isStartElement()){ // <------------ FALSE
if(event.asStartElement().getName().getLocalPart().equals(NAME)){....
I'd expect this if statement to be true for the <name> element but if I stick in a System.out.println(event.isStartElement()) I get FALSE....
Also event.getEventType() returns XMLEvent.CHARACTERS which I don't understand... Can anybody see why?
Feel free to make edits to tags/title and question if necessary.

Characters means that next part of XML are well - characters (in your case newline and indentation) - low level parser is unqualified to discardthem for you. it delivers just raw events. It's your work to proces structure correctly.

That .nextEvent() call is probably bringing in the whitespace between <hometeam> and <name>. Note that in XML, all character data between tags (even if it's whitespace or newlines) is also accessible from the API.
You can test this by printing the element.
You don't normally see that whitespace with DOM-based APIs (or you can easily ignore it) but with event-driven APIs (like SAX or StAX) you have to ignore it.

Related

How to store XQuery results as new document in BaseX

I am very new to XML document database technologies, Xquery/Xpath and so on. So this is probably a very newbie question.
Scenario: A number of XML documents as the input, want to run some number of transformations can be run on these XML documents (using XQuery). I'd like to store these results. Into the same XML data store as the input.
So far I am experimenting with using the BaseX document database to store and processes these XML documents, and so far its been very easy to work with, I am impressed.
Ideally I'd like to interface with BaseX using the XQJ API (http://xqj.net/basex/) as my reasoning would be that XQJ would keep the implementation of application code independent of BaseX as can be. The secondary option would write my java code directly to the BaseX API.
The Problem: I am having a hard time figuring out how to store the results from an XQuery as a new "document" in the database. Perhaps this is more of a conceptual lack of understanding with XQuery (or XQuery Update) itself than any difficulty with the BaseX/XQJ API.
In this simple example if I have a query like this, it returns some XML output with in a format that I want for my new document
let $items := //firstName
return <results>
{ for $item in $items
return <result> {$item} </result>
}
</results>
Gives
<results>
<result>
<firstName>Bob</firstName>
</result>
<result>
<firstName>Joe</firstName>
</result>
<result>
<firstName>Tom</firstName>
</result>
</results>
I want to store this new <result> document back into the database, for use in later querying/transformations/etc. In SQL this makes sense to me. I would do CREATE TABLE <name> SELECT <query> or INSERT INTO, etc. But I am unclear what the equivalent is in XQuery. I think the XQuery Update functionality is what I need here, but I'm having trouble finding concrete examples.
This is further complicated when dealing with XQJ
XQResultSequence rs = xqe.executeQuery("//firstName");
// what do i do with it now??
Is there a way to persist this XQResultSequence back INTO the database using BaseX? Or even better, can I run additional XQueries directly on the XQResultSequence?
Thanks for the help!

BaseX implements XQuery Update Facility, so you should be able to use fn:put:
let $items := //firstName
return fn:put(
<results>{
for $item in $items
return <result> {$item} </result>
}</results>,
"/results/result-new.xml")
If you are running simple ad-hoc queries like above, it should be fairly straightforward. I'm not very familiar with XQJ, but if you want to run queries in a sequence, I suspect there is a way to pass those XQResultSequence variables back to a new query, in which you would likely accept it by declaring a variable as external in the following query:
declare variable $previous-result as item()* external;

xml to jaxb in xml cyclic references

How to convert following XML to java using jaxb
<work>
<subwork id="sub">
<ret="it">
</subwork>
<ret id="it">
<time>9</time>
</ret>
</work>
It is a bit tough since ret tag is outside subwork tag

Frst, you need to start with valid XML. I've made assumptions in correcting the XML:
<work>
<subwork id="sub">
<ret id="it"/>
</subwork>
<ret id="it">
<time>9</time>
</ret>
</work>
Second (and there are other ways of doing this), you need to create a schema that describes this XML. Without doing it for you, I'll say that the trick is to define an element, ret, and then refer to that element within the work element and again within the subwork element.
Third, you then feed that schema file (.XSD) into a tool that generates the JAXB classes. Typically this is xcj.exe (included with the Java JDK).

How can I parse CDATA?

How can I find and iterate through all the nodes present under CDATA and those nodes are started by (<) and closed by (>)?
Also, how should I iterate over all the child nodes and get the values like in below child node? I want to retrieve the value.
Input XML
<SOURCE TransactionId="1" ProviderName="ABCDD"><RESPONSE><![CDATA[<?xml version="1.0" encoding="utf-8"?><soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><soap:Body><NetworkResponse xmlns="http://www.example.com/"><NetworkResult><Network offering_id="13" transaction_id="2" submission_id="3" timestamp="20140828 16010683 GMT" customer_id="NETTest">
<Network_List>
<Network_Info att0="Y" att1="N" att2="N" att3="Y" att4="Y">
<SIM_DATA>
<SIM><![CDATA[1100040101]]></SIM>
</SIM_DATA>
<NetworkResponseInfo k_status="C">
<KEY1>269</KEY1>
<PARENTNODE>
<CHILDNODE1>
<KEY2>XXXXXXX</KEY2>
<KEY3>YYYYYYY</KEY3>
</CHILDNODE1>
<CHILDNODE2>
<KEY4>N</KEY4>
<KEY5>I</KEY5>
</CHILDNODE2>
<CHILDNODE3>
<KEY6>1</KEY6>
<KEY7>3</KEY7>
</CHILDNODE3>
</PARENTNODE>
<KEY8><![CDATA[some image not visible]]></KEY8>
<KEY9>N</KEY9>
<KEY10>15</KEY10>
</NetworkResponseInfo>
</Network_Info>
</Network_List>
<response_message_list transaction_status_code="000" transaction_status_text="Successful"/>
</Network></NetworkResult></NetworkResponse></soap:Body></soap:Envelope>]]></RESPONSE></SOURCE>
Output XML
<ns3:NetworkResponse>
<Networks_OF_List>
<NetCharSeq>
<Nrep>
<type>Some Image</type>
<data> Data Coming from KEY8 CDATA section</data>
</Nrep>
<Nrep>
<type>ANYTHING</type>
<data>VALUE INSIDE SIM CDATA</data>
</Nrep>
<NetDetail>
<MYKEY1>Value present inside KEY4</MYKEY1>
<MYKEY2>Value present inside KEY5</MYKEY2>
</NetDetail>
<SystemID>Value of KEY2</SystemID>
<SystemPath>Valuelue of KEY3</SystemPath>
</NetCharSeq>
</Networks_OF_List>
</ns3:NetworkResponse>

(Welcome at SO. Please note that you are downvoted by some users because you do not show what you have done so far. Have a look at the How To Ask section to learn how to ask questions that actually can be answered and are considered proper questions in the SO format.)
If you can use XSLT 3.0, you can consider using the new fn:parse-xml function, which will take a document-as-a-string.
However, your CDATA-section contains itself escaped data, which means that, after you apply fn:parse-xml, you will have to do it once again for the text node that is the child of NetworkResult.
A better solution is often to fix this at the source and creating an XML format that allows other XML in certain elements (you can allow this with a proper XSD). It will save you a lot of trouble and at least you XML can then be pre-validated.
If you are stuck with XSLT 2.0 or 1.0, you can use disable-output-escaping (google it, there is a lot of info around on how to use it), but you will have to re-process your output once more because of the double-escape that is used. You may want to consider an XProc pipeline to ease the process.
You wrote: Also, how should I iterate over all the child nodes and get the values like in below child node
That is what XSLT is all about, please read this XSLT Tutorial, or any other tutorial you can find, it will be explained to you in the first minutes.
Update: as suggested by michael.hor257k in the comments, you can also parse the escaped data by hand using string manipulation functions. As he already says in the comments, this is laborious and error-prone, but sometimes, esp. if the XML is not really XML after unescaping, but something like XML, then this may be your only option.

Retrieve value of attribute using XPath

I am trying to retrieve the value of an attribute from an xmel file using XPath and I am not sure where I am going wrong..
This is the XML File
<soapenv:Envelope>
<soapenv:Header>
<common:TestInfo testID="PI1" />
</soapenv:Header>
</soapenv:Envelope>
And this is the code I am using to get the value. Both of these return nothing..
XPathBuilder getTestID = new XPathBuilder("local-name(/*[local-name(.)='Envelope']/*[local-name(.)='Header']/*[local-name(.)='TestInfo'])");
XPathBuilder getTestID2 = new XPathBuilder("Envelope/Header/TestInfo/#testID");
Object doc2 = getTestID.evaluate(context, sourceXML);
Object doc3 = getTestID2.evaluate(context, sourceXML);
How can I retrieve the value of testID?

However you're iterating within the java, your context node is probably not what you think, so remove the "." specifier in your local-name(.) like so:
/*[local-name()='Header']/*[local-name()='TestInfo']/#testID worked fine for me with your XML, although as akaIDIOT says, there isn't an <Envelope> tag to be seen.

The XML file you provided does not contain an <Envelope> element, so an expression that requires it will never match.
Post-edit edit
As can be seen from your XML snippet, the document uses a specific namespace for the elements you're trying to match. An XPath engine is namespace-aware, meaning you'll have to ask it exactly what you need. And, keep in mind that a namespace is defined by its uri, not by its abbreviation (so, /namespace:element doesn't do much unless you let the XPath engine know what the namespace namespace refers to).

Your first XPath has an extra local-name() wrapped around the whole thing:
local-name(/*[local-name(.)='Envelope']/*[local-name(.)='Header']
/*[local-name(.)='TestInfo'])
The result of this XPath will either be the string value "TestInfo" if the TestInfo node is found, or a blank string if it is not.
If your XML is structured like you say it is, then this should work:
/*[local-name()='Envelope']/*[local-name()='Header']/*[local-name()='TestInfo']/#testID
But preferably, you should be working with namespaces properly instead of (ab)using local-name(). I have a post here that shows how to do this in Java.

If you don't care for the namespaces and use an XPath 2.0 compatible engine, use * for it.
//*:Header/*:TestInfo/#testID
will return the desired input.
It will probably be more elegant to register the needed namespaces (not covered here, depends on your XPath engine) and query using these:
//soapenv:Header/common:TestInfo/#testID

Xpath transformation not working in java

This is my xml document. I want to sign only the userID part using xml signature. I am using xpath transformation to select that particular element.
<samlp:AuthnRequest xmlns:samlp="urn:oasis:names:tc:SAML:2.0:protocol"
Version="2.0" IssueInstant="2012-05-22T13:40:52:390" ProtocolBinding="urn:oasis:na
mes:tc:SAML:2.0:bindings:HTTP-POST" AssertionConsumerServiceURL="localhos
t:8080/consumer.jsp">
<UserID>
xyz
</UserID>
<testing>
text
</testing>
<saml:Issuer xmlns:saml="urn:oasis:names:tc:SAML:2.0:assertion">
http://localhost:8080/saml/SProvider.jsp
</saml:Issuer>
</samlp:AuthnRequest>
I am using the following code to add the transformations :
transformList.add(exc14nTransform);
transformList.add(fac.newTransform(Transform.XPATH, new XPathFilterParameterSpec("samlp:AuthnRequest/UserID xmlns:samlp=\"urn:oasis:names:tc:SAML:2.0:protocol\"")));
But I get the following :
Original Exception was javax.xml.transform.TransformerException: Extra illegal t
okens: 'xmlns', ':', 'samlp', '=', '"urn:oasis:names:tc:SAML:2.0:protocol"'
So, I tried removing the xmlns part.
transformList.add(fac.newTransform(Transform.XPATH, new XPathFilterParameterSpec("samlp:AuthnRequest/UserID")));
But it signs the whole document and gives the following message :
com.sun.org.apache.xml.internal.security.utils.CachedXPa
thFuncHereAPI fixupFunctionTable
INFO: Registering Here function
What is the problem?
EDIT
As #Jörn Horstmann said the message is just a log or something like that. Now the problem is that even after giving the xpath query the whole document is signed instead of just the UserID. I confirmed this by changing the value of <testing>element after signing the document. The result is that the document does not get validated(If it signed only the UserID part, then any changes made to <testing> should result in a valid signature .)

This is not a valid xpath expression, there is no way to declare namespace prefixe inside the expression.
samlp:AuthnRequest/UserID xmlns:samlp="urn:oasis:names:tc:SAML:2.0:protocol"
XPathFilterParameterSpec does have another constructor that allows to specify a mapping of namespace prefixes, you could try the following expression:
new XPathFilterParameterSpec("samlp:AuthnRequest/UserID",
Collections.singletonMap("samlp", "urn:oasis:names:tc:SAML:2.0:protocol"))
Edit:
The message does not seem to be an error, see line 426 here, its log level should probably be lower than INFO though.
I also had a look at the description of xpath filtering:
The XPath expression appearing in the XPath parameter is evaluated once for each node in the input node-set. The result is converted to a boolean. If the boolean is true, then the node is included in the output node-set. If the boolean is false, then the node is omitted from the output node-set.
So the correct xpath expression to only include the UserID in the signature would be self::UserID. But don't ask me if this actually makes sense for a xml signature. The example in the specification seems to use a xpath expression to include everything except the signature element itself:
not(ancestor-or-self::dsig:Signature)
Edit 2:
The correct expression is actually ancestor-or-self::UserID since the filter also has to include the text child nodes of the UserID node.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.