groovy xml parsing function

groovy xml parsing function - java

I wish to have a groovy function which can take 2 or more parameters something like input, find_tag.
I wrote something like below to test(not function), but it does not give me D_1164898448. Please help me with it.
def temp="""<Portals objVersion=\"1.1.19\">
<vector xsi:type=\"domainservice:Portals\" objVersion=\"1.1.19\">
<domainName>D_1164898448</domainName>
<address xsi:type=\"metadata:NodeRef\" objVersion=\"1.1.19\">
<host>Komodo</host>
<port>18442</port>
</address>
</vector>
</Portals>"""
def fInput="domainName"
def records = new XmlParser().parseText(temp)
def t=records.findAll{ it.fInput}.text()
println t
Update
for attribute i am doin something like below
println "id = ${records.attribute("id")}"
but like wise how to do it for nodes?
println "host = ${records.vector.address.host.text()}"

If you don't know the exact path to the XML tag you're searching for, you can do something like this to get the content of all tags with the given name:
def t = records."**"."$fInput".text()
To access attributes from a given XML node you can also use the # notation, e.g.
records.vector.#objVersion

What you need to do is:
turn off namespace awareness, so that XmlParser won't throw an error on encountering unbound xsi: prefix. You can do it by passing right arguments to XmlParser constructor.
properly traverse the DOM tree returned by parser - it returns a Node, not a list, and using findAll the way you used will not work
(optionally) remove backslashes from before double quotes in your XML, as escaping double quotes inside a heredoc is not necessary
Your code after corrections:
def temp="""
<Portals objVersion="1.1.19">
<vector xsi:type="domainservice:Portals" objVersion="1.1.19">
<domainName>D_1164898448</domainName>
<address xsi:type="metadata:NodeRef" objVersion="1.1.19">
<host>Komodo</host>
<port>18442</port>
</address>
</vector>
</Portals>
"""
def fInput="domainName"
def records= new XmlParser(false, false).parseText(temp)
def t = records.vector."$fInput".text()
println t
Running it displays 'D_1164898448', as expected.

I'm think you must use XPath expression here, or if you input xml excactly as you show in question, i'm recommend to you regexp like
def temp = ".." //your temp
def m = temp =~ /<domainName>(.*)</domainName>/
print m[0][1] // should be your domain
more about groovy regexp http://groovy.codehaus.org/Regular+Expressions

Related

unknown number of children in ANTLR tree

I am working on a parser for a calculator, which also needs to build a tree.
For example:
exp returns[Tree tree]e1=exp e2=operator e3=exp{
Tree tempTree = ($e2.tree);
tempTree.insertChild ($e1.tree);
tempTree.insertChild ($e3.tree);
$tree = tempTree;
}
I would like to know how can I build a tree for a multiple arguments function without assuming number of children.
For example: max(a,b,c,d,..)
I thought of using something like FUNCTION LEFTBRACKET exp (COMMA exp)* RIGHTBRACKET
but I am not sure about building the tree for the * expression

Something like:
FUNCTION: FUNCTION_NAME LEFTBRACKET PARAMETERS RIGHTBRACKET;
PARAMETERS: exp | exp COMMA PARAMEGERS;
may help.

What you did works fine, and the children will be put into a list that you can access via expr().

How to use XPath to get attributes of BPMN nodes in java?

I have tried to use XPath with XML files and it works fine. Now I want to use it with BPMN files.
My BPMN file looks sth like this:
<bpmn2:startEvent id="StartEvent_1" name="StartProcess">
<bpmn2:outgoing>SequenceFlow_1</bpmn2:outgoing>
</bpmn2:startEvent>
I try to get the value of the id attribute of the bpmn2:startEvent node using this line of code:
startEventID = xml.getParameterString("(//bpmn2:startEvent/#id)");
System.out.println(startEventID);
But it prints me a blank line ... and not the id : StartEvent_1
Any suggestion for this plz?

You can use this expression: "//*[local-name()='startEvent']/#id".
Note that this may be tricky if you have same tag names in different namespaces.

Groovy: how to parse xml and preserve namespaces and schemaLocations

I'm trying to use groovy to simply add a node to a at a particular location. My source schema looks like this
<s1:RootNode
xmlns:s1="http://localhost/s1schema"
xmlns:s2="http://localhost/s2schema"
xsi:schemaLocation="http://localhost/s1schema s1schema.xsd
http://localhost/s2schema s2schema.xsd">
<s1:aParentNode>
<s2:targetNode>
<s2:childnode1 />
<s2:childnode2 />
<s2:childnode3 />
<s2:childnode4 />
</s2:targetNode>
</s1:aParentNode>
</s1:RootNode>
I'd like to simply add a new child node inline with the other ones to make the output
<s1:RootNode
xmlns:s1="http://localhost/s1schema"
xmlns:s2="http://localhost/s2schema"
xsi:schemaLocation="http://localhost/s1schema s1schema.xsd
http://localhost/s2schema s2schema.xsd">
<s1:aParentNode>
<s2:targetNode>
<s2:childnode1 />
<s2:childnode2 />
<s2:childnode3 />
<s2:childnode4 />
<s2:childnode5 >value</s2:childnode5>
</s2:targetNode>
</s1:aParentNode>
</s1:RootNode>
To do this i have the following simple groovy script
def data = 'value'
def root = new XmlSlurper(false,true).parseText( sourceXML )
root.'aParentNode'.'topNode'.appendNode{
's2:childnode5' data
}
groovy.xml.XmlUtil.serialize(root);
however when i do this the namespaces and schemaLocations that are applied to the root node are being removed. and the namespace, but not the schema location is being added to each of the child nodes.
this is causing validation issues downstream.
How do i simply process this xml. perform no validation and leave the xml as is and add a single node of a namespace i specify?
One note: we process many messages and i won't know in advance the outer most namespace (s1 in the above example) but even with that, i'm really just lookign for a technique that is a "dumber" processing of xml
Thanks!

First, I had to add xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" to define your xsi namespace. Without it I would receive a SAXParseException for the unbound xsi prefix.
Additionally, I consulted this question on successfully appending a namespaced xml node to an existing document.
Finally, we had to utilize the StreamingMarkupBuilder to work around the moving of the namespaces. Bascially, by default the serializer moves the referenced namespaces to the first node that actually uses the namespace. In your case it was moving your s2 namespace attribute to the "targetNode" tag. The following code produces the results you want, but you will still have to know the correct namespaces to use to instantiate the StreamingMarkupBuilder.
def root = new XmlSlurper(false, true).parseText( sourceXML )
def data = '<s2:childnode5 xmlns:s2="http://localhost/s2schema">value</s2:childnode5>'
def xmlFragment = new XmlSlurper(false, true).parseText(data)
root.'aParentNode'.'targetNode'.appendNode(xmlFragment);
def outputBuilder = new StreamingMarkupBuilder()
String result = XmlUtil.serialize(outputBuilder.bind {
mkp.declareNamespace('s1':"http://localhost/s1schema")
mkp.declareNamespace('s2':"http://localhost/s2schema")
mkp.yield root }
)

XMLSlurper (or XMLParser) does not handle namespaces if you set the second parameter of the constructor:
XmlSlurper (boolean validating, boolean namespaceAware)
to false:
def root = new XmlSlurper(false, false).parseText( sourceXML )
Without setting namespaceAware to false, I also faced strange bahavior of the parser. After setting to false, it leaves the XML as is, with no namespace changes.

Retrieve value of attribute using XPath

I am trying to retrieve the value of an attribute from an xmel file using XPath and I am not sure where I am going wrong..
This is the XML File
<soapenv:Envelope>
<soapenv:Header>
<common:TestInfo testID="PI1" />
</soapenv:Header>
</soapenv:Envelope>
And this is the code I am using to get the value. Both of these return nothing..
XPathBuilder getTestID = new XPathBuilder("local-name(/*[local-name(.)='Envelope']/*[local-name(.)='Header']/*[local-name(.)='TestInfo'])");
XPathBuilder getTestID2 = new XPathBuilder("Envelope/Header/TestInfo/#testID");
Object doc2 = getTestID.evaluate(context, sourceXML);
Object doc3 = getTestID2.evaluate(context, sourceXML);
How can I retrieve the value of testID?

However you're iterating within the java, your context node is probably not what you think, so remove the "." specifier in your local-name(.) like so:
/*[local-name()='Header']/*[local-name()='TestInfo']/#testID worked fine for me with your XML, although as akaIDIOT says, there isn't an <Envelope> tag to be seen.

The XML file you provided does not contain an <Envelope> element, so an expression that requires it will never match.
Post-edit edit
As can be seen from your XML snippet, the document uses a specific namespace for the elements you're trying to match. An XPath engine is namespace-aware, meaning you'll have to ask it exactly what you need. And, keep in mind that a namespace is defined by its uri, not by its abbreviation (so, /namespace:element doesn't do much unless you let the XPath engine know what the namespace namespace refers to).

Your first XPath has an extra local-name() wrapped around the whole thing:
local-name(/*[local-name(.)='Envelope']/*[local-name(.)='Header']
/*[local-name(.)='TestInfo'])
The result of this XPath will either be the string value "TestInfo" if the TestInfo node is found, or a blank string if it is not.
If your XML is structured like you say it is, then this should work:
/*[local-name()='Envelope']/*[local-name()='Header']/*[local-name()='TestInfo']/#testID
But preferably, you should be working with namespaces properly instead of (ab)using local-name(). I have a post here that shows how to do this in Java.

If you don't care for the namespaces and use an XPath 2.0 compatible engine, use * for it.
//*:Header/*:TestInfo/#testID
will return the desired input.
It will probably be more elegant to register the needed namespaces (not covered here, depends on your XPath engine) and query using these:
//soapenv:Header/common:TestInfo/#testID

extracting attribute value in XML using regex

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE ... ]>
<abc-config version="THIS" id="abc">
...
</abc-config>
Hi all,
In the code above, how can I extract the value of version attribute using Regex in Groovy/Java?
Thanks.

A regex to handle this could be something like:
/<\?xml version="([0-9.]+)"/
I'll spare you one of the 10000 lectures about not using a regex to parse markup languages.
Edit: The One whose Name cannot be expressed in the Basic Multilingual Plane, He compelled me.

I know you asked for a regex, but what's wrong with this in Groovy?
Assuming the xml is something like:
def xml= '''<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE abc-config>
<abc-config version="THIS" id="abc">
<node></node>
</abc-config>'''
Then I can parse it with:
def n = new XmlSlurper().parseText( xml )
And then this line:
println n.#version
Prints out "THIS"
If you are having problems with a more complex DOCTYPE failing to load, you can try disabling the DOCTYPE checker by either:
def parser = new XmlSlurper()
parser.setFeature( "http://apache.org/xml/features/nonvalidating/load-external-dtd", false )
parser.setFeature( "http://xml.org/sax/features/namespaces", false )
parser.parseText( xml )
or by using the constructor for XmlSlurper that takes 2 parameters so as to disable this checking

Not a java regex, Perl regex...
/<\w+\s+[^>]*?(?<=\s)version\s*=\s*["'](.+?)["'][^>]*?\s*\/?>/sg
Note that this fails on many levels, I could fill the page with a proper regex, but I don't have the desire.
this fails too ...
/<\w+\s+[^>]*?(?<=\s)version\s*=\s*(".+?"|'.+?')[^>]*?\s*\/?>/sg
so does this
/<\w+\s+[^>]*?(?<=\s)version\s*=\s*(["'])(.+?)\1[^>]*?\s*\/?>/sg

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

groovy xml parsing function - java

If you don't know the exact path to the XML tag you're searching for, you can do something like this to get the content of all tags with the given name: def t = records."**"."$fInput".text() To access attributes from a given XML node you can also use the # notation, e.g. records.vector.#objVersion

Related

unknown number of children in ANTLR tree

How to use XPath to get attributes of BPMN nodes in java?

Groovy: how to parse xml and preserve namespaces and schemaLocations

Retrieve value of attribute using XPath

extracting attribute value in XML using regex

Categories

Resources