How to ignore xsi:nil attributes when parsing xml with XmlSlurper - java

We're upgrading a 3rd-party product from which we consume XML content. The new version generates XML with xsi:nil="true" attributes, indicating null elements:
<?xml version="1.0" encoding="UTF-8"?>
<data xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<cusip xsi:nil="true"/>
<ticker xsi:nil="true"/>
<year>2014</year>
</data>
When parsing, we use:
def parsed = new XmlSlurper().parseText(xml)
...
element.attributes().each{ k,v -> {
}
..but the attribute key for xsi:nil="true", comes back as:
"{http://www.w3.org/2001/XMLSchema-instance}nil"
...and this is raising hell with our downstream processing because it's not expecting an attribute key enclosed in braces.
Does XmlSlurper support a way to ignore xsi schema type attributes without having to filter them out manually?
TO BE CLEAR
Given the xml...
<?xml version="1.0" encoding="UTF-8"?>
<data xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<cusip xsi:nil="true"/>
<ticker xsi:nil="true"/>
<year format='yyyy'>2014</year>
</data>
...only attribute format is visible; xsi:nil attributes are ignored:
def parser1 = new XmlParser(false, false).parseText(xml)
assert parser1.children()*.attributes().size() == 1 // for 'format'

UPDATE:
You can use XmlSlurper with namespaceAware set to false as:
def parsed = new XmlSlurper(false, false).parseText(xml)
you can also use XmlParser for parsing, similar to XmlSlurper, if feasible. You have the option of making the parser namespace unaware by using as below:
def parsed = new XmlParser(false, false).parseText(xml)
Toggle the second argument (namespaceAware) of the constructor to true to see the difference.
Example:
def xml = '''<?xml version="1.0" encoding="UTF-8"?>
<data xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<cusip xsi:nil="true"/>
<ticker xsi:nil="true"/>
<year>2014</year>
</data>
'''
def parser1 = new XmlParser(false, false).parseText(xml)
def parser2 = new XmlParser(false, true).parseText(xml)
println parser1.children()*.attributes()
println parser2.children()*.attributes()

Related

How to merge multple XML files and make proper namespaces in result XML

I have few XMLs needs to be merged and create a new xml. How i can extract the xml namespaces from the xmls and use it for resulting xml.
xml1
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<root1 xmlns:xlink="ABC/xlink" xmlns:xsi="XYZ/instance">
<tag1>123</tag1>
</root1>
xml2
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<root2 xmlns:xlink2="EFG/xlink2" xmlns:xsi="XYZ/instance">
<tag2>321</tag2>
</root2>
expected result xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<root xmlns:xlink="ABC/xlink" xmlns:xsi="XYZ/instance" xmlns:xlink2="EFG/xlink2">
<tag1>123</tag1>
<tag2>321</tag2>
</root>
Currently i am trying like below
Pattern pattern = Pattern.compile("xmlns[\\s\\S]*?>");
Matcher matcher = pattern.matcher(stringxml);
while(matcher.find()){
//storing the namespace.
}
Trying to do String manipulation to create the proper xml namespace. There are quite a few string operations happening to handle the namespace properly like need to avoid the duplicate namespace if its defined in both xmls. Is there any other better way to do it ?

how to compare prefixed with no prefix xml documents in xmlunit to get similar result

xmlnit does not recognize the following two xml "identical" (except one has defined namespace) documents to be similar:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns3:message xmlns:ns3="https://www.bookmarks.dev/xml/bookmarks">
<ns3:bookmarks>
<ns3:bookmark>
<ns3:name>Bookmarks and Snippets Manager</ns3:name>
<ns3:url>https://www.bookmarks.dev</ns3:url>
</ns3:bookmark>
</ns3:bookmarks>
</ns3:message>
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<message>
<bookmarks>
<bookmark>
<name>Bookmarks and Snippets Manager</name>
<url>https://www.bookmarks.dev</url>
</bookmark>
</bookmarks>
</message>
The failing unit test comparing the two:
#Test
void givenSameMessageOneWithoutNamespace_shouldBeSimilar() {
ClassLoader classLoader = getClass().getClassLoader();
final var withNamespaceInput =
Input.from(
new File(classLoader.getResource("with-namespace.xml").getFile()));
final var noNamespaceInput =
Input.from(
new File(
classLoader
.getResource("no-namespace.xml")
.getFile()));
final Diff documentDiff =
DiffBuilder.compare(withNamespaceInput)
.withTest(noNamespaceInput)
.checkForSimilar()
.build();
assertThat(documentDiff.hasDifferences()).isFalse();
}
The differences come in the form Expected namespace uri 'null' but was 'https://www.bookmarks.dev/xml/bookmarks' - comparing <message...> at /message[1] to <ns3:message...> at /message[1] (DIFFERENT)...
Any ideas how can I configure the comparator to ignore the missing prefix in the second document?
My problem was that when generating the no-namespace document I had no default namespace defined in the root element. Adding it solves the problem and xmlunit recognizes them as similar:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<message xmlns="https://www.bookmarks.dev/xml/bookmarks">
<bookmarks>
<bookmark>
<name>Bookmarks and Snippets Manager</name>
<url>https://www.bookmarks.dev</url>
</bookmark>
</bookmarks>
</message>

Java XML Programming - Extracting the Child nodes

I have an xml file like below. I need to extract all the Child nodes under logdata and all the sub-Child nodes under each of the Child nodes along with their values. How can i extract these
<logdata>
<Request RequestID="123" RequestType = "Read">
<Data Mode = "Read">
<Type>ReadWrite</Type>
</Data>
<Textdetails Eligible = "true">
<Code>1</Code
<Name>ABC</Name>
</Textdetails>
</Request>
<Request RequestID="456" RequestType = "Read">
<Data Mode = "Read">
<Type>ReadWrite</Type>
</Data>
<Textdetails Eligible = "true">
<Code>2</Code>
<Name>DEF</Name>
</Textdetails>
</Request>
</logdata>
Using the XOM Library this would be rather simple. All you would need is to build the Document from a Builder. Then get the root element (logdata) using getRootElement(). After that you can use getChildElements() to get all the child elements from logdata and any other Element.

Writing an XML in Matlab: How to add reference to DTD?

I'm trying to write an XML file using Matlab and I need to specify a DOCTYPE DTD at the header, but I haven't found any method for this in the Matlab documentation or questions related. Every question involving a DTD reference is about how to read an XML into Matlab.
What I am able to do now is an XML file of the type
<?xml version="1.0"?>
<root>
<child>
Hello world!
</child>
</root>
with the code
docNode = com.mathworks.xml.XMLUtils.createDocument('root');
root = docNode.getDocumentElement;
child = docNode.createElement('child');
child.appendChild(docNode.createTextNode('Hello World!'));
root.appendChild(child);
xmlwrite(docNode)
However, I need the file to include a DTD reference:
<?xml version="1.0"?>
<!DOCTYPE root SYSTEM "root.dtd" []>
<root>
<child>
Hello world!
</child>
</root>
Is there any function in com.mathworks.xml.XMLUtils for this? Or will I have to open the generated XML and manually insert the DTD reference?
You can stay with using the org.w3c.dom package: you can use the createDocumentType method of DOMImplementation.
domImpl = docNode.getImplementation();
doctype = domImpl.createDocumentType('root', 'SYSTEM', 'root.dtd');
With this update the full sample code is:
docNode = com.mathworks.xml.XMLUtils.createDocument('root');
domImpl = docNode.getImplementation();
doctype = domImpl.createDocumentType('root', 'SYSTEM', 'root.dtd');
docNode.appendChild(doctype);
root = docNode.getDocumentElement;
child = docNode.createElement('child');
child.appendChild(docNode.createTextNode('Hello World!'));
root.appendChild(child);
xmlwrite(docNode)
Output
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE root PUBLIC "SYSTEM" "root.dtd">
<root>
<child>Hello World!</child>
</root>

Java appending an element to XML document

I am trying to append an element to my xml document so it looks like this:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<students>
</students>
However, it ends up looking like this:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<students/>
This is the code I am using:
// results is the new XML document I created using DocumentBuilder.newDocument();
Element root = results.createElement("students");
results.appendChild(root);
How come it isn't looking like how I want it to?
Java dom is implemented based on the xml specification, and by definition: An element with no content is said to be empty : https://www.w3.org/TR/REC-xml/#sec-starttags.

Categories

Resources