TransformerHandler to output CDATA sections

TransformerHandler to output CDATA sections - java

I am trying to output CDATA section using the below code. While other declarations are being honoured the CDATA sections still comes out as plain text without its enclosing tags (CDATA). What am i doing wrong?
private TransformerHandler getHandler(StringWriter sw) {
SAXTransformerFactory stf = (SAXTransformerFactory)SAXTransformerFactory.newInstance();
TransformerHandler th = null;
th = stf.newTransformerHandler();
th.getTransformer().setOutputProperty(OutputKeys.CDATA_SECTION_ELEMENTS, "{ns1}elem");
th.getTransformer().setOutputProperty(OutputKeys.INDENT, "yes");
th.getTransformer().setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
th.setResult(new StreamResult(sw));
}

Try re-reading the JavaDoc section for OutputKeys.CDATA_SECTION_ELEMENTS: http://docs.oracle.com/javase/6/docs/api/javax/xml/transform/OutputKeys.html#CDATA_SECTION_ELEMENTS
... as well as the referenced explanation of how to specify a literal QName http://docs.oracle.com/javase/6/docs/api/javax/xml/transform/package-summary.html#qname-delimiter
The parameter value that you specify "{ns1}elem", does NOT look like it includes a namespace URI to me, but rather looks like a namespace prefix (ns1). Find out what the "xmlns:ns1" declaration is, and include the namespace URI in the literal QName.
Example (assuming the namespace declaration for the ns1 prefix looks like xmlns:ns1="http://softee.org" you should specify;
setOutputProperty(OutputKeys.CDATA_SECTION_ELEMENTS, "{http://softee.org}elem");

Related

XML with different namespaces drilling down to needed value

I am trying to figure out how to go about getting the value of jxdm:ID from the following XML file:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<My:Message
xmlns:Abcd="http://...."
xmlns:box-1="http://...."
xmlns:bulb="http://...."
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xsi:schemaLocation="http://....stores.xsd">
<Abcd:StoreDataSection>
<Abcd:DataSection>
<Abcd:FirstStore>
<box-1:Response>
<box-1:DataSection>
<box-1:Release>
<box-1:Activity>
<bulb:Date>2017-04-29</bulb:Date>
<bulb:Store xsi:type="TPIR:Organization">
<bulb:StoreID>
<bulb:ID>D79G2102</bulb:ID>
</bulb:StoreID>
</bulb:Store>
</box-1:Activity>
</box-1:Release>
</box-1:DataSection>
</box-1:Response>
</Abcd:FirstStore>
</Abcd:DataSection>
</Abcd:StoreDataSection>
</ My:Message>
I keep getting "null" as the value of node
Node node = (Node) xPath.evaluate(expression, document, XPathConstants.NODE);
This is my current Java code:
try {
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document document = builder.parse(new File("c:/temp/testingNamespace.xml"));
XPath xPath = XPathFactory.newInstance().newXPath();
String expression = "//My/Message//Abcd/StoreDataSection/DataSection/FirstStore//box-1/Response/DataSection/Release/Activity//bulb/Store/StoreID/ID";
Node node = (Node) xPath.evaluate(expression, document, XPathConstants.NODE);
node.setTextContent("changed ID");
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.transform(new DOMSource(document), new StreamResult(new File("C:/temp/test-updated.xml")));
} catch (Exception e) {
System.out.println(e.getMessage());
}
How would the correct XPath be formatted in order for me to get that value and change it?
Update 1
So something like this?
String expression = "/My:Message/Abcd:StoreDataSection/Abcd:DataSection/Abcd:FirstStore/box-1:Response/box-1:DataSection/box-1:Release/box-1:Activity/bulb:Store/bulb:StoreID/bulb:ID";

The problem is that you should access to Node by prefix (if you want to) but in a different way, like: //bulb:StoreID if you want to access StorID for example.
Then again it would still not work because you need to tell XPath how to resolve namspaces prefixes.
You should check this answer : How to query XML using namespaces in Java with XPath?
for details on how to implement and use a NamespaceContext.
The bottom line is that you need to implement a javax.xml.namespace.NamespaceContext and set it to the XPath.
XPath xpath = XPathFactory.newInstance().newXPath();
NamespaceContext context = new MyNamespaceContext();
xpath.setNamespaceContext(context);

Two things wrong here:
Your XML is not namespace-well-formed; it does not declare the used namespace prefixes.
Once namespace prefixes are properly declared in the XML and in your Java code, you use them in XPath via : not via /. So, it'd be not /Abcd/StoreDataSection but rather /Abcd:StoreDataSection (and so on for the rest of the steps in your XPath).
See also How does XPath deal with XML namespaces?
I am unable to change anything in the XML so I have to go with it as-is sadly.
Technically you might be able to use some XML tools with undeclared namespaces because this omission only renders the XML only namespace-not-well-formed. Many tools expect not only well-formed but also namespace-well-formed XML. (See Namespace-Well-Formed
for the difference)
Otherwise, see How to parse invalid (bad / not well-formed) XML? to repair your XML.

Handling change in newlines by XML transformation for CDATA from Java 8 to Java 11

With Java 9 there was a change in the way javax.xml.transform.Transformer with OutputKeys.INDENT handles CDATA tags. In short, in Java 8 a tag named 'test' containing some character data would result in:
<test><![CDATA[data]]></test>
But with Java 9 the same results in
<test>
<![CDATA[data]]>
</test>
Which is not the same XML.
I understood (from a source no longer available) that for Java 9 there was a workaround using a DocumentBuilderFactory with setIgnoringElementContentWhitespace=true but this no longer works for Java 11.
Does anyone know a way to deal with this in Java 11? I'm either looking for a way to prevent the extra newlines (but still be able to format my XML), or be able to ignore them when parsing the XML (preferably using SAX).
Unfortunately I don't know what the CDATA tag will actually contain in my application. It might begin or end with white space or newlines so I can't just strip them when reading the XML or actually setting the value in the resulting object.
Sample program to demonstrate the issue:
public static void main(String[] args) throws TransformerException, ParserConfigurationException, IOException, SAXException
{
String data = "data";
StreamSource source = new StreamSource(new StringReader("<foo><bar><![CDATA[" + data + "]]></bar></foo>"));
StreamResult result = new StreamResult(new StringWriter());
Transformer tform = TransformerFactory.newInstance().newTransformer();
tform.setOutputProperty(OutputKeys.INDENT, "yes");
tform.transform(source, result);
String xml = result.getWriter().toString();
System.out.println(xml); // I expect bar and CDATA to be on same line. This is true for Java 8, false for Java 11
Document document = DocumentBuilderFactory.newInstance()
.newDocumentBuilder()
.parse(new InputSource(new StringReader(xml)));
String resultData = document.getElementsByTagName("bar")
.item(0)
.getTextContent();
System.out.println(data.equals(resultData)); // True for Java 8, false for Java 11
}
EDIT: For future reference, I've submitted a bug report to Oracle, and this is fixed in Java 14: https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8223291

As your code relies on unspecified behavior, extra explicit code seems better:
You want indentation like:
tform.setOutputProperty(OutputKeys.INDENT, "yes");
tform.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
However not for elements containing a CDATA.
String xml = result.getWriter().toString();
// No indentation (whitespace) for elements with a CDATA section.
xml = xml.replaceAll(">\\s*(<\\!\\[CDATA\\[.*?]]>)\\s*</", ">$1</");
The regex uses:
(?s) DOT_ALL to have . match any character, also newline characters.
.*? the shortest matching sequence, to not match "...]]>...]]>".
Alternatively: In a DOM tree (preserving CDATA) you can retrieve all CDATA sections per XPath, and remove whitespace siblings using the parent element.

Creating namespace prefixed XML nodes in Java DOM

I am creating several XML files via Java and up to this point everything worked fine, but now I've run into a problem when trying to create a file with namespace prefixed nodes, i.e, stuff like <tns:node> ... </tns:node> using a refactored version of my code that's already working for normal xml files without namespaces.
The error getting thrown is:
org.w3c.dom.DOMException: INVALID_CHARACTER_ERR: Ungültiges XML-Zeichen angegeben.
Sorry for the German in there, it says "invalid XML-sign specified".
The codeline where the error occurs:
Element mainRootElement = doc.createElement("tns:cmds xmlns:tns=\"http://abc.de/x/y/z\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xsi:schemaLocation=\"http://abc.de/x/y/z xyzschema.xsd\"");
To eliminate the possibility of the error resulting in escaping that rather long string or something among those lines I also tried just using Element mainRootElement = doc.createElement("tns:cmds");, however, this results in the same error.
That's why I figure it has something to do with the namespace declaration, i.e., the : used to do it, as that's the only "invalid" character I could think of in that string.
Can anyone confirm this is the source of the problem? If so, is there an easy solution to it? Can Java DOM use namespaced tags at all?
Edit: Whole method for reference
private void generateScriptXML()
{
DocumentBuilderFactory icFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder icBuilder;
try
{
icBuilder = icFactory.newDocumentBuilder();
Document doc = icBuilder.newDocument();
Element mainRootElement = doc.createElement("tns:cmds xmlns:tns=\"http://abc.de/x/y/z\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xsi:schemaLocation=\"http://abc.de/x/y/z xyzschema.xsd\"");
doc.appendChild(mainRootElement);
mainRootElement.appendChild(getAttributes(doc,"xxx", "yyy", "zzz"));
mainRootElement.appendChild(getAttributes(doc,"aaa", "bbb", "ccc"));
mainRootElement.appendChild(getAttributes(doc,"ddd", "eee", "fff"));
...
...
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
DOMSource source = new DOMSource(doc);
StreamResult streamResult = new StreamResult(new File(vfsPath));
transformer.transform(source, streamResult);
}
catch (Exception e)
{
e.printStackTrace();
}
}

Wrong method, try the *NS variants:
Element mainRootElement = doc.createElementNS(
"http://abc.de/x/y/z", // namespace
"tns:cmds" // node name including prefix
);
First argument is the namespace, second the node name including the prefix/alias. Namespace definitions will be added automatically for the namespace if needed. It works to set them as attributes, too.
The namespace in your original source is http://abc.de/x/y/z. With the attribute xmlns:tns="http://abc.de/x/y/z" the alias/prefix tns is defined for the namespace. The DOM api will implicitly add namespaces for nodes created with the *NS methods.
xmlns and xml are reserved/default namespace prefixes for specific namespaces. The namespace for xmlns (namespace definitions) is http://www.w3.org/2000/xmlns/.
To add an xmlns:* attribute with setAttributeNS() use the xmlns namespace:
mainRootElement.setAttributeNS(
"http://www.w3.org/2000/xmlns/", // namespace
"xmlns:xsi", // node name including prefix
"http://www.w3.org/2001/XMLSchema-instance" // value
);
But even that is not needed. Just like for elements, the namespace definition will be added implicitly if you add an attribute node using it.
mainRootElement.setAttributeNS(
"http://www.w3.org/2001/XMLSchema-instance", // namespace
"xsi:schemaLocation", // node name including prefix
"http://abc.de/x/y/z xyzschema.xsd" // value
);
Namespaces Prefixes
If you see a nodename like xsi:schemaLocation you can resolve by looking for the xmlns:xsi attribute. This attribute is the namepace definition. The value is the actual namespace. So if you have an attribute xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" the node name can be resolved to {http://www.w3.org/2001/XMLSchema-instance}schemaLocation (Clark notation).
If you want to create the node you need 3 values:
the namespace: http://www.w3.org/2001/XMLSchema-instance
the local node name: schemaLocation
the prefix: xsi
The prefix is optional for element nodes, but mandatory for attribute nodes. The following three XMLs resolve all to the element node name {http://abc.de/x/y/z}cmds:
<tns:cmds xmlns:tns="http://abc.de/x/y/z"/>
<cmds xmlns="http://abc.de/x/y/z"/>
<other:cmds xmlns:other="http://abc.de/x/y/z"/>

Since moving to java 1.7 Xml Document Element does not indent

I'm trying to indent XML which generated by Transformer.
All the DOM Node are being Indent as expected except for the First Node - The Document Element.
document element does not start in a new line , just concat right after the XML Declaration.
This bug arise when I moved to java 1.7 , when using java 1.6 or 1.5 it does not happen.
My code :
ByteArrayOutputStream s = new OutputStreamWriter(out, "utf-8");
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount","4");
transformer.transform(new DOMSource(doc), new StreamResult(s));
The output:
<?xml version="1.0" encoding="UTF-8"?><a>
<b>bbbbb</b>
</a>
Anyone knows why ?
btw,
when I add the property
transformer.setOutputProperty(OutputKeys.STANDALONE, "yes");
It work as expected , but the xml declaration is changed,
it now have the standalone property as well, and i don't want to change the xml declaration..

Ok ,
I found in Java API this :
If the doctype-system property is specified, the xml output method should output a document type declaration immediately before the first element.
SO I used this property
transformer.setOutputProperty(OutputKeys.DOCTYPE_PUBLIC, "yes");
and it solve my problem with out changed my xml declartion.
Thanks.

Xalan has at some point changed the behavior regarding the newline character after the XML declaration.
OpenJDK (and thus Oracle JDK, too) has implemented a workaround for this problem. This workaround can be enabled by setting a special property on the Transformer object:
try {
transformer.setOutputProperty("http://www.oracle.com/xml/is-standalone", "yes");
} catch (IllegalArgumentException e) {
// Might be thrown by JDK versions not implementing the workaround.
}
This way, the old behavior (printing a newline character after the XML declaration) is restored without adding the standalone attribute to the XML declaration.

For me writing the XML declaration to the Writer or OutputStream before writing the XML and telling the transformer to omit declaration was the only thing that worked. The only other solution to preserve the spacing appears to be VTD-XML library.
StringBuilder sb = new StringBuilder();
sb.append("<?xml version=\"").append(doc.getXmlVersion()).append("\"");
sb.append(" encoding=\"").append(doc.getXmlEncoding()).append("\"");
if (doc.getXmlStandalone()) {
sb.append(" standalone=\"").append("" + doc.getXmlStandalone()).append("\"");
}
sb.append("?>").append("\n");
writer.write(sb.toString());
TransformerFactory tf = TransformerFactory.newInstance();
try {
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
}
catch (Exception e) {
//snipped out for brevity
}

This seems to be a problem (bug) of the XML implementaion in Java. The only way to get a linebreak after the XML declaration is to explicitly specify the standalone attribute. You may set it to no, which is the implicit default, even if it is completely irrelevant when not using a DTD.

Control order of XML attributes in outputed file in Java

How do I control the order that the XML attributes are listed within the output file?
It seems by default they are getting alphabetized, which the program I'm sending this XML to apparently isn't handling.
e.g. I want zzzz to show first, then bbbbb in the following code.
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
Document doc = docBuilder.newDocument();
Element root = doc.createElement("requests");
doc.appendChild(root);
root.appendChild(request);
root.setAttribute("zzzzzz", "My z value");
root.setAttribute("bbbbbbb", "My b value");
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
DOMSource source = new DOMSource(doc);
StreamResult result = new StreamResult(new File(file));
transformer.transform(source, result);

The order of attributes is defined to be insignificant in XML: no conformant XML application should produce results that depend on the order in which attributes appear. Therefore, serializers (code that produces lexical XML as output) will usually give you no control over the order.
Now, it would sometimes be nice to have that control for cosmetic reasons, because XML is designed to be human-readable. So there's a valid reason for wanting the feature. But the fact is, I know of no serializer that offers it.

I had the same issue when I used XML DOM API for writing file. To resolve the problem I had to use XMLStreamWriter. Attributes appear in a xml file in the order you write it using XMLStreamWriter.

XML Canonicalisation results in a consistent attribute ordering, primarily to allow one to check a signature over some or all of the XML, though there are other potential uses. This may suit your purposes.

If you don't want to use another framework just for a custom attribute order you can simply add an order identifier to the attributes.
<someElement a__price="32" b__amount="3"/>
After the XML serializer is done post process the raw XML like so:
public static String removeAttributeOrderIdentifiers(String xml) {
return xml.replaceAll(
" [a-z]__(.+?=\")",
"$1"
);
}
And you will get:
<someElement amount="3" price="32"/>

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

TransformerHandler to output CDATA sections - java

Related

XML with different namespaces drilling down to needed value

Handling change in newlines by XML transformation for CDATA from Java 8 to Java 11

Creating namespace prefixed XML nodes in Java DOM

Since moving to java 1.7 Xml Document Element does not indent

Control order of XML attributes in outputed file in Java

Categories

Resources