Append an element in a XML using DOM keeping the format

Append an element in a XML using DOM keeping the format - java

i have a xml like this
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<Empleado>
<ConsultorTecnico>
<Nombre>Pablo</Nombre>
<Legajo antiguedad="4 meses">7778</Legajo>
</ConsultorTecnico>
<CNC>
<Nombre>Brian</Nombre>
<Legajo antiguedad="1 año, 7 meses">2134</Legajo>
<Sueldo>4268.0</Sueldo>
</CNC>
</Empleado>
What i want is to read a XML and append "Sueldo" at the same level than "Nombre" and "Legajo" in the element "CNC". "Sueldo" must be "Legajo" x 2
The code I have appends "Sueldo" as you can see in the XML above but it does not indent it as it should, Im using the propierties to indent (This XML is created the same way, using DOM)
public class Main
{
public static void main(String[] args)
{
try
{
File xml = new File("C:\\Empleado.xml");
if (xml.exists() == true)
{
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(xml);
String legajo = doc.getElementsByTagName("Legajo").item(1).getFirstChild().getNodeValue();
Element sueldo = doc.createElement("Sueldo");
Node valorSueldo = doc.createTextNode(String.valueOf(Float.valueOf(legajo)*2));
sueldo.appendChild(valorSueldo);
Node cnc = doc.getElementsByTagName("CNC").item(0);
cnc.appendChild(sueldo);
DOMSource source = new DOMSource(doc);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
t.setOutputProperty(OutputKeys.INDENT, "yes");
t.setOutputProperty("{http://xml.apache.org/xslt}indent-amount","2");
FileOutputStream fos = new FileOutputStream("C:\\Empleado.xml");
StreamResult sr = new StreamResult(fos);
t.transform(source,sr);
}
else
throw new Exception("No hay archivo XML con ese nombre en el directorio");
}
catch (Exception e)
{
System.out.println(e.getMessage());
}
}
}
Thank you in advance guys, I'll appreciate the help here!

Assuming your input file is the same as the output you've shown but without the Sueldo element, then the initial CNC element has five child nodes as far as the DOM is concerned
The whitespace text node (newline and four spaces) between <CNC> and <Nombre>
The Nombre element node
The whitespace text node (newline and four spaces) between </Nombre> and <Legajo
The Legajo element node
The whitespace text node (newline and two spaces) between </Legajo> and </CNC>
You are inserting the Sueldo element after this final text node, which produces
<CNC>
<Nombre>Brian</Nombre>
<Legajo antiguedad="1 año, 7 meses">2134</Legajo>
<Sueldo>4268.0</Sueldo></CNC>
and the INDENT output property simply moves the closing </CNC> tag to the next line, aligned with the opening one. To get the auto indentation to do the right thing you would need to remove all the whitespace-only text nodes from the initial tree.
Alternatively, forget the auto-indentation and do it yourself - instead of adding Sueldo as the last child of CNC (after that final text node), instead add a newline-and-four-spaces text node immediately after the Legajo (i.e. before the last text node) and then add the Sueldo element after that.
As an alternative approach entirely, I would consider doing the transformation in XSLT rather than using the DOM APIs
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<!-- ignore whitespace-only text nodes in the input -->
<xsl:strip-space elements="*"/>
<!-- and re-indent the output -->
<xsl:output method="xml" indent="yes" />
<!-- Copy everything verbatim except where otherwise specified -->
<xsl:template match="#*|node()">
<xsl:copy><xsl:apply-templates select="#*|node()" /></xsl:copy>
</xsl:template>
<!-- For CNC elements, add a Sueldo as the last child -->
<xsl:template match="CNC">
<xsl:copy>
<xsl:apply-templates select="#*|node()" />
<Sueldo><xsl:value-of select="Legajo * 2" /></Sueldo>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
which you could either run using the TransformerFactory API from Java code or using a standalone command-line XSLT processor.

XML does not intrinsically define any indentation or pretty-form. If you want it to be "indented", you need to insert content with newlines and spaces. In this case, you need content immediately after element Legajo and before element Sueldo.
To my taste, the best strategy is to ignore all formatting from XML files and use generalized prettyfiers immediately before human consumption. Or, better, give them good XML editors. If you have every program that manipulates XML files concerned about this detail, most of the benefits of XML are gone (and a lot of effort misused).
UPDATE: Just noticed that you are using element CNC to "position" the insert, not Legajo. The space-and-newlines content needs to go immediately before element CNC (and after element Sueldo).

Related

How to parse large xml document with DOM?

I want to parse a xml element that has the following incidents:
and no xml declaration
can serve the elements in no particular order
<employees>
<employee>
<details>
<name>Joe</name>
<age>34</age>
</details>
<address>
<street>test</street>
<nr>12</nr>
</address>
</employee>
<employee>
<address>....</address>
<details>
<!-- note the changed order of elements! -->
<age>24</age>
<name>Sam</name>
</details>
</employee>
</employees>
Output should be a csv:
name;age;street;nr
Joe,34,test,12
Sam,24,...
Problem: when using event-driven parsers like stax/sax, I would have to create a temporary Employee bean whose properties I set on each event node, and lateron convert the bean to csv.
But as my xml file is several GB in size, I'd like to prevent having to create additional bean objects for each entry.
Thus I probably have to use plain old DOM parsing? Correct my if I'm wrong, I'm happy for any suggestions.
I tried as follows. Problem is that doc.getElementsByTagName("employees") returns an empty nodelist, while I'd expect one xml element. Why?
StringBuilder sb = new StringBuilder();
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new InputSource(new StringReader(xml)));
doc.getDocumentElement().normalize();
NodeList employees = doc.getElementsByTagName("employees");
for (int i = 0; i < employees.getLength(); i++) {
Node employee = employees.item(i);
if (employees.getNodeType() == Node.ELEMENT_NODE) {
NodeList employee = ((Element) employees).getElementsByTagName("employee");
for (int j = 0; j < employee.getLength(); j++) {
NodeList details = ((Element) employee).getElementsByTagName("details");
//the rest is pseudocode
for (details)
sb.append(getElements("name").item(0) + ",");
sb.append(getElements("age").item(0) + ",");
for (address)
sb.append(getElements("street").item(0) + ",");
sb.append(getElements("nr").item(0) + ",");
}
}
}

A DOM solution is going to use a lot of memory, a SAX/Stax solution is going to involve writing and debugging a lot of code. The ideal tool for this job is an XSLT 3.0 streamable transformation:
<xsl:transform version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:mode streamable="yes" on-no-match="shallow-skip"/>
<xsl:template match="employee">
<xsl:value-of select="copy-of(.)!(.//name, .//age, .//street, .//nr)"
separator=","/>
<xsl:text>
</xsl:text>
</xsl:template>
</xsl:transform>
NOTE
I originally wrote the select expression as copy-of(.)//(name, age, street, nr). This is incorrect, because the // operator sorts the results into document order, which we don't want. The use of ! and , carefully avoids the sorting.

Do not use a StringBuilder but write immediately to the file (Files.newBufferedWriter).
It is not a big deal to manually parse the XML as there does not seem to be a high level of complexity, neither need of XML based validation.
DOM parsing would build a document object model, just what you would not want.
Stax needs to build a full employee if sub-elements are unordered.
So doing reading an employee yourself would not be that different.
Also the XML seems not to originate from XML writing, and might need to patch XML invalid text, like & that should be & in XML.
If the XML is valid (you could have a Reader that adds <?xml ...> in front), scanning through the XML would be:
XMLInputFactory f = XMLInputFactory.newInstance();
XMLStreamReader r = f.createXMLStreamReader( ... );
while(r.hasNext()) {
r.next();
}
That easily allows maintaing a Map for employee attributes, started with <employee> and ending, being validated and written at </employee>.

Change xml namespace url in Java

I have a java REST API and we recently changed domain. The api is versioned although up to now this has involved adding removing elements across the versions.
I would like to change the namespaces if someone goes back to previous versions but I am struggling. I have realised now, after some hacking about, that it is probably because I am changing the namespace of the xml that is actually being referenced. I was thinking of it as a text document but I guess the tool is not ?
So looking at this xml with the n#namespace url veg.com ->
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<ns2:apple xmlns:ns2="http://veg.com/app/api/apple" xmlns:ns1="http://veg.com/app/api" xmlns:ns3="http://veg.com/app/api/apple/red"
xmlns:ns4="http://veg.com/app/banana" xmlns:ns5="http://veg.com/app/api/pear" xmlns:ns6="http://veg.com/app/api/orange"
ns1:created="2016-05-23T16:47:55+01:00" ns1:href="http://falseserver:8080/app/api/apple/1" ns1:id="1">
<ns2:name>granny smith</ns2:title>
<ns2:flavour>sweet</ns2:status>
<ns2:origin>southwest region</ns2:grantCategory>
...
</ns2:apple>
I would like to change the namespaces to fruit.com. This is a very hacky unit test which shows the broad approach that I have been trying...
#Test
public void testNamespaceChange() throws Exception {
Document appleDoc = load("apple.xml");
XPath xpath = XPathFactory.newInstance().newXPath();
org.w3c.dom.Node node = (org.w3c.dom.Node) xpath.evaluate("//*[local-name()='apple']", appleDoc , XPathConstants.NODE);
NamedNodeMap nodeMap = node.getAttributes();
for (int i = 0; i < nodeMap.getLength(); i++) {
if (nodeMap.item(i).getNodeName().startsWith("xmlns:ns")) {
nodeMap.item(i).setTextContent( nodeMap.item(i).getNodeValue().replace( "veg.com", "fruit.com"));
}
}
//Check values have been set
for (int i = 0; i < nodeMap.getLength(); i++) {
System.out.println(nodeMap.item(i).getNodeName());
System.out.println(nodeMap.item(i).getNodeValue());
System.out.println("----------------");
}
StringWriter writer = new StringWriter();
StreamResult result = new StreamResult(writer);
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.transform(new DOMSource(node), result);
System.out.println("XML IN String format is: \n" +
writer.toString());
}
So the result of this is that the loop of nodeMap items shows the updates taking hold
i.e. all updated along these lines
xmlns:ns1
http://fruit.com/app/api
-------------------------------------------
xmlns:ns2
http://fruit.com/app/api/apple
-------------------------------------------
xmlns:ns3
http://fruit.com/app/api/apple/red
-------------------------------------------
...
but when I print out the transfomed document I get what I see in the api response...
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<ns2:apple xmlns:ns2="http://veg.com/app/api/apple" xmlns:ns1="http://veg.com/app/api" xmlns:ns3="http://fruit.com/app/api/apple/red"
xmlns:ns4="http://fruit.com/app/banana" xmlns:ns5="http://fruit.com/app/api/pear" xmlns:ns6="http://fruit.com/app/api/orange"
ns1:created="2016-05-23T16:47:55+01:00" ns1:href="http://falseserver:8080/app/api/apple/1" ns1:id="1">
The sibling (and further down the hierarchy) namespaces have been changed but ns1 and ns2 have remained unchanged.
Can anyone tell me why and whether there is a simple way for me to update them ? I guess the next step for me might be to stream the xml doc into a string, update them as text and then reload it as an xml document but I'm hoping I'm being defeatist and there is a more elegant solution ?

I would solve it with an XSLT like this:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="*[namespace-uri()='http://veg.com/app/api/apple']" priority="1">
<xsl:element name="local-name()" namespace="http://fruit.com/app/api/apple">
<xsl:apply-templates select="#*|node()"/>
</xsl:element>
</xsl:template>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
This stylesheet combines the identity transform with a template which changes namespace of elements in http://veg.com/app/api/apple to http://fruit.com/app/api/apple.
I think it is much simpler that Java code that you have. You'd be also more flexible, should you find out you have more differences between version of you XML apart just namespaces.
Please consider this to be a rough sketch. I wrote a book on XSLT some 15 years ago, but did not use XSLT for more than 6 or 7 years.

Doing DOM Node-to-String transformation, but with namespace issues

So we have an XML Document with custom namespaces. (The XML is generated by software we don't control. It's parsed by a namespace-unaware DOM parser; standard Java7SE/Xerces stuff, but also outside our effective control.) The input data looks like this:
<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<MainTag xmlns="http://BlahBlahBlah" xmlns:CustomAttr="http://BlitherBlither">
.... 18 blarzillion lines of XML ....
<Thing CustomAttr:gibberish="borkborkbork" ... />
.... another 27 blarzillion lines ....
</MainTag>
The Document we get is usable and xpath-queryable and traversable and so on.
Converting this Document into a text format for writing out to a data sink uses the standard Transformer approach described in a hundred SO "how do I change my XML Document into a Java string?" questions:
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
StringWriter stringwriter = new StringWriter();
transformer.transform (new DOMSource(theXMLDocument), new StreamResult(stringwriter));
return stringwriter.toString();
and it works perfectly.
But now I'd like to transform individual arbitrary Nodes from that Document into strings. A DOMSource constructor accepts Node pointers just the same as it accepts a Document (and in fact Document is just a subclass of Node, so it's the same API as far as I can tell). So passing in an individual Node in the place of "theXMLDocument" in the snippet above works great... until we get to the Thing.
At that point, transform() throws an exception:
java.lang.RuntimeException: Namespace for prefix 'CustomAttr' has not been declared.
at com.sun.org.apache.xml.internal.serializer.SerializerBase.getNamespaceURI(Unknown Source)
at com.sun.org.apache.xml.internal.serializer.SerializerBase.addAttribute(Unknown Source)
at com.sun.org.apache.xml.internal.serializer.ToUnknownStream.addAttribute(Unknown Source)
......
That makes sense. (The "com.sun.org.apache" is weird to read, but whatever.) It makes sense, because the namespace for the custom attribute was declared at the root node, but now the transformer is starting at a child node and can't see the declarations "above" it in the tree. So I think I understand the problem, or at least the symptom, but I'm not sure how to solve it though.
If this were a String-to-Document conversion, we'd be using a DocumentBuilderFactory instance and could call .setNamespaceAware(false), but this is going in the other direction.
None of the available properties for transformer.setOutputProperty() affect the namespaceURI lookup, which makes sense.
There is no such corresponding setInputProperty or similar function.
The input parser wasn't namespace aware, which is how the "upstream" code got as far as creating its Document to hand to us. I don't know how to hand that particular status flag on to the transforming code, which is what I really would like to do, I think.
I believe it's possible to (somehow) add a xmlns:CustomAttr="http://BlitherBlither" attribute to the Thing node, the same as the root MainTag had. But at that point the output is no longer identical XML to what was read in, even if it "means" the same thing, and the text strings are eventually going to be compared in the future. We wouldn't know if it were needed until the exception got thrown, then we could add it and try again... ick. For that matter, changing the Node would alter the original Document, and this really ought to be a read-only operation.
Advice? Is there some way of telling the Transformer, "look, don't stress your dimwitted little head over whether the output is legit XML in isolation, it's not going to be parsed back in on its own (but you don't know that), just produce the text and let us worry about its context"?

Given your quoted error message "Namespace for prefix 'CustomAttr' has not been declared.",
I'm assuming that your pseudo code is along the lines of:
<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<MainTag xmlns="http://BlahBlahBlah" xmlns:CustomAttr="http://BlitherBlither">
.... 18 blarzillion lines of XML ....
<Thing CustomAttr:attributeName="borkborkbork" ... />
.... another 27 blarzillion lines ....
</MainTag>
With that assumption, here's my suggestion:
So you want to extract the "Thing" node from the "big" XML. The standard approach is to use a little XSLT to do that. You prepare the XSL transformation with:
Transformer transformer = transformerFactory.newTransformer(new StreamSource(new File("isolate-the-thing-node.xslt")));
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
transformer.setParameter("elementName", stringWithCurrentThing); // parameterize transformation for each Thing
...
EDIT: #Ti, please note the parameterization instruction above (and below in the xslt).
The file 'isolate-the-thing-node.xslt' could be a flavour of the following:
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:custom0="http://BlahBlahBlah"
xmlns:custom1="http://BlitherBlither"
version="1.0">
<xsl:param name="elementName">to-be-parameterized</xsl:param>
<xsl:output encoding="utf-8" indent="yes" method="xml" omit-xml-declaration="no" />
<xsl:template match="/*" priority="2" >
<!--<xsl:apply-templates select="//custom0:Thing" />-->
<!-- changed to parameterized selection: -->
<xsl:apply-templates select="custom0:*[local-name()=$elementName]" />
</xsl:template>
<xsl:template match="node() | #*" priority="1">
<xsl:copy>
<xsl:apply-templates select="node() | #*" />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Hope that gets you over the "Thing" thing :)

I have managed to parse the provided document, get the Thing node and print it without issues.
Take a look at the Working Example:
Node rootElement = d.getDocumentElement();
System.out.println("Whole document: \n");
System.out.println(nodeToString(rootElement));
Node thing = rootElement.getChildNodes().item(1);
System.out.println("Just Thing: \n");
System.out.println(nodeToString(thing));
nodeToString:
private static String nodeToString(Node node) {
StringWriter sw = new StringWriter();
try {
Transformer t = TransformerFactory.newInstance().newTransformer();
t.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
t.setOutputProperty(OutputKeys.INDENT, "yes");
t.transform(new DOMSource(node), new StreamResult(sw));
} catch (TransformerException te) {
System.out.println("nodeToString Transformer Exception");
}
return sw.toString();
}
Output:
Whole document:
<?xml version="1.0" encoding="UTF-8"?><MainTag xmlns="http://BlahBlahBlah" xmlns:CustomAttr="http://BlitherBlither">
<Thing CustomAttr="borkborkbork"/>
</MainTag>
Just Thing:
<?xml version="1.0" encoding="UTF-8"?><Thing CustomAttr="borkborkbork"/>
When I try the same code with CustomAttr:attributeName as suggested by #marty it fails with the original exception, so it looks like somewhere in your original XML you are prefixing a attribute or node with that custom CustomAttr namespace.
In the latter case you can leverage the problem with setNamespaceAware(true), which will include the namespace information on the Thing node itself.
<?xml version="1.0" encoding="UTF-8"?><Thing xmlns:CustomAttr="http://BlitherBlither" CustomAttr:attributeName="borkborkbork" xmlns="http://BlahBlahBlah"/>

Using regexp in java to modify an xml

I'm trying to change an xml by using regular expressions in java, but I can't find the right way. I have an xml like this (simplified):
<ROOT>
<NODE ord="1" />
<NODE ord="3,2" />
</ROOT>
The xml actually shows a sentence with its nodes, chunks ... in two languages and has more attributes. Each sentence it's loaded in two RichTextAreas (one for the source sentence, and the other for the translated one).
What I need to do is add a style attribute to every node that has an specific value in its ord attribute (this style attribute will show correspondences between two languages, like Google Translate does when you mouse over a word). I know this could be done using DOM (getting all the NODE nodes and then seeing the ord attribute one by one), but I am looking for the fastest way to do the change as it is going to execute in the client side of my GWT app.
When that ord attribute has a single value (like in the first node) it is easy to do just taking the xml as a string and using the replaceAll() function . The problem is when the attribute has composed values (like in the second node).
For example, how could I do to add that attribute if the value I'm looking for is 2? I believe this could be done using regular expressions, but I can't find out how. Any hint or help would be appreciated (even if it doesn't use regexp and replaceAll function).
Thanks in advance.

XPath can do this for you. You could select:
/ROOT/NODE[contains(concat(',', #ord, ','), ',2,')]
Since you intend to use GWT on the client, you could give gwtxslt a try. With it you could specify an XSLT stylesheet to do the transformation (i.e. adding the attribute) for you:
XsltProcessor processor = new XsltProcessor();
processor.importStyleSheet(styleSheetText);
processor.importSource(sourceText);
processor.setParameter("ord", "2");
processor.setParameter("style", "whatever");
String resultString = processor.transform();
// do something with resultString
where styleSheetText could be an XSLT document along the lines of
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:param name="ord" select="''" />
<xsl:param name="style" select="''" />
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*" />
</xsl:copy>
</xsl:template>
<xsl:template match="NODE">
<xsl:copy>
<xsl:apply-templates select="#*" />
<xsl:if test="contains(concat(',', #ord, ','), concat(',', $ord, ','))">
<xsl:attribute name="style">
<xsl:value-of select="$style" />
</xsl:attribute>
</xsl:if>
<xsl:apply-templates select="node()" />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Note that I use concat() to prevent partial matches in the comma-separated list that the attribute value of #ord actually is.

String resultString = subjectString.replaceAll("<NODE ord=\"([^\"]*\\b2\\b[^\"]*)\" />", "<NODE ord=\"$1\" style=\"whatever\"/>");
will find any <NODE> tag that has a single ord attribute with a value of "2" (or "1,2" or "2,3" or "1,2,3" but not "12") and adds a style attribute.
This is quick and dirty, and rightfully advised against by many here, but for a one-off quick job it should be OK.
Explanation:
<NODE ord=" # Match <NODE ord:" verbatim
( # Match and capture...
[^"]* # any number of characters except "
\b2\b # "2" as a whole word (surrounded by non-alphanumerics)
[^"]* # any number of characters except "
) # End of capturing group
" /> # Match " /> verbatim

I'm trying to change an xml by using regular expressions in java, but I can't find the right way.
That's because there isn't a right way. Regular expressions are not the right way to manipulate XML. That's because XML is not a regular grammar (which is a technical term in computer science, not a generalized insult.)

It might sound like overkill, but I'd consider using the standard DOM parsers to read the fragment, modify it using setAttribute() calls, and then write it out again. I know you said that efficiency is important, but how long does this really take? Testing shows 60ms on my ageing 2GHz pentium.
This approach will be more robust against comments, things split across lines etc. It is also much more likely to give you well-formed XML. Also things like your requirement of only doing it if certain values are present will become trivial.
public class AddStyleExample {
public static void main(final String[] args) {
String input = "<ROOT> <NODE ord=\"1\" /> <NODE ord=\"3,2\" /> </ROOT>";
try {
final DocumentBuilderFactory factory = DocumentBuilderFactory
.newInstance();
factory.setValidating(false);
factory.setNamespaceAware(false);
DocumentBuilder builder;
builder = factory.newDocumentBuilder();
final Document doc = builder.parse(new InputSource(
new StringReader(input)));
NodeList tags = doc.getElementsByTagName("NODE");
for (int i = 0; i < tags.getLength(); i++) {
Element node = (Element) tags.item(i);
node.setAttribute("style", "example value");
}
StringWriter writer = new StringWriter();
final StreamResult result = new StreamResult(writer);
final Transformer t = TransformerFactory.newInstance()
.newTransformer();
t.setOutputProperty(OutputKeys.INDENT, "yes");
t.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
t.transform(new DOMSource(doc), result);
System.out.println(writer.toString());
} catch (ParserConfigurationException e) {
e.printStackTrace();
} catch (TransformerException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}

How to make javax Transformer output HTML (no self-closing tags)?

I'm using a javax.xml.transform.Transformer to convert an XML file into an HTML file. It can happen that a div will have no content, which causes the Transformer to output <div/>, which breaks rendering.
I've searched and found that "You can change the xslt output to html instead of xml to avoid the problem with self closing tags", but that was for a different tool and I'm left wondering: how do I do that with a javax Transformer?

It looks like you create the transformer as normal, and then use Transformer.setOutputProperty to set the METHOD property to "html"
For example:
private static final DocumentBuilderFactory sDocumentFactory;
private static DocumentBuilder sDocumentBuilder;
private static DOMImplementation sDomImplementation;
private static final TransformerFactory sTransformerFactory =
TransformerFactory.newInstance();
private static Transformer sTransformer;
static {
sDocumentFactory = DocumentBuilderFactory.newInstance();
sDocumentFactory.setNamespaceAware( true );
sDocumentFactory.setIgnoringComments( true );
sDocumentFactory.setIgnoringElementContentWhitespace( true );
try {
sDocumentBuilder = sDocumentFactory.newDocumentBuilder();
sDomImplementation = sDocumentBuilder.getDOMImplementation();
sTransformer = sTransformerFactory.newTransformer();
sTransformer.setOutputProperty( OMIT_XML_DECLARATION, "yes" );
sTransformer.setOutputProperty( INDENT, "no" );
sTransformer.setOutputProperty( METHOD, "html" );
} catch( final Exception ex ) {
ex.printStackTrace();
}
}

The way to output valid HTML with XSLT is to use the <xsl:output> instruction with its method attribute set to html.
Here is a small example:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<div>
<xsl:apply-templates select="x/y/z"/>
</div>
</xsl:template>
</xsl:stylesheet>
when this transformation is applied on the following XML document:
<t/>
the wanted result is produced (the same result is produced by 8 different XSLT processors I am working with):
<div></div>
In case the unwanted output happens only with a specific XSLT processor, then this is an implementation issue with this particular processor and more an "xsltprocessors" than "xslt" question.

This answer in another thread doesn't seem to work in my case; even if I specify <xsl:output method="html"...> it still produces <div/> instead of <div></div>.
I don't know if my IDE or compiler is broken (IBM Rational Application Developer), but I'm using a work-around of detecting blank nodes and inserting single spaces in them. Less clean, but effective...

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Append an element in a XML using DOM keeping the format - java

Related

How to parse large xml document with DOM?

Change xml namespace url in Java

Doing DOM Node-to-String transformation, but with namespace issues

Using regexp in java to modify an xml

How to make javax Transformer output HTML (no self-closing tags)?

Categories

Resources