Unable to do XSLT transformation on a latin2 xml document - java

I want to run a XSLT conversion on a xml document which is in latin-2 encoding
The xml doc looks like this:
<?xml version="1.0" encoding="ISO-8859-2"?>
<Cégjegyzék>
<Kiadmány>
<CégAdatlap>...</CégAdatlap>
<CégAdatlap>...</CégAdatlap>
<CégAdatlap>...</CégAdatlap>
</Kiadmány>
</Cégjegyzék>
My XSLT tries to iterate through those elements.
<results>
<xsl:for-each select="//Cégjegyzék/Kiadmány/CégAdatlap" >
...
</xsl:for-each>
</results>
However the output will always look like this
<?xml version="1.0" encoding="UTF-8"?><results/>
I am thinking that the XPath doesn't work due to encoding issues, that's why I tried to convert to utf8 like this
byte[] latin2 = xml.getBytes("ISO-8859-2");
byte[] utf8 = new String(latin2, "ISO-8859-2").getBytes("UTF-8");
String utf8String = new String(utf8);
utf8String = utf8String.replace("ISO-8859-2","UTF-8");
return utf8String;
Unfortunately this didn't help.
Anyone has a clue what the issue is and how I could fix it?
The code to initiate the tranformation:
TransformerFactory tf = TransformerFactory.newInstance();
File f = new File(searchProviderXslt.getXsltFilename());
StreamSource ss = new StreamSource(f);
Transformer transformer = tf.newTransformer(ss);
//String response = convert_response((String) resp, sc);
String response = (String) resp;
Source xml = new StreamSource((new StringReader(response)));
StreamResult xmlres = new StreamResult(new StringWriter());
transformer.transform(xml, xmlres);
String xmls = xmlres.getWriter().toString();

The most obvious explanation is that the encoding of the XSLT stylesheet is incorrect. It shouldn't matter what the actual encoding is, so long as it is correctly declared. If the actual encoding of the stylesheet is iso-8859-2, but the XML declaration is omitted or says utf-8, then those names are going to be misinterpreted and won't match the names in your source document.

Related

Java DOM Transformer - XML creation doesn't replace apostrophe and quotes in the final xml

I'm trying to create an XML and return it as a response to the caller based on the input.
The transformer works as expected for most parts, but it doesn't convert apostrophe and quotes to their XML equivalent. Below is the code I'm using
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
// root elements
Document doc = docBuilder.newDocument();
Element rootElement = doc.createElement("template");
doc.appendChild(rootElement);
/* Adding attendant ID */
Element line = doc.createElement("line");
line.appendChild(doc.createTextNode("----&----<------>------'-----\"--------"));
Attr Attr1 = doc.createAttribute("Attr1");
Attr1.setValue("attribute value 1");
line.setAttributeNode(Attr1);
Attr Attr2 = doc.createAttribute("Attr2");
Attr2.setValue("attribute value 2");
line.setAttributeNode(Attr2);
rootElement.appendChild(line);
// write the content into xml file
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
DOMSource source = new DOMSource(doc);
// Output to String
StringWriter writer = new StringWriter();
StreamResult result = new StreamResult(writer);
transformer.transform(source, result);
String strResult = writer.toString();
//return escapeXml(strResult);
System.out.println(strResult);
Resulting output
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<template>
<line Attr1="attribute value 1" Attr2="attribute value 2">----&----<------>------'-----"--------</line>
</template>
Expected Result
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<template>
<line Attr1="attribute value 1" Attr2="attribute value 2">----&----<------>------&apos;-----"--------</line>
</template>
Initially I thought could escape those character before sending it as input to transformer, but it replaced all the ampersand to their equivalent "&". If I replace the apostrophe or quotes after the final XML is created, it replaces attributes as well.
I'm thinking we could solve this in 2 ways
I could transform the & , < , > , ' , " before adding to node and transformer ignores it
Give explicit directions to transformer to convert ' , " them to their XML equivalent.
Currently I'm unaware of how to achieve these. Could someone help me on this or if a better solution to create a valid XML would hugely be appreciated.
Thanks.
Why do you want quotation marks and apostrophes to be escaped? XML doesn't require them to be escaped (except in attributes where they conflict with the attribute delimiters). The serializer knows what it's doing: trust it.

Parsing with XPath a xml document. Why add a <xml tag as a header> in the result?

I searched on google first and I found many result about how to parse with xpath a xml document. I have parse it but a want to convert a NODELIST in String and I have created a method for it:
private String processResult(Document responseDocument) throws XPathExpressionException, TransformerException {
NodeList soaphead = responseDocument.getElementsByTagName("xmlTagToTrasform");
StringWriter sw = new StringWriter();
Transformer serializer = TransformerFactory.newInstance().newTransformer();
serializer.transform(new DOMSource(soaphead.item(0)), new StreamResult(sw));
String result = sw.toString();
return result;
}
This method works perfectly but the transformer adds an <?xml version="1.0" encoding="UTF-8"?> in the header of the result, and I don't want that. This is the result of the method:
<?xml version="1.0" encoding="UTF-8"?>
<xmlTagToTrasform>
<xmlTagToTrasform2>
.
.
.
.
</xmlTagToTrasform2>
</xmlTagToTrasform>
You can configure the transformer not to output the XML declaration, before you call transform:
serializer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
XML is a markup language and every xml document has this line on the top to specify the version and the encoding-type. It is madatory to have this.

Fixing format of print statement

I recently just started working with document builder to build a title with my XML. This is the following code I am using right now:
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
InputStream inputStream = new FileInputStream(new File("c:\\staff.xml"));
org.w3c.dom.Document doc = documentBuilderFactory.newDocumentBuilder().parse(inputStream);
StringWriter stw = new StringWriter();
Transformer serializer = TransformerFactory.newInstance().newTransformer();
serializer.transform(new DOMSource(doc), new StreamResult(stw));
String xmldata = (stw.toString());
System.out.println(xmldata);
It builds the title fine, but it starts writing the XML file on the same line due to the parse. Can someone show me how I can alter this code to get <company> onto the second line?
Here is my print:
<?xml version="1.0" encoding="UTF-8" standalone="no"?><company>
<comp id="512">
<firstname>Brandon</firstname>
<lastname>Liens</lastname>
<empid>612</empid>
<rqid>51265</rqid>
</comp>
</company>
Staff.XML file:
<company>
<comp id="512">
<firstname>Brandon</firstname>
<lastname>Nyberg</lastname>
<empid>612</empid>
<rqid>51265</rqid>
</comp>
</company>
You can't add a newline after the XML declaration automatically, but if you can deal with removing the XML declaration entirely, you can add this line just after your Transformer serializer = ... line:
serializer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");

How do I add a namespace prefix to an XML DOM object?

I am trying to build an XML document using a specific namespace. The final document I am trying to generate is supposed to look like this:
<m:documentObject xmlns:m="http://www.myschema.com">
<sender>token</sender>
<receiver>token</receiver>
<payload>token</payload>
</m:documentObject>
Here is what i have so far.
Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
Element requestElement = document.createElementNS("http://www.myschema.com", "documentObject");
document.appendChild(requestElement);
Element sender = document.createElement("sender");
requestElement.appendChild(sender);
Text senderText = document.createTextNode("Xmlsender");
sender.appendChild(senderText);
Element receiver = document.createElement("receiver");
requestElement.appendChild(receiver);
Text receiverText = document.createTextNode("Xmlreceiver");
receiver.appendChild(receiverText);
Element payload = document.createElement("payload");
requestElement.appendChild(payload);
Text payloadText = document.createTextNode("Xmlpayload");
payload.appendChild(payloadText);
StringWriter sw = new StringWriter();
StreamResult result = new StreamResult(sw);
DOMSource source = new DOMSource(requestElement);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.ENCODING, "utf-8");
transformer.transform(source, result);
String xmlString = sw.toString();
System.out.println(xmlString)
For some reason when I run the above the schema comes out without the prefix. As shown below:
<?xml version="1.0" encoding="utf-8"?>
<documentObject xmlns="http://www.myschema.com">
<sender>Xmlsender</sender>
<receiver>Xmlreceiver</receiver>
<payload>Xmlpayload</payload>
</documentObject>
What do I need to do so that XML is exactly as shown in the first XML example with the namespace prefix and the tags to have the namespace prefix?
I am trying to create an XML string which will be used for a Spring-WS webservice which expects a JAXB object which is in the format shown in the first example.
You can use setPrefix.
But it is better to create the root element like this:
document.createElementNS("http://www.myschema.com", "m:documentObject");
Note also that passing null to createElement is a supported way of forcing a null namespace. In your original example this would however not work because your document element effectively forces a default namespace by combining a namespace URI with no prefix.

java DOM xml file create - Have no tabs or whitespaces in output file

I already looked through the postings on stackoverflow but it seems that nothing helps.
Here is what have:
// write the content into xml file
TransformerFactory transformerFactory = TransformerFactory.newInstance();
transformerFactory.setAttribute("indent-number", 2);
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
DOMSource source = new DOMSource(xmlDoc);
StreamResult result = new StreamResult(new File("C:\\testing.xml"));
transformer.transform(source, result);
and this is what I get as output:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<Satellite SatelliteName="" XmlFileVersion="">
<test0>
<test1>
<test2>
<test3>
<test4>
<test5>
<test6>
<test7>
<test8>
<test9/>
</test8>
</test7>
</test6>
</test5>
</test4>
</test3>
</test2>
</test1>
</test0>
</Satellite>
No tabs or no spaces.
I set the indent-number because of a possible bug of java and I activated OutputKeys.INDENT.
Any other ideas?
Edit 1 (after adarshr's fix):
I now have white spaces. Only the first Satellite Entry is placed in the first line which shouldn't be.
<?xml version="1.0" encoding="UTF-8"?><Satellite SatelliteName="" XmlFileVersion="">
<test0>
<test1>
<test2>
<test3>
<test4>
<test5>
<test6>
<test7>
<test8>
<test9>blah</test9>
</test8>
</test7>
</test6>
</test5>
</test4>
</test3>
</test2>
</test1>
</test0>
<sdjklhewlkr/>
</Satellite>
Edit 2:
So the current state is that I now have whitespaces but I have no line feed after the XML declaration. How can I fix this?
try setting the indent amount like this:
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
I've played with Transformer, but never got it to work. I used the Xerces (Apache) library, which has always worked like a charm for me. Try something like
OutputFormat format = new OutputFormat(document);
format.setLineWidth(65);
format.setIndenting(true);
format.setIndent(2);
Writer outxml = new FileWriter(new File("out.xml"));
XMLSerializer serializer = new XMLSerializer(outxml, format);
serializer.serialize(document);
I had faced the same problem sometime back. The issue was that the implementation of the TransformerFactory or Transformer classes loaded was different from what Java intends it to be.
There was also a System property that we had to set in order to solve it. I will try and get that for you in a moment.
EDIT: Try this
System.setProperty("javax.xml.transform.TransformerFactory", "org.apache.xalan.xsltc.trax.TransformerFactoryImpl");
I can give you 2 advice
1st
You can use xsl file for pretty output
2nd
I found interesting library ode-utils-XXX.jar
And you can just write like
String result = "";
try {
result = DOMUtils.prettyPrint(doc);
} catch (IOException e) {
e.printStackTrace();
}
System.out.println(result);

Categories

Resources