How to handle (R) symbol during XML XSLT transformation - java

I have an UTF-8 XML (passed as a string) which contains the following line:
<LongName>SomeName®</LongName>.
And it should be transformed into another UTF-8 XML after XSLT transformation. The problem is only with ® symbol, it's transformed into two symbols: ®
Here's the code:
public String transform (String inputXML) throws TransformerException {
TransformerFactory factory = TransformerFactory.newInstance();
OutputStream os = new ByteArrayOutputStream();
InputStream transformationFile = getClass().getResourceAsStream(TRANSFORMER_PATH);
Transformer transformer = factory.newTransformer(new StreamSource(transformationFile));
InputStream is = new ByteArrayInputStream(inputXML.getBytes(Charset.forName("UTF-8")));
Source input = new StreamSource(is);
transformer.transform(input, new StreamResult(os));
return os.toString();
}
So the question is - how to correctly transform ® to ® from UTF-8 to UTF-8 XML?

Your error is the last line:
return os.toString();
Since os is a ByteArrayOutputStream it has to convert the byte array to a String and it will use the current platform default encoding instead of UTF-8. You may use return os.toString("UTF-8");.

Instead of
InputStream is = new ByteArrayInputStream(inputXML.getBytes(Charset.forName("UTF-8")));
Source input = new StreamSource(is);
try
Source input = new StreamSource(StringReader(inputXML));

Related

Java Transformer: How do you make its result into an OutputStream?

I am new to javax.xml.transform.Transformer.
I am applying an XSLT on an XML document and It works fine.
What I want to achieve is to be able to write the output of that tranformation to an OutputStream.
This is my code:
OutputStream outputStream = null;
InputStream agent = new FileInputStream("src/res/testxmlfile.xml");
TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer transformer = tFactory.newTransformer(new StreamSource("src/res/trans.xslt"));
transformer.transform(new StreamSource(agent), outputStream ????????);
I know it can be used to write a file like this, but I want to write it to a OutputStream Object.
transformer.transform(new StreamSource(agent),
new StreamResult(new FileOutputStream("/result.xml")));
How can I pass an OutputStream to be used here?
This is the error I am getting when I am passing the Outputstream:
Exception in thread "main" javax.xml.transform.TransformerException:
Result object passed to ''{0}'' is invalid.
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl
.getOutputHandler(TransformerImpl.java:468)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl
.transform(TransformerImpl.java:344)
at com.gohealth.TestXmlStream.main(TestXmlStream.java:75)
Use a StreamResult. It provides constructors to write to a File or an OutputStream:
Example using File:
transformer.transform(new StreamSource(agent), new StreamResult(file));
Example using FileOutputStream:
FileOutputStream outputStream = new FileOutputStream(new File("outputfile.xml"));
transformer.transform(new StreamSource(agent), new StreamResult(outputStream));
Example using ByteArrayOutputStream:
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
transformer.transform(new StreamSource(agent), new StreamResult(outputStream));
byte[] bytes = outputStream.toByteArray();`
Use a "StreamResult" constructed with an object that represents where you want the output. See http://docs.oracle.com/javase/7/docs/api/javax/xml/transform/stream/StreamResult.html

Transformer not reading Special Character from Document Object

I am trying to read xml data from Document Object, and then using transformer to render the data inside the document object to pdf,using XSL,
My code is :
Document doc = toXML(arg1,arg2);
doc contains data like :
İlkyönetmeliği
with in tags
InputStream inputStream = new FileInputStream(xslFilePath);
transformer = factory.newTransformer(new StreamSource(inputStream));
transformer.setParameter("encoding", "UTF-8");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.transform(new DOMSource(doc.getDocumentElement()), res);
Special characters present in xml are not getting rendered accordingly and displaying like
#lk yard#m.
I have also set encoding to UTF-8 ,but still it is displaying like above.
It is not clear what causes your encoding problem because I cannot see how your document is read/constructed and how your transformation result res is set up. Try the following standalone example code which handles encoding with XSLT. Maybe you can even modify it gradually to use your actual data in order to see what goes wrong.
public static void main(String[] args) {
try {
String inputEncoding = "UTF-16";
String xsltEncoding = "ASCII";
String outputEncoding = "UTF-8";
ByteArrayOutputStream bos = new ByteArrayOutputStream();
OutputStreamWriter osw = new OutputStreamWriter(bos, inputEncoding);
osw.write("<?xml version='1.0' encoding='" + inputEncoding + "'?>");
osw.write("<root>İlkyönetmeliği</root>"); osw.close();
byte[] inputBytes = bos.toByteArray();
bos.reset();
osw = new OutputStreamWriter(bos, xsltEncoding);
osw.write("<?xml version='1.0' encoding='" + xsltEncoding + "'?>");
osw.write("<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' version='1.0'>");
osw.write("<xsl:template match='#*|node()'><xsl:copy><xsl:apply-templates select='#*|node()'/></xsl:copy></xsl:template>");
osw.write("</xsl:stylesheet>"); osw.close();
byte[] xsltBytes = bos.toByteArray();
bos.reset();
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document d = db.parse(new InputSource(new InputStreamReader(new ByteArrayInputStream(inputBytes), inputEncoding)));
// if encoding declaration correct, use: Document d = db.parse(new InputSource(new ByteArrayInputStream(inputBytes)));
System.out.println(XPathFactory.newInstance().newXPath().evaluate("/root[1]", d));
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer(new StreamSource(new InputStreamReader(new ByteArrayInputStream(xsltBytes), xsltEncoding)));
// if encoding declaration correct, use: Transformer t = tf.newTransformer(new StreamSource(new ByteArrayInputStream(xsltBytes)));
StreamResult sr = new StreamResult(new OutputStreamWriter(bos, outputEncoding));
t.setOutputProperty(OutputKeys.ENCODING, outputEncoding);
t.transform(new DOMSource(d.getDocumentElement()), sr);
byte[] outputBytes = bos.toByteArray();
Scanner s = new Scanner(new InputStreamReader(new ByteArrayInputStream(outputBytes), outputEncoding));
String output = s.useDelimiter("</>").next(); // read all
s.close();
System.out.println(output);
} catch (Exception ex) {
ex.printStackTrace(System.err);
}
The example code applies the XSLT identity template to a minimal input containing the non-ASCII characters.
I output the string to check if it has been parsed correctly in the document using XPath. You may want to check your (intermediate) document if you know how to locate it with XPath.
Note that, if present, the parser tries to pick up the encoding declared in the XML processing instruction (PI) by default when reading an XML file. It assumes that actual and declared encoding are the same. If they differ or the PI is missing, then you can enforce the actual encoding e.g. by using an InputStreamReader as in the code above.

How to keep character "&" from ISO-8859-1 to UTF-8

I'd just written a java file using Eclipse encoding with ISO-8859-1.
In this file, I want to create a String such like that (in order to create a XML content and save it into a database) :
// <image><img src="path_of_picture"></image>
String xmlContent = "<image><img src=\"" + path_of_picture+ "\"></image>";
In another file, I get this String and create a new String with this constructor :
String myNewString = new String(xmlContent.getBytes(), "UTF-8");
In order to be understood by a XML parser, my XML content must be converted to :
<image><img src="path_of_picture"></image>
Unfortunately, I can't find how to write xmlContent to get this result in myNewString.
I tried two methods :
// First :
String xmlContent = "<image><img src=\"" + content + "\"></image>";
// But the result is just myNewString = <image><img src="path_of_picture"></image>
// and my XML parser can't get the content of <image/>
//Second :
String xmlContent = "<image><img src=\"" + content + "\"></image>";
// But the result is just myNewString = <image>&lt;img src="path_of_picture"&gt;</image>
Do you have any idea ?
This is unclear. But Strings don't have an encoding. So when you write
String s = new String(someOtherString.getBytes(), someEncoding);
you will get various results depending on your default encoding setting (which is used for the getBytes() method).
If you want to read a file encoded with ISO-8859-1, you simply do:
read the bytes from the file: byte[] bytes = Files.readAllBytes(path);
create a string using the file's encoding: String content = new String(bytes, "ISO-8859-1);
If you need to write back the file with a UTF-8 encoding you do:
convert the string to bytes with UTF-8 encoding: byte[] utfBytes = content.getBytes("UTF-8");
write the bytes to the file: Files.write(path, utfBytes);
I dont feel that your question is related to encoding but if you want to "create a String such like that (in order to create a XML content and save it into a database)", you can use this code:
public static Document loadXMLFromString(String xml) throws Exception
{
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
InputSource is = new InputSource(new StringReader(xml));
return builder.parse(is);
}
Refer this SO answer.

How to force javax xslt transformer to encode national characters using utf-8 and not html entities?

I'm working on filter that should transform an output with some stylesheet. Important sections of code looks like this:
PrintWriter out = response.getWriter();
...
StringReader sr = new StringReader(content);
Source xmlSource = new StreamSource(sr, requestSystemId);
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.setParameter("encoding", "UTF-8");
//same result when using ByteArrayOutputStream xo = new java.io.ByteArrayOutputStream();
StringWriter xo = new StringWriter();
StreamResult result = new StreamResult(xo);
transformer.transform(xmlSource, result);
out.write(xo.toString());
The problem is that national characters are encoded as html entities and not by using UTF. Is there any way to force transformer to use UTF-8 instead of entities?
You need to set the output method to text instead of (default) xml.
transformer.setOutputProperty(OutputKeys.METHOD, "text");
You should however also set the response encoding beforehand:
response.setCharacterEncoding("UTF-8");
And instruct the webbrowser to use the same encoding:
response.setContentType("text/html;charset=UTF-8");

how to create an InputStream from a Document or Node

How can I create an InputStream object from a XML Document or Node object to be used in xstream? I need to replace the ??? with some meaningful code. Thanks.
Document doc = getDocument();
InputStream is = ???;
MyObject obj = (MyObject) xstream.fromXML(is);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
Source xmlSource = new DOMSource(doc);
Result outputTarget = new StreamResult(outputStream);
TransformerFactory.newInstance().newTransformer().transform(xmlSource, outputTarget);
InputStream is = new ByteArrayInputStream(outputStream.toByteArray());
If you are using Java without any Third Party Libraries, you can create InputStream using below code:
/*
* Convert a w3c dom node to a InputStream
*/
private InputStream nodeToInputStream(Node node) throws TransformerException {
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
Result outputTarget = new StreamResult(outputStream);
Transformer t = TransformerFactory.newInstance().newTransformer();
t.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
t.transform(new DOMSource(node), outputTarget);
return new ByteArrayInputStream(outputStream.toByteArray());
}
One way to do it: Adapt the Document to a Source with DOMSource. Create a StreamResult to adapt a ByteArrayOutputStream. Use a Transformer from TransformerFactory.newTransformer to copy across the data. Retrieve your byte[] and stream with ByteArrayInputStream.
Putting the code together is left as an exercise.
public static InputStream document2InputStream(Document document) throws IOException {
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
OutputFormat outputFormat = new OutputFormat(document);
XMLSerializer serializer = new XMLSerializer(outputStream, outputFormat);
serializer.serialize(document);
return new ByteArrayInputStream(outputStream.toByteArray());
}
This works if you are using apache Xerces implementation, you can also set format parameter with the output format.
public static InputStream documentToPrettyInputStream(Document doc) throws IOException {
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
XMLWriter xmlWriter = new XMLWriter(outputStream, OutputFormat.createPrettyPrint());
xmlWriter.write(doc);
xmlWriter.close();
InputStream inputStream = new ByteArrayInputStream(outputStream.toByteArray());
return inputStream;
}
If you happen to use DOM4j and you need to print it pretty!

Categories

Resources