UTF-8 to UTF16 Parsing

UTF-8 to UTF16 Parsing - java

I have an XML that is UTF-8 and have some special characters in Chinese, I need to parse this xml.
DocumentBuilderFactory factory = DocumentBuilderFactory
.newInstance();
factory.setIgnoringElementContentWhitespace(true);
factory.setNamespaceAware(true);
factory.setValidating(true);
//byte[] buffer = xmlMsg.getBytes("UTF-16");
logger.info("transformToUTP " + xmlMsg);
//byte[] buffer = soapMessage.getBytes();
//ByteArrayInputStream stream = new ByteArrayInputStream(buffer);
InputSource is = new InputSource(new ByteArrayInputStream(
xmlMsg.getBytes("UTF-16")));
Document doc = factory.newDocumentBuilder().parse(is);
//Document doc = factory.newDocumentBuilder().parse(
new InputSource(new StringReader(xmlMsg)));
XPath xpath = XPathFactory.newInstance().newXPath();
xpath.setNamespaceContext(getNameSpace());
XPathExpression soapBodyExpr = xpath.compile(BODY_XPATH_EXP);
Node soapBody = (Node) soapBodyExpr.evaluate(doc,
XPathConstants.NODE);
Node reqMsgNode = soapBody.getFirstChild();
I am getting a null pointer exception on reqMsgNode.

Do not convert xml into a string, parse it as is, use
DocummentBuilder.parse(File) or DocumentBuilder.parse(InputStream)
the parser will take encoding from xml declaration e.g. <?xml version="1.0" encoding="UTF-8"?>, and if it is missing then it will use UTF-8 by default

Related

Remove SOAP envelope

I have an InputStream containing a SOAP message, including the envelope. I don't know the contents of the body beforehand and therefore cannot create a Jaxb annotated class for it.
I've tried many ways, inlcuding a custom SOAPWrapper JaxB Class with XmlAnyElement and other ways. Currently I have this:
private InputStream removeSoapEnvelope(final InputStream inputStream) throws IOException, TransformerException
{
final SoapBody body = messageFactory.createWebServiceMessage(inputStream)
.getSoapBody();
final Transformer transformer = TransformerFactory.newInstance()
.newTransformer();
final DOMResult domResult = new DOMResult();
transformer.transform(body.getPayloadSource(), domResult);
final StringWriter writer = new StringWriter();
transformer.transform(new DOMSource(domResult.getNode()), new StreamResult(writer));
byte[] barray = writer.toString()
.getBytes(StandardCharsets.UTF_8);
return new ByteArrayInputStream(barray);
}
It seems to work but is horribly inefficient. Is there no short and concise way of achieving this with standard libraries and without regex?
Thanks

Here's a solution using XPath to get the element (pure JaxB? not sure). Takes the document as a regular XML document so it should work for any I guess
FileInputStream fileIS;
fileIS = new FileInputStream(System.getProperty("user.home") + "/tmp/soap.xml");
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument;
xmlDocument = builder.parse(fileIS);
XPath xPath = XPathFactory.newInstance().newXPath();
String expression01 = "//*[local-name()='Body']";
Node currentNode = (Node) xPath.compile(expression01).evaluate(xmlDocument, XPathConstants.NODE);
StringWriter buf = new StringWriter();
Transformer xform = TransformerFactory.newInstance().newTransformer();
xform.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
xform.setOutputProperty(OutputKeys.INDENT, "yes");
xform.transform(new DOMSource(currentNode), new StreamResult(buf));
System.out.println(buf.toString());
Result:
<soap:Body>
<incident xmlns="http://example.com">
<Company type="String">Test</Company>
</incident>
</soap:Body>

I ended up doing it with regex. All other options are too slow:
private InputStream removeSoapEnvelope(final InputStream inputStream) throws IOException
{
final String text = new String(inputStream.readAllBytes(), UTF_8);
final String replace = text.replaceAll("\\s*<\\/?(?:SOAP-ENV|soap):(?:.|\\s)*?>", "");
File file = File.createTempFile("temp", XML_NS_PREFIX);
Files.writeString(file.toPath(), replace);
return new FileInputStream(file);
}

creating space in end of xml tag in java

My xml tag is :
<Description/>
I want with space :
<Description />
How can I do this in Java?
I am signing xml document , in original file space has been used but when I used following code and print it, it printing without space.
String thisLine = "";
String xmlString = "";
BufferedReader br = new BufferedReader(new FileReader(originalXmlFilePath));
while ((thisLine = br.readLine()) != null) {
xmlString = xmlString + thisLine.trim();
}
br.close();
ByteArrayInputStream xmlStream = new ByteArrayInputStream(xmlString.getBytes());
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
dbf.setIgnoringElementContentWhitespace(true);
dbf.setValidating(false);
Document doc = dbf.newDocumentBuilder().parse
(xmlStream );
doc.setXmlStandalone(true);
DOMSignContext dsc = new DOMSignContext
(keyEntry.getPrivateKey(), doc.getDocumentElement());
javax.xml.crypto.dsig.XMLSignature signature = fac.newXMLSignature(si, ki);
signature.sign(dsc);
// Output the resulting document.
// OutputStream os = new FileOutputStream(new File(destnSignedXmlFilePath));
TransformerFactory tf = TransformerFactory.newInstance();
Transformer trans = tf.newTransformer();
trans.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
trans.setOutputProperty(OutputKeys.INDENT, "yes");
StringWriter writer = new StringWriter();
trans.transform(new DOMSource(doc), new StreamResult(writer));
String output = writer.getBuffer().toString();//.replaceAll("\n|\r", "");
System.out.println("output== "+output);

What you are doing wrong is signing an arbitrary unprocessed text instead of submitting a canonical version of your document (without spaces in tags, but also with sorted attributes, with quotes of the same type, etc.) to the digital signature computation.
The Canonical XML and Exclusive Canonical XML W3C recommendations specify a standard and comprehensive way to eliminate arbitrary differences.

Transformer not reading Special Character from Document Object

I am trying to read xml data from Document Object, and then using transformer to render the data inside the document object to pdf,using XSL,
My code is :
Document doc = toXML(arg1,arg2);
doc contains data like :
İlkyönetmeliği
with in tags
InputStream inputStream = new FileInputStream(xslFilePath);
transformer = factory.newTransformer(new StreamSource(inputStream));
transformer.setParameter("encoding", "UTF-8");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.transform(new DOMSource(doc.getDocumentElement()), res);
Special characters present in xml are not getting rendered accordingly and displaying like
#lk yard#m.
I have also set encoding to UTF-8 ,but still it is displaying like above.

It is not clear what causes your encoding problem because I cannot see how your document is read/constructed and how your transformation result res is set up. Try the following standalone example code which handles encoding with XSLT. Maybe you can even modify it gradually to use your actual data in order to see what goes wrong.
public static void main(String[] args) {
try {
String inputEncoding = "UTF-16";
String xsltEncoding = "ASCII";
String outputEncoding = "UTF-8";
ByteArrayOutputStream bos = new ByteArrayOutputStream();
OutputStreamWriter osw = new OutputStreamWriter(bos, inputEncoding);
osw.write("<?xml version='1.0' encoding='" + inputEncoding + "'?>");
osw.write("<root>İlkyönetmeliği</root>"); osw.close();
byte[] inputBytes = bos.toByteArray();
bos.reset();
osw = new OutputStreamWriter(bos, xsltEncoding);
osw.write("<?xml version='1.0' encoding='" + xsltEncoding + "'?>");
osw.write("<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' version='1.0'>");
osw.write("<xsl:template match='#*|node()'><xsl:copy><xsl:apply-templates select='#*|node()'/></xsl:copy></xsl:template>");
osw.write("</xsl:stylesheet>"); osw.close();
byte[] xsltBytes = bos.toByteArray();
bos.reset();
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document d = db.parse(new InputSource(new InputStreamReader(new ByteArrayInputStream(inputBytes), inputEncoding)));
// if encoding declaration correct, use: Document d = db.parse(new InputSource(new ByteArrayInputStream(inputBytes)));
System.out.println(XPathFactory.newInstance().newXPath().evaluate("/root[1]", d));
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer(new StreamSource(new InputStreamReader(new ByteArrayInputStream(xsltBytes), xsltEncoding)));
// if encoding declaration correct, use: Transformer t = tf.newTransformer(new StreamSource(new ByteArrayInputStream(xsltBytes)));
StreamResult sr = new StreamResult(new OutputStreamWriter(bos, outputEncoding));
t.setOutputProperty(OutputKeys.ENCODING, outputEncoding);
t.transform(new DOMSource(d.getDocumentElement()), sr);
byte[] outputBytes = bos.toByteArray();
Scanner s = new Scanner(new InputStreamReader(new ByteArrayInputStream(outputBytes), outputEncoding));
String output = s.useDelimiter("</>").next(); // read all
s.close();
System.out.println(output);
} catch (Exception ex) {
ex.printStackTrace(System.err);
}
The example code applies the XSLT identity template to a minimal input containing the non-ASCII characters.
I output the string to check if it has been parsed correctly in the document using XPath. You may want to check your (intermediate) document if you know how to locate it with XPath.
Note that, if present, the parser tries to pick up the encoding declared in the XML processing instruction (PI) by default when reading an XML file. It assumes that actual and declared encoding are the same. If they differ or the PI is missing, then you can enforce the actual encoding e.g. by using an InputStreamReader as in the code above.

Parse XML string on BlackBerry

I am trying to parse XML with the following code, but StringReader is not available in the BlackBerry JDE. What is the right way to do this?
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(xmlRecords));
Document doc = db.parse(is);

String xmlString = "<xml> </xml>" // your xml string
ByteArrayInputStream bis = new ByteArrayInputStream(xmlString.getBytes("UTF-8"));
Document doc = builder.parse(bis);
Try this out

If you want to build a DOM from data coming from a server, you're much better off parsing the InputStream directly with a DocumentBuilder rather than reading the data into a String and trying to work with that. One way is:
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(input);

DOMImplementationLS serialize to String in UTF-8 in Java

reading the documentation for java org.w3c.dom.ls it seems as a Element only can be serialized to a String with the java native string encoding, UTF-16. I need however to create a UTF-8 string, escaped or what not, I understand that it still will be a UTF-16 String. Anyone has an idea to get around this?
I need the string to pass in to a generated WS client that will consume the String, then it should be UTF-8.
the code i use to create the string:
DOMImplementationRegistry domImplementationRegistry = DOMImplementationRegistry.
DOMImplementationLS domImplementationLS = (DOMImplementationLS) REGISTRY.getDOMImplementation("LS");
LSSerializer writer = domImplementationLS.createLSSerializer();
String result = writer.writeToString(element);

You can still use DOMImplementationLS:
DOMImplementationRegistry domImplementationRegistry = DOMImplementationRegistry.
DOMImplementationLS domImplementationLS = (DOMImplementationLS)REGISTRY.getDOMImplementation("LS");
LSOutput lsOutput = domImplementationLS.createLSOutput();
lsOutput.setEncoding("UTF-8");
Writer stringWriter = new StringWriter();
lsOutput.setCharacterStream(stringWriter);
lsSerializer.write(doc, lsOutput);
String result = stringWriter.toString();

I find that the most flexible way of serializing a DOM to String is to use the javax.xml.transform API:
Node node = ...
StringWriter output = new StringWriter();
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.transform(new DOMSource(node), new StreamResult(output));
String xml = output.toString();
It's not especially elegant, but it should give you better control over output encoding.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

UTF-8 to UTF16 Parsing - java

Do not convert xml into a string, parse it as is, use DocummentBuilder.parse(File) or DocumentBuilder.parse(InputStream) the parser will take encoding from xml declaration e.g. <?xml version="1.0" encoding="UTF-8"?>, and if it is missing then it will use UTF-8 by default

Related

Remove SOAP envelope

creating space in end of xml tag in java

Transformer not reading Special Character from Document Object

Parse XML string on BlackBerry

DOMImplementationLS serialize to String in UTF-8 in Java

Categories

Resources