How to Canonicalize a Stax XML object

How to Canonicalize a Stax XML object - java

i want to Canonicalize a Stax object, the program it's doing it with DOM, but dom can't manage big XML documents (like 1GB), so STAX it's the solution.
The Code that i have it's:
File file=new File("big-1gb.xml");
org.apache.xml.security.Init.init();
DocumentBuilderFactory dfactory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = dfactory.newDocumentBuilder();
Document doc = documentBuilder.parse(file);
Canonicalizer c14n = Canonicalizer.getInstance("http://www.w3.org/TR/2001/REC-xml-c14n-20010315");
outputBytes = c14n.canonicalizeSubtree(doc.getElementsByTagName("SomeTag").item(0));
The idea it's do the code below with Stax...
Thx :)

I solve this problem with XOM library, here is the equivalent code.
ByteArrayOutputStream bytestream = new ByteArrayOutputStream();
ObjectOutputStream outputstream = new ObjectOutputStream(bytestream);
nu.xom.Builder builder = new nu.xom.Builder(false, new nu.xom.samples.MinimalNodeFactory()); //The false parameter is for avoid a ValidationException that trows XOM
try {
nu.xom.canonical.Canonicalizer outputter = new nu.xom.canonical.Canonicalizer(outputstream);
nu.xom.Document input = builder.build(file);
outputter.write(input);
}
catch (Exception ex) {
System.err.println(ex);
ex.printStackTrace();
}
outputstream.close();
MessageDigest sha1 = MessageDigest.getInstance("SHA1");
sha1.reset();
sha1.update(java.nio.ByteBuffer.wrap(bytestream.toByteArray()));
salidasha1=sha1.digest();
String tagDigestValue=new String(Base64.encodeBase64(salidasha1));
This code can manage files of 200Mb, and take 7 minutes to do the canonicalization, if you have doubt's, see the XOM documentation, it's pretty clear and have a lot of Examples.
Thx to all for your comments :)

Related

text encoding in ftp download app causing errors

I have created a script to download files from an ftp endpoint. I was assured that the files would be in utf-8 encoding but upon downloading and parsing the xml, we encounter bad formatting. The process is to download the file, convert the xml to json and parse and convert to a different format. What we see after converting to json is for example the following which appears instead of chinese/hindi/arabic characters:
"Size": 3227,
"Title": "??? ???? ????? ?? ???? ?? 5 ??? ?? ??? ?? ?? ???? ?? ????????? ?? ???? ???? ??????-Pakistan new army chief
The code snippet is the following:
ftp.connect("xx.xxx.xxx.xx");
ftp.login("xxxx", "xxxxx");
ftp.enterLocalPassiveMode();
ftp.setControlEncoding("UTF-8");
ftp.setFileType(FTP.BINARY_FILE_TYPE);
...
String remoteFile1 = ftp.printWorkingDirectory() + "/" + file.getName();
File downloadFile1 = new File(destFolder + "/" + "/" + file.getName());
OutputStream outputStream1 = new BufferedOutputStream(new FileOutputStream(downloadFile1));
boolean success = ftp.retrieveFile(remoteFile1, outputStream1);
outputStream1.flush();
outputStream1.close();
....
DocumentBuilderFactory docFactory =
DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
Document doc = docBuilder.newDocument();
doc = docBuilder.parse(xmlFile);
doc.getDocumentElement().normalize();
TransformerFactory tf = TransformerFactory.newInstance();
Transformer trans = tf.newTransformer();
StringWriter sw = new StringWriter();
trans.transform(new DOMSource(doc), new StreamResult(sw));
String xml = sw.toString();
JSONObject xmlJSONObj = XML.toJSONObject(xml);
String jsonPrettyPrintString = xmlJSONObj.toString(4);
jsonMapper.configure(SerializationFeature.WRAP_ROOT_VALUE, false);...
Can someone advise how to ensure the encoding can be changed to output the correct format for foreign characters?

Remove SOAP envelope

I have an InputStream containing a SOAP message, including the envelope. I don't know the contents of the body beforehand and therefore cannot create a Jaxb annotated class for it.
I've tried many ways, inlcuding a custom SOAPWrapper JaxB Class with XmlAnyElement and other ways. Currently I have this:
private InputStream removeSoapEnvelope(final InputStream inputStream) throws IOException, TransformerException
{
final SoapBody body = messageFactory.createWebServiceMessage(inputStream)
.getSoapBody();
final Transformer transformer = TransformerFactory.newInstance()
.newTransformer();
final DOMResult domResult = new DOMResult();
transformer.transform(body.getPayloadSource(), domResult);
final StringWriter writer = new StringWriter();
transformer.transform(new DOMSource(domResult.getNode()), new StreamResult(writer));
byte[] barray = writer.toString()
.getBytes(StandardCharsets.UTF_8);
return new ByteArrayInputStream(barray);
}
It seems to work but is horribly inefficient. Is there no short and concise way of achieving this with standard libraries and without regex?
Thanks

Here's a solution using XPath to get the element (pure JaxB? not sure). Takes the document as a regular XML document so it should work for any I guess
FileInputStream fileIS;
fileIS = new FileInputStream(System.getProperty("user.home") + "/tmp/soap.xml");
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument;
xmlDocument = builder.parse(fileIS);
XPath xPath = XPathFactory.newInstance().newXPath();
String expression01 = "//*[local-name()='Body']";
Node currentNode = (Node) xPath.compile(expression01).evaluate(xmlDocument, XPathConstants.NODE);
StringWriter buf = new StringWriter();
Transformer xform = TransformerFactory.newInstance().newTransformer();
xform.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
xform.setOutputProperty(OutputKeys.INDENT, "yes");
xform.transform(new DOMSource(currentNode), new StreamResult(buf));
System.out.println(buf.toString());
Result:
<soap:Body>
<incident xmlns="http://example.com">
<Company type="String">Test</Company>
</incident>
</soap:Body>

I ended up doing it with regex. All other options are too slow:
private InputStream removeSoapEnvelope(final InputStream inputStream) throws IOException
{
final String text = new String(inputStream.readAllBytes(), UTF_8);
final String replace = text.replaceAll("\\s*<\\/?(?:SOAP-ENV|soap):(?:.|\\s)*?>", "");
File file = File.createTempFile("temp", XML_NS_PREFIX);
Files.writeString(file.toPath(), replace);
return new FileInputStream(file);
}

using apache fop with PipedOutputStream

I want to build a pdf-file from a jaxb-object using apache fop to generate and itext PdfStamper to modify it. since fop writes to an outputStream and PdfStamper reads from InputStream my idea was to use Piped[I|O]Streams for this. here is what I tried:
public void transform2XSLFO_onthefly(Medium medium, OutputStream out) throws Exception {
PipedInputStream pInputPipe = new PipedInputStream();
PipedOutputStream outputTemp = new PipedOutputStream(pInputPipe);
try {
JAXBSource source = new JAXBSource( JAXBContext.newInstance(medium.getClass()) , medium );
FOUserAgent userAgent = fopFactory.newFOUserAgent();
// settings
Fop fop = fopFactory.newFop(MimeConstants.MIME_PDF, userAgent ,outputTemp);
InputStream XSLinputStream = xslfoStylesheet.getInputStream();
StreamSource XSLsource = new StreamSource(XSLinputStream);
Result res = new SAXResult(fop.getDefaultHandler());
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer(XSLsource);
// run transformation
t.transform(source, res);
// does not come so far, no use closing the stream
outputTemp.close();
PdfReader reader = new PdfReader(pInputPipe);
pdfStamper = new PdfStamper(reader, out);
//..... postProcess...
pdfStamper.close();
} catch (Exception ex) {
log.error("ERROR", ex);
}
However it hangs in the line "t.transform(source, res);", looks like he is waiting for something in the middle of the fop-transformation. It works using BypeArrayOutputStream and convert it to inputstream and use it for PdfStamper input:
InputStream pdfInput = new ByteArrayInputStream(((ByteArrayOutputStream) outputTemp).toByteArray());
but the files can get quite large (few MB) so i think the piped version would perform better! what do you think?

You should read up on how to use PipeInput/OutputStream. FOP and the PdfStamper will need to run in separate threads. Basically, this has nothing to do with FOP per se. I'm sure you'll find various examples on the net on how this works. If you're not comfortable with multi-threaded programming, I suggest you just buffer FOP's output in a byte[] or a temporary file.

Transformer not reading Special Character from Document Object

I am trying to read xml data from Document Object, and then using transformer to render the data inside the document object to pdf,using XSL,
My code is :
Document doc = toXML(arg1,arg2);
doc contains data like :
İlkyönetmeliği
with in tags
InputStream inputStream = new FileInputStream(xslFilePath);
transformer = factory.newTransformer(new StreamSource(inputStream));
transformer.setParameter("encoding", "UTF-8");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.transform(new DOMSource(doc.getDocumentElement()), res);
Special characters present in xml are not getting rendered accordingly and displaying like
#lk yard#m.
I have also set encoding to UTF-8 ,but still it is displaying like above.

It is not clear what causes your encoding problem because I cannot see how your document is read/constructed and how your transformation result res is set up. Try the following standalone example code which handles encoding with XSLT. Maybe you can even modify it gradually to use your actual data in order to see what goes wrong.
public static void main(String[] args) {
try {
String inputEncoding = "UTF-16";
String xsltEncoding = "ASCII";
String outputEncoding = "UTF-8";
ByteArrayOutputStream bos = new ByteArrayOutputStream();
OutputStreamWriter osw = new OutputStreamWriter(bos, inputEncoding);
osw.write("<?xml version='1.0' encoding='" + inputEncoding + "'?>");
osw.write("<root>İlkyönetmeliği</root>"); osw.close();
byte[] inputBytes = bos.toByteArray();
bos.reset();
osw = new OutputStreamWriter(bos, xsltEncoding);
osw.write("<?xml version='1.0' encoding='" + xsltEncoding + "'?>");
osw.write("<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' version='1.0'>");
osw.write("<xsl:template match='#*|node()'><xsl:copy><xsl:apply-templates select='#*|node()'/></xsl:copy></xsl:template>");
osw.write("</xsl:stylesheet>"); osw.close();
byte[] xsltBytes = bos.toByteArray();
bos.reset();
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document d = db.parse(new InputSource(new InputStreamReader(new ByteArrayInputStream(inputBytes), inputEncoding)));
// if encoding declaration correct, use: Document d = db.parse(new InputSource(new ByteArrayInputStream(inputBytes)));
System.out.println(XPathFactory.newInstance().newXPath().evaluate("/root[1]", d));
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer(new StreamSource(new InputStreamReader(new ByteArrayInputStream(xsltBytes), xsltEncoding)));
// if encoding declaration correct, use: Transformer t = tf.newTransformer(new StreamSource(new ByteArrayInputStream(xsltBytes)));
StreamResult sr = new StreamResult(new OutputStreamWriter(bos, outputEncoding));
t.setOutputProperty(OutputKeys.ENCODING, outputEncoding);
t.transform(new DOMSource(d.getDocumentElement()), sr);
byte[] outputBytes = bos.toByteArray();
Scanner s = new Scanner(new InputStreamReader(new ByteArrayInputStream(outputBytes), outputEncoding));
String output = s.useDelimiter("</>").next(); // read all
s.close();
System.out.println(output);
} catch (Exception ex) {
ex.printStackTrace(System.err);
}
The example code applies the XSLT identity template to a minimal input containing the non-ASCII characters.
I output the string to check if it has been parsed correctly in the document using XPath. You may want to check your (intermediate) document if you know how to locate it with XPath.
Note that, if present, the parser tries to pick up the encoding declared in the XML processing instruction (PI) by default when reading an XML file. It assumes that actual and declared encoding are the same. If they differ or the PI is missing, then you can enforce the actual encoding e.g. by using an InputStreamReader as in the code above.

Parse XML string on BlackBerry

I am trying to parse XML with the following code, but StringReader is not available in the BlackBerry JDE. What is the right way to do this?
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(xmlRecords));
Document doc = db.parse(is);

String xmlString = "<xml> </xml>" // your xml string
ByteArrayInputStream bis = new ByteArrayInputStream(xmlString.getBytes("UTF-8"));
Document doc = builder.parse(bis);
Try this out

If you want to build a DOM from data coming from a server, you're much better off parsing the InputStream directly with a DocumentBuilder rather than reading the data into a String and trying to work with that. One way is:
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(input);

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to Canonicalize a Stax XML object - java

Related

text encoding in ftp download app causing errors

Remove SOAP envelope

using apache fop with PipedOutputStream

Transformer not reading Special Character from Document Object

Parse XML string on BlackBerry

Categories

Resources