Apache FOP Xml to PDF - java

I'm trying to transform XML financial data to PDF in Java using xslt and Apache FOP. But I'm getting following exception while transforming XML to PDF with created xsl-fo.
Caused by: org.xml.sax.SAXParseExceptionpublicId: -//W3C//DTD HTML 4.01 Transitional//EN; systemId: http://www.w3.org/TR/html4/loose.dtd; lineNumber: 31; columnNumber: 3; The declaration for the entity "HTML.Version" must end with '>'.
http://www.w3.org/TR/html4/loose.dtd is included in my xslt file. It has really that line without closing tag. I read on https://sourceforge.net/p/saxon/mailman/message/23058335/ that its SGML DTD. I can't transform this xslt to xsl-fo using Apache FOP, because underlying saxon can't parse sgml dtd?
Code for transform xslt to xsl-fo and then xsl-fo to PDF look like following. Could someone tell me, what I'm doing wrong? And how can I transform XML to PDF? Thanks in Advance.
private byte[] generateFOFromXML(Source xslt, Source invoice) throws TransformerException {
ByteArrayOutputStream out = new ByteArrayOutputStream();
try {
//Setup XSLT
TransformerFactory factory = TransformerFactory.newInstance();
Transformer transformer = factory.newTransformer(xslt);
//Setup input for XSLT transformation
//Resulting SAX events (the generated FO) must be piped through to FOP
Result res = new StreamResult(out);
//Start XSLT transformation and FOP processing
transformer.transform(invoice, res);
return out.toByteArray();
} finally {
// try {
// out.close();
// } catch (IOException e) {
// e.printStackTrace();
// }
}
}
byte[] xslFO = generaterFOFromXML(xsltSource, invoiceSource);
FopFactoryBuilder builder = new FopFactoryBuilder(new File(".").toURI());
builder.setStrictFOValidation(false);
FopFactory fopFactory= builder.build();
// FopFactory fopFactory = FopFactory.newInstance(new File(".").toURI());
FOUserAgent foUserAgent = fopFactory.newFOUserAgent();
ByteArrayOutputStream tempBAO = new ByteArrayOutputStream();
Fop fop = fopFactory.newFop(MimeConstants.MIME_PDF, foUserAgent, tempBAO);
TransformerFactory factory = TransformerFactory.newInstance();
Transformer transformer = factory.newTransformer();
transformer.setParameter("versionParam", "2.0");
Result result = new SAXResult(fop.getDefaultHandler());
Source foSource = new StreamSource(new ByteArrayInputStream(xslFO));
transformer.transform(foSource, result);

Related

Generating PDF using fop and XSL when having URLS in XSLT

Generating PDF using fop and XSL when having URLS in XSLT
I am generating PDF using FOP 2.0 and XSLT. Here i am getting XSL from web url. my one XSL URL is including and importing other urls of XSLs. If its a single XSL I could able to generate PDF. If i have multiple URLS in one XSLT on Web . The FOP is not able to Connect automatically to other URLS[ Example of using XSLTS]
xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" \
xmlns:fo="http://www.w3.org/1999/XSL/Format" version="1.0">
<xsl:include href="abc.xsl"/>
<xsl:include href="xyz.xsl"/>
<xsl:include href="wgh.xsl"/>
This is the way Its including XSLs in one XSLs. In this Case my FOP is not redirecting to those xsls and couldn't able to generate PDF
ERROR:
SystemId Unknown; Line #3; Column #34; Had IO Exception with stylesheet file: header.xsl
SystemId Unknown; Line #4; Column #34; Had IO Exception with stylesheet file: footer.xsl
SystemId Unknown; Line #5; Column #36; Had IO Exception with stylesheet file: mainbody.xsl
SystemId Unknown; Line #6; Column #41; Had IO Exception with stylesheet file: secondarybody.xsl
SystemId Unknown; Line #10; Column #38; org.xml.sax.SAXException: ElemTemplateElement error: layout
javax.xml.transform.TransformerException: ElemTemplateElement error: layout
13:58:27.326 [http-nio-auto-1-exec-2] DEBUG org.apache.fop.fo.FOTreeBuilder - Building formatting object tree
SystemId Unknown; Line #10; Column #38; Could not find template named: layout
Code for PDF Generator:
public class PdfGenerator {
private static final Logger LOG=LoggerFactory.getLogger(PdfGenerator.class);
public List<OutputStream> generatePdfs(List<Content> xmlList, int reqestListSize,String xslPath)
{
try {
List<OutputStream> pdfOutputStreams= new ArrayList();
for(int p = 0; p <reqestListSize; p++) {
ByteArrayOutputStream bos = new ByteArrayOutputStream();
String jaxbType = "com.abc.model"; // model package
JAXBContext context = JAXBContext.newInstance(jaxbType);
Marshaller marshaller = context.createMarshaller();
marshaller.setProperty("jaxb.formatted.output",Boolean.TRUE);
marshaller.marshal(xmlList.get(p),bos);
ByteArrayInputStream inStream = new ByteArrayInputStream(bos.toByteArray());
StreamSource xmlSource = new StreamSource(inStream);
// create an instance of fop factory
FopFactory fopFactory = FopFactory.newInstance(new File(".").toURI());
// a user agent is needed for transformation
FOUserAgent foUserAgent = fopFactory.newFOUserAgent();
ByteArrayOutputStream tempOutputStream = new ByteArrayOutputStream();
Fop fop = fopFactory.newFop(MimeConstants.MIME_PDF, foUserAgent, tempOutputStream);
pdfOutputStreams.add(p, tempOutputStream);
// Setup XSLT
TransformerFactory transformerFactory = TransformerFactory.newInstance();
URL url = new URL(xslPath);
InputStream xslFile = url.openStream(); ( **http://home.www.test.com/abc_web/xsl/test.xsl** ( Using an url to get XSLT. faild loading due to XSL :include) )
StreamSource xsltStreamSource = new StreamSource(xslFile);
Transformer transformer = transformerFactory.newTransformer(xsltStreamSource);
Result res = new SAXResult(fop.getDefaultHandler());
// Start XSLT transformation and FOP processing
// That's where the XML is first transformed to XSL-FO and then
// PDF is created
transformer.transform(xmlSource, res);
}
return pdfOutputStreams;
}catch(Exception ex) {
LOG.error("Error", ex);
return new ArrayList();
}
Simply replace
URL url = new URL(xslPath);
InputStream xslFile = url.openStream();
StreamSource xsltStreamSource = new StreamSource(xslFile);
with
StreamSource xsltStreamSource = new StreamSource(xslPath);
and the XSLT processor should be able to resolve any relative imports or includes.
Or you would need to explicitly set the SystemId on your xsltStreamSource. But the single line I have suggested should do the job just fine.

Display XML with stylesheet in JEditorPane

I have an XML file, which uses an XSS and XSL stored in the folder to display the XML in a proper format.
when i use the following code
JEditorPane editor = new JEditorPane();
editor.setBounds(114, 65, 262, 186);
frame.getContentPane().add(editor);
editor.setContentType( "html" );
File file=new File("c:/r/testResult.xml");
editor.setPage(file.toURI().toURL());
All i can see is the text part of the XML without any styling. what should i do to make this display with style sheet.
The JEditorPane does not automatically process XSLT style-sheets. You must perform the transformation yourself:
try (InputStream xslt = getClass().getResourceAsStream("StyleSheet.xslt");
InputStream xml = getClass().getResourceAsStream("Document.xml")) {
DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = db.parse(xml);
StringWriter output = new StringWriter();
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer(new StreamSource(xslt));
transformer.transform(new DOMSource(doc), new StreamResult(output));
String html = output.toString();
// JEditorPane doesn't like the META tag...
html = html.replace("<META http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\">", "");
editor.setContentType("text/html; charset=UTF-8");
editor.setText(html);
} catch (IOException | ParserConfigurationException | SAXException | TransformerException e) {
editor.setText("Unable to format document due to:\n\t" + e);
}
editor.setCaretPosition(0);
Use an appropriate InputStream or StreamSource for your particular xslt and xml documents.

How to Convert .doc/.docx to pdf in java using POI..?

how to convert ms-document to PDF, is there any example pls share
with me.. thanks.
If you are requiered to use POI i guess you should take a look at org.apache.poi.hwpf.converter
I never tried this, but i guess it´s worth a try atleast.
It seems like you can use WordToFoConverterto convert your XWPFDocument to a FO-file (example here).
From there you can use apaches FOP to transform the FO-file to a PDF like this:
// Step 1: Construct a FopFactory
// (reuse if you plan to render multiple documents!)
FopFactory fopFactory = FopFactory.newInstance();
// Step 2: Set up output stream.
// Note: Using BufferedOutputStream for performance reasons (helpful with FileOutputStreams).
OutputStream out = new BufferedOutputStream(new FileOutputStream(new File("C:/Temp/myfile.pdf")));
try {
// Step 3: Construct fop with desired output format
Fop fop = fopFactory.newFop(MimeConstants.MIME_PDF, out);
// Step 4: Setup JAXP using identity transformer
TransformerFactory factory = TransformerFactory.newInstance();
Transformer transformer = factory.newTransformer(); // identity transformer
// Step 5: Setup input and output for XSLT transformation
// Setup input stream
Source src = new StreamSource(new File("C:/Temp/myfile.fo"));
// Resulting SAX events (the generated FO) must be piped through to FOP
Result res = new SAXResult(fop.getDefaultHandler());
// Step 6: Start XSLT transformation and FOP processing
transformer.transform(src, res);
} finally {
//Clean-up
out.close();
}
This Code was taken from https://xmlgraphics.apache.org/fop/0.95/embedding.html incase you want to read more on this topic.

One FOP XSLT transformation but different files rendered

Is there any way to make only one xslt transformation and render the output to pdf, png, svg files?
StreamSource contentSource = new StreamSource(xmlContentStream);
StreamSource transformSource = new StreamSource(xslFoMarkupStream);
ByteArrayOutputStream outStream = new ByteArrayOutputStream();
Transformer xslfoTransformer = getTransformer(transformSource);
Fop fop = fopFactory.newFop("application/pdf", foUserAgent, outStream);
Result res = new SAXResult(fop.getDefaultHandler());
// Start XSLT transformation and FOP processing
xslfoTransformer.transform(contentSource, res);
xmlContentStream.close();
xslMarkupStream.close();
return outStream;
In the case above to generate PDF and then PNG I will have to create a new Fop instance with different mime type and then again call xslfoTransformer.transform().
That means that I will have the transformation twice, but I wonder if there is a way to run the transformation once and then render the output to different formats? (Custom Renderer?)
Or maybe there are any suggestions to speed up the rendering as I still need to do it several times - once for PDF, PNG, SVG.
I also tried to generate PDF via FOP and then convert it to image via Apache PdfBox. That works slightly faster, but looks silly.
Thank_you.
You can save one step. Your code does 2 steps above: take some arbitrary XML, transform that into XSL:FO using XSLT and then render the output into whatever format you want. You could do the transformation XML to XSL:FO (probably the slower part) once and use that result as input to 2 FO instances. Something like this:
public void fopReport(OutputStream pdfOut, OutputStream jpgOut, Source xmlSource, Source xsltSource) throws Exception {
// Create the FO content
TransformerFactory factory = TransformerFactory.newInstance();
Transformer transformer = factory.newTransformer(xsltSource);
ByteArrayOutputStream foBytesStream = new ByteArrayOutputStream();
StreamResult foByteStreamResult = new StreamResult(foBytesStream);
transformer.transform(xmlSource, foByteStreamResult);
byte[] foBytes = foBytesStream.toByteArray();
// Render twice
FopFactory fopFactory = FopFactory.newInstance();
FOUserAgent uaPDF = fopFactory.newFOUserAgent();
FOUserAgent uaJpg = fopFactory.newFOUserAgent();
Fop fopPDF = fopFactory.newFop(MimeConstants.MIME_PDF, uaPDF, pdfOut);
Fop fopJpg = fopFactory.newFop(MimeConstants.MIME_JPEG, uaJpg, jpgOut);
//PDF
Source src = new StreamSource(new ByteArrayInputStream(foBytes));
Transformer resultTransformer = factory.newTransformer();
resultTransformer.transform(src, new SAXResult(fopPDF.getDefaultHandler()));
//JPF
src = new StreamSource(new ByteArrayInputStream(foBytes));
resultTransformer = factory.newTransformer();
resultTransformer.transform(src, new SAXResult(fopJpg.getDefaultHandler()));
}
Hope that helps

Converting a .docx to html using Apache POI and getting no text

I currrently have some code that converts a .doc document to html but the code I am using for converting a .docx to text unfortunately doesn't get the text and convert it. Below is my code.
private void convertWordDocXtoHTML(File file) throws ParserConfigurationException, TransformerConfigurationException, TransformerException, IOException {
XWPFDocument wordDocument = null;
try {
wordDocument = new XWPFDocument(new FileInputStream(file));
} catch (IOException ex) {
Exceptions.printStackTrace(ex);
}
WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument());
org.w3c.dom.Document htmlDocument = wordToHtmlConverter.getDocument();
ByteArrayOutputStream out = new ByteArrayOutputStream();
DOMSource domSource = new DOMSource(htmlDocument);
StreamResult streamResult = new StreamResult(out);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer serializer = tf.newTransformer();
serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
serializer.setOutputProperty(OutputKeys.INDENT, "yes");
serializer.setOutputProperty(OutputKeys.METHOD, "html");
serializer.transform(domSource, streamResult);
out.close();
String result = new String(out.toByteArray());
acDocTextArea.setText(newDocText);
String htmlText = result;
}
Any ideas as to why this isn't working would be much appreciated. The ByteArrayOutput should return the entire html but it is empty and has no text.
Mark, you're using HWPF package which supports only .doc format, see this description. The document also mentions attempts to provide the interface for .docx files, through XWPF package. However they seem to lack human resources and users are encouraged to submit extensions. Limited functionality should be available though, extracting the text must be one of them.
You should also see this question: How to Extract docx (word 2007 above) using apache POI.
I too was struck at this point.
Now I know there is a 3rd party API to convert docx to html
works fine
https://code.google.com/p/xdocreport/wiki/XWPFConverterXHTML

Categories

Resources