Using a stylesheet with iText, possible? - java

Does iText provide/support for any kind of styling sheet?
What I mean is, like in Apache FOP, the data is represented in the XML and the formatting is programmed in the XSL. So then we pass the XML and XSL to the FOP engine which in turn converts the data in XML using the formatting specified in the XSL to create a PDF.
Does iText support a similar functionality or the only way we have is to program the whole formatting in the Java code itself, meaning specifying the table/cell(its dimensions etc.), paragraph(its font, color etc.)?

iText isn't FOP, no. The only way is to program the whole formatting in the java code itself. OTOH, Your program could read formatting information from various files in the format of your choice, but you'd have to write that code yourself.
iText in Action 2nd ed has a sample that outlines building an XML parser and using that to feed iText. Nothing about style info other than what's written in the code.

Related

rich PDF generation framework in java

I have hundreds of rich PDFs that need to be generated from my application, they have images and colorful content. I was looking to build a framework which support a template and data model and can take care of rest, so adding anew pdfs would be just adding a new template. In the past i have used free-marker to generate HTML and that print HTML to PDFs, are there any better recent solution to solve this problem?
There are various things you could do:
generate xml data, apply xslt transformation to style it, and convert
the html document to a pdf
code a small class that converts whatever data format you have to a pdf document (you would need to do all the layout through code)
create a template (using whatever design program you want) pdf document, insert form fields, and have iText fill the form (several of our customers go for this approach)
Keep in mind that JasperReports uses a proprietary format. Whereas the approaches I suggested use only open and well-established formats.
Take a look at JasperReports.

Convert ALTO XML to formatted PDF/RTF/TXT?

I am looking to batch convert a large amount of ALTO format XML docs to various formats in Windows, txt at least, rtf if possible and pdf would be convenient as well.
ALTO is an xml standard used by libraries and archives to hold metadata/format/font/layout aware text for reconstruction in PDF images.
I have only the XML files for a large archive that I would like to convert for text mining. The software I am using requires clean text or rtf files, so converting the xml to plain text is kind of the goal. Because ALTO is a standard the conversion should be possible, no?
A bonus would be the ability to either embed the metadata in a pdf or convert it to a bibliographical format file like LaTex. This could be a separate program.
I'd appreciate any ideas,
Thanks.
In order to get plain text from the ALTO xml, you may try implementing the simple method used in this (hacky) Python script in Java: https://github.com/cneud/alto-ocr-text.
I am not currently aware of a straight conversion to PDF or LaTeX though you may be able to do this with a stylesheet, based on how exactly your ALTO files look like.

xsl stylesheet for generating word documents from Java

I was wondering if there is any Java API which allows to generate Word document similarly Apache FOP does.
With FOP, it is possible to specify a style sheet which defines the layout of the page in which the data (stored in an xml file) are printed.
The Transformer object within FOP library is in charge of that.
Is there any equivalent API for word document?
With FOP, you can try XML to RTF, which Word accepts.
From their webpage, XMLmind XSL-FO Converter apparently generates:
RTF (can be opened in Word 2000+),
WordprocessingML (can be opened inWord 2003+),
Office Open XML (.docx, can be opened in Word 2007+),
Putting FO to one side, here are 2 different approaches:
The first would be to write an XSLT to convert your XML to Flat OPC XML. Most parts in the Flat OPC XML would simply be copied there by your XSLT. (Generate that template content in Word, using "save as XML"). You'll be focusing mainly on populating the document.xml part. Word can open a Flat OPC XML file, or you can use docx4j (a project I work on) to convert Flat OPC XML to docx.
The second would be to use the docx4j Flying Saucer fork to convert your XML + CSS to docx content. See the code samples. You may need to customise it a bit; one way of feeding it CSS is this file. This actually ought to work pretty well; there is stuff there for mapping class attributes to Word styles, so if you could adorn your XML with class attributes, you could get even better results.
I will assume that your input is an XML document, or at least a CSV file.
1) Create an XSLT stylesheet to transform your input into the Word document format. The result will be a file we will call content.xml. You can apply the stylesheet to the input from Java.
2) Create a MS Word shell and put the content.xml into the shell. There are tools within Apache POI to do this.
I may have taken your question too literally. You might also be able to generate the document using Apache POI API. Also, if your MS Word document doesn't have tables, you can use Apache FOP to generate an RTF document, which can then be easily translated to a .docx file.
Apache POI provides Java APIs for reading and writing Microsoft Excel, Word, and PowerPoint files.
You can checkout POI's Javadocs here.

Converting XML to PDF, using styles from XSL

I have following problem: I have a XML file with XSL stylesheet, that is rendering this XML file as neat table in HTML when I load it in web browser. Now I need to make a PDF that is looking EXACTLY like that XSL-styled XML in web browser, without need for making custom FO's for every file. Everything must be done in Java.
I need to make a PDF that is looking EXACTLY like that XSL-styled XML in web browser
Think again about this requirement. Paged media such as PDF and non-paged media such as HTML may only look "close enough", but never "exactly like" each other. This is even more obvious if you consider your HTML being displayed on devices with different screen sizes.
If you relax the above requirement somewhat, you'll probably agree that XSL-FO is the best choice. You definitely do not need to write "custom FO's for every file": write an XSLT just once, and use it on-the-fly to convert your XML to XSL-FO, and then use a rendering engine to process XSL-FO to PDF. Simple.
XSL-FO does sound like exactly what you need. But if that's not an option, first explicitly doing the XSLT transform on the XML in Java and then converting the resulting HTML (which by then is a String/byte array/DOM/whatever you want) to PDF using some additional library would do the trick. There's some libraries that support HTML to PDF, like iText for example. XSLT transformations in Java are really simple. Little code involved there.

Creating PDF, HTML, and optionally RTF documents from the same source using Java?

I was looking at using iText to create both a pdf and html version of a document with RTF as a possible option. According to this question this is no longer possible with iText. Is there a library that will allow me to create a document in Java and output it as both PDF and HTML? The ability to output RTF would be nice but is not required.
As that answer to the other question states, you can just use the iText RTF Library.
I have used PD4ML to convert HTML to pdf. Even though it is a commercial app. It is very reliable and supports CSS well.
JasperReports. If you look at this package it supports export to:
pdf
html
rtf
xls
xml
You have two options to create the documents:
via iReport - a visual designer for reports
via an API, where you construct everything with Java code.
Note that even though JasperReports's main function is to create reports, it can very well create other documents, with no tabular data for example.
You could also try Docmosis since that supports the output formats provided by OpenOffice (including the ones you specified) and you can often do the job with a lot less code.

Categories

Resources