In one of my requirement I have to create a PDF from word document such that
1) There will be a page number on every page.
2) There will be line number for every line, starting from line no as 1 for each page.
Our application is written in java so Java API will be more helpful. But if it is more convenient in any other language that will be OK.
Here is a link to the most common Java PDF API :)
http://java-source.net/open-source/pdf-libraries
IText is the most famous one, This is the link for IText
http://itextpdf.com/
I hope that I helped :)
Since word file cant be read like a .txt file , you will need 2 APIs.
One to read data from the doc file and another to write that data into pdf file.
Now to read data from the doc file you can use APACHE-POI.
and to write that data into pdf file you can use ITEXT.
I suggest JODReports or Docmosis since you want to start with a word document. You can dynamically insert numbered records and page breaks from a Java api so it sounds like they will be able to meet your requirements.
You haven't indicated why you want to start with a word document. If you are strictly generating a document, you could possibly drop that requirement and use iText or docx4j.
You can use Apache FOP(http://xmlgraphics.apache.org/fop/) only one thing I see last update on Oct 2012 for 1.1 release. But still issue are open.
You could try docx4j, although you would need to make a minor enhancement to support line numbering on every line, and possibly more problematically, probably use a commercial XSL FO processor.
docx4j uses XSL FO for PDF output, and line numbering isn't part of the 1.0 spec. (It is part of the XSL 2.0 requirements spec)
This means you'd have to use a XSL FO processor which supports a vendor-specific extension, for example Antenna House
UPDATE 2016 04
From v3.3.0, docx4j defaults to using our commercial converter, which you can try at http://converter-eval.plutext.com/
Related
Does anyone know of any easy to learn API's for printing word documents? I'm using Apache POI to create and edit the documents but I can't find any documentation on auto printing each document after it is created. I just need an API that is well documented (again, preferably easy to learn but it does not have to be) and does not require the end user to download any SDK's (also preferably no xml required).
You could check JODConverter to try and convert your Word Document to a printable format.
You could try this
You could (credit-Rob Spoor):
Store your Word file in a temporary file (File.createTempFile())
Use java.awt.Desktop and its PrintFile(File) method
Delete the temp File
But, you should really be more specific in your question and show evidence of personal effort and research.
I am searching for a way for my Java application to generate Word document using some kind of Template (the data for the document will be provided by the application)
Here are the requirements :
- The template should be editable for a non-developper human being. Creating a Jasper template using the adequate tool or editing a Word document with some kind of templating language is compliant. Asking for editing the xml file of the document is not
- The results should be easily editable for a human being, using Microsoft Word. For example, the document generated by Jasper or Birt is not compliant, as the table layout prevent any easy edition.
For the moment, I looked at the following solutions, finding no one which match the two requirements :
Jasper. The document generated are not easily edited
Birt. Same Problem
Generating the xml using a template motor (velocity, Freemarker). I cannot ask for the final client to edit this kind of XML file...
You can check out Templater. It has pretty good demo page.
Disclamer: I'm the author.
LibreOffice
LibreOffice is an open-source implementation of an app suite similar to Microsoft Office. Besides supporting the standardized OpenDocument format, it also reads and writes Microsoft Office formats.
LibreOffice offers a Java API. So you may be able to programmatically create documents from a template.
In the past we’ve done something similar, modifying a document with search-and-replace and document-variables.
Apache Poi
Apache Poi is an open-source library for reading and writing Microsoft Office compatible documents.
I don't know its details but you might take a look.
JODReports (open source) and Docmosis (commercial) are designed to use normal/human-managed documents as templates (Word, OpenOffice, etc), merge in your data and return editable documents, PDFs etc. Please note I work for Docmosis.
Both JODReports and Docmosis provide a Java API.
If you are interested in automating Open Office or Libre Office directly (as mentioned in Basil's answer), this blog about converting Doc to Pdf will give you a quick-start to:
load a doc file as a template
search and replace
export to file (pdf in the example)
To change the output format to Doc instead of PDF:
propertyValues[1].Value = "writer_pdf_Export";
to
propertyValues[1].Value = "MS Word 97";
I hope that helps.
Was searching for this kind of solution as well, and I found XDocReport, including an example of a table. I will give it a try.
[Background Info]
We had a solution in place to use Word automation serverside to convert HTM documents into Docx, PDF or Print documents. This solution broke in the latest version of Windows Server 2012. We learned that MS does not intend on Word working in this manner and after trouble shooting with MS support Engineers we have come to the conclusion that it will never work.
[Currently]
I am currently researching potential technologies and tools that my company can use to regain this functionality. We need to be able to create Docx, PDF and print files to a local printer.
I have looked into a number of tool already and I am currently leaning towards Apache FOP this seems to handle PDF and Printing for us.
However, I'm looking for some advice and suggested tools that we could use to implement a pure Java approach. Currently our application creates HTM files with all the required information. So ideally we would like to take these HTM files and "Convert" them into Docx/XLS-FO format.
[Question]
So my question that I'm hoping you will be able to help me with.
What is the best tools that I can use to get from
HTM to Docx
HTM to PDF
Or what would be the best process for achieving this? has anyone had success finding a solution for this in the past?
Thank You
It depends on the level of control and the complexity of the source HTML. There are HTML to FO stylesheets but you might find them wanting for your specific need.
So you could use the Jericho parser to read the HTML and generate FO. Or you generate the target format directly using Apache PDFBox and Apache POI
It all boils down to the level of control you want/need
docx4j-ImportXHTML will get you from XHTML to docx. From there, you can use docx4j (or some other solution eg LibreOffice/OpenOffice) to do docx to PDF.
docx4j supports docx to XSL FO, and by default uses FOP.
We are converting a C++ project to Java where we generate reports in ".doc" extension. The problem is we don't use any third party library to generate MS Word document, rather a file with .doc extension. Everything works fine except that we can't seem to find a way to add a Header at the beginning of every page. Using line numbers is not an option. Any other way it can be done?
Thank you.
The Apache POI library might be of some help.
It has facilities to read and modify Microsoft proprietary file formats like MS-Word .doc and MS-Excel .xls
how to Parse a PDF file and write the content in word file using Java?
For parsing a PDF file in Java, you can use Apache PDFBox: http://incubator.apache.org/pdfbox/
For reading/writing Word (or other Office) file formats in Java, try POI: http://poi.apache.org/
Both are free.
Try the iText java library:
iText is an ideal library for developers looking to enhance web- and other applications with dynamic PDF document generation and/or manipulation.
It can be used for your parsing step.
As for generating word documents - the OpenOffice Java API might be able to generate Word compatible docs (no personal experience with this API).
You might want to try any of these:
http://incubator.apache.org/pdfbox/
https://pdf-renderer.dev.java.net/
Once you are reading the contents of the PDF file, you can as well store them in a ODT file or a text file. For ODT file, try http://odftoolkit.openoffice.org.
Best!
You could use iText if the source PDF is mostly text. Images and such are quite hard to handle while parsing. If it's text only, it's as easy as 10 lines of code. See the iText manual for examples.
For writing word files there's only Apache POI. It can be a little tricky to figure out, but for such a simple task it shouldn't be any problem.