I want to be able to generate .docx files through either Java or Python based off of a template .docx file. I need to be able to insert in simple text, some bullets and a table or two.
I would like suggestions on specific libraries/modules for either Python or Java that would allow me to load a template, insert basic text and tables and then save it.
I have been looking into JACOB for Java and docx for Python. Any alternatives? Or will one of these be able to do what I need?
Thanks in advance
If you want to generate a docx, than you might like docxtemplater, which is a library I maintain which does docx generation from a template (much like Mustache for HTML).
It runs on node but has a command line interface so you can use it from any language.
DocxTemplater Library
Demo Site
Give docx4j as a choice, it's based on Apache POI but with better documentation
Did you look at Aapache POI (the Java API for Microsoft Documents) project?
http://poi.apache.org/
Good luck!
Related
[Background Info]
We had a solution in place to use Word automation serverside to convert HTM documents into Docx, PDF or Print documents. This solution broke in the latest version of Windows Server 2012. We learned that MS does not intend on Word working in this manner and after trouble shooting with MS support Engineers we have come to the conclusion that it will never work.
[Currently]
I am currently researching potential technologies and tools that my company can use to regain this functionality. We need to be able to create Docx, PDF and print files to a local printer.
I have looked into a number of tool already and I am currently leaning towards Apache FOP this seems to handle PDF and Printing for us.
However, I'm looking for some advice and suggested tools that we could use to implement a pure Java approach. Currently our application creates HTM files with all the required information. So ideally we would like to take these HTM files and "Convert" them into Docx/XLS-FO format.
[Question]
So my question that I'm hoping you will be able to help me with.
What is the best tools that I can use to get from
HTM to Docx
HTM to PDF
Or what would be the best process for achieving this? has anyone had success finding a solution for this in the past?
Thank You
It depends on the level of control and the complexity of the source HTML. There are HTML to FO stylesheets but you might find them wanting for your specific need.
So you could use the Jericho parser to read the HTML and generate FO. Or you generate the target format directly using Apache PDFBox and Apache POI
It all boils down to the level of control you want/need
docx4j-ImportXHTML will get you from XHTML to docx. From there, you can use docx4j (or some other solution eg LibreOffice/OpenOffice) to do docx to PDF.
docx4j supports docx to XSL FO, and by default uses FOP.
Is there a way to convert a xls file into a pdf?
I want to make a dynamical report directly to pdf file, but didn't find a way to make dynamic columns on iReport, so I've made a method on Java that exports to xls dynamically.
So I was wondering if is there a way to convert this file to pdf, but it need to be on a method from my code. Or if you have a better idea, it can be used too.
Maybe there's a way to make this pdf file from my code as I did with xls. Please help me out.
Thanks.
Try using iText http://itextpdf.com/ - I've used it to create PDFs with columnar structure.
In addition to using iText directly, there are a couple report engines that sit on top of it:
Eclipse BIRT (using 2.1.7, the last MPL/LGPL version)
Jasper Reports (which uses a Very Old version, 1.3.1 IIRC)
This is a commercial solution:
http://www.dancrintea.ro/xls-to-pdf/
If you want open source try jakarta POI.
Try using the Muhimbi PDF Converter Services. It comes with a Java compatible Web Services based interface and sample code. It does other things as well.
I worked on this application, so the usual disclaimer applies.
I am very new to java. I am trying to fetch some data from a database and the result set is displayed in excel. I am able to inetract with database. But how should I go ahead for inserting data into excel sheet.Its simple Java program and in future I would like to generate files in other format say PDF, doc etc.
I am looking for an approach with lesser load on CPU, faster.
thanks in advance for help.
Just spit out a CSV file. It's lightweight and portable. You can grab a csv writer from Apache commons I think but spending the 10 min it would take to write one might be a good learning exercise as well.
If you want a real solution where you want different outputs (eg excel, pdf, rich text etc) then use a reporting tool. There are plenty of opensource tools like ireport which will let you create a template then write a simple java app that renders that to pdf, excel etc. Otherwise you will end up doing it by hand. It's a bit heavyweight but anything more than trival tabular output will be easier.
Apache POI is for you in this case, but you will find it a little bit overwhelmed if you just need to write/read data from an excel file.
Try jExcel instead, the API is simple and straightforward, you can also manipulate sheets within an excel workbook.
The easiest and standard way of doing this is to use POI library:
http://poi.apache.org/spreadsheet/quick-guide.html
The new xlsx format is based on Open-XML and would provide a method of generating these files without any dependency on Microsoft-office COM libraries -- the same could done for docx and pptx formats later, as well as other open-Xml formats like EPUB.
The Apache POI project looks like it might provide one possible solution. There's also an article on the MSDN interop blog that discusses this in some detail.
The key words you should google for are OLE and DDE.
Though, Java is not the best language for interface Microsoft's software.
for generating excel i think you should try simreport in jsimreport.com. in my opinion, it's quite simple to make an excel report, it uses the excel sheet to generate report so easy to config and visually
I am using PDF documents for various purposes using iText library.
Its like one class per PDF document. In a way there are a lot of similarities among the classes and the same have been listed below:
The fields have (x,y) location
The field can be wrapped after some no. of words
A field can have a value which is a function of one or more parameters
Subsequent page of PDF has to kept same or different
I am thinking of doing this layout business through a XML file. Any thoughts or innovative ideas of solving this are welcome.
take a look at PDFBox Library which is now in the incubator of Apache
PDFBox is nice, Used it before and good good help from the developer. You might want to have a look at XSL:FO. It is an XML based formatting language that can output the result as PDF (and other formats) using Apache:FOP.
What about Prince? It's a FOP engine that uses CSS files as styling, and has a Java API. It's not free though (apart from the free Personal License)
Flying Saucer supports using XHTML/CSS to create PDFs.
Does anybody know of a open source Java library that will do robust diffing of the text parts of pdf files?
Ideally I would like something that would produce a diff in the form of a patch.
Extract the pdf text with http://incubator.apache.org/pdfbox/ and create a diff with http://code.google.com/p/google-diff-match-patch.
If the PDFs are different only in text, you could also rasterize the pages and then look at the differences that way - we use that for regression testing output on our PDF code.
You can take a look of xdiffweb.com. It's a pure java opensource project based on apache pdfbox.