I need to insert data in some doc template and return it's changed value. I decided to use POI, but if there are other ways to solve my problems I may change the library. I can change the string using Range.replaceText(), but by this way I loose my text formatting, and the text itself turns into plain document with no styles and tables. Are there any ways to replace some characters saving the formatting? I tried RTFTemplate, but it could slightly help me, because it depends on Spring, but I use vaadin in my project.
Thanks in advance
Several years ago I was solving similar problem. The easiest way was to use RTF files as templates, and avoid using any parsing library, because MS Office RTF is not so standatd as you could expect, and any library that tries to "understand" this format tends to loose part of formatting.
So I just opened rtf files as plain text, and searched for my keywords within it. There was a problem when this keywords were splitted into several parts, divided by some non meaningful parts.
I will search my delphi sources and will try to port it to java later this week.
Related
I am currently working at a project which generates contracts. The idea is that I put the data in a form and save it in a simple database.
So long, this was my favorite place to search for good ideas and simple solutions.
Now I am facing another problem and I don't know how I can solve that. I want to create a PDF and replace some placeholders with some data from my form.
One idea was, that I use an existing Word template with some bookmarks and replace them with the data from my form. Maybe there is a way to do that, and I am just too stupid to find it.
Another idea was, that I am using XML. Therefore, I thought I was clever and just converted the Word template to an PDF, so I am able to convert that PDF to an XML. Attached, you find the XML file. But now I need the XSL file - is there an easy way to create the XSL file?
Or maybe there is another simple solution to solve my problem.
In these attachments you find the PDF file, the Word template and the XML:
Thank you a lot :)
Using a template is a good idea - it makes some changes much quicker to make and then deploy. The comments above are focused on conversion, but don't forget you need to merge your data in (population) first.
If you can use Adobe tools, you can have a PDF template and use the Adobe tools to populate. This saves a "conversion" stage.
You mentioned using Word for templates. This means you to run through two stages of processing:
population - docx is a zipped set of XML files - so you can process them with your own code or using a library.
conversion - you need pdf, so you have to convert the docx to pdf. You also have to watch out for fonts at this stage (ie make sure they are available on your host).
The population stage you could do yourself since you are familiar with XML. But it is definitely complicated. The conversion needs to use a tool that is ideal for it. There are a few mentioned in the comments already.
There are some free/os and commercial tools that can do both parts:
docx4j
JOD Reports
Libre Office (using the Java Uno API) (I blogged this once - Java Convert Word to PDF with UNO)
Docmosis (please note I work for Docmosis)
I suggest starting with the simple example you have attached and prove you can both populate and convert that. Then switch to a more complicated example to see if you can do the other things that might be required (eg repeating or conditions or other logic) during the population stage.
I know this may sound silly to some of you experienced guys out there but it’s really important for me and my group at school, we need to create a software that allows the user to create a new RTF document from scratch (like an editor where you can center, change font size, style, save, insert picture), it also needs to be able read a docx document with images and format included and save it as a RTF document.
What we have done so far is being able to open the .docx document, extract the text without format and put it into an RTF document out. In other words using docx4j library we have been able to transform a .docx document text to .rtf, no pictures included, no formatting, just plain text surrounded by [ ].
We have made some progress today but we can’t figure out the next steps, considering the delivery date is in 72 hours, I thought it’d be a good idea to ask for help from more experienced people than us.
Please leave your answers or request info about the project, we’ll be glad to learn from you guys
To convert a .docx to .rtf use a library like https://code.google.com/p/jodconverter/. It will do all the heavy lifting for you.
Anyway, now about your editor itself. If I had to do it as fast as I could, I would use JavaFX to make my interface. There is a control called "Rich Text Editor" (http://docs.oracle.com/javafx/2/ui_controls/editor.htm) which you can just put into your application.
The trick here is that you can actually extract the HTML of the editor using getHtmlText(), and then you can the HTML to RTF using... yes, a library. I suspect that jodconverter can do this too, but if not, you can look at this question: Convert HTML to RTF in java?.
This should give you a better idea of how to do your project. There are Java libraries to handle conversion between HTML and RTF, so you can use an HTML editor (provided by JavaFX). And of course, a .docx can be converted to HTML too. Let libraries do all the dirty work :).
I have been given a Microsoft Word Document, with some tables and spots to fill in automatically. I am not sure if this can be done with JAVA, which is my most preferable language.
I am looking for a way to implement a function which I can give the word file to it, and it fills the required spots for me. Is it possible to do it? A hint or a link to a tutorial would definitely suffice. Thanks.
Newer versions Word store documents as zipped XML. Have you filled out the form manually in Word and done a before/after comparison on the XML? Depending on the extent of the changes you could use the standard Java XML APIs to do the same thing programmatically.
A bit of googling and I found docx4j and Apache POI. I haven't used either personally, but it appears that what you're asking for is certainly possible. See this example from the POI SVN repo on how to manipulate tables.
I have an Arabic PDF, and I want to parse it into text document using Java. I have tried many times, and the English words parse successfully but the Arabic words don't.
Can anyone recommend a solution that will convert the Arabic words properly as well?
There are several libraries that come to mind. Apache Tika, iText or pdfbox will all more or less solve your problem. Although, I must put in a word for Tika, as it supports language detection, and can also handle other document types too.
I think you can use iText for pdf manipulation using Java. It supports Arabic too.
Anyone know if it is possible?
And got any sample code for this?
Or any other java API that can do this?
The Office 2007 format is based on XML and so can probably be written to using XML tools. However there is this library which claims to be able to write DocX format word documents.
The only other alternative is to use a Java-COM Bridge and use COM to manipulate word. This is probably not a good idea though - I would suggest finding a simpler way.
For example, Word can easily read RTF documents and you can generate .rtf documents from within Java. You don't have to use the Microsoft Word format!
As others have said POI isn't going to allow you to do anything really fancy - plus it doesn't support Office 2007+ formats. Treating MS Word as a component that provides this type of functionality via COM is most likely the best approach here (unless you are running on a non-Windows OS or just can't guarantee that Word will be installed on the machine).
If you do go the COM route, I recommend that you look into the JACOB project. You do need to be somewhat familiar with COM (which has a very steep learning curve), but the library works quite well and is easier than trying to do it in native code with a JNI wrapper.
If you are using docx, you could try docx4j.
See the AddImage sample
Surely:
Take a look at this: http://code.google.com/p/java2word
Word 2004+ is XML based. The above framework gets the image, convert to Base64 representation and adds it to the XML.
When you open your Word Document, there will be your image.
Simple like this:
IDocument myDoc = new Document2004();
myDoc.getBody().addEle("path/myImage.png"));
Java2Word is one API to generate Word Docs using obviously Java code. J2W takes care of all implementation and XML generation behind the scenes.
As far as can be gathered from the project website: no.
POI's HWPF can extract an MS Word document's text and perform simple modifications (basically deleting and inserting text).
AFAIK it can't do much more than that.
Also keep in mind that HWPF works only with the older MS Word (97) format, not the latest ones.
Not sure if Java out of the box can do it directly. But i've read about a component that can pretty much do anything in terms of automating word document generation without having Word. Aspose Words
JasperReports uses this API alternatively to POI, because it supports images:
JExcelAPI
I didn't try it yet and don't know how good/bad it is.