I have a .docx template with fields defined in it. I need to take data inputted by a user in a web-service and insert it into those fields using Java.
My team and I have been researching this for most of the day, and we have been unable to find a straightforward solution to this.
Is there a way to do this relatively easily?
Thanks.
EDIT:
After pressing alt+F9, all of the fields display like this: { FORMTEXT }
POI doesn't seem to have sufficient support to do this.
I was unable to successfully set up the Open Office SDK in Windows XP because I couldn't fulfill all of its dependencies.
docx4j may work, but MailMerger in it is currently not filling the fields in with the given data.
If I extract the docx and open the word/document.xml file, this is what the XML around one field looks like: http://pastebin.com/uXBtz7X5 (search for FieldName and FieldValue to see where these are defined)
Have a look at Docx4j which you can use to update fields in docx documents there is also and example
fieldupdater example
Disclosure: my company sponsors docx4j
Have a look at MailMerger; see the main method at the bottom.
For fields of other types, you can try the more generic field support.
The docx format is a zip file, with XML and other files inside. You may be able to edit the XML files using standard XML tools.
Docmosis and JODReports might help you - they are Java libraries for producing documents / populating templates in several formats. Docmosis can work with DocX and since they are based on the same techologies JODReports probably can too. I don't know if the particular {FORMTEXT} field is going to work, but Docmosis can work with plain-text files or Word's merge fields which look like {MERGEFIELD} when you press ALT-F9.
Related
I have tried with PDFTextStripperByArea and PDPageContentStream classes to extract the number values from my pdf file. They work fine!
But my requirement is to use PDFTable or PDFTableExtractor class to read the pdf contents. Can you tell me what is the maven dependency and jar file I need to use to access the above said classes?
Also mention the required methods to get the values from a particular position.
I have another doubt. Can we extract the table formatted data from PDF file as it is? I meant the data with rows and columns with table lines. If a page contains some text and a table, can we just read only the table headers and the rows? I have uploaded my page in GitHub. Click here! From that image, I only need the values of Gross premium, GST and Total Payable. Please let me know whether it's possible
First, don't use classes from packages com.lowagie
That code is old, obsolete and no longer supported. Furthermore, this code belonged to the very early version of iText.
Afterwards a thorough investigation was done into the intellectual property rights of all the code (since iText has had a lot of contributors). When you use the old code, you may (unknowingly) be using code for which you do not have the copyright.
Second, if you just want to solve the problem of extracting numbers and tables from a PDF document, have a look at pdf2Data. It's an iText add-on that makes things a lot easier.
It gives you a nice UI, where you can build templates for data extraction. Then you can call a single method to match an existing (XML) template against an input PDF document, and you'd get a datastructure that contains all the information about the match.
http://pdf2data.online/
PDFTable
I have found two PDFTable classes:
com.lowagie.text.pdf.PdfPTable
com.itextpdf.text.pdf.PdfPTable
Documentation of both of this class (this may help you to learn the methods you need):
https://www.coderanch.com/how-to/javadoc/itext-2.1.7/com/lowagie/text/pdf/PdfPTable.html
http://itextsupport.com/apidocs/itext5/5.5.9/com/itextpdf/text/pdf/PdfPTable.html
If you want to use this classes, you can copy the dependency to your pom.file from:
https://mvnrepository.com/artifact/com.itextpdf/itextpdf
https://mvnrepository.com/artifact/com.lowagie/itext - As mentioned in this link, This artifact was moved to com.itextpdf
Examples of how to use this classes you may found here:
https://developers.itextpdf.com/examples/itext-action-second-edition/chapter-4
https://www.programcreek.com/java-api-examples/index.php?api=com.lowagie.text.pdf.PdfPTable
I am trying to make some existing PDF's into templets.
Because these documents hold real data I am replaceing this data such as names and addrsss and making them into dummy place holders.
Examples
[[Name]]
[[Address1]]
When I alter the text via the iText version 5 library replace via a program I can use the template.
To speed things up I tried to use Adobe DC.
When using this method the template stops working.
Any ideas?
From what I understand of your question;
you have (or want to have) a template document
fill in the template with data from a program
turn this back into a pdf
You can easily achieve some of your goals with iText.
I suggest you look into http://developers.itextpdf.com/examples/form-examples/clone-filling-out-forms
I've a need to update an existing FDF file, progamatically from server side. For this I'm looking for a Java library which we can manupuate an existing FDF file. I've tried out libraries from iText and Adobe so far. It seems like iText's FDFWriter will allow you just to create a new FDF file and will not help you to update an existing one.
With Adobe's FDFDoc class I somehow managed to update a FDF file, but this API seems to be very old and looks ugly (Method names and field names are not very much elegant and does not follow the camel notation.). My questions is whether there is a known better library?
P.S. : FDF is a data format to collect input data from editable PDF forms
FDF is a simple structured text file, which means that you should be able to do simple text manipulation to modify it. You may actually even be able to create an XSLT (even if FDF it is not XML).
Adobe has the FDFToolkit, which is quite old, but there have been no changes in the FDF format for a long time. Although the FDFToolkit reads and writes FDF, it brings advantages only for reading; writing is sol basic that you don't really need a specific library…
here is my issue.
I need to read an RTF document and render to a webpage (some sort of google docs) but these documents are templates, the idea, is that user can only edit certain text and not the text that is marked to be "template logic".
So far I've seen a bunch of RTF libraries that performs only rendering but wont let you access an object that can be iterated dynamically to go over the structure of the RTF document.
My idea is to determine what can be editable and what can not, put all that info (images, text, tables, headers, footers) into a json and send it to my JS client.
Maybe this is a crazy idea, any suggestions?
When I read "template", I think "Velocity". I wonder if you can solve this by separating template from dynamic data. I wonder if you can solve this by letting users modify dynamic data and only marry it with the static, unedited template at the last minute.
It's possible that Docmosis can help because it lets you use documents and templates and you can extract from Docmosis an "analysis" of the template (eg a list of fields). It's hard to be sure if it will fit your purpose though from your description. Please note I work for the company that produces Docmosis.
I use Jericho HTML Parser 3.1.
I need to extract text from html, handle it and according to this, I need to insert tags to original html.
But for this I need matching between extracted text and source html.
net.htmlparser.jericho.TextExtractor extracts text pretty good, but I was not able to find how to find the location in original file.
Is it possible to do so with Jericho-html?
You cann't do this with the TextExtractor as is, but I've needed to do similar things in the past and the simplest solution is to copy Jericho's TextExtractor implementation and edit it to add your own custom behaviour. It's a pretty simple class so you'll be able to easily see where to add your own hooks.