Make existing PDF's in to templets - iText

Make existing PDF's in to templets - iText - java

I am trying to make some existing PDF's into templets.
Because these documents hold real data I am replaceing this data such as names and addrsss and making them into dummy place holders.
Examples
[[Name]]
[[Address1]]
When I alter the text via the iText version 5 library replace via a program I can use the template.
To speed things up I tried to use Adobe DC.
When using this method the template stops working.
Any ideas?

From what I understand of your question;
you have (or want to have) a template document
fill in the template with data from a program
turn this back into a pdf
You can easily achieve some of your goals with iText.
I suggest you look into http://developers.itextpdf.com/examples/form-examples/clone-filling-out-forms

Related

How to read values from a PDF file in java using PDFTable or PDFTableExtractor class?

I have tried with PDFTextStripperByArea and PDPageContentStream classes to extract the number values from my pdf file. They work fine!
But my requirement is to use PDFTable or PDFTableExtractor class to read the pdf contents. Can you tell me what is the maven dependency and jar file I need to use to access the above said classes?
Also mention the required methods to get the values from a particular position.
I have another doubt. Can we extract the table formatted data from PDF file as it is? I meant the data with rows and columns with table lines. If a page contains some text and a table, can we just read only the table headers and the rows? I have uploaded my page in GitHub. Click here! From that image, I only need the values of Gross premium, GST and Total Payable. Please let me know whether it's possible

First, don't use classes from packages com.lowagie
That code is old, obsolete and no longer supported. Furthermore, this code belonged to the very early version of iText.
Afterwards a thorough investigation was done into the intellectual property rights of all the code (since iText has had a lot of contributors). When you use the old code, you may (unknowingly) be using code for which you do not have the copyright.
Second, if you just want to solve the problem of extracting numbers and tables from a PDF document, have a look at pdf2Data. It's an iText add-on that makes things a lot easier.
It gives you a nice UI, where you can build templates for data extraction. Then you can call a single method to match an existing (XML) template against an input PDF document, and you'd get a datastructure that contains all the information about the match.
http://pdf2data.online/

PDFTable
I have found two PDFTable classes:
com.lowagie.text.pdf.PdfPTable
com.itextpdf.text.pdf.PdfPTable
Documentation of both of this class (this may help you to learn the methods you need):
https://www.coderanch.com/how-to/javadoc/itext-2.1.7/com/lowagie/text/pdf/PdfPTable.html
http://itextsupport.com/apidocs/itext5/5.5.9/com/itextpdf/text/pdf/PdfPTable.html
If you want to use this classes, you can copy the dependency to your pom.file from:
https://mvnrepository.com/artifact/com.itextpdf/itextpdf
https://mvnrepository.com/artifact/com.lowagie/itext - As mentioned in this link, This artifact was moved to com.itextpdf
Examples of how to use this classes you may found here:
https://developers.itextpdf.com/examples/itext-action-second-edition/chapter-4
https://www.programcreek.com/java-api-examples/index.php?api=com.lowagie.text.pdf.PdfPTable

How to manipulate a Check box in Word with Java and save as PDF?

I need to edit some Check-Boxes in a big Wordfile (docx) and save this then as PDF. This file contains many images and is about 19MB big.
Maybe there will be the need of adding some Checkbox and text.
My idea was to use docx4j, but before to learn the ropes I want to ask if this is possible and which is the best way.
May it be better to save the document as a PDF and then use this as base for processing?

Yes, you can manipulate checkboxes using docx4j.
Be aware that there are several different kinds of checkboxes:
legacy checkbox
content control checkbox
checkbox character
and the details depend on which type are present.
For more, you should post a snippet of the relevant OpenXML (and as they say here on SO, code showing what you've tried).

Is it necessary to use only docx4j?
Recently i tried a solution that helps me manage a Word document with checkboxes and save it as a PDF file. I used Plumsail Documents. The case is about how to populate a Word template using a form with checkboxes. You can connect your app via Zapier or Power Automate to activate checkboxes depending on value from your app. You can set the resulting file as a PDF and deliver it by email and across any system using Zapier and Power Automate.
The great is that Plumsail Documents has a templating engine that allows it to operate pictures.
Your case may be like this:
Create a form in Plumsail Form. It will allow you to activate checkboxes depending on your needs, or your users' needs.
Create a process in Plumsail Documents, upload your Word document and set it as a template. Just put placeholders where you want to change or fill a document with some values or data. Set the resulting document in PDF format.
Set the delivery method. Save across apps or deliver by email.
I recommend you to read the article. That solution is not free, but there is a free 30-day trial, so you will have enough time to try it.

RTF Java Parser

here is my issue.
I need to read an RTF document and render to a webpage (some sort of google docs) but these documents are templates, the idea, is that user can only edit certain text and not the text that is marked to be "template logic".
So far I've seen a bunch of RTF libraries that performs only rendering but wont let you access an object that can be iterated dynamically to go over the structure of the RTF document.
My idea is to determine what can be editable and what can not, put all that info (images, text, tables, headers, footers) into a json and send it to my JS client.
Maybe this is a crazy idea, any suggestions?

When I read "template", I think "Velocity". I wonder if you can solve this by separating template from dynamic data. I wonder if you can solve this by letting users modify dynamic data and only marry it with the static, unedited template at the last minute.

It's possible that Docmosis can help because it lets you use documents and templates and you can extract from Docmosis an "analysis" of the template (eg a list of fields). It's hard to be sure if it will fit your purpose though from your description. Please note I work for the company that produces Docmosis.

What technologies are there for formatted, structured data input and output?

I am working on a project here that ingests internal resumes from people at my company, strips out the skills and relevant content from them and stores it in a database. This was all done using docx4j and Grails. This required the resumes to first be submitted via a template that formatted everything just right so that the ingest tool knew what to look for to strip the data.
The 2nd portion of this, is what if we want to get out a "reduced" resume from the database. In other words, I want to search the uploaded content I now have, and only print out new resumes for people who have Java programming experience lets say. So I can go into my database, find the people who originally had java as a skill, and output a new set of resumes that are also still in a nice templated format, and only have the relevant info in them, instead of ALL the content.
I have been writing some software to do this in Java that will basically use a docx template, overwriting the items in customXML which are bound to the content controls in the doc, so the new data shows up and can eb saved as a new docx with that custom data.
This seems really cumbersome to me, and has some limitations. For one, lets say my template has a place for 3 Skills, and the particular person has 8 skills. There seems to be no good way to add those 5 additional skills to the docx other than painstakingly inserting the data with all of the formatting XML tags and such. This is a real pain, because if the template changes, I dont want to have to go back into my software and edit source code to change that additional data input XML tag to bold instead of italic.
I was doing some reading up on using Infopath to create a form that I could use to get the input, connecting to some sharepoint data source or something to store the stripped out data. However, I can't seem to find out if it is possible using sharepoint to get the data back out, in a nice formatted way. What would the general steps for this be? It seems like I couldnt find very much about this topic with any quick googling.
Thanks

You could set up the skills:
<skills>
<skill>..</skill>
<skill>..</skill>
and use a "repeat" content control pointing to the container. This would handle any number of <skill> entries.

Format Java Code into Word / RTF

I need to format java code to put into a Word document. Are there any programs that will do this with keyword highlighting, etc. ?

When I copy/paste from my IDE (Eclipse), the formatting comes along for the ride.
You'll probably want to turn off "Mark Occurrences" first.

This is a late reply but since it's quite a specific requirement I'll post my comment anyway.
You can do this programmatically with Docmosis assuming you want the program to be running in Java (not just showing java in documents) and can install OpenOffice where the program runs. The process would be:
Create a doc or odt file that will
act as a template (setting fonts,
position, tables etc) and will have
a placeholder for where you want to
insert the code sample
Add docmosis to your java project
and write the code to initialise
Docmosis, register the template,
then render document with your
selected Java code.
Currently, Docmosis FieldRenderers
can underline or italicize your data
as it goes, but the rendering is
currently applied to the entire
field. So this wouldn't let you
have a single field for all your
java text and individually highlight
words, but there are a few other
tricks that you could employ to get
useful/interesting results (such as
splitting your data into separate
fields and letting Docmosis render
the fields differently).
The "java code" text that you specify as data will be inserted into your template using the font and layout properties in the template. The renderer will have a chance to override specific formatting.

You can just copy and then paste it to the word document. I am using OS X as well. I just works fine. I am uploading the screenshot of how it looks in word.

I'm using Easy Code Formatter as called out here: How do you display code snippets in MS Word preserving format and syntax highlighting?
It's an Office add-in. You can select multiple themes, enable / disable line numbering / highlight lines in rectangles. It allows you to select the coding style / and has a quick formatting button. Pretty neat.
Requires you to have Office 2013 or beyond.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.