How to fill XFA (PDF) form automatically?

How to fill XFA (PDF) form automatically? - java

I'm looking for a free option to fill a XFA PDF form. I know that iText is an option but their commercial prices are too expensive for me, I'd prefer something fully open-source. There is PDFBox but it doesn't seem to allow for inserting data into XFA forms, or at least there's very little explaining how to.
I simply need to fill in certain fields in my form with certain data in a text file. Can you recommend or direct me to a solution? Thank you greatly

XFA is an enterprise product, with enterprise pricing…
XFA consists of an XML part and a PDF part which represents kind of what is in the XML part.
You may try to fill data into the XML part, and then see whether Acrobat/Reader can render it properly. No guarantee that it works, but worth a try. If that does work, you'd have shifted the problem to something maybe a bit more managable.

Related

Insert xml into a pdf

I need to insert an .xml in a .pdf (and then later digitally sign all).
The implementation guide that I'm following tell me to insert my xml data populating the structure called "XFAResources", as you can see in this example (my data is contained within the ClinicalDocument block):
<Xfa: datasets xmlns: xfa = "http://www.xfa.org/schema/xfa-data/1.0/">
<Xfa: date>
<ClinicalDocument xmlns = "urn: hl7-org: v3" ...>
...
</ClinicalDocument>
</Xfa: date>
</Xfa: datasets>
Could someone give me some information about how can I do it?
Both the xml that the PDF was generated by me, the second using iText5.
--- Updating ---
My need is to perform the insertion, injection, an XML in a PDF, I have to do this work for a thesis project and the only “restriction” that was given to me is the programming language, Java precisely, for the rest I have total freedom.
So I, using Eclipse, I created / saved my XML and using iText I created / saved PDF, which is not an XFA form, with the content and formatting I decided, now I have to make an injection.
The teacher has also informed me this morning that it is not necessary the final signature of the file.
Can you then recommend a way to simply make an injection? I believe that through iText is possible but unfortunately I can not understand how.
Thank you in advance for availability.

You are referring to XFAResources. There is no such thing as XFAResources in ISO-32000-1 (aka the current PDF specification). XFAResources was introduced in one of the early drafts of ISO-32000-2 (aka PDF 2.0, to be released later this year), but it was removed during the development of the new specification. It's not in the FDIS (Final Draft International Standard), which means it will never be part of an official specification.
In short: the implementation guide that you are following is wrong. It would help if (1.) you shared that implementation guide, and (2.) asked its author to adapt it so that it reflects official standards.
As for your problem:
I assume that you have an XFA form that was created using Adobe LiveCycle Designer, and that you want to inject XML into this form. This question has been asked answered before: How to fill out a pdf file programmatically? (Dynamic XFA)
You want to sign the form, and signing XFA forms is described in PAdES-5: PAdES for XML Content – Profiles for XAdES signatures of XML content in PDF files. This was implemented in iText, but unfortunately, I don't know of any other implementations. As far as I remember, Adobe has a completely different implementation which makes it impossible to verify a PAdES-5 created with iText in Adobe Reader (please check with Adobe if this is still true). No other viewer supports PAdES-5, and XFA will be deprecated in PDF 2.0 (which significantly reduces the chance that any vendor will invest in further development).
You don't want to sign the form, but you want to sign the flattened form. In that case, you first need to convert the XFA into real PDF. An XFA form is nothing but XML wrapped inside a PDF structure, and you need software such as Adobe LiveCycle ES or iText's XFA Worker to convert that XML to regular PDF. This is very specialized software that is not available as open source software. Given the fact that XFA will be deprecated, there is very little chance that anyone will ever do the effort to create such software as open source software. However, if you buy either Adobe LiveCycle ES or iText's XFA Worker, you will be able to create a document that can be signed using a PAdES-2 (ISO-32000-1 style) or PAdES-3 (new in ISO-32000-2) signature.
You asked your fellow subscribers on StackOverflow "Could someone give me some information about how can I do it?" I hope the above information already points you towards the relevant specifications (ISO-32000-1 and -2, PAdES-2, -3, and -5), but unfortunately, it is impossible to give you a code sample, as your question isn't accurate enough for the following reasons:
Your assumption that you need to populate a structure called XFAResources is based on an implementation guide that refers to a structure that never made it into an official standard.
You aren't entirely clear about the nature of the template: is it really an XFA form? If so, why? XFA forms are about to become obsolete.
You aren't entirely clear about the nature of the signed result: does the document need to remain an XFA form (PAdES-5), or is it OK for the document to be flattened?

Replacing placeholders using iText in Java

I have a PDF that contains placeholders like <%DATE_OF_BIRTH%>, i want to be able to read in the PDF and change the PDF placeholder values to text using iText.
So read in PDF, use maybe a replaceString() method and change the placeholders then generate the new PDF.
Is this possible?
Thanks.

The use of placeholders in PDF is very, very limited. Theoretically it can be done and there are some instances where it would be feasible to do what you say, but because PDF doesn't know about structure very much, it's hard:
simply extracting words is difficult so recognising your placeholders in the PDF would already be difficult in many cases.
Replacing text in PDF is a nightmare because PDF files generally don't have a concept of words, lines and paragraphs. Hence no nice reflow of text for example.
Like I said, it could theoretically work under special conditions, but it's not a very good solution.
What would be a better approach depends on your use case:
1) For some forms it may be acceptable to have the complete form as a background image or PDF file and then generate your text as an overlay to that background (filling in the blanks so to speak) As pointed out by Bruno and mlk in comments, in this case you can also look into using form fields which can be dynamically filled.
2) For other forms it may be better to have your template in a structured format such as XML or HTML, do the text replacement in that format and then convert it into PDF.

RTF Java Parser

here is my issue.
I need to read an RTF document and render to a webpage (some sort of google docs) but these documents are templates, the idea, is that user can only edit certain text and not the text that is marked to be "template logic".
So far I've seen a bunch of RTF libraries that performs only rendering but wont let you access an object that can be iterated dynamically to go over the structure of the RTF document.
My idea is to determine what can be editable and what can not, put all that info (images, text, tables, headers, footers) into a json and send it to my JS client.
Maybe this is a crazy idea, any suggestions?

When I read "template", I think "Velocity". I wonder if you can solve this by separating template from dynamic data. I wonder if you can solve this by letting users modify dynamic data and only marry it with the static, unedited template at the last minute.

It's possible that Docmosis can help because it lets you use documents and templates and you can extract from Docmosis an "analysis" of the template (eg a list of fields). It's hard to be sure if it will fit your purpose though from your description. Please note I work for the company that produces Docmosis.

How to create templates from html page automatically?

I have a use case in which I need to render an unformatted text in the format of a given web page programmatically in Java. i.e. The text should automatically be formatted like the web page with styles, paragraphs, bullet points etc.
As I see first I will have to analyze the piece of unformatted text to find out the candidates for paragraphs, bullet points, headings etc. I intend to use Lucene analyzers/tokenizers for this task. Are there any alternatives?
The second problem is to convert the formatted web page into some kind of template (e.g. velocity template) with place holders for various entities like titles, bullet points etc.
Is there any text analysis/templating library in Java that can help me do this? Preferably open source.
Are there any other suggestions for doing this sort of task in a better way in Java?
Thanks for your help.

There are a lot of hard parts to what you're doing.
The user input
If you don't ask your user to provide any context, you're never going to guess the structure of the text. At least, you should ask them to provide a title, and a series of paragraph in your GUI.
Ideally, you could ask them to follow a well-know markup language (Markdown, Textile, etc...) and use the open source parser to extract the structure.
The external page
If any page is used, the only things you can rely on are the "structural markup". So assuming you know the title of the page should be "Hello World", and there is a "h1" element somewhere in the page, you can maybe assume that this is where the header could go.
But if the pages is a div tag-soup, and only CSS is used to differentiate the rendering of the header as opposed to the bulk of the text, you're going to have to guess how the styling is done : that's plain impossible if you don't know how the page is made.
I don't think Lucene would help fo this (as far as I know Lucene is made to create an index of the words used in a bulk of text ; I don't think it can help you guessing which part of the text is meant to be a title, a subtitle, etc...)
Generating templates from external page
Assuming you have "guessed" right, you could generate the content by
copy pasting the page
replacing the parts to change with tags of your template language of choice
storing the template somewhere the templating system can access it
configure your template / view system (viewResolver for velocity) to use the right template for the rigth person
That would of course pose terrible legal questions, since your templates would incorporate works by the original website author (most probably copyrighted material)
A more realistic solution
I would suggest you constrain your problem to :
using input that has some structure information available (use a GUI to enter it, use a markup language, whatever)
using templates that you provide, know the structure of (and can reuse very easily)
Note that none of those points are related to the template system.
Otherwise, I'm afraid you're heading to an unreasonnable amount of work...

How to optimize the HTML text copied from MS Word with GWT?

I'm having a problem with RichTextAreas, so my problem is:
when i paste into RichTextArea the copied text from Ms Word or OpenOffice,it keeps all text styles and this is perfect, But one bad thing is it's HTML text is huge enough :( .
And database's size increasing because of unnecessary HTML tags.
My question is:"How to optimize that HTML text easily?"
Thanks!!!

RichTextArea is based on the browser's contentEditable support. This means that the HTML "tag soup" that you'll wind up with is going to be platform-, source-, and browser-specific. When you say "optimize" what's your end goal? How much of the original formatting do you want to preserve? Beyond just trivial minification of the HTML that's being pasted in, any significant reduction in the complexity of the HTML will likely result in a loss of visual fidelity.
Utilities such as HTML Tidy or any of its derivatives can probably help you with the minification aspect. If your goal is to reduce the complexity of the HTML, you might consider using HTMLUnit as a captive, server-side browser to render the pasted content in memory and then extract the attributes that you consider useful from HTMLUnit's DOM. FWIW, this is one way to make AJAX apps crawlable by search engines.
While reducing visual fidelity can be a little disconcerting to the original user, it does afford you the opportunity to unify the visual style of all pasted content. If you're building a site based on contributions from many users, this homogeneity decreases the amount of mental effort required to orient (i.e. see what you're seeing) the content.

Finally, i figured out the answer for my own question:
I found TinyMCE for GWT good enough for me, it has copy from ms word option and its HTML optimization is excellent .

Related question
Html Tidy has an API you can use in Java programs.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.