Insert xml into a pdf

Insert xml into a pdf - java

I need to insert an .xml in a .pdf (and then later digitally sign all).
The implementation guide that I'm following tell me to insert my xml data populating the structure called "XFAResources", as you can see in this example (my data is contained within the ClinicalDocument block):
<Xfa: datasets xmlns: xfa = "http://www.xfa.org/schema/xfa-data/1.0/">
<Xfa: date>
<ClinicalDocument xmlns = "urn: hl7-org: v3" ...>
...
</ClinicalDocument>
</Xfa: date>
</Xfa: datasets>
Could someone give me some information about how can I do it?
Both the xml that the PDF was generated by me, the second using iText5.
--- Updating ---
My need is to perform the insertion, injection, an XML in a PDF, I have to do this work for a thesis project and the only “restriction” that was given to me is the programming language, Java precisely, for the rest I have total freedom.
So I, using Eclipse, I created / saved my XML and using iText I created / saved PDF, which is not an XFA form, with the content and formatting I decided, now I have to make an injection.
The teacher has also informed me this morning that it is not necessary the final signature of the file.
Can you then recommend a way to simply make an injection? I believe that through iText is possible but unfortunately I can not understand how.
Thank you in advance for availability.

You are referring to XFAResources. There is no such thing as XFAResources in ISO-32000-1 (aka the current PDF specification). XFAResources was introduced in one of the early drafts of ISO-32000-2 (aka PDF 2.0, to be released later this year), but it was removed during the development of the new specification. It's not in the FDIS (Final Draft International Standard), which means it will never be part of an official specification.
In short: the implementation guide that you are following is wrong. It would help if (1.) you shared that implementation guide, and (2.) asked its author to adapt it so that it reflects official standards.
As for your problem:
I assume that you have an XFA form that was created using Adobe LiveCycle Designer, and that you want to inject XML into this form. This question has been asked answered before: How to fill out a pdf file programmatically? (Dynamic XFA)
You want to sign the form, and signing XFA forms is described in PAdES-5: PAdES for XML Content – Profiles for XAdES signatures of XML content in PDF files. This was implemented in iText, but unfortunately, I don't know of any other implementations. As far as I remember, Adobe has a completely different implementation which makes it impossible to verify a PAdES-5 created with iText in Adobe Reader (please check with Adobe if this is still true). No other viewer supports PAdES-5, and XFA will be deprecated in PDF 2.0 (which significantly reduces the chance that any vendor will invest in further development).
You don't want to sign the form, but you want to sign the flattened form. In that case, you first need to convert the XFA into real PDF. An XFA form is nothing but XML wrapped inside a PDF structure, and you need software such as Adobe LiveCycle ES or iText's XFA Worker to convert that XML to regular PDF. This is very specialized software that is not available as open source software. Given the fact that XFA will be deprecated, there is very little chance that anyone will ever do the effort to create such software as open source software. However, if you buy either Adobe LiveCycle ES or iText's XFA Worker, you will be able to create a document that can be signed using a PAdES-2 (ISO-32000-1 style) or PAdES-3 (new in ISO-32000-2) signature.
You asked your fellow subscribers on StackOverflow "Could someone give me some information about how can I do it?" I hope the above information already points you towards the relevant specifications (ISO-32000-1 and -2, PAdES-2, -3, and -5), but unfortunately, it is impossible to give you a code sample, as your question isn't accurate enough for the following reasons:
Your assumption that you need to populate a structure called XFAResources is based on an implementation guide that refers to a structure that never made it into an official standard.
You aren't entirely clear about the nature of the template: is it really an XFA form? If so, why? XFA forms are about to become obsolete.
You aren't entirely clear about the nature of the signed result: does the document need to remain an XFA form (PAdES-5), or is it OK for the document to be flattened?

Related

Is there any way to convert html file to IN-MEMORY File as PDF in Java?

I have been given an HTML file and wanted to convert it into an in-memory PDF file. During the conversion, I don't want to use any external location for this. All I want is to keep it in-memory.
So far, I have already tried some Java libraries for the conversion but all of them always create a temporary file in a location and then read/write from it. I don't want any I/O operation during the conversion.

The HTMLWorker class was deprecated many years ago. The goal of HTMLWorker was to convert small, simple HTML snippets to iText objects. It was never meant to convert complete HTML pages to PDF, yet that was how many developers tried to use it. This caused plenty of frustration because HTMLWorker didn't support every HTML tag, didn't parse CSS files, and so on. To avoid this frustration, HTMLWorker was removed from recent versions of iText.
In 2011, iText Group released XML Worker as a generic XML to PDF tool, built on top of iText 5. A default implementation converted XHTML (data) and CSS (styles) to PDF, mapping HTML tags such as
<p>
,
<img>
, and
<li>
to iText 5 objects such as Paragraph, Image, and ListItem. We don't know of any implementations that used XML Worker for any other XML formats, but many developers used XML Worker in combination with jsoup as an HTML2PDF converter.
XML Worker wasn't a URL2PDF tool though. XML Worker expected predictable HTML created for the sole purpose of converting that HTML to PDF. A common use case was the creation of invoices. Rather than programming the design of an invoice in Java or C#, developers chose to create a simple HTML template defining the structure of the document, and some CSS defining the styles. They then populated the HTML with data, and used XML Worker to create the invoices as PDF documents, throwing away the original HTML. We'll take a closer look at this use case in chapter 4, converting XML to HTML in memory using XSLT, then converting that HTML to PDF using the pdfHTML add-on.
When iText 5 was originally created, it was designed as a tool to produce PDF as fast as possible, flushing pages to the OutputStream as soon as they were finished. Several design choices that made perfect sense when iText was first released in the year 2000, were still present in iText 5 sixteen years later. Unfortunately, some of these choices made it very difficult –if not impossible– to extend the functionality of XML Worker to the level of quality many developers expected. If we really wanted to create a great HTML to PDF converter, we would have to rewrite iText from scratch. Which we did.
In 2016, we released iText 7, a brand new version of iText that was no longer compatible with previous versions, but that was created with pdfHTML in mind. A lot of work was spent on the new Renderer framework. When a document is created with iText 7, a tree of renderers and their child-renderers is built. The layout is created by traversing that tree, an approach that is much better suited when dealing with HTML to PDF conversion. The iText objects were completely redesigned to better match HTML tags and to allow setting styles "the CSS way."
For instance: in iText 5, you had a PdfPTable and a PdfPCell object to create a table and its cells. If you wanted every cell to contain text in a font different from the default font, you needed to set that font for the content of every separate cell. In iText 7, you have a Table and Cell object, and when you set a different font for the complete table, this font is inherited as the default font for every cell. That was a major step forward in terms of architectural design, especially if the goal is to convert HTML to PDF.
But let's not dwell on the past, let's see what pdfHTML can do for us. In the first chapter, we'll take a look at different variations of the convertToPdf()/ConvertToPdf() method, and we'll discover how the converter is configured.

This is the solution for generating HTML to pdf that works for me:
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
ITextRenderer renderer = new ITextRenderer();
renderer.setDocumentFromString(html);
renderer.layout();
renderer.createPDF(outputStream);
outputStream.close();
MimeBodyPart att = new MimeBodyPart();
ByteArrayDataSource bds = new ByteArrayDataSource(outputStream.toByteArray(), "application/pdf");
att.setDataHandler(new DataHandler(bds));
att.setFileName("example.pdf");
}

How to read values from a PDF file in java using PDFTable or PDFTableExtractor class?

I have tried with PDFTextStripperByArea and PDPageContentStream classes to extract the number values from my pdf file. They work fine!
But my requirement is to use PDFTable or PDFTableExtractor class to read the pdf contents. Can you tell me what is the maven dependency and jar file I need to use to access the above said classes?
Also mention the required methods to get the values from a particular position.
I have another doubt. Can we extract the table formatted data from PDF file as it is? I meant the data with rows and columns with table lines. If a page contains some text and a table, can we just read only the table headers and the rows? I have uploaded my page in GitHub. Click here! From that image, I only need the values of Gross premium, GST and Total Payable. Please let me know whether it's possible

First, don't use classes from packages com.lowagie
That code is old, obsolete and no longer supported. Furthermore, this code belonged to the very early version of iText.
Afterwards a thorough investigation was done into the intellectual property rights of all the code (since iText has had a lot of contributors). When you use the old code, you may (unknowingly) be using code for which you do not have the copyright.
Second, if you just want to solve the problem of extracting numbers and tables from a PDF document, have a look at pdf2Data. It's an iText add-on that makes things a lot easier.
It gives you a nice UI, where you can build templates for data extraction. Then you can call a single method to match an existing (XML) template against an input PDF document, and you'd get a datastructure that contains all the information about the match.
http://pdf2data.online/

PDFTable
I have found two PDFTable classes:
com.lowagie.text.pdf.PdfPTable
com.itextpdf.text.pdf.PdfPTable
Documentation of both of this class (this may help you to learn the methods you need):
https://www.coderanch.com/how-to/javadoc/itext-2.1.7/com/lowagie/text/pdf/PdfPTable.html
http://itextsupport.com/apidocs/itext5/5.5.9/com/itextpdf/text/pdf/PdfPTable.html
If you want to use this classes, you can copy the dependency to your pom.file from:
https://mvnrepository.com/artifact/com.itextpdf/itextpdf
https://mvnrepository.com/artifact/com.lowagie/itext - As mentioned in this link, This artifact was moved to com.itextpdf
Examples of how to use this classes you may found here:
https://developers.itextpdf.com/examples/itext-action-second-edition/chapter-4
https://www.programcreek.com/java-api-examples/index.php?api=com.lowagie.text.pdf.PdfPTable

How to fill XFA (PDF) form automatically?

I'm looking for a free option to fill a XFA PDF form. I know that iText is an option but their commercial prices are too expensive for me, I'd prefer something fully open-source. There is PDFBox but it doesn't seem to allow for inserting data into XFA forms, or at least there's very little explaining how to.
I simply need to fill in certain fields in my form with certain data in a text file. Can you recommend or direct me to a solution? Thank you greatly

XFA is an enterprise product, with enterprise pricing…
XFA consists of an XML part and a PDF part which represents kind of what is in the XML part.
You may try to fill data into the XML part, and then see whether Acrobat/Reader can render it properly. No guarantee that it works, but worth a try. If that does work, you'd have shifted the problem to something maybe a bit more managable.

Best way to update an existing FDF (Form Data Format) file

I've a need to update an existing FDF file, progamatically from server side. For this I'm looking for a Java library which we can manupuate an existing FDF file. I've tried out libraries from iText and Adobe so far. It seems like iText's FDFWriter will allow you just to create a new FDF file and will not help you to update an existing one.
With Adobe's FDFDoc class I somehow managed to update a FDF file, but this API seems to be very old and looks ugly (Method names and field names are not very much elegant and does not follow the camel notation.). My questions is whether there is a known better library?
P.S. : FDF is a data format to collect input data from editable PDF forms

FDF is a simple structured text file, which means that you should be able to do simple text manipulation to modify it. You may actually even be able to create an XSLT (even if FDF it is not XML).
Adobe has the FDFToolkit, which is quite old, but there have been no changes in the FDF format for a long time. Although the FDFToolkit reads and writes FDF, it brings advantages only for reading; writing is sol basic that you don't really need a specific library…

RTF Java Parser

here is my issue.
I need to read an RTF document and render to a webpage (some sort of google docs) but these documents are templates, the idea, is that user can only edit certain text and not the text that is marked to be "template logic".
So far I've seen a bunch of RTF libraries that performs only rendering but wont let you access an object that can be iterated dynamically to go over the structure of the RTF document.
My idea is to determine what can be editable and what can not, put all that info (images, text, tables, headers, footers) into a json and send it to my JS client.
Maybe this is a crazy idea, any suggestions?

When I read "template", I think "Velocity". I wonder if you can solve this by separating template from dynamic data. I wonder if you can solve this by letting users modify dynamic data and only marry it with the static, unedited template at the last minute.

It's possible that Docmosis can help because it lets you use documents and templates and you can extract from Docmosis an "analysis" of the template (eg a list of fields). It's hard to be sure if it will fit your purpose though from your description. Please note I work for the company that produces Docmosis.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.