I am currently developing a Java App for a company to help keep track of client records. At some point, some of the client's information is written down in an Order form and then is translated to several PDF documents. I noticed with PDF's in Adobe Acrobat Reader that input fields have values that can be stored or re-written. I was wondering if it was possible to write information to these input fields with Java? Or if anyone would have any idea of how to do such a feature?
I've seen this done with PHP so I assume it's a similar process in which I would just have to write and translate information from the order form to the input fields. I don't want to spend time recreating the entire PDF file in Java so I was wondering if there were perhaps a way I can access the info through the use of an object or something. Or some kind of hack I don't know about yet....
You can use iText or Apache PDFBox to read your PDFs and write your content.
Related
I am currently working at a project which generates contracts. The idea is that I put the data in a form and save it in a simple database.
So long, this was my favorite place to search for good ideas and simple solutions.
Now I am facing another problem and I don't know how I can solve that. I want to create a PDF and replace some placeholders with some data from my form.
One idea was, that I use an existing Word template with some bookmarks and replace them with the data from my form. Maybe there is a way to do that, and I am just too stupid to find it.
Another idea was, that I am using XML. Therefore, I thought I was clever and just converted the Word template to an PDF, so I am able to convert that PDF to an XML. Attached, you find the XML file. But now I need the XSL file - is there an easy way to create the XSL file?
Or maybe there is another simple solution to solve my problem.
In these attachments you find the PDF file, the Word template and the XML:
Thank you a lot :)
Using a template is a good idea - it makes some changes much quicker to make and then deploy. The comments above are focused on conversion, but don't forget you need to merge your data in (population) first.
If you can use Adobe tools, you can have a PDF template and use the Adobe tools to populate. This saves a "conversion" stage.
You mentioned using Word for templates. This means you to run through two stages of processing:
population - docx is a zipped set of XML files - so you can process them with your own code or using a library.
conversion - you need pdf, so you have to convert the docx to pdf. You also have to watch out for fonts at this stage (ie make sure they are available on your host).
The population stage you could do yourself since you are familiar with XML. But it is definitely complicated. The conversion needs to use a tool that is ideal for it. There are a few mentioned in the comments already.
There are some free/os and commercial tools that can do both parts:
docx4j
JOD Reports
Libre Office (using the Java Uno API) (I blogged this once - Java Convert Word to PDF with UNO)
Docmosis (please note I work for Docmosis)
I suggest starting with the simple example you have attached and prove you can both populate and convert that. Then switch to a more complicated example to see if you can do the other things that might be required (eg repeating or conditions or other logic) during the population stage.
I have been bumping my head against the wall with this one, have researched and pretty much tried every library suggested to me. I am currently trying to write a program in java that will extract text AND images from a pdf file and allow me to write the extracted content to a word file. I have managed to extract the content using the ICEpdf library, however the problem is that I need to be able to write the content in the exact same order as it was read. So, to clarify, I need a library that will help me keep track of where exactly in the page the text and images are situated so I can put them in the same place in my word file.
A PDF to Word converter is a horribly complex proposition.
Your best bet will probably to use Open Office to do it for you and not even try to handle the intermediate steps.
http://www.openoffice.org/api/
Look at this: Advanced PDF parser for Java
OFF:
-Also to my knowledge there is a python parser that sorta converts the pdf to html (that way you can keep track of the ordering of the objects within the pdf). I know its not java, but you might be able to use the output.
http://www.unixuser.org/~euske/python/pdfminer/index.html
I need to show a graph(piegraph and XYgraph) in HTML file.
I have used some free tool created an image and I am trying to show this on HTML.
But, we need to place this image in shared folder or in a server to get accessed by HTML.Our client is not satisfied with both approaches.
Can some please let me know whether there is any way where I can pass data directly to html file. The data will be in csv file and it may contain some thousands of rows.
Thanks,
There is a Javascript framework that renders really pretty charts: http://www.highcharts.com/
You can use one of many CSV Javascript parsers: Javascript code to parse CSV data
If you then write some javascript code to extract your CSV data, and pass it to highcharts, you've got a very nice interactive chart.
The alternative, if you want to use your existing images, is to encode the images as base64 directly in the html file: http://webcodertools.com/imagetobase64converter/
As an alternative you can also look at the Dojo Toolkit (http://dojotoolkit.org/), its a Javascript toolkit with some really nice features including charting.
I wrote a web app for generating PDF by filling data into a pre-saved PDF template, template edited by acrobat, with some text-fields. But the context of those text-fields seems in a different layer and cannot affect other existing words in template.
... But I want it affect the existing words and make them flow base on how many data fill into the text-fields.
The solution maybe use program to generate a whole PDF instead of using template. But the template changes really often in my case, I don't want waste a lot of time to adjust the position and format by coding...
Do anyone know how to use text-field with auto flow in a PDF template? just like a Word document.
PDF doesn't work like that. You need to generate the whole PDF.
Ah... but from what?
There are quite a few HTML->PDF converters out there. You could fill in your template HTML, and convert it that way.
You could develop your own input format (for your template), and write an app that reads it and builds a PDF.
The later is similar enough to HTML->PDF, that unless you can't find a converter that handles some PDF feature or other you need, I'd just go that route. There are LOTS of html->pdf apps out there. You can search SO, google, whatever. Lots.
I have a huge pdf file (20 mb/800 pages) which contains some information.
It has got index with hyperlinks. Also most of the remaining information is in Tabular format (in pdf). I need to retrieve this information using Java and store it in SQL Server.
Which is the best API available to read this kind of file from Java?
It is unlikely to be in tabular format inside the PDF as PDF does not contain structure information unless explicitly added at creation time. I wrote an article explaining some of the issues with text extraction from at PDF at http://www.jpedal.org/PDFblog/2009/04/pdf-text/
Have you tried iText:
iText
Download iText
iText in Action — 2nd Edition
List of the Examples