I have got some excel sheet and want to fill fields automaticly with data. I would like to be able to somehow tag the boxes where information can be entered. So I can then use some api (itext,pdfbox?) to fill these field without having to measure the pdf with a ruler and then paint into the right spaces. And it should be changable in the future.
What are the best apis and tool to use?
Which format should I use, xls, pdf, pdf with fdf or something better?
Project can invest 0 euros into buying tools.
For modifying/creating:
Excel: Apache POI
PDF: iText
Related
I am currently working on a little java application to transform some PDF bound data and I am using PDFBox for this. The pdf itself is very simple and just contains some headers and a table which seperates rows with a line. I am trying to find the coordinates of this line so that I can dynamically extract by area as some rows vary with their height. I have not really found any information on this during my search as almost all results deal with "text lines" and not actual lines. Is this even possible with PDFBox or will I have to look for another pdf library.
Any information would be greatly appreciated.
Is there any way to get number of paragraphs or content of given paragraph in pdf file using iText library ?. I saw some classes like Paragraph, Chunk in some code to create new pdf file but I can not find any way to get these classes in reading file. Every idea is appreciated
Is the PDF you're talking about a Tagged PDF? If not, you are making the wrong assumptions about PDF. In a PDF, content is drawn on a page. For instance: an iText PdfPTable is converted into text state operators that draw snippets of text to a canvas, as well as graphics state operators that draw paths and shapes. If the PDF isn't tagged, the lines don't know that they are borders of a table; a word doesn't know to which cell it belongs.
The same goes for paragraphs: a snippet of text doesn't know whether it belongs to a sentence, to a paragraph, to a title line,...
Due to the very nature of PDF, what you're looking for may be impossible (using iText or any other software product), or may require heuristics (artificial intelligence) to examine all text state operators and the semantics of the content to get a result that mimics how humans would interpret text.
It's very easy to achieve if your PDF is tagged correctly. See the ParseTaggedPdf example.
I am using iText library for writing a PDF file.
I want to give page numbers and page header on every page of file
How can I do that?
If you don't break lines manually then it's really hard almost impossible to precisely get the line count. This number depends on font measurement and the text layout on a page. iText is a great tool for PDF generation not for parsing.
I suggest you buy the iText book "iText in Action" which is very helpful. If you don't generate your page breaks yourself, you could checkout the PdfPageEvent methods such as onStartPage() and the getPageNumber() method.
I develope new program but i need to allow user to highlighting word in pdf file then i want to process the file to get list of highlighted words with place
how can do that by java
thank in advance
PDF files are PostScript, which is very difficult to process. I doubt there's an easy way.
Take a look at http://java-source.net/open-source/pdf-libraries , but be aware you might have some difficulty.
Also, read http://partners.adobe.com/public/developer/en/pdf/HighlightFileFormat.pdf for the specs of the highlight format. Depending on what "place" information you need, that might be enough.
How are you displaying the PDF? If you are displaying the image, you just need the word co-ordinates. Something like PdfBox or JPedal or maybe IText can do this.
I am working on a billing program - right now when you click the appropriate button it generates a frame that shows the various charges etc, basically an invoice. Is there a way to give the user an option of saving that frame as a document, either Microsoft Word, Microsoft Works or PDF?
One approach would be to save the frame as an image, you can do that by using the following syntax to convert it to an image.
BufferedImage myImage = new BufferedImage(size.width,size.height,
BufferedImage.TYPE_INT_RGB);
Graphics2D g2 = myImage.createGraphics();
myComponent.paint(g2);
you can then save this image and pass it into a jasper report. From the JasperPrint object you can then save in a few different formats, including pdf. A better but similar approach would be to pass the Graphics context into JasperReports(there is a renderer to do this in jasper, and the quality is much better).
Paint JFrame in a BufferedImage. paint() method of JFrame
Save the image as jpg or png or whatever image format
Take some pdf library and create a blank pdf (e.g. iText)
Insert the image into the PDF document
Save it - done
Instead of generating a word document, I'd rather use a Java library like iText to produce a PDF document (more portable) or, even better, the JasperReport report library that can output reports in a wide range of formats (PDF, XML, HTML, CSV, XLS, RTF, TXT) as suggested by bigbrother82 in a comment. This looks cleaner to me than using an image, especially for printing (not even mentioning that your invoice may be a multi-page document).
I'd likely look at this from a slightly different direction and instead of asking how to splat the GUI form as-is into a PDF or word document I'd ask how to get that content into a Word/PDF document.
The answer to that question is Apache FOP. Generate a XSL-FO file and ask FOP to convert it into a RTF document (with a .DOC extension) or a PDF.
Normally one does this by generating an XML file containing the data you need printed. Then use an XSLT to convert that XML to XSL-FO. I however found it easier to generate a XSL-FO file directly using a templating language (such as Freemarker).
You might want to look at the online demo for Docmosis as an example which gives the user the options for requesting the document up front. That demo does a download, but it could direct the document into a frame instead and leave it to the browser to display. This style of working (as metioned by others) is looking at the problem from a different angle and deciding up front what format, rather than after the fact and then trying to save the frame contents.