how to append data in an existing pdf? [closed] - java

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I'm trying to add some data to a PDF with iText 7 in a Java application.
I don't succeed in opening the pdf in append mode. I looked for some solutions online but all concerned iText5 (and use classes that doesn't exist any more.)
What can I do?

It depends on what you want specifically:
merge two documents:
https://developers.itextpdf.com/content/itext-7-examples/itext-7-merging-pdf-documents
add content at the end of a document:
Similar to before, you could create a new document (to a byte output stream), and merge the two together
add content to an existing page:
Hard to do, since that typically requires re-layout of the document, which no PDF-engine can currently do.
fill in forms in the document:
https://developers.itextpdf.com/content/itext-7-examples/itext-7-form-examples
add an attachment to the document:
https://developers.itextpdf.com/examples/miscellaneous/clone-embedded-files
extra (3):
Adding content to a PDF, in the middle of existing content is extremely hard.
To understand why, here is some information on how PDF documents are built internally:
PDF documents contain instructions for a viewer to render, rather than plain text
instructions and their arguments are grouped in 'objects'
objects can be compressed to reduce file size
a PDF document keeps an internal index of all of these objects, this is called the XREF table
the index inside a PDF document uses byte-offsets to tell a renderer where (in the file) an object can be found
Suppose you want to change (or add) something.
You'd mess up all the byte-offsets in the XREF. No viewer would be able to find any object again.
Then there is the fact that the PDF does not contain layout information. If you added something new, and existing content would need to move, you need layout information (what objects make a sentence? which sentences make a paragraph?). Only by having layout information can you sensibly re-layout the document.

Related

Combining pdfs into one pdf [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have a web app that I am changing to use xPressions from EMC2. There is a point where xPressions returns a pdf document inside a java servlet. Before we added xPressions, we would combine several of these pdfs into one large pdf and send it back to the user/screen. But xPressions can only process one pdf at a time. It is returning the pdf as a byte[] array. So I am trying to find a way to take the byte[] arrays and combine them into one large pdf to send back to the user/screen. Before we had xPressions we were using an old version of Big Faceless (bfo.com) to combine the individual pdfs into one pdf in the servlet. I have not been able to get the byte[] array to a valid pdf using the old bfo.com software. I have searched on Google and here on stack overflow for another technique. I have found answers that are close but most are using Linux or c#. Also, these pdfs are created inside the java servlet and are not existing on a hard drive where I could read them in and convert them. I have to take the byte[] array and work with that. So, does anyone have any ideas for me ? Thanks in advance !
You can use PDFBox for merging your pdf files. PDFMergerUtility class has a method addSource which takes in an inputstream, you can convert the byte array to inputstream and add that as a source.
PDFMergerUtility merger = new PDFMergerUtility();
merger.addSource(...);
merger.addSource(...);
merger.setDestinationFileName(...);
merger.mergeDocuments();

java copy Excel cell content to a word document [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I would like to be able to copy the content of an Excel cell, let's say B3 in tab (or sheet) "my sheet" into a word document.
I know exactly where to put the content of B3 into the word document, but I don't know how I can do it using Java.
I just finished a project that creates an excel workbook from scratch using java. First time doing it, but I would recommend Apache POI. To share some sources I found along the way here is an overview of core classes the library offers with some useful method descriptions.
And if you find this is something you may want to use here are a bunch of examples that I found quite useful.
My answer is as non-technical as they come, but hopefully this is helpful in some way.
Edit: Just to give an example, you could do something like:
Sheet sheet = new Sheet();
sheet.getRow(rowNumber).getCell(cellNumber).setCellValue(someValue);
And you can get fancy and iterate through every row, cell, column, etc. I found it to be pretty flexible. Even offers styling options
Edit2: Just realized you don't need to set the excel cell value. Oh well, there it is anyways. Still a useful library to use one way or the other though.

How to extract page number from PDF file [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
We explored so many API's like tika,Pdfbox and itextpdf to extract page number from pdf file but we did not able to do this. In itextpdf we got PdfPageLabels.getPageLabels(reader) but the behaviour of this method is not uniform.
The reason why you don't find any software that is able to extract page numbers from a PDF is simple: the concept of a page number doesn't exist in PDF.
Allow me to predict your response.
*"Wait a minute!" you say, "When I open a PDF in Adobe Reader, I can clearly see a page number in the document!"
Well yes, you can see that page number with your eyes and your human intelligence, but to a machine that number is just some text drawn on a canvas. A machine consuming the document has no idea what all the glyphs and lines and shapes on a page are about. Hence, software can not give you the page number you see as a human. A machine doesn't know where to look!
If you know something about PDF, I can predict your next reply.
"Wait a minute!" you say, "What about Tagged PDF? Doesn't Tagged PDF mean that the semantics of a document are stored along with the representation?"
Well yes, when a PDF is tagged a snippet of text knows that is is part of a title, or a paragraph, or a list,... But Tagged PDF is there to define the structure of the real content. Page numbers however, are not part of the real content. They are marked as artifacts along with headers, footers and other items on a page that are not considered being real content. There is no way to distinguish page numbers.
"Then what are these page labels about?" you ask.
Well, page labels are optional. They are present in some PDFs that are well conceived, but they will be absent in a large majority of the PDFs you'll find in the wild.
This is the long answer. The short answer is simple: You are asking for something that is impossible (in general, not only with iText, Tika, PdfBox, or any other tool you might try).

How to compare two PDFs based on visual differences programmatically? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I need to compare and get all the visual differences in the two PDF files. I know there are some questions related to this on stack overflow but they are not fulfilling my need.
I'm currently using PDFBox to generate images for pages in PDF and comparing the bytes of the images.
By this approach I'm able to know that particular page is differing.
But I need to find to know some more fine details such as font size of some text, for say - "The text" is differing in the page number, say 6 in the PDFs.
Not only for text but I need to take care of all the visual differences such as images, text in the charts etc.
Please suggest me someway to achieve this.
PS: I tried using Apache Tika but I'm getting the sense that it could be used to get structured text in XHTML and metadata. But I'm seeing the fine details such as font size, font eight is not appearing in structured text. Please correct me if I'm getting it wrong.
PDF to image using Java
Convert PDF to thumbnail image in Java (there's an example of pdf-renderer use here)
https://www.google.com.br/search?q=PixelGraber&ie=utf-8&oe=utf-8&rls=org.mozilla:pt-BR:official&client=firefox-a&gws_rd=cr&ei=K1PhUqD2Jei0sQTQs4DoAw
A good library for converting PDF to TIFF?
Convert jpeg/png to an array of pixels in java
int pixels array to bmp in java
Finding pixel position
Get Pixel Color around an image
For extraction of text using PDFBox: Extracting text from PDF file using pdfbox
There are classes in PDFBox for detecting font position, type, size and maybe (didn't search deeper) other settings. (Links below) You could, then, extract text from both PDFs, compare them to check if texts are equal, then - if they are equal - compare their format. If there's something different, mark for display into another text, image or PDF.
http://pdfbox.apache.org/docs/1.8.3/javadocs/org/apache/pdfbox/util/TextPosition.html
http://pdfbox.apache.org/docs/1.8.2/javadocs/org/apache/pdfbox/pdmodel/graphics/PDFontSetting.html
Check out this Java package: https://java.net/projects/pdf-renderer
You can convert the pdf to an image and then traverse the image as a 2D array and compare differences like that.

Replacing XML Nodes in Java [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have an XML document in a foreign language and another XML document in English. I am trying to replace some nodes in the foreign document with nodes from the English document and export the document.
I have been working on this for days now and have tried countless things form importing both documents into text with a Scanner, BufferedReader, etc. with no good results.
I'm at a loss on what else I can try. I have searched for days and have nothing. Maybe what I'm trying to do cannot be done although it seems simple enough. Any help/direction would be appreciated.
Put them into DOM objects, then use XPATH to locate and select nodes, to copy values between them.
Depending on what you need to replace and what you mean by "export", I would use an XML parser like SAX using the following algorithm
For each node that you read
Replace attributes or text as necessary
Write it out to the the a new XML file
There are many tutorials out there on how to use SAX, such as this one: How to parse XML using the SAX parser
If the "replacements" you need to do are very straightforward like "all <tag> objects under <parent-tag>" then maybe building the DOM and using XPath would work, but if your replacements are very arbitrary and unstructured then I'd go with parsers.

Categories

Resources