PDF reader for Java as PDF.js - java

We have a project where we use pdf.js to render a PDF into webpage and it creates HTML container elements for the PDF pages. The content of the PDF is split as HTML span in the view.
Attached is the image which shows how pdf text is rendered in the view. It also shows, each span has a data-key does not corresponds to a line in PDF.
Now, I need a pdf reader for java which reads and breaks the content as span with data-key or just the span in the order.
There are lot of java libraries available to read PDF content which gets the content line by line but that does not solve my issue. I need a java library which could break the content equivalent to span in the view.

Related

Print to a PDF file JavaFX

Is there a way to print/export a collection of Panes containing Nodes to a PDF file in JavaFX? Each Pane represents an A4 page and contains Text and Webview elements as in the following image:
The only workaround solution I found (with reference to this blogpost) was to print the Pane Nodes using the JavaFX API and selecting a PDF printer such as CutePDF. The goal is to directly provide the ability to produce a PDF file directly without relying on an external PDF printer.

How to save and open jtextpane content to/from an external file which contains Bold,Italics,Underline text, images and other styled content?

I am developing a Wordpad application using Java Swing.
I am using JTextPane component. I have added the code for Bold, Italics, Underline. I am adding the code to insert images in all the formats.
My expected scenario: I want to save the document with all these styled content with the extension '.doc' (or)'.docx'. And I want to read and open the document which contains these types of styled content like Bold, Italic, underline text and images, bullet points etc..
I don't think it can be done with HTMLEditorKit().
Can anyone help with the sample code for saving and reading these styled contents to/from external file?
Thanks in advance.

How to hide text in an PDF file?

How can I add text to a pdf document, which is not visible?
The document manipulation should be done in java. The usecase is to add further metadata to a document (in a proprietary format, about 40kb), before the document is signed and archived.
I tried:
annotation field with size 0,0
.txt file attachment
but, this annoys readers of the PDF, because they see a difference (comment / attachment bar).
Is there a comment object or a syntax to comment out lines in a PDF document?
EDIT:
I've tried adding text between PDF objects. This works, the problem is: acrobat reader asks to resave the file when closing window.
Adding the text after %EOF is not a solution, because signing is not applied to the metadata, which is a needed feature.
The proper way to add metadata to a PDF would be through XMP. It allows you to add arbitrary metadata and allows defining the metadata types inside of the same PDF file (which you really should do if you're archiving and which is a requirement in archival standards such as PDF/A).
XMP data can be extracted by readers who don't understand the PDF format using a simple text scanning algorithm yet at the same time it will be inside of the document so will be protected by the digital signature you apply.
You can read more about it here: http://www.adobe.com/products/xmp/
I have seen PDF's who had a bunch of metadata in the footer, just in color white while the background was also white, so normally you wouldn't recognize it when you're looking at the PDF. But that's quite nasty..

extract formatted text from pdf to html

I needed to covert PDF documents into HTML. where i can achieve below.
1-Extract the text from the PDF.
2-extract the images
3-Retain the formatting in the newly converted HTML page same as that of PDF page.
4-To embed the images into the newly converted HTML page in the appropriate places as that of PDF.
5- Applying color scheme to HTML page.
Any help would be appreciated.
To extract images from PDF
Image Extraction
To extract text from PDF
Text Extraction
All the other things you are seeking answers for are possible using any web-application setup.,

Issue with pdf creation using itext

I have to edit an existing pdf file using itext in java. My problem is in the existing pdf it contains lots of pages. When inputting the page number of that existing pdf i have to edit the footer of that page to a new text and have to output only that page with edited footer page along with the page contents in that page. No need to output the remaining pages. Also the existing pdf is in A6 format and I have to change the output pdf to A4 format. How it is possible?
You can split and merge PDF files using iText. That means, you need to split your original document into three parts and keep only the middle (required) part. You can also delete and add objects. That means you can find the footer object, delete it and and add a new object in its place. I do not think you would be able to change the format. Unless, you can create a brand new document in the target format and copy the objects from the source into the new document. Worth trying.

Categories

Resources