Search inside a pdf without opening the contents

Search inside a pdf without opening the contents - java

I would like to create a searchview in android in a pdf file without opening the content and if the pdf has the searched word then it will show only the title/titles of that pdf.

It is not possible to search text in a PDF file w/o reading its content. What you may find - it is strings and names(field names, document info, metadata etc.), and it will work only if the document is not encrypted.
All streams in a PDF document are compressed(mostly using FlateDecode filter).

Related

PDF is getting change after loading using PDFBOX jar

I have a PDF having first page as different page ( as we have in MS word functionality under "Design" tab ). and the same PDF is passed to PDFBOX using below code :
File originalPdfFile = new File("D:\\AsposeOutput_temp.pdf");
PDDocument originalDocument = PDDocument.load(originalPdfFile);
originalDocument.save("D:\\pdfBoxGen.pdf");
But when i am opening the PDF that is generated by PDFBOX, is modified. I have attached the input PDF (named AsposeOutput_temp.pdf) and output PDF (named : pdfBoxGen.pdf). I want the PDF to same as i am passing as input.
File links : https://gofile.io/?c=lLPpQz
Any help would be greatly appreciated!!

I got the solution for the above problem. There was no issue with PDFBOX library. it was with the Aspose word.The input file that was passed to PDFBOX library , was having section break internally and the same was making the improper alignment of footer.

iText PDFXFA library: Issue with some of the fields while flattening using flattenXDP( )

I'm trying to flatten the XFA PDF using iText pdfxfa library. On flattening the pdf using the demo application provided by iText, I get all the data correctly embedded in my pdf. But when I try to do it using my code, it is otherwise. The data for the text fields, checkboxes gets correctly embedded, but for attachment names. By 'attachments' I'm referring to: The dynamic form can contain another PDF(attachment) inside it. The 'attachment' can be added to the PDF using buttons provided in the XFA pdf. Below is the code I'm using to flatten the PDF. I've copied the XFA of the PDF using iText RUPS in a separate file and used it as InputStream to XFA flattenXDP().
private void flattenXFA(String flattenedPDFDest) throws FileNotFoundException, IOException, InterruptedException {
FileOutputStream fos = new FileOutputStream(flattenedPDFDest);
XFAFlattener xfaf = new XFAFlattener();
// The XFA for the PDF is copied from iText RUPS in the phshuman10.xfa.xml file.
xfaf.flattenXDP(new FileInputStream("/home/NetBeansProjects/kitext/resources/phshuman10.xfa.xml"), fos);
fos.close();
}
Link to the zip of all required PDF's:
https://drive.google.com/file/d/0B6w278NcMSCrT2p6cWQxZG0yYVU/view?usp=sharing
The name of PDF in the zip:
Flattened PDF using itext demo: checkResult.pdf
Sample filled copy of form: PHSHumanSubjectsAndClinicalTrialsInfo-V1.0 (10).pdf
Flattened PDF using my code: tt_flattened3.pdf
The XFA file for PHSHumanSubjectsAndClinicalTrialsInfo-V1.0 (10).pdf: phshuman10.xfa.xml
If required, my scenario can be adequately reproduced using the uploaded resources! Thanks in advance.

This is expected behavior. If you open your original XFA form as a PDF file in a PDF Viewer, you will be able to see that there are 4 attachments in this PDF file. XFA itself is an XML-based format which can be embedded into PDF, and it can actually interconnect with PDF file by some JavaScript APIs.
What happens is that in your form the JavaScript code in your XFA form communicates with PDF file (most likely by proprietary Acrobat's API), and is able to retrieve the attachments.
When you try to flatten pure XDP package, you only extract the XML from PDF that is responsible for definition of XFA form, some datasets etc, but do not extract anything related to PDF file itself: fonts, images, attachments.
In case XFA form uses some PDF resources, you will not be able to flatten them 100% correctly as in original form contained in PDF.
Thus, if PDF resources are used in XFA form, you will have to flatten the PDF form directly via flatten(InputStream, OutputStream) method, which accepts the input stream for a PDF containing an XFA form, and output stream for the resultant flattened PDF file.

Skip the content while HTML to Excel conversion using aspose.cell in Java

I want to convert the HTML to Excel using aspose cell in Java, but the generated Excel skipping the content.
HTML content :
Hi Fanny,
Urgent !!Â 
SPR'17 - S/545175 -- ADSO# 16843754;
SPR'17 - S/545175 -- ADSO# 16843754;
fdzjchxk;shdgasz;ASDO;fhsjdzx
dyzhbsxz;sdhbdugvfd;36457q;sfdnzcx;
Best regards
Tel: 0123-1234 8765
Generated Excel file content :
SPR'17 - S/545175 -- ADSO# 16843754;
I am using aspose cells-16.12.0.jar and there is not any error occurring. The Content doesn't has any image or table etc but have special symbol. The code is executing fine without any error. I feel special symbol is creating the problem.

Aspose.Cells supports to read/write MS Excel-oriented HTML file, it means, you got to use only those HTML tags which MS-Excel uses in the formation of the file. I guess your template HTML is not good. I think you may try to open the HTML file into MS-Excel first to check if it is opened fine or not. If MS Excel opens the file fine then Aspose.Cells APIs should be able to read it fine.

Creating a dynamic PDF in Java

This is not a duplicate question. I had searched and tried many options before posting this question.
We have a web page, in which user should be able to input data in text boxes, text areas, images and also Rich Text editors. This data has to be filled in an existing report, like filling the blanks.
I was able to achieve the functionality using Apache FOP when the user input is simple text. But Apache FOP doesn't work if the user input is Rich Text(html format). FOP will not render html, and it just pushes the html code(ex: <strong> XYZ /strong>) into the pdf.
I tried using iText, but the setback here is that even though iText supports rendering of html to pdf, it is not able to place the images, that are included in <img> tags, in the pdf file.
I can try to create a pdf using iText api block by block, but the problem is rich text data entered by the user can not be embedded between the code since building pdf block by block and html to pdf can not be done together in iText. Or at least that is what I think from my experience.
Is there any other way to create a pdf file from java with images, rich text rendering as it is, headers and footers?

iText provides the capability to convert HTML Data to Pdf. Below is the snippet to do it :
Lets assume the html data is available as Input Stream (If its a String then we can convert it to InputStream using Apache Commons - IOUtils)
InputStream htmlData; // Html Data that needs to converted to Pdf
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
Document document = new Document();
PdfWriter pdfWriter = PdfWriter.getInstance(document, outputStream);
document.open();
// convert the HTML with the built-in convenience method
XMLWorkerHelper.getInstance().parseXHtml(pdfWriter, document, htmlData);
document.close();
// outputStream now has the required pdf data

I am working as Social Media Developer for Aspose and to add rich text to a form field in PDF file, you can try our Aspose.Pdf for Java API. Check the following sample code:
// Open a PDF document
com.aspose.pdf.Document pdfDocument = new com.aspose.pdf.Document("c:\\data\\input.pdf");
//Find Rich TextBox field using Field Name
RichTextBoxField textBoxField1 = (RichTextBoxField)pdfDocument.getForm().get("textbox1");
//Set the field value
textBoxField1.setValue("<strong> XYZ </strong>");
// Save the modified PDF
pdfDocument.save("c:\\data\\output2.pdf");

I am not trying to market or promote this product. This api actually solved our problem so thought of mentioning it as it might help fellow developers. please let me know if this is against your policy.
I finally realized that the solution for my requirement can not be achieved with either FOP, iText, Aspose, Flying Saucer, JODConverter.
I found a paid api Sferyx. This api allows to render a very complex html to pdf almost preserving the original style. It also renders the images included in the html. We are still exploring this api and will post what other features this api provides.

How to read bookmark links of pdf file?

I am reading pdf file in java code using PdfReader class.
I want to read the bookmarks index/chapter links shown in red colored box.

Use SimpleBookmark.getBookmark(PdfReader).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Search inside a pdf without opening the contents - java

I would like to create a searchview in android in a pdf file without opening the content and if the pdf has the searched word then it will show only the title/titles of that pdf.

It is not possible to search text in a PDF file w/o reading its content. What you may find - it is strings and names(field names, document info, metadata etc.), and it will work only if the document is not encrypted. All streams in a PDF document are compressed(mostly using FlateDecode filter).

Related

PDF is getting change after loading using PDFBOX jar

iText PDFXFA library: Issue with some of the fields while flattening using flattenXDP( )

Skip the content while HTML to Excel conversion using aspose.cell in Java

Creating a dynamic PDF in Java

How to read bookmark links of pdf file?

Categories

Resources