How to prevent downloading PDF file when displaying on web pages?
I tried the content disposition inline but it didn't work.
How can i do this ?
One option is to render the PDF to JPEG or some other image format and only serve the rendered images to the user. Some of the PDF libraries allow you to render PDF's to other file formats.
Another option may be to send/redirect the PDF through to an online PDF viewing app in the same way Google does with attachments in GMail. That way the user sees a JPEG of the PDF and cannot download the PDF.
Related
PDF Example
There is a PDF file with images and text mixed like in the picture above.
As a result, I want to create a parser that make PDF files to data with a fixed format.
Question is.
Simply, how to replace image to text ? TextStripper skip all images.
Extracting and dataizing text is successful, but it does not recognize the image between the texts.
I'm using PDF BOX. I extract pdf with PDFTextStripper().getText();
I also succeeded in extracting the images individually.
So i have to make an android app using Java that reads a PDF File and displays it on screen without using other programs(such as PDF Reader). How to make a distinction between text and image in that file? in other words, there is text and in between text ther is an image, how do i verify where it is text and where is an image?
PDF files don't work like that.
It is a complex format, and there is a lot more data in the files than just text and images, such as metadata and formatting.
If you want to handle PDF files in your app, you should use a PDF library, such as the ones listed here:
https://camposha.info/android-examples/android-pdf-libraries/#gsc.tab=0
How exactly to load text will depend on the specific library you choose, and you should check the relevant documentation.
I need to create json from pdf to render the pdf content as HTML with all the images and text. I have tried the modules below to do that. I am able to extract only plain images now, but not able to extract the graphical images and background shadow images. Is there any module to get these?
Modules tried
-PDFMiner (python)
-Mammoth(Node)
-pdf2json(Node)
-PDFBox(Java)
Have a look at http://pythonhosted.org/PyMuPDF/. Apparently this product renders pages in various formats, including json. Although I have limited experience with it, the recipe at http://code.activestate.com/recipes/580703-extract-images-of-a-pdf-optionally-by-page-using-p/history/1/ shows how to use PyMuPDF to extract images from a PDF.
I know it's possible to convert an HTML file to PDF using Google Drive (HTML2PDF using Google Drive API) but I'd like to know if this HTML has images and CSS files is possible and how to do that.
You need convert HTML to a Docs file and export it as PDF. During the docs conversion most of the non-trivial styles are being trimmed. Basic coloring, sizing and positioning will all you'll get. The exported PDF is the Docs' file's PDF version. Images will be preserved though.
You can make experiments by uploading your html files to Google Drive on drive.google.com with conversion settings on and see the results.
For images you could try this: Embedding Base64 Images
Worked for me when uploading by web. Should work with my solution https://stackoverflow.com/a/21711109/592042
Css can be written right into html file.
Currently I'm developing an application that allows users to create a template and generate it into a DOCX file. The application needs to be able to display to users the changes in the template as the user is creating it.
The approach I tried was using DOCX4J library (allows manipulation of DOCX file) and ICEPDF which is primarily used to display the DOCX into the swing component by converting it first into a PDF file. Now the problem in this approach is that it loads pretty slow and some of the changes that occurs in the DOCX file does not reflect on the PDF conversion (example: dashed underline, font changes). When I tried to open the DOCX file ouput in MS WORD, the file is viewed correctly so I know changes do occur, but it seems that ICEPDF just can't show it properly.
So I was wondering if anyone knows a java library that allows DOCX files to be viewed directly from a Swing Component instead of converting it first into a PDF file.
You can try docx4all or DocxEditorKit. Both of these are built around docx4j.