I'm using pdfbox (1.8) to handle pdf on Windows (7 and above). I need to take an input pdf and convert to a pdf made by the same page but used as image (no text selectable etc etc). With small file i have no problem but when i have to convert bigger file i have no clue due to massive memory use.
I will post some code if it helps but the approach i'm using is simple: create a document by all the page saved as image taken from the source pdf.
I'm searching for some more memory and time efficent way to do this (i have to handle pdf with 1 or 2 k of pages).
Related
Currently I'm developing an application that allows users to create a template and generate it into a DOCX file. The application needs to be able to display to users the changes in the template as the user is creating it.
The approach I tried was using DOCX4J library (allows manipulation of DOCX file) and ICEPDF which is primarily used to display the DOCX into the swing component by converting it first into a PDF file. Now the problem in this approach is that it loads pretty slow and some of the changes that occurs in the DOCX file does not reflect on the PDF conversion (example: dashed underline, font changes). When I tried to open the DOCX file ouput in MS WORD, the file is viewed correctly so I know changes do occur, but it seems that ICEPDF just can't show it properly.
So I was wondering if anyone knows a java library that allows DOCX files to be viewed directly from a Swing Component instead of converting it first into a PDF file.
You can try docx4all or DocxEditorKit. Both of these are built around docx4j.
I am using iText to generate PDF and is working fine, and I can also download it via browser as PDF. However, is it possible for java or iText to convert it to JPEG or any IMAGE file and allow users to download the image file.
response.setContentType("application/pdf; charset=utf-8");
Merely changing the contentType to image/jpg is not possible. I am continuously looking for answer but struggling to find one.
Any idea would be a lot of help
I dont know more about iText. But using PDFBox we can convert pdf document into images.
After splitting you can push images to response.
Here some reference links :
http://pdfbox.apache.org/commandlineutilities/PDFToImage.html
Converting a PDF into multiple JPGs with iText or other
http://www.javatpoint.com/example-to-display-image-using-servlet
you can use iText only for generating a pdf nothing else. see the link http://itextpdf.com/itext.php . see this to convert a pdf to image. See this link as well for clearer understanding with an example.
Looking for a Java based PDF creation library. We're currently using Apache Velocity with HTML to render PDFs on the fly.
We'd like to be able to find a way to render large images (sometimes as big as 3000 x 1700) in a creative manner within the PDF container. For instance, a scrollable image pane within a PDF. This might not be possible within a PDF, I might be wrong.
Open source would ideal.
For a good PDF library you should take a look at iText: http://itextpdf.com/
I have used images of around 5000x4000 with iText without any problems.
I don't know if it is possible to create a working scrollpane inside a PDF, unless of course you were doing it through a custom PDF creator/viewer.
iText is open source but make sure to check out the AGPL license before you use it commecrially: http://itextpdf.com/terms-of-use/agpl.php
For just creating PDF files from images iText is a little overdimensioned. Give xsPDF a chance, it has no limits for images sizes and seems to be appropriate for your problem.
Just a FYI for anyone that may run into this in the future:
I used a library called PDFBox (http://pdfbox.apache.org/) to open a pre-existing PDF and modify the PDF with a custom sized PDFRectangle with the dimensions of the image. Then inserted the image and rectangle into that new page and got the desired results.
I didn't realize you could have multiple page sizes in a single PDF.
I have a huge pdf file (20 mb/800 pages) which contains some information.
It has got index with hyperlinks. Also most of the remaining information is in Tabular format (in pdf). I need to retrieve this information using Java and store it in SQL Server.
Which is the best API available to read this kind of file from Java?
It is unlikely to be in tabular format inside the PDF as PDF does not contain structure information unless explicitly added at creation time. I wrote an article explaining some of the issues with text extraction from at PDF at http://www.jpedal.org/PDFblog/2009/04/pdf-text/
Have you tried iText:
iText
Download iText
iText in Action — 2nd Edition
List of the Examples
How to load word document in the jasper report in java.
You have the following options:
Put the word document somewhere on a shared network drive and add a link to it in the report.
You can convert the word document to PDF and then link to that in your report. That way, almost anyone on the planet (not only Windows users) can read it.
You can convert the word document to PDF and then use a tool to convert that to PNG or JPEG images (one page = one image) and then include the images in your report. This will make the report very huge.
You can hope that Microsoft will implement a MS Word reader in Java and tell your boss in the meantime that it is not possible without some drawbacks.