PDF with forms to simple image PDF

PDF with forms to simple image PDF - java

How can I transform a PDF with forms made in Adobe Livecycle to a simple image PDF using Java?
I tried using Apache PDFBox but it can't save as image a PDF with forms.
This is what I tried(from this question: Convert PDF files to images with PDFBox)
String pdfFilename = "PDForm_1601661791_587488.pdf";
try (PDDocument document = PDDocument.load(new File(pdfFilename))) {
PDFRenderer pdfRenderer = new PDFRenderer(document);
for (int page = 0; page < document.getNumberOfPages(); ++page) {
BufferedImage bim = pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGB);
ImageIOUtil.writeImage(bim, pdfFilename + "-" + (page+1) + ".png", 300);
}
} catch (IOException ex) {
Logger.getLogger(StartClass.class.getName()).log(Level.SEVERE, null, ex);
}
But is not working, the result is an image where it writes that "The document you are trying to load requires Adobe Reader 8 or higher.

I guess is not possible, I tried many libraries and none worked.
This is how I solved the problem:
I used an external tool - PDFCreator.
In PDFCreator I created a special printer that prints and saves the PDF without asking any questions(you have these options in PDFCreator).
This is simple to reproduce in PDFCreator because in the Debug section you have an option to load a config file, so I have this file prepared, I just install PDFCreator and load the config file.
If you will use my INI file in the link above you should know that the resulted PDF is automatically saved in the folder: "current user folder/Desktop/temporary".
The rest of the job is done from Java using Adobe Reader, the code is in my case:
ProcessBuilder pb = new ProcessBuilder(adobePath, "/t", path+"/"+filename, printerName);
Process p = pb.start();
This code opens my PDF in AdobeReader, prints the PDF to the specified virtual printer, and exists automatically.
"adobePath" is the path to the adobe executable
path+"/"+filename is the path to my PDF.
"printerName" is the name of the virtual printer created in PDFCreator
So this is not a pure Java solution and in the future, I intend to use Apache PDFBox to generate my PDF's in a format that is compatible with browsers and all readers...but this works also.

Related

PDFBox: Loading an Image Into PDF From a JAR Resource

Good afternoon. I have a JAR file to which I have attached some images as resources in a folder called logos. I am being told to do this due to security restrictions (we don't want the image files to be exposed in the same folder as the JAR). I first tried to load these images in as if they were a File object, but that obviously doesn't work. I am now trying to use an InputStream to load the image into the required PDImageXObject, but the images are not rendering into the PDF. Here is a snippet of the code which I am using:
String logoName = "aLogoName.png";
PDDocument document = new PDDocument();
// the variable "generator" is an object used for operations in generating the PDF
InputStream logoFileAsStream = generator.getClass().getResourceAsStream("/" + logoName);
PDStream logoStream = new PDStream(document, logoFileAsStream);
PDImageXObject logoImage = new PDImageXObject(logoStream, new PDResources());
PDPage page = new PDPage(new PDRectangle(PDRectangle.A4.getHeight(), PDRectangle.A4.getWidth()));
document.addPage(page);
PDPageContentStream contentStream = new PDPageContentStream(document, page);
contentStream.drawImage(logoImage, 500, 100);
Note that I have verified that the resource is getting loaded in correctly, as using logoFileAsStream.available() returns a different value for various logos. After running this code, the image does not actually get drawn on the PDF, and upon opening it, the error message "An error exists on this page. Acrobat may not display the page correctly. Please contact the person who created the PDF document to correct the problem." appears. Could someone please help me figure what's wrong with that code snippet/a different solution to load my images in as a resource from the JAR? Thanks so much. Let me know if more details are needed/clarification.

This PDImageXObject constructor is for internal PDFBox use only. You can use
PDImageXObject.createFromByteArray(document, IOUtils.toByteArray(logoFileAsStream), logoName /* optional, can be null */)
for maximum flexibility, or if you know it is always a PNG file
LosslessFactory.createFromImage(document, ImageIO.read(logoFileAsStream))
don't forget to close logoFileAsStream and contentStream.

Multi-page document scanning

I'm making a java program that scans document and saves it to pdf. It works like a charm for a single page document. I run bash command from java, create BufferedImage from InputStream and then build pdf document using itext.
Process p = Runtime.getRuntime().exec("scanimage --resolution=300 --format png --device-name " + device.getName());
BufferedImage bI = ImageIO.read(p.getInputStream());
The trouble begins when trying to scan multi-page document (batch scanning). Namely, I don't know what to do with resulting InputStream.
Process p = Runtime.getRuntime().exec("scanimage --batch --resolution=300 --format png --device-name " + device.getName());
I see a possible workaround, saving images to temporary files and then building pdf document using these files. However, I would like to avoid that. Is there a way to acquire array of BufferedImage from InputStream given by scanimage?

Java: Download .txt File from URL

I want to download a .txt file from website and my code works, so I don't get an error and it loads the document, but the document is full of hmtl code, instead of my content.
public static void main(String[] args) {
try {
URL website = new URL("http://www.file-upload.net/download-11700212/document.txt.html");
String filepath = "C://Users//" + System.getProperty("user.name") + "//Desktop//document.txt";
ReadableByteChannel channel = Channels.newChannel(website.openStream());
FileOutputStream stream = new FileOutputStream(filepath);
stream.getChannel().transferFrom(channel, 0, Long.MAX_VALUE);
System.out.println("Download successfull.");
} catch (Exception e) {
System.out.println("Download was not successfull.");
}
}
The download itself works, I got the txt file on my desktop, but the content is wrong and full of html code.
Please help.
Thanks.

The URL you are trying to download from is an HTML page, rather than the document itself. The link on that page you should be trying to download from is...
http://www.file-upload.net/download5.php?valid=451.69031370715&id=11700212&name=document.txt
However, if you wish to guarantee that you're downloading a text file, then you should choose a text file to download directly e.g.
http://humanstxt.org/humans.txt

I have a Python project called Python Webscraper which can read a URL and copy its textual contents to a text file without the HTML.
You'll need to install a package called Beautiful Soup then run the code from the GitHub repo.

org.apache.poi.xwpf.converter.xhtml.XHTMLConverter not generating images

I am using org.apache.poi.xwpf.converter.xhtml.XHTMLConverter class to convert docx to html. Below is my groovy code
public Map convert(String wordDocPath, String htmlPath,
Map conversionParams)
{
log.info("Converting word file "+wordDocPath)
try
{
...
String notificationWorkingFolder = "C:\tomcats\Notification\store\Notification1234"
FileInputStream fis = new FileInputStream(wordDocPath);
XWPFDocument document = new XWPFDocument(fis);
XHTMLOptions options = XHTMLOptions.create().URIResolver(new FileURIResolver(new File(notificationWorkingFolder)));
File htmlFile = new File(htmlPath);
OutputStream out = new FileOutputStream(htmlFile)
XHTMLConverter.getInstance().convert(document, out, options);
log.info("Converted to HTML file "+htmlPath)
return [success:true,htmlFileName:getFileName(htmlPath)]
}
catch(Exception e)
{
log.error("Exception :"+e.getMessage(),e)
return [success:false]
}
}
The above code is converting docx to html successfully, but if docx contains any images it puts <img src="C:\tomcats\Notification\store\Notification1234\word\media\image1.png"> but do not copy the image to that folder. As a result, when I open html tag, all images appears empty. Am I missing something in code? Is there a way to generate an image srouce link instead of absolute path, like <img src="http://localhost:8080/webapp/image1.png">

I got answer for first question from this link lychaox.com/java/poi/Word07toHtml.html. I had to add one line of code options.setExtractor(new FileImageExtractor(imageFolderFile)); to generate images.
Second question I resolved by pattern search and replacement.

Even with proper usage, it's worth noting that XHTMLConverter uses XHTMLMapper, which does not process headers, footers, or VML Images. Any images falling into those categories will be lost.
The PDFConverter is more fully featured, but also uses the GPL licensed library, iText.

Unable to load image from the same package in Java?

In my java package, I have a file called 'prog.ico'. I'm trying to load this file, via the following code:
java.net.URL url = this.getClass().getResource("prog.ico");
java.awt.Image image = ImageIO.read( url );
System.out.println("image: " + image);
This gives the output:
image: null
What am I doing wrong? The .ico file exists in the same package as the class from which I'm running this code.

It seems that the .ico image format is not supported. See this question and it's answer to get around this.
To prevent link rot: This solution recommends using Image4J to process .ico files.

I've written a plugin for ImageIO that adds support for .ICO (MS Windows Icon) and .CUR (MS Windows Cursor) formats.
You can get it from GitHub here: https://github.com/haraldk/TwelveMonkeys/tree/master/imageio/imageio-ico
After you have it installed the plugin, you should be able to read your icon using the code in your original post.

I thing you must go over FileInputStream to wrap the file
File file = new File("prog.ico");
FileInputStream fis = new FileInputStream(file);
BufferedImage image = ImageIO.read(fis); //reading the image file

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

PDF with forms to simple image PDF - java

Related

PDFBox: Loading an Image Into PDF From a JAR Resource

Multi-page document scanning

Java: Download .txt File from URL

org.apache.poi.xwpf.converter.xhtml.XHTMLConverter not generating images

Unable to load image from the same package in Java?

Categories

Resources