Java2DRendererBuilder BufferedImage From HTML Size Incorrect - java

I am using com.openhtmltopdf.java2d.Java2DRendererBuilder to create BufferedImage objects from html. The html may contain image objects as base64 text.
Java2DRendererBuilder builder = new Java2DRendererBuilder();
builder.withHtmlContent(html, null);
builder.useDefaultPageSize(210, 297, BaseRendererBuilder.PageSizeUnits.MM);
builder.useFastMode();
builder.toSinglePage(bufferedImagePageProcessor);
builder.runFirstPage();
int i = 0;
for (BufferedImage imagePage : imagePages) {
log.debug("image width: "+imagePage.getWidth());
log.debug("image height: "+imagePage.getHeight());
saveImageFile(imagePage, "normal"+System.currentTimeMillis());
}
The problem is that when the html contains an image with a large width, the resulting BufferedImage does not fit within the page. Instead, part of the image is not shown.
I realize I can change the default page size to a larger value to accommodate the larger image. However, I don't know what value to set for the width since I won't know beforehand if the html has any images or what their width would be. I am hoping to find a way to accommodate any (reasonable) width on an A4 size page automatically.
Image not fitting on page

Related

PDFRenderer creates images of greater size than the PDF itself

I have a sample of scanned PDFs that I need to edit and re-export. I use PDFBox to render the PDF into a series of images (one image per page), I perform some OpenCV calculations on the rasterized jpegs and then I intend to insert them back into a new pdf file.
Example: PDF is 423kb, Page 1 is 313kb, Page 2 is 287kb, Page 3 is 319kb, Page 4 is 485kb, and Page 5 is 470kb.
Problem is that the output images are greater in size than the PDF itself. This results in my OCR efforts taking much longer than is acceptable (5 minutes vs 30 seconds per document). The only way to keep the jpegs from inflating in size is to leave them with a default DPI of 72. This produces poor quality images that cannot be used.
Why is this happening? I should be able to get back images that have a size less than or equal to the PDF in question (without sacrificing quality). I'm not doing anything weird to the images, just removing watermarks.
Here's some code illustrating how I'm extracting the jpegs from the PDF.
File file = new File(fileName);
PDDocument document = PDDocument.load(file);
PDFRenderer renderer = new PDFRenderer(document);
BufferedImage[] pageArray = new BufferedImage[document.getNumberOfPages()];
int pageCounter = 0;
for(PDPage page : document.getPages()) {
pageArray[pageCounter] = renderer.renderImageWithDPI(pageCounter, 160);
pageCounter++;
}

Getting TIFF Tags from an Image

I am trying to render a Tiff Image for extracting tags from it, here is what i have done till now:
ByteArraySeekableStream sStream= new ByteArraySeekableStream(imageByteArray);
ImageDecoder imgDecoder = ImageCodec.createImageDecoder("TIFF",sStream,null);
RenderedImage renderedImage = imgDecoder.decodeAsRenderedImage();
this seems to work fine for all positive scenarios, where as suppose if we have a TIFF image which has a missing tag in it.
for eg: If Image Length is missing, then i get the following exception:
java.io.IOException: "Image Length", a required field, is not present in the TIFF file
but i do not want it to throw any exception, rather it should render the image with whatever TIFF tags are present.
Is there any other way of rendering a TIFF image? or is there a way of modifying the above code to achieve this requirement.
To get the Tiff tags, u don't need to render the image first. If you render the image you will definitely get that exception if any tags are not proper.
Instead try this,
ByteArraySeekableStream sStream= new ByteArraySeekableStream(imageByteArray);
And get the Tiff directory from the stream instead of rendered image
TIFFDirectory tiffDr = new TIFFDirectory(sStream, 0)
From this you can get the Tiff tags, and then do the validations on the available tags in the image. Hope this helps.

Why is PDFBox PDFRenderer slow?

I want to convert a PDF to a TIFF using PDFBox 2.x and the PDFRenderer Class.
But it runs very slowly compared to ghostscript.
Here's my sample code
public class SpeedTest
{
static long startTime = System.currentTimeMillis ();
public static void logTime (String msg)
{
long now = System.currentTimeMillis ();
System.out.println (String.format ("%.3f: %s", (now - startTime) / 1000.0, msg));
startTime = now;
}
public static void main (String[] args) throws Exception
{
//System.setProperty ("sun.java2d.cmm", "sun.java2d.cmm.kcms.KcmsServiceProvider");
String pdfFileName = args[0];
String tiffFileName = args[1];
PDDocument document = PDDocument.load (new File (pdfFileName));
logTime (pdfFileName + " loaded.");
PDFRenderer pdfRenderer = new PDFRenderer (document);
logTime ("intitalized renderer.");
BufferedImage img = pdfRenderer.renderImageWithDPI (0, 600, ImageType.RGB);
logTime ("page rendered as image.");
ImageIO.write (img, "TIFF", new File (tiffFileName));
logTime ("image saved as TIFF.");
}
}
The output is as follows
0.521: sample.pdf loaded.
0.013: intitalized renderer.
2.910: page rendered as image.
2.005: image saved as TIFF.
As you can see, the call to pdfRenderer.renderImageWithDPI takes almost 3 secs (also ImageIO.write-call takes 2 secs, too).
When done the same using ghostscript the complete task finishes in 0.4secs.
time gs -dQUIET -dBATCH -dNOPAUSE -sstdout=/dev/null -sDEVICE=tifflzw -r600 -dFirstPage=1 -dLastPage=1 -sOutputFile=sample.tif sample.pdf
real 0m0.389s
user 0m0.340s
sys 0m0.048s
I've also already tried
System.setProperty("sun.java2d.cmm", "sun.java2d.cmm.kcms.KcmsServiceProvider");
as I'm running Java 8 (1.8.0_161 to be precise) but that makes no difference.
Thanks for every idea,
regards
Thomas
Upgrade to JDK 1.8.0_191 which was released on Oct, 2018, or JDK 9.0.4.
From Pdfbox docs,
PDFBox and Java 8
Important notice when using PDFBox with Java 8
before 1.8.0_191 or Java 9 before 9.0.4
Due to the change of the java color management module towards
“LittleCMS”, users can experience slow performance in color
operations. A solution is to disable LittleCMS in favor of the old
KCMS (Kodak Color Management System) by:
Starting with -Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider
or Calling
System.setProperty("sun.java2d.cmm", "sun.java2d.cmm.kcms.KcmsServiceProvider")
Sources:
https://bugs.openjdk.java.net/browse/JDK-8041125
According to my experiments this slowness only occurs for the first rendered page of a document. If you render all pages of a multi-page document, then all pages after the first one render faster. The absolute speed of the rendering also depends very much on the size of the DPIs used.
Render 6 document pages at 600 DPI
4.903s: page 0 rendered as image.
4.205s: page 1 rendered as image.
3.946s: page 2 rendered as image.
3.866s: page 3 rendered as image.
3.761s: page 4 rendered as image.
3.633s: page 5 rendered as image.
Render 6 document pages at 300 DPI
3.241s: page 0 rendered as image.
1.308s: page 1 rendered as image.
1.155s: page 2 rendered as image.
1.156s: page 3 rendered as image.
1.109s: page 4 rendered as image.
1.083s: page 5 rendered as image.
Render 6 document pages at 150 DPI
2.507s: page 0 rendered as image.
0.555s: page 1 rendered as image.
0.386s: page 2 rendered as image.
0.373s: page 3 rendered as image.
0.410s: page 4 rendered as image.
0.361s: page 5 rendered as image.
Render 6 document pages at 72 DPI
2.455s: page 0 rendered as image.
0.333s: page 1 rendered as image.
0.213s: page 2 rendered as image.
0.190s: page 3 rendered as image.
0.175s: page 4 rendered as image.
0.171s: page 5 rendered as image.
I think the problem here is that the AWT graphics does all rendering in software and with a constant pixel fill rate the rendering time scales quadratically with the DPI value. The slowness of the first image is probably some initialization overhead. (But that's all a wild guess at the moment.)

Thumbnail Program In Java everytime on page refresh it gives random images for a image

I have this java servlet Thumbnail.do which generates a thumbnail image everytime you send a request to it .User has to pass the file name and width the user wants for the image.
The code I am using is below:
public String Image="",ImgWidth="";
Image= "d:\\app\\project\\media\\"+req.getParameter("image");
ImgWidth= req.getParameter("width");
BufferedImage bufferedimage =ImageIO.read(new File(Image))
float scale=1;
int targetWidth=0;
int targetHeight=0;
Imgwidth=req.getParameter("width");
targetWidth=(int)(bufferedimage.getWidth(null)* scale);
targetHeight=(int)(bufferedimage.getHeight(null)* scale);
if(ImgWitdh == null || ImgWitdh.equlas("")){
ImgWitdh ="0";
}
if(targetWidth>Integer.parseInt(ImgWitdh)&& !ImgWitdh.equals("0")){
targetHeight=Integer.parseInt(ImgWitdh) * targetHeight/targetWidth;
targetWidth=Integer.parseInt(ImgWitdh);
}
ImageIO.write(createResizedCopy(bufferedimage,targetWidth,
targetHeight,imageOutput,
res.getOutputStream());
BufferedImage createResizedCopy(Image originalImage, int scaledWidth, int
scaledHeight)
{
BufferedImage bufferedimage =new BufferedImage(scaledWidth, scaledHeight,
BufferedImage.TYPE_INT_RGB );
Graphics2D g = scaledBI.createGraphics();
g.setComposite(AlphaComposite.Src);
g.drawImage(originalImage,0,0,scaledWidth,scaledHeight,null);
g.dispose();
}
And on whichever page I have to display the image ,I call the servlet Like this
<img src="../Thumbnail.do?image="the_image_name"&width=150&target="+Math.random()+"/>
till this everything works fine the image are getting converted to the said size and are getting displayed on the page .
But the problem is Suppose on the same page I am calling Thumbnail.do multiple times to display different images at various locations on the page
like
<div>
<img src="../Thumbnail.do?image="emp.png"&width=150&target="+Math.random()+"/>
</div>
<div>
<img src="../Thumbnail.do?image="logo.png"&width=50&target="+Math.random()+"/>
</div.
then what happens is every time I refresh the page random images are displayed in the div tags.
can anyone suggest why and if anyone knows the solution reply
If I understand your question correctly, the problem is the browser caches the image from your servlet. You can disable caching in your servlet code using the approaches described in the link:How to prevent the result of Servlets from being cached

ITextRenderer: Adjust page height to content

I'm using ITextRenderer to generate a PDF from HTML and what I need to do is a cash register receipt.
This receipt has dynamic width and, of course, dynamic content. This said, the height of content will always be different and right now I'm struggling to find a way of adjusting the height of the PDF page to the content.
If it's too big the receipt has a long white section in the end and if it's to short the PDF get's paginated and I need it to be in one page only.
I'm using #page {size: Wpx Hpx;} to set the page size, but it's almost impossible (would be very painful) to calculate the content height based on width and data.
This is the code that generates the PDF:
ITextRenderer renderer = new ITextRenderer();
byte[] bytes = htmlDocumentString.toString().getBytes("UTF-8");
ByteArrayInputStream bais = new ByteArrayInputStream(bytes);
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
InputSource is = new InputSource(bais);
Document doc = builder.parse(is);
renderer.setDocument(doc, null);
renderer.layout();
renderer.createPDF(outputStream);
outputStream.flush();
outputStream.close();
I've also tried renderer.getSharedContext().setPrint(false);, but this throws a NPE.
Also #page {-fs-page-sequence: "none";} without any luck.
The solution I found is not even close to perfect, but works!
#page {
size: Wpx 1px;
}
* {
page-break-inside: always;
}
This will generate 1px pages for the entire content. Then I just have to tell the printer to print all the pages with 0px margin between pages.
Why this solution is not perfect? The file size goes from 1 or 2KB to 200KB.. not very good, when streaming through 3G.

Categories

Resources