Java printing existing pdf with right scaling

Java printing existing pdf with right scaling - java

I'm trying to print an existing PDF.
I already tried this: java pdfbox printerjob wrong scaling / page format
PDFBox is not working for me, because I can't get the right scaling of my PDF.
Scaling.SHRINK_TO_FIT is almost working but it's missing about 5mm and I can't add them because paper.setImageableArea is resetting it.
Also I tried to use new Paper(); with own margins and without setSize, because I can't get the size of an DIN A4 paper. The problem with the margins is, if I'm getting the correct top & left size and then try to set the height and width then it's destroying the top & left position.
Now I'm looking for a free and simple solution without PDFBox to print a pdf file without losing the quality and the scaling.

It's kinda silly but here is already the solution: https://stackoverflow.com/a/35483083/4888475
My printed page was equal to the printed page from the Acrobat software and not to Firefox.

Related

Lines on printout of iText7 PDF have different length on paper

Only some of our testers report a strange look of all lines on the printout of an iText7 generated PDF (see image). The generated PDF looks perfect and the flaw only appears on paper. Most users have no issue, whatsoever.
Any suggestions regarding known issues of printers, drivers or hints on how to reproduce or localize the problem would be appreciated. Anything I could do on iText7 side?
PdfCanvas canvas = new PdfCanvas(pdfPage)
canvas.moveTo(x1,y1);
canvas.lineTo(x2,y2);
canvas.closePathStroke();
The PDF can be found here: PDF

Making prior comments an actual answer...
I could reproduce the issue with your example document:
using Chrome on Windows with a Brother printer I also get those extra line segments;
using Adobe Reader instead of Chrome I don't.
Looking into the PDF, though, I didn't see anything that should cause that issue.
Thus, while there is nothing wrong with your kind of line drawing to start with, I proposed drawing the rectangles as rectangles (instead of multiple lines), or using filled thin rectangles instead of lines. I hoped one of those options is supported by Chrome printing.
And indeed you commented:
I'm now printing thin rectangles instead of lines and it works flawlessly.
So, while the bug was not yours from the start, changing your code to draw the borders differently solved the issue for you.

Determining text areas using opencv in crowded images

I am attempting (and failing at) locating area's containing text from a larger image. Specifically I am looking to recognize titles of Magic cards. At the moment I have managed to cut the images down to blocks containing the title, such as
input image.
Despite this and even with training the ocr library to work with only with this font accuracy is still low. As far as I can tell the best thing I can do is crop the image to only the text. After research I still have been unable to do so. I attempted to implement the solutions presented in Extracting text OpenCV however the text is too close to the border for this to work. attempt image. If possible help in the form of java would be greatly appreciated. (sorry for the image links, I don't have the reputation to embed images)

Posting answer as suggested.
This answer relies on the text always being close to the same distance/offset away from the border.
Find the boundings of the border using Canny/Hough etc, and with whatever filtering techniques works best with your images (erosion, dilution, sharpen, grayscale, binary thresholding, etc).
Then take a smaller interior submat() of this border bounding Rect to get an approximation of where the text should be and run the ocr on this submat.

How do I crop each page in pdf using PDFBOX in Java?

I want to remove the bottom part of each page in the PDF, but not change page size, what is the recommended way to do this in java in PDFBOX? How to remove the footer from each page in PDF?
Is there possibly a way to use PDRectangle to just delete all text/images within it?
snippet of what I tried, using rectangle with setCropBox seems to lose page size, maybe cropBox is not intended for this?
PDRectangle rectangle = new PDRectangle();
rectangle.setUpperRightY(mypage.findCropBox().getUpperRightY());
rectangle.setLowerLeftY(50);
rectangle.setUpperRightX(mypage.findCropBox().getUpperRightX());
rectangle.setLowerLeftX(mypage.findCropBox().getLowerLeftX());
mypage.setCropBox(rectangle);
croppedDoc.addPage(mypage);
croppedDoc.save(filename);
croppedDoc.close();
Closest example in pdfbox cookbook examples I could find is on how to remove entire page, however this is not what I'm looking for, I'd like to just delete few elements from the page:
http://pdfbox.apache.org/userguide/cookbook.html

I'm also a newbie, but take a look at this page, in particular, the description of TrimBox. If there's no TrimBox on the page, it defaults to CropBox, which would cause what you're seeing.
In general, don't expect the PDFBox docs to tell you much of anything about PDF itself - to use PDFBox well I think you need to go elsewhere - AFAIK, mostly just to the PDF specification. I haven't even skimmed it yet, though!

The CropBox is the way to go if you want to remove a portion of a page while keeping a rectangular region visible. If you want the page size to remain the same, you need the MediaBox to remain the same.
From the PDF Spec:
CropBox - rectangle (Optional; inheritable) A rectangle, expressed in default user space units, defining the visible region of default
user space. When the page is displayed or printed, its contents are to
be clipped (cropped) to this rectangle and then imposed on the output
medium in some implementation-defined manner (see Section 10.10.1,
“Page Boundaries”). Default value: the value of MediaBox.
MediaBox - rectangle (Required; inheritable) A rectangle (see Section 3.8.4, “Rectangles”), expressed in default user space units,
defining the boundaries of the physical medium on which the page is
intended to be displayed or printed (see Section 10.10.1, “Page
Boundaries”).
A have seen (faulty) applications and libraries that force the CropBox and the MediaBox to be the same, double check that this is not what is happening on your case.
Also take into account that the coordinates origin (0,0) in PDF is the bottom-left corner, some libraries do the translation to top-left for you, some others not, you may also want to double check this on the library you are using.

How to remove black border around transparent image in PDF created by iText

I have searched over bunch of sites, and I was unable to find solution for my problem.
This is the problem:
I am making PDF's in Java using iText library.
Everything works fine except one thing.
Transparent PNG images have black/gray border around non-transparent area.
I didn't set any borders in code, and actually I have tried to remove them (with no luck).
Can someone help me how to solve this problem?
The closest answer what I have found is: Resizing an image in asp.net without losing the image quality
But I cannot (don't know) interpret this code in Java.
My code is pretty big to copy/paste, but these are steps:
create document
load image from given path
manipulate image (resize, rotate, positioning)
add image to current page
save pdf file
This is what I have tried also:
http://itext-general.2136553.n4.nabble.com/template/NamlServlet.jtp?macro=print_post&node=2157267
http://itext-general.2136553.n4.nabble.com/template/NamlServlet.jtp?macro=print_post&node=2330200
I have tried more than those 2, but I didn't bookmarked them (none of them worked)
Thanks in advance
UPDATE: I forgot to mention that my original pictures don't have border. Border is created somehow by iText. I initially thought that it was bug, but since iText 5.0.2 this problem remained so now I doubt that is bug (I am currently using 5.1.3).
UPDATE 2 I forgot to add this link: http://itext-general.2136553.n4.nabble.com/template/NamlServlet.jtp?macro=print_post&node=2157261
Here is presented VB script that works, but I cannot convert to Java code (it still draws black border), so can someone help me at least with this to convert good?

You could use the java BufferedImage method, getSubImage(x, y, w, h) which allows you to crop a sub image out of an existing image. That way you could cut out the edges.
See here: Class BufferedImage

iText PDF colors are inconsistent in Acrobat

I'm generating a multipage PDF from Java using iText. Problem: the lines on my charts shift color between certain pages.
Here's a screenshot of the transition between pages:
This was taken from Adobe Reader. The lines are the correct color in OS X Preview.app.
In Reader the top is #73C352, the bottom is #35FF69. In Preview.app the line is #00FE7E.
Any thoughts on what could be causing this discrepancy? I saved the PDF from Preview.app and opened it in Adobe Reader, still has the colors off.
Here is the PDF that is having trouble. Open it in Adobe Reader and look at the transition between pages 11 & 12.
On checking this out further, it appears that the java.awt.print.PrinterJob is calling print() for each pageIndex twice. This might be a clue.

The problem with the pages with darker colors is that they include a pattern object with a transparent image. When transparency is involved, Adobe Acrobat switches automatically to a custom CMYK profile and this causes the darker colors. Only Acrobat does this, other viewers behave just fine. The solution is either to remove the pattern object with the transparent image (it seems to be a drawing artifact of the PDF generator engine, it is not used anywhere on the page) or you can make the page part of a transparency group and specify the transparency group to use RGB colorspace.

Several different possibilities, yes.
Different color matching. If you're using a "calibrated" color space on one page and a "device" color space on another, the same RGB/CMYK values can produce visually different values.
If the graph is inside a Form XObject, the same graph can appear differently depending on the current graphic state when the form is drawn.
If you could post a link to your PDF, I could probably give you a specific answer.
Ouch. That PDF is painful to shclep through. I'd like to have some words with whoever wrote their PDF converter. Harsh ones. Lots of unnecessary clipping ("text" is being clipped hither and yon, page 7 for example), poor use of patters for images, but not using patters when it would actually help, drawing text as paths, and on and on...
EDIT: Which is precisely the sort of stuff you see when rendering Java UI via a PdfGraphics2D object. You CAN keep the text as text though. It's just a matter of how you create the PdfGraphics2D instance.
Okay, so the color of the line itself is identical. 0 1 0.4 RG. HOWEVER, there is some "transparency stuff" going on.
On pages that have images with soft masks or extended graphic states that change the transparency, the green line appears darker. On pages without, it appears brighter.
I suspect that all those other PDF viewers that draw the lines consistently don't support transparency at all, or only poorly.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.