Apache POI: Word get picture dimensions

Apache POI: Word get picture dimensions - java

I am trying to determine the actual size of embedded images in MS Word documents using
HWPFDocument docB = new HWPFDocument(fileInputStream);
PicturesTable picB = docB.getPicturesTable();
List picturesB = picB.getAllPictures();
for (Object o : picturesB) {
Picture pic = (Picture) o;
int height = pic.getHeight();
int width = pic.getWidht();
}
This works fine for some images but for others getHeight() and getWidth() will only return -1 as stated in the documentation.
So is there any other way to get the actual size of these pictures in the document?

Instead of getHeight() and getWidth() you may resort to getDxaGoal(), resp. getDyaGoal(). These represent the original size of the image prior to scaling / cropping in twips.
You can then multiply this with getVerticalScalingFactor() / 1000.0 resp. getHorizontalScalingFactor() / 1000.0 to get the final (rendered) size. Division by 1000.0 is necessary because those scaling factors are given per mille.

Related

How to center a scaled page in PDFBox

I am trying to center the content of my page after scaling it by a factor X. I have tried using the Matrix.translate function but I always end up getting the wrong position, except when scaling with a factor of 0.5 (which makes totally sense to me).
My current code:
for (int i = 0; i < doc.getNumberOfPages(); i++) {
pdfBuilder.addPage(doc.getPage(i));
PDPage p = pdfBuilder.getDocument().getPage(i);
Matrix matrix = new Matrix();
float scaleFactor = 0.7f;
float pageHeight = p.getMediaBox().getHeight();
float pageWidth = p.getMediaBox().getWidth();
float translateX = pageWidth * (1 - scaleFactor);
float translateY = pageHeight * (1 - scaleFactor);
matrix.scale(scaleFactor, scaleFactor);
matrix.translate(translateX, translateY);
PDPageContentStream str = new PDPageContentStream(pdfBuilder.getDocument(), p, AppendMode.PREPEND,
false);
str.beginText();
str.transform(matrix);
str.endText();
str.close();
}
I have also tried other boxes like the cropBox and bBox but I think I am totally wrong in what I do right now. Please help me! :)
Update
I finally found a solution. The new translation values I am using now look like the following.
float translateX = (pageWidth * (1- scaleFactor)) / scaleFactor / 2;
float translateY = (pageHeight * (1- scaleFactor)) / scaleFactor / 2;

Update I finally found a solution. The new translation values I am using now look like the following.
float translateX = (pageWidth * (1- scaleFactor)) / scaleFactor / 2;
float translateY = (pageHeight * (1- scaleFactor)) / scaleFactor / 2;
First of all, it is important to note what #mkl said.
The crop box may be the box you should use instead of the media box.
The code implicitly assumes that the lower left corner of the (media/crop) box is the origin of the coordinate system. This often is the case but not always.
The code only scales the static content, not annotations.
Now the explanation of the translation (e.G. translation for the page height). PLEASE NOTE THAT I AM NOT A MATHEMATICIAN AND I JUST TRIED DIFFERENT WAYS AND THIS IS THE ONE THAT WORKED FOR ME
Firsly, we multiply the page height with the opposite of the scale factor pageHeight * (1 - scaleFactor). We need the opposite because the smaller we scale something the more it needs to move from a given position. If we use the normal scale factor here, the smaller we scale an image the less it will translate to the centre.
Now the problem is that the translation is still off. Overall it moves the scaled content in the right direction, but just not into the centre. Therefore I tried dividing the calculated factor in the step before through the half of the scale factor. We use the half here because we want the content to appear in the centre. I don't know exactly why it is precisely this value, but as I said it just worked for me!
If you know why this works, feel free to edit this answer :)

Algorithm to resize an image below a size in bytes depending on original image resolution

Is there a good algorithm to resize images below a file size (lets say 250kb) depending on the original image dimensions?
Currently in java i have:
public static byte[] resize(byte[] data, int desiredSize, String type) throws IOException {
BufferedImage img = arrToBufferedImage(data);
int factor = 10;
double aspectRatio = (double) img.getWidth() / (double) img.getHeight();
while (data.length / 1024 > desiredSize) {
data = imgToBytes(resize(img, img.getWidth() - factor, (int) (img.getHeight() - (factor / aspectRatio))), type);
img = arrToBufferedImage(data);
}
return data;
}
which just removes 10px from the width and some from the height to maintain aspect ratio.
The problem im having is when an image is already quite small (lets say 260 kb), scaling it by a factor of 10 px would decrease the file size by alot, ive seen it become 17 kb just by removing pixels with the factor 10.
So im looking for an algorithm that would use a lower factor for smaller image dimensions, and when having large image dimension, it would use a larger factor.
I have tried myself but im not good enough at maths to come up with such formula.

unable to calculate itext PdfPTable/PdfPCell height properly

I'm facing a problem while trying to generate a PdfPTable and calculate its height before adding it to a document. The method calculateHeights of PdfPTable returned the height a lot greater than the height of a page (while the table is about 1/4 of page's height), so I wrote a method to calculate the height:
protected Float getVerticalSize() throws DocumentException, ParseException, IOException {
float overallHeight=0.0f;
for(PdfPRow curRow : this.getPdfObject().getRows()) {
float maxHeight = 0.0f;
for(PdfPCell curCell : curRow.getCells()) {
if(curCell.getHeight()>maxHeight) maxHeight=curCell.getHeight();
}
overallHeight+=maxHeight;
}
return overallHeight;
}
where getPdfObject method returns a PdfPTable object.
Using debugger I've discovered that lly and ury coordinate difference (and thus the height) of cell's rectangle is much bigger than it looks after adding a table to a document (for example, one cell is 20 and the other is 38 height while they look like the same on a page). There is nothing in the cell except a paragraph with a chunk in it:
Font f = getFont();
if (f != null) {
int[] color = getTextColor();
if(color != null) f.setColor(color[0],color[1],color[2]);
ch = new Chunk(celltext, f);
par = new Paragraph(ch);
}
cell = new PdfPCell(par);
cell.setHorizontalAlignment(getHorizontalTextAlignment());
cell.setVerticalAlignment(getVerticalTextAlignment());
A table then has a cell added and setWidthPercentage attribute set to a some float.
What am I doing wrong? Why does cell's proportions are different from those I see after generating PDF? Maybe I'm calculating the height wrong? Isn't it the height of a cell on a PDF page should strictly be the difference between lly and ury coordinates
Sorry I haven't shown the exact code, because the PDF is being generated of XML using lots of intermediate steps and objects and it is not very useful "as is" I guess...
Thanks in advance!

The height of table added to a page where the available width is 400 is different from the height of a table added to a page where the available width is 1000. There is no way you can measure the height correctly until the width is defined.
Defining the width can be done by adding the table to the document. Once the table is rendered, the total height is known.
If you want to know the height in advance, you need to define the width in advance. For instance by using:
table.setTotalWidth(400);
table.setLockedWidth(true);
This is explained in the TableHeight example. In table_height.pdf, you see that iText returns a height of 0 before adding a table and a height of 48 after adding the table. iText initially returns 0 because there is no way to determine the actual height.
We then take the same table and we define a total width of 50 (which is much smaller than the original 80% of the available width on the page). Now when we calculate the height of the table with the same contents, iText returns 192 instead of 48. When you look at the table on the page, the cause of the difference in height is obvious.

Inorder to get dynamic table height we should set and lock width of table.
Here, 595 is A4 size paper width.
table.setTotalWidth(595);
table.setLockedWidth(true);

resizing images in Android

i want to resize my images (orignal size is 1080p) but they dont resize properly and i dont know why. The Images just dont have the right size sometimes. On my emulator and my old 800*480 smartphone it works fine but on my nexus 4 with 1280*768 things dont look right. There is no problem reading the right screen resolution. There is just a bug with my resize procedure. Please help me.
private float smaller;
smaller = height/1080; //height is screenheight; in my case its 768 because of landscape
object.bitmap = Bitmap.createScaledBitmap(bitmap,(int)(smaller*bitmap.getWidth()) ,(int)(smaller*bitmap.getHeight()), true);
In the end the height is not resized to 768/1080*bitmapheight and i dont know why.
edit:
these are screenshots of my programm showing the images have not the same height
First image:
imgur.com/STSgAOd,Wh3fVdX
Second:
imgur.com/STSgAOd,Wh3fVdX#1
As you can see the Images are not equal in terms of height. On my emulator and my old smartphone they look right. The Images should not touch the bottom but on my nexus 4 they do.
also tryed double:
private double factor;
factor = ((double)screenheight/(double)1080);
objekte.bitmap1 = Bitmap.createScaledBitmap(bitmap,(int)(factor*bitmap.getWidth()) ,(int)(factor*bitmap.getHeight()), true);
same bad result

You asume the height needs to resize the most(look at your height/1080). I might be that the width has to resize the most. I use this to scale them:
//Calculate what scale is needed
double xFactor = (double)image.Width/(double)ScreenWidth;
double yFactor = (double)image.Height/(double)ScreenHeight;
double factor = xFactor;
if(yFactor>xFactor){
factor = yFactor;
}
int imageWidth = Convert.ToInt32(im.Width / factor);
int imageHeight = Convert.ToInt32(im.Height / factor);
Note: this is written in C#. It needs some changes to work.
Note2: this makes sure the image will be full screen.(as much as possible, because it is scalled)

It's caused by integer division, you can see that with -
public static void main(String[] args) {
int height = 768;
float smaller = (float) height / 1080; // <-- force float
float test = height / 1080; // <-- integer division,
// and assign the int result to the float.
System.out.println("test: " + test);
System.out.println("smaller: " + smaller);
}
Output is
test: 0.0
smaller: 0.7111111

Get the font height of a character in PDFBox

There is a method in PDFBox's font class, PDFont, named getFontHeight which sounds simple enough. However I don't quite understand the documentation and what the parameters stand for.
getFontHeight
This will get the font width for a character.
Parameters:
c - The character code to get the width for.
offset - The offset into the array. length
The length of the data.
Returns: The width is in 1000 unit of text space, ie 333 or 777
Is this method the right one to use to get the height of a character in PDFBox and if so how? Is it some kind of relationship between font height and font size I can use instead?

I believe the answer marked right requires some additional clarification. There are no "error" per font for getHeight() and hence I believe it is not a good practice manually guessing the coefficient for each new font.
Guess it could be nice for your purposes simply use CapHeight instead of Height.
float height = ( font.getFontDescriptor().getCapHeight()) / 1000 * fontSize;
That will return the value similar to what you are trying to get by correcting the Height with 0.865 for Helvetica. But it will be universal for any font.
PDFBox docs do not explain too much what is it. But you can look at the image in the wikipedia Cap_height article to understand better how it is working and choose the parameter fit to your particular task.
https://en.wikipedia.org/wiki/Cap_height

EDIT: Cap height was what I was looking for. See the accepted answer.
After digging through the source of PDFBox I found that this should do the trick of calculating the font height.
int fontSize = 14;
PDFont font = PDType1Font.HELVETICA;
font.getFontDescriptor().getFontBoundingBox().getHeight() / 1000 * fontSize
The method isn't perfect though. If you draw a rectangle with the height 200 and a Y with the font size 200 you get the font height 231.2 calculated with the above method even though it actually is printed smaller then the rectangle.
Every font has a different error but with helvetica it is close to 13.5 precent too much independently of font size. Therefore, to get the right font height for helvetica this works...
font.getFontDescriptor().getFontBoundingBox().getHeight() / 1000 * fontSize * 0.865

Maybe use this?
http://pdfbox.apache.org/apidocs/org/apache/pdfbox/util/TextPosition.html
Seems like a wrap-around util for text. I haven't looked in the source if it accounts for font error though.

this is a working method for splitting the text and finding the height
public float heightForWidth(float width) throws IOException {
float height = 0;
String[] split = getTxt().split("(?<=\\W)");
int[] possibleWrapPoints = new int[split.length];
possibleWrapPoints[0] = split[0].length();
for (int i = 1; i < split.length; i++) {
possibleWrapPoints[i] = possibleWrapPoints[i - 1] + split[i].length();
}
float leading = font.getFontDescriptor().getFontBoundingBox().getHeight() / 1000 * fontSize;
int start = 0;
int end = 0;
for (int i : possibleWrapPoints) {
float w = font.getStringWidth(getTxt().substring(start, i)) / 1000 * fontSize;
if (start < end && w > width) {
height += leading;
start = end;
}
end = i;
}
height += leading;
return height + 3;
}

For imported True Type Fonts the total height of the font is
(org.apache.pdfbox.pdmodel.font.PDFont.getFontDescriptor().getDescent() + org.apache.pdfbox.pdmodel.font.PDFont.getFontDescriptor().getAscent() + org.apache.pdfbox.pdmodel.font.PDFont.getFontDescriptor().getLeading()) * point size * org.apache.pdfbox.pdmodel.font.PDFont.getFontMatrix().getValue(0, 0)
You will find that font.getFontDescriptor().getFontBoundingBox().getHeight() is 20% larger than the above value as it includes a 20% leading on the above value, but if you take the top value and remove 20%, the font will be right next too each other

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.