Incorrect / missing font metrics in Java?

Incorrect / missing font metrics in Java? - java

Using a certain font, I use Java's FontLayout to determine its ascent, descent, and leading. (see Java's FontLayout tutorial here)
In my specific case I'm using Arial Unicode MS, font size 8. Using the following code:
Font font = new Font("Arial Unicode MS", 0, 8);
TextLayout layout = new TextLayout("Pp", font,
new FontRenderContext(null, true, true));
System.out.println( "Ascent: "+layout.getAscent());
System.out.println( "Descent: "+layout.getDescent());
System.out.println( "Leading: "+layout.getLeading());
Java gives me the following values:
Ascent: 8.550781
Descent: 2.1679688
Leading: 0.0
So far so good. However if I use the sum of these values as my line spacing for various lines of text, this differs by quite a bit from the line spacing used in OpenOffice, Microsoft Word, etc.: it is smaller. When using default single line spacing Word and OO seem to have a line spacing of around 13.7pt (instead of 10.7pt like I computed using Java's font metrics above).
Any idea
why this is?
whether I can somehow access the font information Word and OpenOffice seem to be accessing which leads to this different line spacing?
Things I've tried so far:
adding all glyphs to a glyph vector with font.getNumGlyphs() etc. - still get the same font metrics values
using multiple lines as described here - each line I get has the same font metrics as outlined above.
using FontMetrics' methods such as getLeading()

Zarkonnen doesn't deserve his downvotes as he's on the right lines. Many Java fonts appear to return zero for their leading when perhaps they shouldn't. Maybe it is down to this bug: I don't know. It would appear to be down to you to put this whitespace back in.
Typographical line height is usually defined as ascent + descent + leading. Ascent and descent are measured upwards and downwards from the baseline that characters sit on, and the leading is the space between the descent of one line and the ascent of the line underneath.
But leading is not fixed. You can set the leading in most Word-processing and typographical software. Word calls this the line-spacing. The original question is probably asking how Microsoft Word calculates its single line spacing. Microsoft's recommendations for OpenType fonts seem to suggest that software on different platforms calculate it differently. (Maybe this is why Java now returns zero?)
A quick bit of Googling around seems to indicate that a rule of thumb for leading is 120% of ascent+descent for single-line spacing, or a fixed point spacing; say 2pts leading between all lines. In the absence of any hard or fast rule I can find, I would say it boils down to the legibility of the text you're presenting, and you should just go with what you think looks best.

Are Word and OO including the white space between lines, while Java isn't?
So in Word / OO, your number is Ascent + Descent + Whitespace, while in Java you just have Ascent + Descent?

Related

Absolute positions in PDFBox

I'm writing a program that converts TeX-generated PDFs back to a TeX-like string of text. In order to achieve that I use Apache PDFBox.
I would like to be able to detect subscripts, superscripts and then use a TeX-like method to denote them. I have read this question: Superscript and subscript differentiation using pdf box which isn't really helpful because it is impossible to detect subscripts and superscripts using Y and EndY probably because they are relative. Is there any way to detect the absolute position of text? The height of a glyph is actually easy to obtain as long as people use old TeX fonts though so I can easily detect font size change.

Unexpected padding to the left of first letter/unexpected typographic behaviour in JTextPane

I've implemented a multiline label by extending a JTextPane. The constructor sets various properties to make it look like a label, including disabling any border/setting margins to 0 which works well.
Environment:
using jgoodies-looks-2.6.0
setting the com.jgoodies.looks.windows.WindowsLookAndFeel L&F (also tested with javax.swing.plaf.metal.MetalLookAndFeel, same problem)
Windows 8 x64
Java SE 1.7
When I increase the font size, the first letter sometimes has "blank space"/a margin of ~1px at 19pt (probably increasing with font size) to its left. This happens at least for letters B, F and L, but certainly not for A. Here's an example:
On the left you can clearly see that the layout looks broken with the title having this weird margin on the left. Please note that the first line with the number (1861) is a regular JLabel.
Zooming in confirms this (the pink line is for illustration):
So from what I can see the typesetting is improper.
Can this be considered a bug in swing? Is there a way to solve this? Eg. is there an easy and clean (ie. not paint()-ing) way to have fine-grained control over typographic features in swing in this context?
EDIT:
This is similar to what I would expect:
vs before:

If you look at your screenshot here:
And in particular look at the 1861...you can see that there is a larger space on both sides of the 1. In particular the gap between 1 and 8 and between 6 and 1 is larger than that between 8 and 6.
That is just how the layout has been arranged on that particular font. They clearly thought that a 1 was getting pushed too close to the characters around it and so they added more space on both sides.
Your options to "fix" this are limited.
Use a different font.
Render the line to an image, scan for empty columns, shuffle it left
Build in a few manual hacks for common characters (i.e. if string starts with 1 then shuffle the line left 1 pixel
Indent or outdent the title deliberately so it's not lined up and then the offset is no longer visible.
i.e.
1861
Baked Beans
dkjfdf skdfjsdlf

Line Break Height in android

Now I am having a code which counts the height and width of a paragraph and sets it accordingly. However I have been having this strange problems whenever a break line(\n) passes through my paragraph I use this code to calculate my Height. I also calculate the width and make sure a line is properly fit.
float textSize = t.getTextSize();
Paint paint = new Paint();
paint.setTextSize(textSize);
However for some reason a break line couldn't have the height calculated which would mean me missing a few lines or show me half a line cause of the break lines during my performed calculations.
My question is, how would I undergo the calculation of the height of a break line of the space it occupies?

I wasn't able to solve this issue. However I did try to just delete 3 more extra characters at the edge of the end point of the width. It worked. However the real problem lies more in the character width. If a character is not registered with android a calculation vs the actual out come can be very different if you have letters that are completely different that off the regular alphabet.
Using this code you can determine the edge of the endpoint.
totalCurrentWidth = t.getPaint().measureText(s.substring(start, end));
However characters not registered in the system may have a different end or no end at all(Chinese or taiwan for example).
During each individual characters used in verdana it produces a different spacing compared to the actual outcome of the text.
If anyone find something wrong with my logic feel free to comment me. I only strive to improve after all.

reversed Arabic when printing PDF

I'm trying to print Arabic in some PDF documents using the Java code found here :
http://www.java2s.com/Code/Java/PDF-RTF/ArabicTextinPDF.htm
The example works great, except that the text comes out backwards. For example, changing the example slightly :
String txt = "\u0623\u0628\u062c\u062f\u064a\u0629 \u0639\u0631\u0628\u064a\u0629";
System.out.println(txt);
g2.drawString(txt, 100, 30);
What is printed on the screen are the same characters but in the opposite direction, compared to the PDF. The console output is correct, the PDF is not.
I don't want to simply reverse the characters because otherwise I would lose bi-directional support ...
Thanks much

IIRC, iText supports Arabic shaping at a highler level than drawString. Lets see here...
Ah! ColumnText.showTextAligned(PdfContentByte canvas, int alignment, Phrase phrase, float x, float y, float rotation, int runDirection, int arabicOptions)
Alignment is one of Element.ALIGN_*. Run direction is one of PdfWriter.RUN_DIRECTION_*. Arabic options are bit flags, ColumnText.AR_*
That should do the trick, with one caveat: I'm not sure that it'll handle multiple directions in the same phrase. Your test string has CJKV, Arabic, and Latin characters, so there should be two direction changes.
Good luck.

Figured it out, here is the complete process :
document.open();
java.awt.Font font = new java.awt.Font("times", 0, 30);
PdfContentByte cb = writer.getDirectContent();
java.awt.Graphics2D g2 = cb.createGraphicsShapes(PageSize.A4.width(), PageSize.A4.height());
g2.setFont(font);
String txt = "日本人 أبجدية عربية Dès Noël où";
System.out.println(txt);
java.awt.font.FontRenderContext frc = g2.getFontRenderContext();
java.awt.font.TextLayout layout = new java.awt.font.TextLayout(txt, font, frc);
layout.draw(g2, 15, 55);
g2.dispose();
document.close();
You'll notice it does multiple languages with bi-directional support. Only thing is it's impossible to copy/paste the resulting PDF text, as it is an image. I can live with that.

Unicode Arabic (or anything else) is always in logical order in a Java program. Some PDFs are made in visual order, though this is quite rare in the modern world. The program you cite might be a hack that ends up with PDF's that work, sort of, for some purposes.
If I were you, I'd start by examining some PDF's produced in Arabic by some modern tool.
This sort of 'graphics' approach to PDF construction seems risky to me at best.

How to detect and remove guide lines from a scanned image/document efficiently?

For my project i am writing an image pre processing library for scanned documents. As of now I am stuck with line removal feature.
Problem Description:
A sample scanned form:
Name* : ______________________________
Age* : ______________________________
Email-ID: |_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|
Note:
Following are the further conditions:
The scanned document may contain many more vertical and horizontal guiding lines.
Thickness of the lines may exceed 1px
The document itself is not printed properly and might have noise in the form of ink bloating or uneven thickness
The document might have colored background or lines
Now what I am trying to do is to detect these lines and remove them. And while doing so the hand written content should not be lost.
Solution so for:
The current solution is implemented in Java.
Detected these lines by using a combination of canny/sobel edge detectors and a threshold filter(to make image bitonal). From the previous action I get a black and white array of pixels. Traverse the array and check whether lumanicity of that pixel falls below a specified bin value. And if I found 30 (minimum line length in pixels) such pixels, I remove them. I repeat the same for vertical lines but considering the fact there will be cuts due to horizontal line removal.
Although the solution seems to work. But there are problems like,
Removal of overlapping characters
If characters in the image are not properly spaced then it is also
considered as a line.
The output image from edge detection is in black and white.
A bit slow. Normally takes around 40 seconds for image of 2480*3508.
Kindly guide how to do it properly and efficiently. And if there is an opensource library then please direct.
Thanks

First, I want to mention that I know nothing about image processing in general, and about OCR in particular.
Still, a very simple heuristic comes to my mind:
Separate the pixels in the image to connected components.
For each connected component decide if it is a line or not using one or more of the following heuristics:
Is it longer that the average letters length?
Does it appear near other letters? (To remove ink bloats or artifacts).
Does its X gradient and Y gradient large enough? This could make sure that this connected component contains more than just horizontal line.
The only problem I can see is, if somebody writes letters on a horizontal line, like so:
/\ ___
/ \ / \
|__| |___/
-|--|---|---|------------------
| | \__/
In that case the line would remain, but you have to handle this case anyhow.
As I mentioned, I'm by no means an image processing expert, but sometimes very simple tricks work.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.