String.format for double-width characters

String.format for double-width characters - java

Java's String.format does not appear to be aware of double-width characters, such as Japanese or Chinese:
System.out.println(String.format("%1$9s: %2$20s : %3$20s\n", "field", "expected", "actual"));
System.out.println(String.format("%1$9s: %2$20s : %3$20s\n", "surface", "駆け", "駆け"));
The output is not aligned correctly:
field: expected : actual
surface: 駆け : 駆け
Is there a correct way to format double-width characters with String.format? If not, is there an alternative method or library which is capable of doing this correctly?

There is no issue with Java's String.format() since it can't "know" how you want to render the text, or the font that will be used. Its role is purely to assemble a formatted string of text to be subsequently displayed. The visual appearance of that formatted text is controlled (primarily) by the display font, and the developer must explicitly set the formatting accordingly.
A simple solution would be to use a font that renders both Latin and CJK characters with glyphs of constant width, but I couldn't find one. See a Unicode Technical Report titled "East Asian Width" for more details:
For a traditional East Asian fixed pitch font, this width translates
to a display width of either one half or a whole unit width. A common
name for this unit width is “Em”. While an Em is customarily the
height of the letter “M”, it is the same as the unit width in East
Asian fonts, because in these fonts the standard character cell is
square. In contrast, the character width for a fixed-pitch Latin font
like Courier is generally 3/5 of an Em.
I'm guessing that there might not be any monospace font displaying CJK characters and Latin characters with the same width simply because it would look very strange. For example, imagine the two Latin characters "li" occupying the same width as the two Japanese characters "駆け". So even if you use a monospaced font to render both Latin and CJK characters, although the characters for each language are monospaced, the widths for each language are probably still different.
Google has a very helpful site for evaluating their fonts, which allows you to:
Filter the fonts by language: Japanese, Chinese, etc.
View a large number of characters being rendered. For example this page for Noto Sans JP shows:
The Japanese glyphs are wider than the Latin glyphs.
The Japanese glyphs are fixed width, whereas the Latin glyphs are not.
Enter any text you wish, and apply it to all selected fonts for comparison. For example, this screen shot shows how the Latin glyphs for AEIOUY look alongside some Japanese glyphs using different fonts. Note that the width of the Latin glyphs is always smaller, though by varying amounts, depending on the font being used and the specific glyph to be rendered:
Here's a possible solution to your alignment problem:
With the Kosugi Maru font (middle of top row in the screen shot above), Japanese characters seem to be exactly twice as wide as Latin characters, so use that font to render the output.
When rendering the formatted text, the leading spaces must be reduced by one for each Japanese character to be displayed to ensure column alignment (since Japanese glyphs are twice as wide).
So in the code reduce the number of leading spaces by the number of Japanese glyphs to be rendered:
System.out.println("* The display font is named MotoyaLMaru, created by installing Google font KosugiMaru-Regular.ttf.");
System.out.println("* With this font Japanese glyphs seem to be twice the width of Latin glyphs.");
System.out.println("* Downloaded from https://fonts.google.com/specimen/Kosugi+Maru?selection.family=Kosugi+Maru");
System.out.println(" ");
System.out.println(String.format("%1$9s: %2$20s : %3$20s\n", "field", "expected", "actual"));
System.out.println(String.format("%1$9s: %2$18s : %3$18s\n", "surface", "駆け", "駆け")); // 18, not 20!
System.out.println(String.format("%1$9s: %2$12s : %3$12s\n", "1234567", "川土空田天生花草", "川土空田天生花草")); // 12, not 20!
This is the output from running that code in NetBeans on Windows 10, showing the columns properly aligned:
Notes:
The format strings were hard-coded in this example to ensure column alignment, but it would be simple to dynamically build the format string based on the number of Japanese characters to be rendered.
Also see Monospace font that supports both English and Japanese.

Related

Can Java initialize Tahoma font for other than size 13 for Japanese unicode characters

As described in this wiki
Not all fonts have glyphs for Japanese Unicode codepoints
(We're running Java 8)
The below test display shows text in Latin, Japanese, Arabic, Korean and Russian characters.
For Japanese and Korean, Java seems to import glyphs from another font into the Tahoma font for font-size 13 only.
Is there a way to let Java initialize Tahoma (or another font for that matter) by importing Japanese glyphs for multiple font-sizes?

Why "Courier New" font doesn't have all unicode signs with equal width?

I have hexdump view in my application:
I use Courier New font in java:
private final Text contentText;
contentText.setFont(Font.font("Courier New"));
But as you can see some unicode signs have more width.
There is some way or another font which make all signs with equal width?

You're seeing the results of font fallback, which substitutes something other than the original font you specified when you try to render characters that aren't in that font. In this case, the kana and several other characters (Won sign, others) are not present in Courier New, so you get some other font whose metrics do not match those of Courier New.
There's no simple solution to this, particularly if you expect to be displaying a wide range of characters. What you could possibly do is set up a filter, as many hex editors do, and just show a '.' or similar for anything non-ASCII (or in this case you might be able to do a little detective work and set it up to show '.' for anything that is not present in Courier New font).

Automatic font selection for Chinese character sets

I see iText supports automatic font selection based on individual glyphs. This works by searching font libs included in the FontSelector for the particular glyph and selects a character from a different font lib if it does not exist in font libs defined higher up the list.
I have been able to configure font extentions in Jasper to support asian and latin character sets by choosing a large unicode font library such as "Arial Unicode MS" which has a super comprehensive character set. However good looking libraries like that are subject to pricey licensing!! Not surprisingly considering the amount of work involved... Any single free font libraries supporting both chinese and latin are generally created to cater for effective chinese character rendering and the latin characters' beauty suffer as a result ;)
So final a question: is there a mechanism for utilising iText's automatic font selection feature in jasper?

From what I understand; you want to use the users own fonts in stead of supplying "Arial Unicode MS" with your application.
public class FontList extends ArrayList<String>
{
public FontList()
{
for (final String fontName : GraphicsEnvironment.getLocalGraphicsEnvironment().getAvailableFontFamilyNames())
{
final Font f = new Font(fontName, Font.PLAIN, 10);
//first character in CJK area
if (f.canDisplay('\u4E00'))
{
add(fontName);
}
}
}
}
A new FontList will give you a list with all available fonts for chinese, japanese and korean.
You will need more logic to 'automatically' select the 'best' font; you could look for "Arial" in its name, to give it a higher priority. "Ume" and "WenQuanYi" and finally for "Dialog"
Maybe best to let the user choose which one to use. If no fonts are available, give advice on how to install the fonts e.g..

Setting the width of the characters of a font to be uniform in java

Good Day! I am currently working on a program that prints a string in the printer. I am having problems regarding the display of the string to be printed. Ideally I want the font width of every character to be uniform. The problem occurs when when other language characters are set to be displayed. I've noticed that japanese characters are larger in width than the normal characters. Can I set these japanese characters to follow the font width of the normal characters?
Example:
NNNNN
ＮＰＤ事本
Notice that the string with japanese encoding is larger in width. How can I make this string to follow the font width of my designated font? Is there a way? or is my case hopeless? Thanks in advance.

There are such things as monospace fonts, where each character takes the same width. I do not know if Japanese has such a font, or if there are English fonts and Japanese fonts that are both monospace and have the same width in pixels.
https://stackoverflow.com/questions/586503/complete-monospaced-unicode-font
The above question was closed as "off-topic" but it has some good links. The general consensus seemed to be you are either hosed, or in for a nightmare of working with different fonts in the same document.

Incorrect / missing font metrics in Java?

Using a certain font, I use Java's FontLayout to determine its ascent, descent, and leading. (see Java's FontLayout tutorial here)
In my specific case I'm using Arial Unicode MS, font size 8. Using the following code:
Font font = new Font("Arial Unicode MS", 0, 8);
TextLayout layout = new TextLayout("Pp", font,
new FontRenderContext(null, true, true));
System.out.println( "Ascent: "+layout.getAscent());
System.out.println( "Descent: "+layout.getDescent());
System.out.println( "Leading: "+layout.getLeading());
Java gives me the following values:
Ascent: 8.550781
Descent: 2.1679688
Leading: 0.0
So far so good. However if I use the sum of these values as my line spacing for various lines of text, this differs by quite a bit from the line spacing used in OpenOffice, Microsoft Word, etc.: it is smaller. When using default single line spacing Word and OO seem to have a line spacing of around 13.7pt (instead of 10.7pt like I computed using Java's font metrics above).
Any idea
why this is?
whether I can somehow access the font information Word and OpenOffice seem to be accessing which leads to this different line spacing?
Things I've tried so far:
adding all glyphs to a glyph vector with font.getNumGlyphs() etc. - still get the same font metrics values
using multiple lines as described here - each line I get has the same font metrics as outlined above.
using FontMetrics' methods such as getLeading()

Zarkonnen doesn't deserve his downvotes as he's on the right lines. Many Java fonts appear to return zero for their leading when perhaps they shouldn't. Maybe it is down to this bug: I don't know. It would appear to be down to you to put this whitespace back in.
Typographical line height is usually defined as ascent + descent + leading. Ascent and descent are measured upwards and downwards from the baseline that characters sit on, and the leading is the space between the descent of one line and the ascent of the line underneath.
But leading is not fixed. You can set the leading in most Word-processing and typographical software. Word calls this the line-spacing. The original question is probably asking how Microsoft Word calculates its single line spacing. Microsoft's recommendations for OpenType fonts seem to suggest that software on different platforms calculate it differently. (Maybe this is why Java now returns zero?)
A quick bit of Googling around seems to indicate that a rule of thumb for leading is 120% of ascent+descent for single-line spacing, or a fixed point spacing; say 2pts leading between all lines. In the absence of any hard or fast rule I can find, I would say it boils down to the legibility of the text you're presenting, and you should just go with what you think looks best.

Are Word and OO including the white space between lines, while Java isn't?
So in Word / OO, your number is Ascent + Descent + Whitespace, while in Java you just have Ascent + Descent?

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.