Japanese in JTextArea - java

I have a database with japanese words. Additionaly i have algorithm that reads these words and put them into JTextArea.
Problem is I see rectangles instead of japanese signs.
But when i copy such a set of rectangles (ctrl+c) from JTA and put them into eg. command input of TotalCommander or Winword document, signs appears are displayed properly. But only under Win7.
Because i run Eclipse on Virtual Machine under winXP I have ability of copy rectangles also to command input of TotalCommander under winXP. There are remain rectangles as in my Java app.
It means that there is in JTA an info about particular signs, but JTA can't interpretes this info.
Of course I have installed proper font.
I've tried many way with fonts:
textArea.setFont(new Font(blablabla));
and similar, but without effects.
What should i do?

The Problem with your JTextArea is most-probably, that the font you're using isn't applicable for UTF-8 & Japanese. The font doesn't provide an mapping table from UTF-8 values to characters. i.e. 0x41 is in ASCII, as well as in UTF-8 and even SHIFT-JIS the letter 'A' - but the Font you're just linking, resolves 0x41 to an Kanji character. And the whole font doesn't contain Hiragana and Katakana characters at all - please see also the comments section on the site where you got this font from here.
After using ChapMap it has a WSIfonts TAG and does NOT! support ALL the Chinese characters it only has 90 characters and assigns 1 character per Char except Caps.
It's a chinese font - not a Japanese one. But it won't even provide all chinese characters and has no useful mapping table included - so it's pretty useless.
Try to use another font - that should work just fine, if it contains really japanese characters and provides an applicable mapping table for UTF-8.
You can find fonts, that would work i.e. here

Related

PDFBox Embeded fonts not working when filling form

I fill forms with field.setValue. However even though PDF document has embedded fonts in it, I am getting error "is not available in this font's encoding: WinAnsiEncoding" no matter which type of font it is. Note that this is happening for chinese or russian characters.
Your PDF documents may have embedded fonts but they apparently have been embedded with an Encoding value WinAnsiEncoding.
WinAnsiEncoding contains essentially the Latin-15 characters, so it is intended for “Western European” languages (see the Wikipedia article on this) and in particular neither for Cyrillic nor for CJK languages.
If you want to fill chinese or russian characters into form fields using PDFBox, therefore, you have to
either embed a font into your PDF using an appropriate encoding beforehand
or replace the embedded font with PDFBox right before setting the form field value, see for example this answer.

Rendering a string in Java which is in different languages english, chinese and Indic

I need to render the below String on a JTable cell. How do we do this?
testä漢字1ગુજરાતી2
Update:
Looks like my question is not clear. The above string is a name of file. Need to display this and many other file names in JTable. Data comes dynamically. I may need a custom renderer to display this string exactly same. Currently, this displays junk characters. Simply changing the table Font from Calibri to MS Gothic, I can see the chinese characters, but not indic letters. But, as the data comes dynamically, we won't be knowing what font to use.
So, want to know if there is way so that I can check the string programmatically and render the string with different fonts as appropriate.
The simplest solution would be to use a font which is more complete. I believe the DejaVu family is decent, others may be able to suggest better. The Oracle manual suggests that the logical fonts Dialog, DialogInput, Monospaced, Serif and SansSerif are also likely to be more complete. Potentially they could map to multiple underlying fonts depending on the specific characters which need to be rendered. Oracle also mentions the Lucidia family which is distributed with Oracle's JRE as another possibility which is fairly complete, though it doesn't have Chinese, Korean or Japanese characters.
A more convoluted solution would be to run Character.UnicodeBlock.of(c) on the characters in each string, assemble a Set<Character.UnicodeBlock> and guess which font is most appropriate based on the blocks present in the string, or even write a custom renderer to render each character (or sequence of characters) with a font appropriate to the Unicode block they belong to. Unicode blocks tend to be categorized according to the script they contain.

ANSI value of RTL character

I want to know the ANSI value of the character "\u202B" that make RTL alignment in the text file, the problem that I've used it in UTF8 file and it makes the text RTL but when the text file is ANSI it shows marks "???" that means that this character not identified, so any one can know what's the opposite code for this character in ANSI?
Windows-1256 is the "ANSI code page" if the system locale is set to Arabic.
A misnomer, but that what is called by all MS documentation...
In the Windows world "ANSI code page" should read "system code page"
Anyway, U+202B has no equivalent in in windows-1256.
You can probably achieve what you need with
U+200E LEFT-TO-RIGHT MARK 0xFD in windows-1256
U+200F RIGHT-TO-LEFT MARK 0xFE in windows-1256
There isn't one. ANSI is a pretty old standard by the American National Standards Institute. It doesn't support RTL languages like Arabic or Hebrew.
The Wikipedia Article "ANSI escape code" lists all the codes that it supports.
The workaround is to use a font which renders the glyphs (characters) you need, print them in the opposite order and use cursor movement commands to right align the text.
[EDIT] You're confusing a couple of things. First of all, ANSI is a set of escape sequences to control your terminal.
ASCII, Windows 1256 and UTF-8 are character encodings (i.e. ways to represent text as sequences of octets or bytes).
Unicode is a library of glyphs. It tries to contain each and every glyph that you need to display text in any language. You can encode Unicode data using UTF-8, -16, etc. to serialize it.
The special Unicode Character RIGHT-TO-LEFT EMBEDDING (U+202B) has no representation in any other character encoding.
You will have to write a program to parse the input and then you will have to output the text to the printer, sorting the characters in the correct order. There is no shortcut to do this.

WebView Malayalam Unicode complex/combined letters

I am having problem with my Android application. Which is not displaying some special letters ie, complex/combined letters (KOOTTAKSHARAM) from Malayalam language.
In my application I am using WebView to load the html prepared with Unicode chars received from server. The font 'Thoolika.ttf' is loading from asset.
Later I was used ascii text from server, and .ttf font file and worked without problem. I tried UTF-8 conversion also, but didn't help.
So I would like to know is it possible to display complex/combined letters (KOOTTAKSHARAM) from Malayalam language, using Unicode chars and Unicode font file (.ttf) ?
The split rendering of Koottaksharam and Chillu in Malayalam is not the real issue. The real issue is - only a few manufacturers support Malayalam Unicode fonts, and little of them renders Malayalam correctly.
You can read Malayalam in Samsung, but NOT in HTC, LG, Sony etc. Google has added native support for Malayalam in JellyBean (v.4.1)
The only workaround is - convert the Unicode text into ASCII codes, use that ASCII text in components, and load the font dynamically. You can see that at Manoramaonline.com - see the HTML source - they are not using Unicode, instead they are using some symbols, and displays those symbols using their own font, which eventually looks like Malayalam text.
Mathrubhumi.com has a mobile version of their website, which uses the same technique. You can read Malayalam perfectly even when there's no support for it. I think they are first typing out the ASCII version (to publish for Print and Android) and converts it into Unicode later (to publish in Websites)
There are many ASCII-to-Unicode converters like http://aksharangal.com/ and one famous Unocode-to-ASCII converter is - http://smc.org.in/silpa/ASCII2Unicode

How Can I detect Unknown/Unassigned Unicode characters in my java program?

I want to write a java program to print Unicode characters. I want to detect and not print Unknown/Unassigned CHaracters (which are shown by a rectangular). I have tried "isDefined" and "isISOControl" from "Character" class, but it does not work.
Does anybody know the solution? it will be a big help for me.
Thanks.
The characters that are shown as a rectangle (on Windows) are ones that aren't available in the font you're using. While you could filter out a lot of them by filtering out undefined and control characters, it's entirely possible that the problem you're running into is that your font doesn't support certain ranges of valid characters (which is typical -- very few fonts define glyphs for all defined Unicode characters).
If your goal is really to remove characters that render as a rectangle, you can use the canDisplay method in Font.

Categories

Resources