I have about 30 languages that my application needs to support. I have some fairly simple text that was provided for each of them, but within that text I do need to make one choice using {0, choice, 0# ...|0<...}
At present I have not even got as far as testing if this works because I am having a lot of trouble trying to convice my text editor to allow me to combine left to right and right to left text, but what I really want to know if this is even possible...
Question: Is it possible to use the Java message properties embedded choice with languages flowing from right to left.
If anyone can think of any additional tags to use for this question, I would be grateful.
The short answer is yes it is possible. It is a thorny issue, but BIDI (bi-directionl) support is an issue of the text editor not yours. So if your text editor supports it (and most editors do) then it is possible. First you have to make sure that you use an encoding (character set) that supports multiple languages - UTF-8 is recommended (but also UTF-16 and may be some others may work) as opposed to ISO-8859-X (where X is a single digit) that supports just 2 languages. Also you can write your Strings in property file or anywhere in the code as a unicode sequence.
There is an Open Source java library MgntUtils that has a Utility that converts Strings in any language (including special characters and emojis to unicode sequence and vise versa:
result = "Hello World";
result = StringUnicodeEncoderDecoder.encodeStringToUnicodeSequence(result);
System.out.println(result);
result = StringUnicodeEncoderDecoder.decodeUnicodeSequenceToString(result);
System.out.println(result);
The output of this code is:
\u0048\u0065\u006c\u006c\u006f\u0020\u0057\u006f\u0072\u006c\u0064
Hello World
The library can be found at Maven Central or at Github It comes as maven artifact and with sources and javadoc
Here is javadoc for the class StringUnicodeEncoderDecoder
Related
I have the following task: some text in mixed latin/arabic written in UTF-8 needs to be converted for printing using POS-printer, which uses ancient one-byte code page 864.
text.getBytes("ibm-864") suddenly shows many question marks instead of arabic characters, and after digging the code I understood that conversion table has some different versions of arabic characters used to map to ibm-864 (somewhere in the FExx range rather than 06xx, which I have in my text).
I'm looking for some code or library, which can convert arabic unicode to cp864, preferrably mapping to the corresponding forms of arabic chars (in cp864 there're isolated, initial, medial and final forms for some chars), and maybe even handling reverse for RTL, because I doubt that hardware supports it automatically.
I understand that this is very specific task, but why don't give it a try? Also I know how to implement this, but trying to find a ready-to-use bicycle :)
Anyone?
Another possible solution: library that can translate unicode arabic from the range U+0600 - U+06FF Arabic to the range U+FE70 - U+FF6F Arabic Presentation Forms-B. Then I can safely get my bytes in cp864. Have anyone seen anything alike?
To output arabic text to a relatively dumb output device, you'll need to do several things:
Divide the text into blocks of different directionality using the Unicode Bidirectional Algorithm (UBA), better known as Bidi.
Mirror characters that need to be mirrored (e.g: opening parenthesis point in different directions when they are inside LTR/RTL blocks)
Since the output device is dumb, you'll need to change characters into their positional forms, and apply ligatures where needed (there is a ligature for LAM + ALEF). This is done by a piece of software called an Arabic Shaper.
You'll need to reorder text according to their directionality.
Since CP864 doesn't have all the positional forms for all characters, you'll need to convert to fallback forms, converting some final forms to isolated forms, some medial forms to initial forms, and some initial forms to isolated forms. The text will not ligate as nicely as if there were proper forms, but it will come relatively close.
On Java, the ICU library allows you to do that:
ICU's Bidi can take care of dividing into blocks, mirroring, and reordering. Reordering can be done before shaping, since ICU's ArabicShaping supports working with text in both logical (pre-reordering) and visual (post-reordering) order.
ICU's ArabicShaping can take care of shaping the text, mapping it into the appropriate presentational forms (the FExx range you talked about, which is not meant to be used normally, it is only meant to be used to interface with legacy software/hardware, in this case the printer that understands CP864 but not Unicode).
ICU's CharsetProvider and CharsetEncoder can be used to convert to CP864 using a fallback (non-roundtrip) conversion for characters that are not on the output charset, in this case the final→isolated, medial→initial,... forms.
I have a registration process in Java. I want to make sure the names used are all within unicode 3.2. This unicode requirement is for another part of my system which is not in java.
Does Java have a easy way to validate a string for unicode versions? I can't seem to find anything from some cursory checks.
Thanks
I would read this UCD file and build a BitSet from the first column. This would be fast way to test each code point in a String.
I am new to Java and I'm not quite sure how to output an integer raised to a power as a string output. I know that
Math.pow(double, double)
will actually compute the value of raising a double to a power. But if I wanted to output "2^6" as an output (except with 6 as a superscript and not with the carat), how do I do that?
EDIT: This is for an Android app. I'm passing in the integer raised to the power as a string and I would like to know how to convert this to superscript in the UI for the phone.
Unicode does have superscript versions of the digits 0 to 9: http://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts
This should print 2⁶:
System.out.println("2⁶");
System.out.println("2\u2076");
If you're outputting the text to the GUI then you can use HTML formatting and the <sup> tag to get a superscript. Otherwise, you'll have to use Unicode characters to get the other superscripts. Wikipedia has a nice article on superscripts and subscripts in Unicode:
http://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts
This answer should only be used if using Eclipse (a java editor) what eclipse does is it only supports certain Unicode symbols. 1 2 and 3 can all be done and are supported by eclipse, anything else you have to play with the settings of eclipse which isnt too hard. There's this thing called Windows-1252 which is the default for a lot of stuff it seems, including being the default encoding for files in Eclipse. So, whenever you're printing to the console, that's what it's trying to interpret it as, and it doesn't support the full unicode charset since it's a 1 byte encoding. This isn't actually a problem with the Java, then, it's a problem with Eclipse, which means you need to edit your Eclipse to use UTF-8 instead. You can do this in multiple places; if you just want the unicode characters to be displayed when you're running this one file, you can right click the file, properties -> resource -> text file encoding and change from default to other: utf-8. If you want to do this to your whole workspace, you can go to Window -> Preferences -> General -> Workspace -> Text file encoding. You could also do this project-wide or even just package-wide following pretty similar steps depending what you're going for.
I'm trying to display arabic text in java but it shows junk characters(Example : ¤[ï߯[î) or sometimes only question marks when i print. How do i make it to print arabic. I heard that its something related to unicode and UTF-8. This is the first time i'm working with languages so no idea. I'm using Eclipse Indigo IDE.
EDIT:
If i use UTF-8 encoding then "¤[ï߯[î" characters are becoming "????????" characters.
For starters you could take a look here. This should allow you to make Eclipse print unicode in its console (which I do not know if it is something which Eclipse supports out of the box without any extra tweaks)
If that does not solve your problem you most likely have an issue with the encoding your program is using, so you might want to create strings in some manner similar to this:
String str = new String("تعطي يونيكود رقما فريدا لكل حرف".getBytes(), "UTF-8");
This at least works for me.
If you embed the text literally in the code make sure you set the encoding for your project correctly.
This is for Java SE, Java EE, or Java ME?
If this is for Java ME, you have to make custom GlyphUtils if you use LWUIT.
Download this file:
http://dl.dropbox.com/u/55295133/U0600.pdf
Look list of unicode encoding..
And look at this thread:
https://stackoverflow.com/a/9172732/1061371
in the answer (post) of Mohamed Nazar that edited by bernama Alex Kliuchnikau,
"The below code can be use for displaying arabic text in J2ME String s=new String("\u0628\u06A9".getBytes(), "UTF-8"); where \u0628\u06A9 is the unicode of two arabic letters"
Look at U0600.pdf file, so we can see that Mohamed Nazar and Alex Kliuchnikau give example to create "ba" and "kaf" character in arabic.
Then the last point that you must consider is: "Make sure your UI support unicode(I mean arabic) character."
Like LWUIT not support yet unicode (I mean arabic) character.
You should make your custom code if you mean your app is using LWUIT.
I am automating test cases for a web application using selenium 2.0 and Java. My application supports multiple languages. Some of the test cases require me to validate the text that appears in the UI like success/error messages etc.I am using a properties file to store whatever text I am referring in my tests from the UI, currently only english. For example there is locale_english.properties(see below) that contains all references in english. I am going to have multiple properties files like this for different locales like locale_chinese.properties,locale_french.properties and so on. For locales other than english, their corresponding properties file would have UTF-8 characters (e.g \u30ed) representing the native characters(see below). So If I want to test say Chinese UI, I would load "locale_chinese.properties" instead of "locale_english.properties". I am going to convert the native characters for non-english locale using perhaps native2ascii from JDK or some other way.I tested that Selenium API works well with UTF-8 characters for non-english locales
---locale_english.properties------
user.login.error= Please verify username/password
---locale_chinese.properties------
user.login.error= \u30ed\u30ef\u30eg\u30eh\u30ed
and so on.
The problem is that my locale_english.properties is growing and going out of control. It is becoming hard to manage a single properties file for one locale let alone for multiple locales. Is there a better way of handling localization in Java, particularly in situations like I am in?
Thanks!
You're right that there is a problem managing the files, but you're also right that this is the best approach. Some things are just hard :-(
Selenium (at least the Selenium RC API) does indeed support Unicode input and output, we have lots of tests that enter and confirm Cyrillic and Simple Chinese characters from C#. Since Java strings are Unicode at the core (just like C#), I expect you could simply create the file in a UTF-8-friendly editor like Notepad++ and read them straight into strings and use them directly in the Selenium API.
This is how I solved the issue for those who are interested.
a database would work better for many reasons, like growth, central location, kept outside of app and can be edited and maintained outside of app. We used a table with columns:
id (int) auto increment
id_text -- this and other columns are varchar ... except for date time for last 2
lang
translation
created_by
updated_by
created_date
updated_date
An id is a short english description of the text - like 'hello' or 'error1msg', the key in your map.
In java had a function to get the text for a particular text ... and a app level property - default language (usually en but good to keep it configurable)
Function would scan already loaded hashmap for language asked for - say "ch"
If corresponding translation was not found for this language we would return the default language translation and if that was not founf then we would return "[" + id "]" so the tester knows something is missing in data base - can go to web screen to edit translation table and add it.