jUnit testing Double.toString in multiple cultures

jUnit testing Double.toString in multiple cultures - java

I have an open source library which has plenty of unit tests that compare string forms of numbers.
These tests pass fine in en-GB, en-US and other cultures where numbers are generally written in the form 1,234.00.
However in cultures such as Germany and France, these values are formatted differently, and the tests fail.
How can the jUnit tests be forced to run as en-GB?
EDIT this kind of thing is available in NUnit.

I'm not sure it's standard for all JVMs, but using Oracle's JVM on Windows, you can use the user.language and user.country System properties to set the locale when starting the JVM:
java -Duser.language=en -Duser.country=GB ...
You can also, of course, set the default locale in Java, using
Locale.setDefault(new Locale("en", "GB"));
Note that Double.toString is locale-independent, though.

How do you launch jUnit?
Passing the appropriate language property will depend more of your environment than of jUnit itself.
Alternatively (and I think it's a better solution), you could compare values rather than strings:
assertEquals(12.3, Double.valueOf(aDoubleString));
assertEquals(Double.toString(12.3), aDoubleString);
rather than
assertEquals("12.3", aDoubleString)

there are two gists with JUnit 4 rules to modify the default locale for a couple of tests:
LocaleRule, a very simple implementation.
DefaultLocaleRule has some static helper methods and allows to switch the default locale for an individual test.

Related

Changing DecimalFormatSymbols symbols for language in java sdk

I'm using a 3rd party software built in Java that displays numbers. This software is multilanguage and one of the languages that we use is Euskera (eu, eu_ES). Number format is shown wrong in this language (123,456.89 instead of 123.456,89).
Searching more in-depth and decompiling some classes I've seen that number formatting is done with DecimalFormatSymbols and DecimalFormat so I've made a junit test to see if the issue is from this 3rd party software or from java.
Locale locale = new Locale("eu");
DecimalFormatSymbols decimalFormatSymbols = new DecimalFormatSymbols(locale);
String pattern = "#,###.##";
DecimalFormat decimalFormat = new DecimalFormat(pattern, decimalFormatSymbols);
String formatted = decimalFormat.format(1234567.89765);
assertEquals("1.234.567,9", formatted);
After running this test I've seen is Java who is formatting this way.
In one hand I've downloaded the last version of this 3rd party software because is open source and I could make a little workaround that worked. On the other hand, we use a version from 6 years ago that can't be upgraded because os system requirements and this version are in Sourceforge's CVS which I was unable to download.
Is there any way I can change the grouping separator and decimal separator for Euskera in Java level?

Yes, you can but it's a bit of a palaver. Essentially, you can create a custom NumberFormatProvider that does something different for eu_ES and delegates to the original provider for all other locales. You'll have to put it in a JAR with a META-INF/services/xxxx file and include it on the classpath.
See this question: Java override locale setting for specific locale
And more instructions here:
LocaleServiceProvider JavaDoc
Tutorial on the Java Extension Mechanism

How do I automatically format all numbers based on the current locale in Android?

I am aware of
NumberFormat nf = NumberFormat.getInstance(Locale.getDefault());
But I want all the numbers shown in my app to be formatted according to the locale, thus I don't think it will be a good way to format them one by one using the above method.
So is there some global setting/variable/configuration that I have to change in order to do that?

Locale-aware formatting requires more than just translating e.g. month names from one language to another. In Java that's handled by separate classes apart from the ones that actually hold the values, e.g. NumberFormat, DateFormat. So there's no way around using them like you already do.
What you could try is to create some wrappers or convenience methods (like formatDate(Date)) to simplify things for you. Also put format strings into Android Resources (res/values).

Which "default Locale" is which?

With UNIX locales, the breakdown of which means what is relatively well documented.
LC_COLLATE (string collation)
LC_CTYPE (character conversion)
LC_MESSAGES (messages shown in UI)
LC_MONETARY (formatting of monetary values)
LC_NUMERIC (formatting of non-monetary numeric values)
LC_TIME (formatting of date and time values)
LANG (fallback if any of the above are not set)
Java has a different categorisation which doesn't quite match the real world (as usual):
Locale.getDefault()
Locale.getDefault(Locale.Category.DISPLAY)
Locale.getDefault(Locale.Category.FORMAT)
If you read the documentation on these, Locale.getDefault(Locale.Category.DISPLAY) appears to correspond to LC_MESSAGES while Locale.getDefault(Locale.Category.FORMAT) appears to correspond to some combination of LC_MONETARY+LC_NUMERIC+LC_TIME.
There are problems, though.
If you read the JDK source, you start to find many worrying things. For instance, ResourceBundle.getBundle(String) - which is entirely about string messages - uses Locale.getDefault(), not Locale.getDefault(Locale.Category.DISPLAY).
So I guess what I want to know is:
Which of these methods is supposed to be used for which purpose?
Related, but I made a little test program to see which Java locales corresponded to which UNIX locales and got even more surprising results.
import java.util.Locale;
public class Test {
public static void main(String[] args) {
System.out.println(" Unqualified: " + Locale.getDefault());
System.out.println(" Display: " + Locale.getDefault(Locale.Category.DISPLAY));
System.out.println(" Format: " + Locale.getDefault(Locale.Category.FORMAT));
}
}
Locales according to my shell:
$ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL="en_US.UTF-8"
Output of the program:
$ java Test
Unqualified: en_AU
Display: en_AU
Format: en_AU
So it turns out Java doesn't even get it from the UNIX locale. It must be using some other back door to get the settings without using those.

It's hard to understand what you are asking here. Instead, you make a statement that reveals that you're not necessary a Java programmer. It's OK, it does not matter really.
Few things to clarify:
The Locale class is in JDK since Java 1.1
Things like Locale.Builder, Locale.Category and many others are here from Java 7 (JDK 1.7)
Locale-aware classes and methods like DateFormat, NumberFormat, Collator, ResourceBundle, String.toLowerCase(Locale), String.toUpperCase(Locale) and many, many more are here for quite a long time each (long before JDK 1.7)
Prior to Java 7/JDK 1.7 there was only one method of acquiring current OS Locale - call Locale.getDefault() (that is without parameters)
In other words, prior to Java 7, Java's Locale Model was as simple as one system property composed of a language, a country and an optional locale variant. That has changed with Java 7 (end was further extended with Java 8...) and now you have two system properties, one for formatting and one for displaying user interface messages.
The problem is, there is substantial amount of legacy code written in Java and this could shouldn't break when you upgrade the platform. And that is exactly why you still have parameterless Locale.getDefault() around. Moreover (you may test it yourself), Locale.getDefault() is basically interchangeable with Locale.getDefault(Locale.Category.DISPLAY).
Now, I said formatting and user interface messages. Basically, formatting is not only formatting, but things like character case conversion (LC_CTYPE), collation (LC_COLLATE) as well. Sort of anything but user interface messages. Sort of, because default character encoding (which depends on an OS, BTW) is not part of Locale. Instead you need to call Charset.defaultCharset().
And the fallback rules (built in Java, not read from OS) could be worked out with ResourceBundle.Control class. And as we know, it is rather related to UI category...
The reason why Java Locale Model is different from POSIX (not UNIX, it's more universal), is the simple fact that there are quite a few platforms out there. And these platforms doesn't necessary use POSIX... I mean not only Operating Systems, but things like web... Java is striving to be universal and versatile. As the result Java's Locale Model is convoluted, tough luck.
I have to add that nowadays, it's not only the language and the country, but there are also things like preferred script, calendar system, numbering system, specific collation settings and possibly more. It even works sometimes.

How do I add the new currency code to Java?

The Chinese currency has the ISO 4217 code CNY. Since free global trading in that currency is restricted though, there's a second 'offshore' currency equivalent, called CNH. Wikipedia has a bit of summary of this all.
CNH isn't in ISO 4217, but I'd like to be able to use it in my app without having to write my own Currency class. Presumably there's some kind of list somewhere inside the JVM install. How do I go about adding additional currency codes?
EDIT: See this question for dealing with this in Java 7

Looks like support for this was added with Java 7.
For earlier versions, you could use an equivalent Currency class of your own devising, or less happily, replace the default java.util.Currency class (or java.util.CurrencyData, which contains the raw data) in your classpath (whitepaper).

en_US or en-US, which one should you use? [duplicate]

This question already has answers here:
What is the difference between creating a locale for en-US and en_US?
(4 answers)
Closed 9 years ago.
Assume you want to store the locale of user preference in database, which value you will use?
en_US or en-US
They are two standards, but which one you prefer to use as part of your own application?
Updated: Is seems many web sites use dash instead of underscore, e.g.
http://zh.wikipedia.org/zh-tw
http://www.google.com.hk/search?hl=zh-TW

I'm pretty sure "-" is the standard. If you see "_" somewhere it's probably something some people came up with to make it a valid identifier.
Personally I'd go with "-", just to be correct.
http://en.wikipedia.org/wiki/IETF_language_tag
https://datatracker.ietf.org/doc/html/rfc5646

If you're working with Java, you might as well use the Java locale format (en_US).
The BCP 47 documents actually do specify the en-US format, and it's just as common if not more common than Java-style locale names. But in practice you'll see the form with the underbar quite a bit. For example, both Java and most POSIX-type platforms use the underbar for their language/region separator.
So you can't go far wrong with either choice. But given that you're writing in Java and probably targeting a Unix platform, en_US is probably the way to go.

In Java 7, there is a new method Locale.forLanguageTag(String), which assumes the hyphen as a separator. I'd consider that as normative.
Check the documentation of Locale for more information.

en_US. This is a very useful read.

I don't think en-US is a standard at all for Java. (If you see it somewhere could you add a link).
So just use en_US.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.