Which "default Locale" is which? - java

With UNIX locales, the breakdown of which means what is relatively well documented.
LC_COLLATE (string collation)
LC_CTYPE (character conversion)
LC_MESSAGES (messages shown in UI)
LC_MONETARY (formatting of monetary values)
LC_NUMERIC (formatting of non-monetary numeric values)
LC_TIME (formatting of date and time values)
LANG (fallback if any of the above are not set)
Java has a different categorisation which doesn't quite match the real world (as usual):
Locale.getDefault()
Locale.getDefault(Locale.Category.DISPLAY)
Locale.getDefault(Locale.Category.FORMAT)
If you read the documentation on these, Locale.getDefault(Locale.Category.DISPLAY) appears to correspond to LC_MESSAGES while Locale.getDefault(Locale.Category.FORMAT) appears to correspond to some combination of LC_MONETARY+LC_NUMERIC+LC_TIME.
There are problems, though.
If you read the JDK source, you start to find many worrying things. For instance, ResourceBundle.getBundle(String) - which is entirely about string messages - uses Locale.getDefault(), not Locale.getDefault(Locale.Category.DISPLAY).
So I guess what I want to know is:
Which of these methods is supposed to be used for which purpose?
Related, but I made a little test program to see which Java locales corresponded to which UNIX locales and got even more surprising results.
import java.util.Locale;
public class Test {
public static void main(String[] args) {
System.out.println(" Unqualified: " + Locale.getDefault());
System.out.println(" Display: " + Locale.getDefault(Locale.Category.DISPLAY));
System.out.println(" Format: " + Locale.getDefault(Locale.Category.FORMAT));
}
}
Locales according to my shell:
$ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL="en_US.UTF-8"
Output of the program:
$ java Test
Unqualified: en_AU
Display: en_AU
Format: en_AU
So it turns out Java doesn't even get it from the UNIX locale. It must be using some other back door to get the settings without using those.

It's hard to understand what you are asking here. Instead, you make a statement that reveals that you're not necessary a Java programmer. It's OK, it does not matter really.
Few things to clarify:
The Locale class is in JDK since Java 1.1
Things like Locale.Builder, Locale.Category and many others are here from Java 7 (JDK 1.7)
Locale-aware classes and methods like DateFormat, NumberFormat, Collator, ResourceBundle, String.toLowerCase(Locale), String.toUpperCase(Locale) and many, many more are here for quite a long time each (long before JDK 1.7)
Prior to Java 7/JDK 1.7 there was only one method of acquiring current OS Locale - call Locale.getDefault() (that is without parameters)
In other words, prior to Java 7, Java's Locale Model was as simple as one system property composed of a language, a country and an optional locale variant. That has changed with Java 7 (end was further extended with Java 8...) and now you have two system properties, one for formatting and one for displaying user interface messages.
The problem is, there is substantial amount of legacy code written in Java and this could shouldn't break when you upgrade the platform. And that is exactly why you still have parameterless Locale.getDefault() around. Moreover (you may test it yourself), Locale.getDefault() is basically interchangeable with Locale.getDefault(Locale.Category.DISPLAY).
Now, I said formatting and user interface messages. Basically, formatting is not only formatting, but things like character case conversion (LC_CTYPE), collation (LC_COLLATE) as well. Sort of anything but user interface messages. Sort of, because default character encoding (which depends on an OS, BTW) is not part of Locale. Instead you need to call Charset.defaultCharset().
And the fallback rules (built in Java, not read from OS) could be worked out with ResourceBundle.Control class. And as we know, it is rather related to UI category...
The reason why Java Locale Model is different from POSIX (not UNIX, it's more universal), is the simple fact that there are quite a few platforms out there. And these platforms doesn't necessary use POSIX... I mean not only Operating Systems, but things like web... Java is striving to be universal and versatile. As the result Java's Locale Model is convoluted, tough luck.
I have to add that nowadays, it's not only the language and the country, but there are also things like preferred script, calendar system, numbering system, specific collation settings and possibly more. It even works sometimes.

Related

Automatic date format recognition in java

I want to parse some dates in Java, but the format is not defined and could be a lot of them (any ISO-8601 format which is already a lot, Unix timestamp in any unit, and more)
Here are some samples :
1970-01-01T00:00:00.00Z
1234567890
1234567890000
1234567890000000
2021-09-20T17:27:00.000Z+02:00
The perfect parsing might be impossible because of ambiguous cases but, a solution to parse most of the common dates with some logical might be achievable (for example timestamps are considered in seconds / milli / micro / nano in order to give a date close to the 2000 era, dates like '08/07/2021' could have a default for month and day distinction).
I didn't find any easy way to do it in Java while in python it is kind of possible (not working on all my samples but at least some of them) using infer_datetime_format of panda function to_datetime (https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html).
Are there some easy approach in Java?
Well, first of all, I agree with rzwitserloot here that date parsing in free format is extremely difficult and full of ambiguities. So you are skating on thin ice and will eventually run into trouble if you just assume that a user input will be correctly parsed the way you think it will.
Nevertheless, we could make it work if I assume either of the following:
You simply don't care if it will be parsed incorrectly; or
You are doing this for fun or for learning purposes; or
You have a banner, saying:
If the parsing goes wrong, it's your fault. Don't blame us.
Anyway, the DateTimeFormatterBuilder is able to build a DateTimeFormatter which could be able to parse a lot of different patterns. Since a formatter supports optional parsing, it could be instructed to try to parse a certain value, or skip that part if no valid value could be found.
For instance, this builder is able to parse a fairly wide range of ISO-like dates, with many optional parts:
DateTimeFormatterBuilder builder = new DateTimeFormatterBuilder()
.appendPattern("uuuu-M-d")
.optionalStart()
.optionalStart().appendLiteral(' ').optionalEnd()
.optionalStart().appendLiteral('T').optionalEnd()
.appendValue(ChronoField.HOUR_OF_DAY)
.optionalStart()
.appendLiteral(':')
.appendValue(ChronoField.MINUTE_OF_HOUR)
.optionalStart()
.appendLiteral(':')
.appendValue(ChronoField.SECOND_OF_MINUTE)
.optionalStart()
.appendFraction(ChronoField.NANO_OF_SECOND, 1, 9, true)
.optionalEnd()
.optionalEnd()
.optionalEnd()
.appendPattern("[XXXXX][XXXX][XXX][XX][X]")
.optionalEnd();
DateTimeFormatter formatter = builder.toFormatter(Locale.ROOT);
All of the strings below can be successfully parsed by this formatter.
Stream.of(
"2021-09-28",
"2021-07-04T14",
"2021-07-04T14:06",
"2001-09-11 00:00:15",
"1970-01-01T00:00:15.446-08:00",
"2021-07-04T14:06:15.2017323Z",
"2021-09-20T17:27:00.000+02:00"
).forEach(testcase -> System.out.println(formatter.parse(testcase)));
Als you can see, with optionalStart() and optionalEnd(), you could define optional portions of the format.
There are many more patterns you probably want to parse. You could add those patterns to the abovementioned builder. Alternatively, the appendOptional​(DateTimeFormatter) method could be used to include multiple builders.
The perfect parsing might be impossible because of ambiguous cases but, a solution to parse most of the common dates with some logical might be achievable
Sure, and such wide-ranging guesswork should most definitely not be part of a standard java.* API. I think you're also wildly underestimating the ambiguity. 1234567890? It's just flat out incorrect to say that this can reasonably be parsed.
You are running into many, many problems here:
Java in general prefers throwing an error instead of guessing. This is inherent in the language (java has few optional syntax constructs; semicolons aren't optional, () for method invocations are not optional, java intentionally does not have 'truthy/false', i.e. if (foo) is only valid if foo is an expression of the boolean type, unlike e.g. python where you can stick anything in there and there's a big list of what counts as falsy, with the rest being considering truthy. When in rome, be like the romans: If this tenet annoys you, well, either learn to love it, begrudgingly accept it, or program in another language. This idea is endemic in the entire ecosystem. For what it is worth, given that debugging tends to take far longer than typing the optional constructs, java is objectively correct or at least making rational decisions for being like this.
Either you can't bring in the notion that 'hey, this number is larger than 12, therefore it cannot possibly be the month', or, you have to accept that whether a certain date format parsers properly depends on whether the day-of-month value is above or below 12. I would strongly advocate that you avoid a library that fails this rule like the plague. What possible point is there, in the end? "My app will parse your date correctly, but only for about 3/5ths of all dates?" So, given that you can't/should not take that into account, 1234567890, is that seconds-since-1970? milliseconds-since-1970? Is that the 12th of the 34th month of the year 5678, the 90th hour, and assumed zeroes for minutes, seconds, and millis? If a library guesses, that library is wrong, because you should not guess unless you're 95%+ sure.
The obvious and perennial "do not guess" example is, of course, 101112. Is that November 10th, 2012 (european style)? Is that October 11th, 2012 (American style), or is that November 12th, 2010 (ISO style)? These are all reasonable guesses and therefore guessing is just wrong here. Do. Not. Guess. Unless you're really sure. Given that this is a somewhat common way to enter dates, thus: Guessing at all costs is objectively silly (see above). Guessing only when it's pretty clear and erroring out otherwise is mostly useless, given that ambiguity is so easy to introduce.
The concept of guessing may be defensible but only with a lot more information. For example, if you give me the input '101112100000', there's no way it's correct to guess here. But if you also tell me that a human entered this input, and that human is clearly clued into, say, german locale, then I can see the need to be able to turn that into '10th of november 2012, 10 o'clock in the morning': Interpreting as seconds or millis since some epoch is precluded by the human factor, and the day-month-year order by locale.
You asked:
Are there some easy approach in Java?
This entire question is incorrect. The in Java part needs to be stripped from this question, and then the answer is a simple: No. There is no simple way to parse strings into date/times without a lot more information than just the input string. If another library says they can do that, they are lying, or at least, operating under a list of cultural and source assumptions as long as my leg, and you should not be using that library.
I don't know any standard library with this functionality, but you can always use DateTimeFormatter class and guess the format looping over a list of predefined formats, or using the ones provides by this class.
This is a typichal approximation of what you want to archive.
Here you can see and old implementation https://balusc.omnifaces.org/2007/09/dateutil.html
FTA (https://github.com/tsegall/fta) is designed to solve exactly this problem (among others). It currently parses thousands of formats and does not do it via a predefined set, so typically runs extremely quickly. In this example we explicitly set the DateResolutionMode, however, it will default to something intelligent based on the Locale. Here is an example:
import com.cobber.fta.dates.DateTimeParser;
import com.cobber.fta.dates.DateTimeParser.DateResolutionMode;
public abstract class Simple {
public static void main(final String[] args) {
final String[] samples = { "1970-01-01T00:00:00.00Z", "2021-09-20T17:27:00.000Z+02:00", "08/07/2021" };
final DateTimeParser dtp = new DateTimeParser().withDateResolutionMode(DateResolutionMode.MonthFirst).withLocale(Locale.ENGLISH);
for (final String sample : samples)
System.err.printf("Format is: '%s'%n", dtp.determineFormatString(sample));
}
}
Which will give the following output:
Format is: 'yyyy-MM-dd'T'HH:mm:ss.SSX'
Format is: 'yyyy-MM-dd'T'HH:mm:ss.SSSX'
Format is: 'MM/dd/yyyy'

How do I add the new currency code to Java?

The Chinese currency has the ISO 4217 code CNY. Since free global trading in that currency is restricted though, there's a second 'offshore' currency equivalent, called CNH. Wikipedia has a bit of summary of this all.
CNH isn't in ISO 4217, but I'd like to be able to use it in my app without having to write my own Currency class. Presumably there's some kind of list somewhere inside the JVM install. How do I go about adding additional currency codes?
EDIT: See this question for dealing with this in Java 7
Looks like support for this was added with Java 7.
For earlier versions, you could use an equivalent Currency class of your own devising, or less happily, replace the default java.util.Currency class (or java.util.CurrencyData, which contains the raw data) in your classpath (whitepaper).

What is the best way to display the currency symbol locale-dependent?

I would like to display the currency symbol ($, €) dependent on the current browser locale.
What is the best approach to do so?
I tried:
locale = FacesContext.getCurrentInstance().getViewRoot().getLocale();
System.out.println(locale); //gives: "en"
Currency.getInstance(locale).getSymbol(); //java.lang.IllegalArgumentException
Currency.getInstance(locale.GERMANY).getSymbol(); //gives €
How can I get the symbol based on locale dependent browser setting (which is "en" here)?
Update
locale.getLanguage() > "de"
locale.getDefault() > "de_DE"
Nevertheless, Currency.getInstance(locale).getSymbol(); fails.
Currency depends on the Country-Part of Locale. Since en does not contain a country part it is an illegal argument for creating a Currency instance.
In other words: Would you expect $, US$, AU$ or £ for Locale "en"? Or something else? There is no currency for "English". There are currencies for the US, GB, Australia and so on but not for English.
Edit
If the user configured his browser properly then you'll get indeed a Locale with both: Country and Language Part (e.g. en-US). These locales you can use the way you've done it in your question.
BUT you should consider using Geotargeting based on IP-Address. There exist databases like GEO-IP and MaxMind. Be aware that there are differences - an US student on semester abroad in Germany surfing with his laptop. His browser may return en-US but a GEO-IP database will target most probably to Germany. But maybe this is exactly what you want?!
Finally you can use one of these approaches as primary targeting factor and the second as backup. When both fail then switch to a default (e.g. US$)

jUnit testing Double.toString in multiple cultures

I have an open source library which has plenty of unit tests that compare string forms of numbers.
These tests pass fine in en-GB, en-US and other cultures where numbers are generally written in the form 1,234.00.
However in cultures such as Germany and France, these values are formatted differently, and the tests fail.
How can the jUnit tests be forced to run as en-GB?
EDIT this kind of thing is available in NUnit.
I'm not sure it's standard for all JVMs, but using Oracle's JVM on Windows, you can use the user.language and user.country System properties to set the locale when starting the JVM:
java -Duser.language=en -Duser.country=GB ...
You can also, of course, set the default locale in Java, using
Locale.setDefault(new Locale("en", "GB"));
Note that Double.toString is locale-independent, though.
How do you launch jUnit?
Passing the appropriate language property will depend more of your environment than of jUnit itself.
Alternatively (and I think it's a better solution), you could compare values rather than strings:
assertEquals(12.3, Double.valueOf(aDoubleString));
assertEquals(Double.toString(12.3), aDoubleString);
rather than
assertEquals("12.3", aDoubleString)
there are two gists with JUnit 4 rules to modify the default locale for a couple of tests:
LocaleRule, a very simple implementation.
DefaultLocaleRule has some static helper methods and allows to switch the default locale for an individual test.

en_US or en-US, which one should you use? [duplicate]

This question already has answers here:
What is the difference between creating a locale for en-US and en_US?
(4 answers)
Closed 9 years ago.
Assume you want to store the locale of user preference in database, which value you will use?
en_US or en-US
They are two standards, but which one you prefer to use as part of your own application?
Updated: Is seems many web sites use dash instead of underscore, e.g.
http://zh.wikipedia.org/zh-tw
http://www.google.com.hk/search?hl=zh-TW
I'm pretty sure "-" is the standard. If you see "_" somewhere it's probably something some people came up with to make it a valid identifier.
Personally I'd go with "-", just to be correct.
http://en.wikipedia.org/wiki/IETF_language_tag
https://datatracker.ietf.org/doc/html/rfc5646
If you're working with Java, you might as well use the Java locale format (en_US).
The BCP 47 documents actually do specify the en-US format, and it's just as common if not more common than Java-style locale names. But in practice you'll see the form with the underbar quite a bit. For example, both Java and most POSIX-type platforms use the underbar for their language/region separator.
So you can't go far wrong with either choice. But given that you're writing in Java and probably targeting a Unix platform, en_US is probably the way to go.
In Java 7, there is a new method Locale.forLanguageTag(String), which assumes the hyphen as a separator. I'd consider that as normative.
Check the documentation of Locale for more information.
en_US. This is a very useful read.
I don't think en-US is a standard at all for Java. (If you see it somewhere could you add a link).
So just use en_US.

Categories

Resources