This question already has answers here:
What is the difference between creating a locale for en-US and en_US?
(4 answers)
Closed 9 years ago.
Assume you want to store the locale of user preference in database, which value you will use?
en_US or en-US
They are two standards, but which one you prefer to use as part of your own application?
Updated: Is seems many web sites use dash instead of underscore, e.g.
http://zh.wikipedia.org/zh-tw
http://www.google.com.hk/search?hl=zh-TW
I'm pretty sure "-" is the standard. If you see "_" somewhere it's probably something some people came up with to make it a valid identifier.
Personally I'd go with "-", just to be correct.
http://en.wikipedia.org/wiki/IETF_language_tag
https://datatracker.ietf.org/doc/html/rfc5646
If you're working with Java, you might as well use the Java locale format (en_US).
The BCP 47 documents actually do specify the en-US format, and it's just as common if not more common than Java-style locale names. But in practice you'll see the form with the underbar quite a bit. For example, both Java and most POSIX-type platforms use the underbar for their language/region separator.
So you can't go far wrong with either choice. But given that you're writing in Java and probably targeting a Unix platform, en_US is probably the way to go.
In Java 7, there is a new method Locale.forLanguageTag(String), which assumes the hyphen as a separator. I'd consider that as normative.
Check the documentation of Locale for more information.
en_US. This is a very useful read.
I don't think en-US is a standard at all for Java. (If you see it somewhere could you add a link).
So just use en_US.
Related
I am aware of
NumberFormat nf = NumberFormat.getInstance(Locale.getDefault());
But I want all the numbers shown in my app to be formatted according to the locale, thus I don't think it will be a good way to format them one by one using the above method.
So is there some global setting/variable/configuration that I have to change in order to do that?
Locale-aware formatting requires more than just translating e.g. month names from one language to another. In Java that's handled by separate classes apart from the ones that actually hold the values, e.g. NumberFormat, DateFormat. So there's no way around using them like you already do.
What you could try is to create some wrappers or convenience methods (like formatDate(Date)) to simplify things for you. Also put format strings into Android Resources (res/values).
With UNIX locales, the breakdown of which means what is relatively well documented.
LC_COLLATE (string collation)
LC_CTYPE (character conversion)
LC_MESSAGES (messages shown in UI)
LC_MONETARY (formatting of monetary values)
LC_NUMERIC (formatting of non-monetary numeric values)
LC_TIME (formatting of date and time values)
LANG (fallback if any of the above are not set)
Java has a different categorisation which doesn't quite match the real world (as usual):
Locale.getDefault()
Locale.getDefault(Locale.Category.DISPLAY)
Locale.getDefault(Locale.Category.FORMAT)
If you read the documentation on these, Locale.getDefault(Locale.Category.DISPLAY) appears to correspond to LC_MESSAGES while Locale.getDefault(Locale.Category.FORMAT) appears to correspond to some combination of LC_MONETARY+LC_NUMERIC+LC_TIME.
There are problems, though.
If you read the JDK source, you start to find many worrying things. For instance, ResourceBundle.getBundle(String) - which is entirely about string messages - uses Locale.getDefault(), not Locale.getDefault(Locale.Category.DISPLAY).
So I guess what I want to know is:
Which of these methods is supposed to be used for which purpose?
Related, but I made a little test program to see which Java locales corresponded to which UNIX locales and got even more surprising results.
import java.util.Locale;
public class Test {
public static void main(String[] args) {
System.out.println(" Unqualified: " + Locale.getDefault());
System.out.println(" Display: " + Locale.getDefault(Locale.Category.DISPLAY));
System.out.println(" Format: " + Locale.getDefault(Locale.Category.FORMAT));
}
}
Locales according to my shell:
$ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL="en_US.UTF-8"
Output of the program:
$ java Test
Unqualified: en_AU
Display: en_AU
Format: en_AU
So it turns out Java doesn't even get it from the UNIX locale. It must be using some other back door to get the settings without using those.
It's hard to understand what you are asking here. Instead, you make a statement that reveals that you're not necessary a Java programmer. It's OK, it does not matter really.
Few things to clarify:
The Locale class is in JDK since Java 1.1
Things like Locale.Builder, Locale.Category and many others are here from Java 7 (JDK 1.7)
Locale-aware classes and methods like DateFormat, NumberFormat, Collator, ResourceBundle, String.toLowerCase(Locale), String.toUpperCase(Locale) and many, many more are here for quite a long time each (long before JDK 1.7)
Prior to Java 7/JDK 1.7 there was only one method of acquiring current OS Locale - call Locale.getDefault() (that is without parameters)
In other words, prior to Java 7, Java's Locale Model was as simple as one system property composed of a language, a country and an optional locale variant. That has changed with Java 7 (end was further extended with Java 8...) and now you have two system properties, one for formatting and one for displaying user interface messages.
The problem is, there is substantial amount of legacy code written in Java and this could shouldn't break when you upgrade the platform. And that is exactly why you still have parameterless Locale.getDefault() around. Moreover (you may test it yourself), Locale.getDefault() is basically interchangeable with Locale.getDefault(Locale.Category.DISPLAY).
Now, I said formatting and user interface messages. Basically, formatting is not only formatting, but things like character case conversion (LC_CTYPE), collation (LC_COLLATE) as well. Sort of anything but user interface messages. Sort of, because default character encoding (which depends on an OS, BTW) is not part of Locale. Instead you need to call Charset.defaultCharset().
And the fallback rules (built in Java, not read from OS) could be worked out with ResourceBundle.Control class. And as we know, it is rather related to UI category...
The reason why Java Locale Model is different from POSIX (not UNIX, it's more universal), is the simple fact that there are quite a few platforms out there. And these platforms doesn't necessary use POSIX... I mean not only Operating Systems, but things like web... Java is striving to be universal and versatile. As the result Java's Locale Model is convoluted, tough luck.
I have to add that nowadays, it's not only the language and the country, but there are also things like preferred script, calendar system, numbering system, specific collation settings and possibly more. It even works sometimes.
The Chinese currency has the ISO 4217 code CNY. Since free global trading in that currency is restricted though, there's a second 'offshore' currency equivalent, called CNH. Wikipedia has a bit of summary of this all.
CNH isn't in ISO 4217, but I'd like to be able to use it in my app without having to write my own Currency class. Presumably there's some kind of list somewhere inside the JVM install. How do I go about adding additional currency codes?
EDIT: See this question for dealing with this in Java 7
Looks like support for this was added with Java 7.
For earlier versions, you could use an equivalent Currency class of your own devising, or less happily, replace the default java.util.Currency class (or java.util.CurrencyData, which contains the raw data) in your classpath (whitepaper).
This question already has an answer here:
Closed 11 years ago.
Possible Duplicate:
generically parsing String to date
Following situation:
I need to detect if a String contains a DateTime/Timestamp. The problem is that those DateTimes come in various formats and granularity such as:
2011-09-12
12-09-2011
12.09.2011
2011-09-01-14:15
... and many many more variations
I don't need to understand the semantics (e.g. distinct between day or months) I just need to detect let's say 80% of the most common DateTime variations.
My first thought was using RegExp - which I'm far from being familiar with and also I would need to familiarize myselft with all variations in which DateTimes can come.
So my questions:
Does anybody know a canned RegExps to achieve this?
Is there maybe some Java library that could do this task?
Thanks!!
There is another question of same context, hope that link will help you: Dynamic regex for date time formats
you're going to struggle to find a generic match. For the day - month - year section you could possibly use a pattern like (\d{1,2}.){2}\d{4} which would match dates in format dd*mm*yyyy
DateFormat would be a better choice, I think. As John B suggested above, create a list of valid formats and try to match against each one.
Use Java's DateFormat.
You can set up as many formats as you want and iterate through them looking for a match. You will have to catch exceptions for the formats that don't parse and so this solution is not efficient but will work.
Edit per comment:
If you don't want to have exceptions due to performance the you would need to set up a list of regular expressions (one for each format you will support). Find the regex (if any) that matches your input and convert it to a date based on the matching format. What I would suggest would be to match a DateFormat to each regex and let the appropriate DateFormat do the work of parsing once you have identified the appropriate DateFormat. This would reduce the chance of errors in using the groups from the regex to produce the date. Personally, I don't know if this would actually be more efficient than try/catch so I would opt for the more straightforward mechanism (using DateFormat directly).
In .Net when handling numbers with unspecified locale I use:
return double.Parse(myStr, CultureInfo.InvariantCulture.NumberFormat);
Whats the equivalent in Java? java.util.Locale doesnt seem to include such a thing.
Well, you can use NumberFormat.getNumberInstance().parse();