NL locale returns ambiguous number

NL locale returns ambiguous number - java

if I want to use different number formats across Europe to one format (double), it doesn't seem to work.
Locale locale = new Locale("nl", "NL");
NumberFormat nf= NumberFormat.getNumberInstance(locale);
returns nf.parse("4,000.00").doubleValue();
it returns 4.000 instead of 4000.0, but when enter nf.parse("900,00") it works (returns 900.0)
Another time I enter 4000 and it converts to 4000.0 (expected).
So now I am left with inconsistent types.
I want to convert each number to the same double format. can you guide me?

now I am left with inconsistent types
This is incorrect. The behaviour is entirely consistent and according to spec.
In dutch, the comma is the wholes/fractions separator: There can be only one, and everything to the left is the wholes, and to the right of it, the fractional part. The dot is the thousands separator.
900,00
This is parsed as nine hundred, whole. 900 is to the left of the comma - so those are the wholes. 00 is the fractional part, which is nothing, so, you end up with 900. As expected - a dutch person reading 900,00 would assume that said 'nine hundred'.
4000
Obviously, that's four thousand. No problems there.
4,000.00
That's 4,000 - i.e. four, with 000 as fractional part, and that is how this is parsed. The .00 isn't parsed at all.
Wait, what?
NumberFormat is designed to parse multiple numbers from a stream of text. Even the .parse(string) version of it. Here, try it:
Locale locale = new Locale("nl", "NL");
NumberFormat nf= NumberFormat.getNumberInstance(locale);
System.out.println(nf.parse("4,000hey now this is strange").doubleValue();
works and runs fine, and prints '4'.
Fixing it
If you really want to fix it, you have a few strategies. One of them, is to first verify that the entire input is valid (e.g. with a regular expression) and only then parsing it.
Another option is to explicitly check that the whole input is consumed. You can do that:
String input = "4,000.00";
ParsePosition ps = new ParsePosition(0);
Locale locale = new Locale("nl", "NL");
NumberFormat nf = NumberFormat.getNumberInstance(locale);
double v = nf.parse(input, ps).doubleValue();
if (ps.getIndex() != input.length()) throw new IllegalArgumentException("Not a number: " + input);
The above code parses 900,00 as nine hundred, parser 4000 as four thousand, same for 4.000, and throws an exception if you attempt to toss 4,000.00 at it. Which is, indeed, not a valid anything in dutch locale.
I want something that parses both 4,000.00 as 4000, but also 900,00 as 900.
That is highly inconsistent and implies you want 4,000 to be parsed as 4 and yet 4,000.00 as 4000. If you want this, you're on your own and have to write it from scratch, no built in library (or, as far as I know, any external one) would do such utter befuddled inconsistent craziness.
NB: Note that the snippet would parse 4.000.00 as 400000 and works fine; inconsistent application of thousands separators is leniently parsed by NumberFormat and you can't tell it to be strict. In fact, 4.1.23.4567 is parsed as 41234567 - the only reason 4,000.00 is not parsed in the first place is because dots are not allowed in the fractional part at all. If you don't want that, you're again stuck, you can't use NumberFormat then. Regexes maybe, but you're now on the hook for writing one for each locale you care to support.

Related

NumberFormat doesn't crash with 2 decimal separators

I have a question regarding the behavior of the NumberFormat:
When I want to translate/parse a formatted String into a Number, then I would like to use NumberFormat, since it provides me with nice presets for thousand and decimal separators. Additionally I would like it to crash, if the provided String is not a valid Number.
An example:
// works as expected
String testInput1 = "3,1415";
NumberFormat germanNumberFormat = NumberFormat.getInstance(Locale.GERMANY);
Number number1 = germanNumberFormat.parse(testInput1);
System.out.println(number1); // prints 3.1415
// does not work as expected, cuts off the number after the 2nd decimal
// separator, expected it to crash with java.lang.NumberFormatException:
// multiple points
String testInput2 = "3,14,15";
Number number2 = germanNumberFormat.parse(testInput2);
System.out.println(number2); // prints 3.14
I currently use Double.parseDouble(String s), to have this additional behavior:
// crashes with java.lang.NumberFormatException: multiple points
double number2WithError = Double.parseDouble(testInput2.replace(",", "."));
Is there a way I can use NumberFormat to have my required/expected behavior besides writing my own wrapper class that does some additional checks on e.g. multiple decimal separators?
Also I'm aware that the JavaDoc of the used parse(String source) method of NumberFormat says:
Parses text from the beginning of the given string to produce a number. The method may not use the entire text of the given string.
See the {#link #parse(String, ParsePosition)} method for more information on number parsing.
and parse(String source, ParsePosition parsePosition):
Returns a Long if possible (e.g., within the range [Long.MIN_VALUE, Long.MAX_VALUE] and with no decimals), otherwise a Double. If IntegerOnly is set, will stop at a decimal point (or equivalent; e.g., for rational numbers "1 2/3", will stop after the 1). Does not throw an exception; if no object can be parsed, index is unchanged!
This doesn't tell me though why the method behaves this way. What I get from these is that they can parse only parts of the String (what they obviously do here) and probably just start parsing at the beginning (start position) until they find something they can't deal with.
I didn't find an existing question covering this, so if there is already one, please feel free to close this post and please link to it.

NumberFormat.parse(String) is behaving exactly as documented:
Parses text from the beginning of the given string to produce a number. The method may not use the entire text of the given string.
(Emphasis added)
You ask:
Is there a way I can use NumberFormat to have my required/expected behavior besides writing my own wrapper class that does some additional checks on e.g. multiple decimal separators?
You cannot provide a format that will make NumberFormat.parse() throw an exception for input with only an initial substring that can be parsed according to the format. You can, however, use NumberFormat.parse(String, ParsePosition) to determine whether the whole input was parsed, because the parse position argument is used not only to indicate to the method where to start, but also for the method to say where it stopped. That would be a lot better than implementing format-specific extra checks. Example:
ParsePosition position = new ParsePosition(0);
Number result = format.parse(input, position);
if (position.getIndex() != input.length()) {
throw new MyException();
}
Additionally, you write:
This doesn't tell me though why the method behaves this way.
It behaves that way because sometimes parsing the initial portion of the input is exactly what you want to do. You can build stricter parsing on top of more relaxed parsing, as shown, but it's much more difficult to do it the other way around.

Understanding the strange output of java.util.Locale

I had a perception that Locale is just about adding comma at proper positions at least in case of numbers. But I see a different output for what I have tried.
I tried the following,
public static void main(String[] args) {
DecimalFormat df = null;
df = (DecimalFormat) DecimalFormat.getInstance(Locale.CHINESE);
System.out.println("Locale.CHINESE "+df.format(12345.45));
df = (DecimalFormat) DecimalFormat.getInstance(Locale.GERMAN);
System.out.println("Locale.GERMAN "+df.format(12345.45));
}
Output:
Locale.CHINESE 12,345.45
Locale.GERMAN 12.345,45
If you carefully look at the comma's, you'll see a major difference.
Now, the javadoc for java.util.Locale says
... An operation that requires a Locale to perform its task is called locale-sensitive and uses the Locale to
tailor information for the user. For example, displaying a number is a locale-sensitive operation--the number
should be formatted according to
the customs/conventions of the user's native country, region, or culture ...
I see a comma being interpreted as decimal point in another Locale, which is really a curious thing, as the value is being changed.
So, help me understand this. What exactly is Locale? Won't the drastic change in output cause major issue in code/data?

I had a perception that Locale is just about adding comma at proper positions at least in case of numbers.
No, it affects the symbols used as well, as you've seen.
So, help me understand this. What exactly is Locale? Won't the drastic change in output cause major issue in code/data?
Only if you don't use them correctly :) Machine-to-machine communication should usually not be localized; typically if you really need to use text, it's best to use US as a reasonably invariant locale.
See DecimalFormatSymbols for more details of what is locale-specific.

I see nothing wrong with the above. The German way of representing 12345.45 is 12.345,45
and the Chinese way of representing the same number is 12,345.45 .
So, help me understand this. What exactly is Locale? Won't the drastic
change in output cause major issue in code/data?
No it won't you just need to keep track of the locale of the input and how you want it formatted.

Java Decimal Format - as much precision as given

I'm working with DecimalFormat, I want to be able to read and write decimals with as much precision as given (I'm converting to BigDecimal).
Essentially, I want a DecimalFormat which enforces the following pattern "\d+(\.\d+)?" i.e. "at least one digit then, optionally, a decimal separator followed by at least one digit".
I'm struggling to be able to implement this using DecimalFormat, I've tried several patterns but they seem to enforced fixed number of digits.
I'm open to alternative ways of achieving this too.
Edit:
For a little more background, I'm parsing user-supplied data in which decimals could be formatted in any way, and possibly not in the locale format. I'm hoping to let them supply a decimal format string which I can use the parse the data.

Since you noted in a comment that you need Locale support:
Locale locale = //get this from somewhere else
DecimalFormat df = new DecimalFormat();
df.setDecimalFormatSymbols(new DecimalFormatSymbols(locale));
df.setMaximumFractionDigits(Integer.MAX_VALUE);
df.setMinimumFractionDigits(1);
df.setParseBigDecimal(true);
And then parse.

This seems to work fine:
public static void main(String[] args) throws Exception{
DecimalFormat f = new DecimalFormat("0.#");
f.setParseBigDecimal(true);
f.setDecimalFormatSymbols(new DecimalFormatSymbols(Locale.US));// if required
System.out.println(f.parse("1.0")); // 1.0
System.out.println(f.parse("1")); // 1
System.out.println(f.parse("1.1")); // 1.1
System.out.println(f.parse("1.123")); // 1.123
System.out.println(f.parse("1.")); // 1
System.out.println(f.parse(".01")); // 0.01
}
Except for the last two that violate your "at least one digit" requirement. You may have to check that separately using a regex if it's really important.

Get currency symbol aka localeconv() in ColdFusion?

I'm doing some javascript work inside a ColdFusion shopping cart, and I need to be able to format some numbers in js which will mimic LScurrencyFormat() in CF.
Currently we are taking the first (left,1) character of a formatted string but that doesn't work for currencies like Yen or Euro which come after the number, not to mention any multiple character currency symbols.
What I need to find, based on the current CF locale, is
currency symbol
decimal delimiter (, or .)
leading or trailing (before or after the number)
From there i can run my own js formatting to make the formatted numbers come out as expected on the page.In php we can use localeconv() to get these values... how can I find them in CF?

I am not aware of any built in functions. However, you can obtain the first two items from java. As far as the third, the closest suggestion I have seen is to parse the localized number pattern and detect the position of the currency sign ie \u00A4. Note: It is just a mask placeholder. It is not the same as the actual currency symbols like "$" or "£".
Edit:
As discussed in the comments, getLocale() returns some user friendly name which unfortunately does not quite line up with java's. The easiest way to get the java locale object for the current request is using getPageContext().getResponse().getLocale().
<cfscript>
// Get the current locale as a java object
javaLocale = getPageContext().getResponse().getLocale();
// get numeric settings for that locale
currency = createObject("java", "java.text.DecimalFormat").getCurrencyInstance(javaLocale);
symbols = currency.getDecimalFormatSymbols();
// 164 => decimal code point for currency sign
currencyPattern = currency.toLocalizedPattern();
result.hasTrailingCurrencySymbol = currencyPattern.indexOf(javacast("int", 164)) > 0;
result.currencySymbol = symbols.getCurrencySymbol();
result.decimalSeparator= symbols.getDecimalSeparator();
WriteDump(result);
</cfscript>

getLocale() returns the old cf5 style locale "names" but only for those locales supported by cf5. if you dump out the supported locales (Server.Coldfusion.SupportedLocales) you'll see the goofy old cf5 style locale names as well as the core java locale IDs (ie both "Chinese(China)" and "zh_CN"). if your locale wasn't one of the cf5 supported locales you should see the core java locale ID (ie th_TH for thai, thailand). see
http://cfbugs.adobe.com/cfbugreport/flexbugui/cfbugtracker/main.html#bugId=82474
as a small tweak to leigh's answer, you should also be concerned with the currency/locale's fraction digits. for instance in normal practice, you can't have part of a yen (ie 1.1 isn't quite kosher). you can get that info from the Currency class's getDefaultFractionDigits() method:
result.fractionDigits=currency.getDefaultFractionDigits();

Formatting currency in String from different locale currency formats

I have a java string variable in my groovy app. The variable contains a user input of price in possibly different currency formats:
val = "1,250.50"
val = "1.250,50"
val = "1250,50"
val = "1250.50"
(etc.. I don't know if there are anymore funny way other countries write this)
Is there a way to parse this to the appropriate double value regardless of the format? Looking at this at the moment but not sure if it'll help. My current method only works for the US format:
total = Double.parseDouble(val.replace('$','').replaceAll(",","").trim())

You cannot parse it without knowing what the user will use as decimal separator, grouping separator, ... . For example if I type 1,250 you do not know whether I mean one thousand two hundred fifty (1,250.00), or one point two hundred fifty (1.250) .
That's why the NumberFormat/DecimalFormat class of Java allows you to specify the grouping and decimal separator.
What you could do is hoping that the user inputs his values using the conventions corresponding to his Locale settings, and use the
NumberFormat.getInstance( Locale )
with the current Locale of the JVM.
Note: with the NumberFormat you can also parse a currency. See NumberFormat#getCurrencyInstance

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

NL locale returns ambiguous number - java

Related

NumberFormat doesn't crash with 2 decimal separators

Understanding the strange output of java.util.Locale

Java Decimal Format - as much precision as given

Get currency symbol aka localeconv() in ColdFusion?

Formatting currency in String from different locale currency formats

Categories

Resources