NumberFormat doesn't crash with 2 decimal separators

NumberFormat doesn't crash with 2 decimal separators - java

I have a question regarding the behavior of the NumberFormat:
When I want to translate/parse a formatted String into a Number, then I would like to use NumberFormat, since it provides me with nice presets for thousand and decimal separators. Additionally I would like it to crash, if the provided String is not a valid Number.
An example:
// works as expected
String testInput1 = "3,1415";
NumberFormat germanNumberFormat = NumberFormat.getInstance(Locale.GERMANY);
Number number1 = germanNumberFormat.parse(testInput1);
System.out.println(number1); // prints 3.1415
// does not work as expected, cuts off the number after the 2nd decimal
// separator, expected it to crash with java.lang.NumberFormatException:
// multiple points
String testInput2 = "3,14,15";
Number number2 = germanNumberFormat.parse(testInput2);
System.out.println(number2); // prints 3.14
I currently use Double.parseDouble(String s), to have this additional behavior:
// crashes with java.lang.NumberFormatException: multiple points
double number2WithError = Double.parseDouble(testInput2.replace(",", "."));
Is there a way I can use NumberFormat to have my required/expected behavior besides writing my own wrapper class that does some additional checks on e.g. multiple decimal separators?
Also I'm aware that the JavaDoc of the used parse(String source) method of NumberFormat says:
Parses text from the beginning of the given string to produce a number. The method may not use the entire text of the given string.
See the {#link #parse(String, ParsePosition)} method for more information on number parsing.
and parse(String source, ParsePosition parsePosition):
Returns a Long if possible (e.g., within the range [Long.MIN_VALUE, Long.MAX_VALUE] and with no decimals), otherwise a Double. If IntegerOnly is set, will stop at a decimal point (or equivalent; e.g., for rational numbers "1 2/3", will stop after the 1). Does not throw an exception; if no object can be parsed, index is unchanged!
This doesn't tell me though why the method behaves this way. What I get from these is that they can parse only parts of the String (what they obviously do here) and probably just start parsing at the beginning (start position) until they find something they can't deal with.
I didn't find an existing question covering this, so if there is already one, please feel free to close this post and please link to it.

NumberFormat.parse(String) is behaving exactly as documented:
Parses text from the beginning of the given string to produce a number. The method may not use the entire text of the given string.
(Emphasis added)
You ask:
Is there a way I can use NumberFormat to have my required/expected behavior besides writing my own wrapper class that does some additional checks on e.g. multiple decimal separators?
You cannot provide a format that will make NumberFormat.parse() throw an exception for input with only an initial substring that can be parsed according to the format. You can, however, use NumberFormat.parse(String, ParsePosition) to determine whether the whole input was parsed, because the parse position argument is used not only to indicate to the method where to start, but also for the method to say where it stopped. That would be a lot better than implementing format-specific extra checks. Example:
ParsePosition position = new ParsePosition(0);
Number result = format.parse(input, position);
if (position.getIndex() != input.length()) {
throw new MyException();
}
Additionally, you write:
This doesn't tell me though why the method behaves this way.
It behaves that way because sometimes parsing the initial portion of the input is exactly what you want to do. You can build stricter parsing on top of more relaxed parsing, as shown, but it's much more difficult to do it the other way around.

Related

NL locale returns ambiguous number

if I want to use different number formats across Europe to one format (double), it doesn't seem to work.
Locale locale = new Locale("nl", "NL");
NumberFormat nf= NumberFormat.getNumberInstance(locale);
returns nf.parse("4,000.00").doubleValue();
it returns 4.000 instead of 4000.0, but when enter nf.parse("900,00") it works (returns 900.0)
Another time I enter 4000 and it converts to 4000.0 (expected).
So now I am left with inconsistent types.
I want to convert each number to the same double format. can you guide me?

now I am left with inconsistent types
This is incorrect. The behaviour is entirely consistent and according to spec.
In dutch, the comma is the wholes/fractions separator: There can be only one, and everything to the left is the wholes, and to the right of it, the fractional part. The dot is the thousands separator.
900,00
This is parsed as nine hundred, whole. 900 is to the left of the comma - so those are the wholes. 00 is the fractional part, which is nothing, so, you end up with 900. As expected - a dutch person reading 900,00 would assume that said 'nine hundred'.
4000
Obviously, that's four thousand. No problems there.
4,000.00
That's 4,000 - i.e. four, with 000 as fractional part, and that is how this is parsed. The .00 isn't parsed at all.
Wait, what?
NumberFormat is designed to parse multiple numbers from a stream of text. Even the .parse(string) version of it. Here, try it:
Locale locale = new Locale("nl", "NL");
NumberFormat nf= NumberFormat.getNumberInstance(locale);
System.out.println(nf.parse("4,000hey now this is strange").doubleValue();
works and runs fine, and prints '4'.
Fixing it
If you really want to fix it, you have a few strategies. One of them, is to first verify that the entire input is valid (e.g. with a regular expression) and only then parsing it.
Another option is to explicitly check that the whole input is consumed. You can do that:
String input = "4,000.00";
ParsePosition ps = new ParsePosition(0);
Locale locale = new Locale("nl", "NL");
NumberFormat nf = NumberFormat.getNumberInstance(locale);
double v = nf.parse(input, ps).doubleValue();
if (ps.getIndex() != input.length()) throw new IllegalArgumentException("Not a number: " + input);
The above code parses 900,00 as nine hundred, parser 4000 as four thousand, same for 4.000, and throws an exception if you attempt to toss 4,000.00 at it. Which is, indeed, not a valid anything in dutch locale.
I want something that parses both 4,000.00 as 4000, but also 900,00 as 900.
That is highly inconsistent and implies you want 4,000 to be parsed as 4 and yet 4,000.00 as 4000. If you want this, you're on your own and have to write it from scratch, no built in library (or, as far as I know, any external one) would do such utter befuddled inconsistent craziness.
NB: Note that the snippet would parse 4.000.00 as 400000 and works fine; inconsistent application of thousands separators is leniently parsed by NumberFormat and you can't tell it to be strict. In fact, 4.1.23.4567 is parsed as 41234567 - the only reason 4,000.00 is not parsed in the first place is because dots are not allowed in the fractional part at all. If you don't want that, you're again stuck, you can't use NumberFormat then. Regexes maybe, but you're now on the hook for writing one for each locale you care to support.

NumberFormat.format() returns inconsistent values

I am parsing multiple files in parallel, and from times to times, the format() method will not return the right value.
Number parse = numberFormat.parse(val);
String format = numberFormat.format(parse);
format.equals(parse); //returns false sometimes
At first I thought it was due to the fact that the format method was not thread safe, but it was using a numberformat.clone() for each thread.
I also tried creating a new NumberFormat() for each thread, and also a ThreadLocal<NumberFormat>, with an initial value, and then calling the get() method, all with the same problem.
In the debugger, an evaluation of the expression always return the right value at the breakpoint.
I tried putting multiple lines String format = numberFormat.format(parse);, it turns out that randomly, one or several of the lines return a completely wrong value, and the other return the right one.
I'm 99% sure it's a thread issue, and a concurrent access is made to something, probably the numberFormat itself.
I might not have used the right way to make it thread safe, but in my understanding, using either clone() or new should get rid of that concern.
Any clues as to what is causing the issue, and how to fix it?
EDIT :
Here are two screen shots made with IntelliJ IDEA to showcase the issue :

Extend the NumberFormat class and synchronize the format method:
class SynchronizedNumberFormat extends NumberFormat {
public synchronized String format(Number number) {
return super.format(number);
}
//unimplemented methods...
}

There has never been any guarantee that a NumberFormat's format method will return exactly the same String as what you parsed the number from. In fact, many Strings can yield the same Number value.
First, consider trailing zeroes:
NumberFormat numberFormat = NumberFormat.getInstance();
Number parsed = numberFormat.parse("1.500000000000");
String formatted = numberFormat.format(parsed);
System.out.println(formatted); // prints 1.5
Second, NumberFormat doesn't parse a complete String like the Double.valueOf, Integer.valueOf, etc. It parses as much as it can from the String, and ignores trailing characters. The following are all valid operations that will parse successfully, without throwing a ParseException:
NumberFormat numberFormat = NumberFormat.getInstance();
numberFormat.parse("1.500000000000");
numberFormat.parse("1.5 ");
numberFormat.parse("1.5-----------");
numberFormat.parse("1.5helloworld");
All of the above calls return 1.5.

How to divide the value of a field in a tmap in TALEND

I'm just getting started with Talend and I would like to know how to divide a value from a CSV file and round it if possible?
Here's my job layout:
And here's how my tMap is configured:

I assume the "/r" is to add a new line? That won't actually work and will instead add a string literal "/r" to whatever other string you're adding it to. You also don't need to do that because Talend will automatically start a new line at the end of the row of data for your tFileOutputDelimited.
But more importantly, you're attempting to call the divide method on a string which obviously doesn't exist (how would it be defined?).
You need to first parse the string as a numeric type (such as float/double/Big Decimal) and then divide by another numeric type (your Var1 is defined as a string in your example, so will actually fail there too because a string must be contained in quotes).
So typically you would either define the schema column that you are dividing as a numeric type (as mentioned) or you'd attempt to parse the string into a float in the tMap/tJavaRow component.
If you have your prices defined as something like a double before your tMap/tJavaRow operation that divides then you can use:
row1.prix2 / Var.var1
Or to round it to two decimal places:
(double)Math.round((row1.prix2 / Var.var1) * 100) / 100
You can also use a tConvertType component to explicitly convert between types where available. Alternatively you could parse the string as a double using:
Double.parseDouble(row1.prix2)
And then proceed to use that as previously described.
In your case though (according to your comment on Gabriele's answer), there is a further issue in that Java (and most programming languages) expect numbers to be formatted with a . for the decimal point. You need to add a pre-processing step to be able to parse your string as a double.
As this question's answers show, there are a couple of options. You can use a regex processing step to change all of your commas in that field to periods or you can use a tJavaRow to set your locale to French as you parse the double like so:
NumberFormat format = NumberFormat.getInstance(Locale.FRENCH);
Number number = format.parse(input_row.prix2);
double d = number.doubleValue();
output_row.nom = input_row.nom;
output_row.code = input_row.code;
output_row.date = input_row.date;
output_row.ref = input_row.ref;
output_row.ean = input_row.ean;
output_row.quantitie = input_row.quantitie;
output_row.prix1 = input_row.prix1;
output_row.prix2 = d;
And make sure to import the relevant libraries in the Advanced Settings tab of the tJavaRow component:
import java.text.NumberFormat;
import java.util.Locale;
Your output schema for the tJavaRow should be the same as the input but with prix2 being a double rather than a string.

Since var1 is defined as String you cannot apply the divide method. Try something like this for your output prix2 calculus:
(Float.parseFloat(row1.prix2)/2200f) + "Vr"
or something like that (I cannot read the text in the screenshot very well, actually)

Why is Double.parseDouble(scan.next()) so much faster than scan.nextDouble()?

So I have a file to read in and I know how the data will be set out. For example I know that the first token of each new line is going to be a double.
I had been using a Scanner and was simply using scan.nextDouble() to read in the double however I was told of Double.parseDouble(scan.next()) instead which sped up the process of reading in the data from the file from 30 seconds down to ~5 seconds.
The same happened with scan.nextInt() vs. Integer.parseInt(scan.next()).
In the file I was reading it went int double int int for each line for about 40,000 lines.
So what makes it so much faster?

It's all because scan.nextDouble() find the nearest Doublelike value from the following Stream. it can not sure the next string value will be a doublelike value, for example
s = "abcde1234.5"
scan.nextDouble(s) will be 1234.5 but Double.parseDouble(scan.next()) will throw an error.
more details you will find in the source code.

The Scanner next<Type> methods are doing additional work besides simply reading in the next token and calling the appropriate parser. First they check against a regular expression that the token is valid for that type, then they massage it to deal with locale-specific bits (such as group separator, decimal separator, etc.), then finally pass that to the parser.
If you are sure that your input is in the exact format you describe and you don't need to account for any potential differences caused by the input coming from a different locale, etc., then by all means use the optimization you were informed of.

Java Decimal Format - as much precision as given

I'm working with DecimalFormat, I want to be able to read and write decimals with as much precision as given (I'm converting to BigDecimal).
Essentially, I want a DecimalFormat which enforces the following pattern "\d+(\.\d+)?" i.e. "at least one digit then, optionally, a decimal separator followed by at least one digit".
I'm struggling to be able to implement this using DecimalFormat, I've tried several patterns but they seem to enforced fixed number of digits.
I'm open to alternative ways of achieving this too.
Edit:
For a little more background, I'm parsing user-supplied data in which decimals could be formatted in any way, and possibly not in the locale format. I'm hoping to let them supply a decimal format string which I can use the parse the data.

Since you noted in a comment that you need Locale support:
Locale locale = //get this from somewhere else
DecimalFormat df = new DecimalFormat();
df.setDecimalFormatSymbols(new DecimalFormatSymbols(locale));
df.setMaximumFractionDigits(Integer.MAX_VALUE);
df.setMinimumFractionDigits(1);
df.setParseBigDecimal(true);
And then parse.

This seems to work fine:
public static void main(String[] args) throws Exception{
DecimalFormat f = new DecimalFormat("0.#");
f.setParseBigDecimal(true);
f.setDecimalFormatSymbols(new DecimalFormatSymbols(Locale.US));// if required
System.out.println(f.parse("1.0")); // 1.0
System.out.println(f.parse("1")); // 1
System.out.println(f.parse("1.1")); // 1.1
System.out.println(f.parse("1.123")); // 1.123
System.out.println(f.parse("1.")); // 1
System.out.println(f.parse(".01")); // 0.01
}
Except for the last two that violate your "at least one digit" requirement. You may have to check that separately using a regex if it's really important.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

NumberFormat doesn't crash with 2 decimal separators - java

Related

NL locale returns ambiguous number

NumberFormat.format() returns inconsistent values

How to divide the value of a field in a tmap in TALEND

Why is Double.parseDouble(scan.next()) so much faster than scan.nextDouble()?

Java Decimal Format - as much precision as given

Categories

Resources