There are some related questions on SO but I think this one is actually unique. I'm working on a system where we expect to get decimal input from users in their own locale.
We have a user in Canada who is used to entering numbers in French-style (e.g. 12,5 is a number between 12 and 13) and when they enter that number into our system while in the "French" locale, it gets parsed as a number between 12 and 13. When switching into English and entering "12,5", the number gets parsed as 125, ten times as large as expected.
The problem is that the set of number format symbols being used is different and has bearing on how the parsing is performed. In French, the , character causes the parser to switch from parsing the whole-part to parsing the decimal-part. In English, the , is an ignoreable "grouping separator".
I'd like to throw an error when an English user enters "12,5" because we only expect commas to occur when there are 3 digits after it without any other markers, end-of-input, etc.
I want to be very strict, here, because I want to remove the ambiguity from a number like "12,5" in English. Was that a fat-finger and it should have been "12.5" or is that a French-language used thinking that they are entering a number between 12 and 13. I'd like to throw an error and make sure that the user is entering exactly the type of input that the software is expecting to parse.
Is there a way to get DecimalFormat to do this for me, or do I have to roll my own additional validation?
Related
if I want to use different number formats across Europe to one format (double), it doesn't seem to work.
Locale locale = new Locale("nl", "NL");
NumberFormat nf= NumberFormat.getNumberInstance(locale);
returns nf.parse("4,000.00").doubleValue();
it returns 4.000 instead of 4000.0, but when enter nf.parse("900,00") it works (returns 900.0)
Another time I enter 4000 and it converts to 4000.0 (expected).
So now I am left with inconsistent types.
I want to convert each number to the same double format. can you guide me?
now I am left with inconsistent types
This is incorrect. The behaviour is entirely consistent and according to spec.
In dutch, the comma is the wholes/fractions separator: There can be only one, and everything to the left is the wholes, and to the right of it, the fractional part. The dot is the thousands separator.
900,00
This is parsed as nine hundred, whole. 900 is to the left of the comma - so those are the wholes. 00 is the fractional part, which is nothing, so, you end up with 900. As expected - a dutch person reading 900,00 would assume that said 'nine hundred'.
4000
Obviously, that's four thousand. No problems there.
4,000.00
That's 4,000 - i.e. four, with 000 as fractional part, and that is how this is parsed. The .00 isn't parsed at all.
Wait, what?
NumberFormat is designed to parse multiple numbers from a stream of text. Even the .parse(string) version of it. Here, try it:
Locale locale = new Locale("nl", "NL");
NumberFormat nf= NumberFormat.getNumberInstance(locale);
System.out.println(nf.parse("4,000hey now this is strange").doubleValue();
works and runs fine, and prints '4'.
Fixing it
If you really want to fix it, you have a few strategies. One of them, is to first verify that the entire input is valid (e.g. with a regular expression) and only then parsing it.
Another option is to explicitly check that the whole input is consumed. You can do that:
String input = "4,000.00";
ParsePosition ps = new ParsePosition(0);
Locale locale = new Locale("nl", "NL");
NumberFormat nf = NumberFormat.getNumberInstance(locale);
double v = nf.parse(input, ps).doubleValue();
if (ps.getIndex() != input.length()) throw new IllegalArgumentException("Not a number: " + input);
The above code parses 900,00 as nine hundred, parser 4000 as four thousand, same for 4.000, and throws an exception if you attempt to toss 4,000.00 at it. Which is, indeed, not a valid anything in dutch locale.
I want something that parses both 4,000.00 as 4000, but also 900,00 as 900.
That is highly inconsistent and implies you want 4,000 to be parsed as 4 and yet 4,000.00 as 4000. If you want this, you're on your own and have to write it from scratch, no built in library (or, as far as I know, any external one) would do such utter befuddled inconsistent craziness.
NB: Note that the snippet would parse 4.000.00 as 400000 and works fine; inconsistent application of thousands separators is leniently parsed by NumberFormat and you can't tell it to be strict. In fact, 4.1.23.4567 is parsed as 41234567 - the only reason 4,000.00 is not parsed in the first place is because dots are not allowed in the fractional part at all. If you don't want that, you're again stuck, you can't use NumberFormat then. Regexes maybe, but you're now on the hook for writing one for each locale you care to support.
I am currently developing a Spring Boot application that is expected to store/show pricing and inventory data. For numeric values, we have been instructed to use the following number formats:
Cost Prices upto 4 decimal places in International Format(e.g. 233,445.6700)
Selling Prices upto 2 decimal places in International Format(e.g. 23,500.50)
Remaining numbers that are not amounts or prices in International Format(e.g. 123,456)
Now, there are quite a number of ways I could go about this. In lists shown in non-editable tables, I have used DecimalFormat with the following patterns:
FOUR_DECIMAL_FORMAT = "###,##0.0000"
TWO_DECIMAL_FORMAT = "###,##0.00"
INTERNATIONAL_NUMBER_FORMAT = "###,###"
I am faced with a dilemma when handling these numbers inside a jQuery-edit-Table or a regular form. I can show these numbers with the proper formatting using the above mentioned DecimalFormat patterns. But, when I receive those numbers, I don't know a approach other than removing the separators from each numeric value and then storing them. I don't want to do that since it seems like a pain to do for every numeric value.
What should be the ideal approach for handling the numbers here?
I'm developing an desktop application with java, right now I'm at the point of registering person data. One of the fields of the person form is "DocumentTextField" which holds the Identification Document and Number, that's why I tried to use a JFormattedTextField mask, to help user with the format to this field.
Basically, I just used the AbstracFormatterFactory to create the mask:
Mask = UU - ########## to get something like (PP-0123456789)
It does work perfecly on the fly, the user just type "pp0123456789" and the mask become this to "PP-0123456789" the point is the numbers length, as you can see on my mask, i declare 10 numbers (##########) but in fact, It could be lower than 10 numbers or even Higher. It does only work with 10 numbers, if user type lower than 10 numbers, the JFormattedTextField resset to empty, the same thing happen if user type more than 10 numbers.
is there any way to declare the range (numbers length) of this? some document are just 5 numbers (PP-01234).
Thank you so much in advance by reading this and trying to help.
I assume you're using Java 8 for your development. Are you seeing any kind of ParseException?
As per the documentation of the component: https://docs.oracle.com/javase/8/docs/api/javax/swing/text/MaskFormatter.html
When initially formatting a value if the length of the string is less than the length of the mask, two things can happen. Either the placeholder string will be used, or the placeholder character will be used. Precedence is given to the placeholder string.
According to the example:
MaskFormatter formatter = new MaskFormatter("###-####");
formatter.setPlaceholderCharacter('_');
setPlaceHolderCharacter method can help you with your problem.
I want to take input in the name form like two-third or one-fifth and I want my system to convert it into numerical form and give the answer.
Que: two-third of thirty is?
The system should output 20
How can I program it?
As a general problem natural language processing (NLP) - which is what you're talking about - is a difficult open-ended problem.
There are lots of libraries for this stuff. If you want background look here:
Is there a good natural language processing library
Or look up Natural Language Processing in Wikipedia.
However you said you want to do this and you're new to programming.
The first thing you need to do is break the problem down. That's how we solve programming problems.
So first try writing a program that can read a string containing a single word and map it to a number.
For example "One" outputs 1, "Two" outputs 2, "Thirty" outputs 30.
Next try and write a program that cuts a string into its constituent words.
You probably want to use an array here.
That's a process called tokenizing and Java has a built in StringTokenizer to do that.
You might want to code that yourself, but you're learning and it might be the moment to start learning using library code.
When you've got those try combining them so your program can convert "Thirty Seven" into 37 (i.e. numbers under 100).
That new program should combine the ideas of your program than can convert "Thirty" and "Seven" and the one that can split words up.
This is the other thing we do in programming - combining things.
We break it down to smaller problems solve them and then build them back up to solve the bigger problems.
(I apologize if I'm patronizing you but I have no idea of your experience).
After that you might add logic that handles "Five Hundred And Thirty Seven".
Again, notice how spotting Five followed by Hundred is like converting Five and then finding a token that tells you to multiply what you just saw by 100.
You could go on to handle Thousands, Hundred Thousand etc.
Or you could branch off into the fractions.
That's similar but you just have a different vocabulary.
Seven Forty-Seconds = 7/42.
As a learning challenge I would suggest you'll have come a long way if your program handles things like "forty two ninety-thirds of eight hundred and eighty nine".
The easy solution outputs 0.000508 - the floating point answer to (42/93)*889.
The extra credit solution outputs 2/3937 - (42/93)*889 can be simplified as a rational number to 2/3937.
To be honest, you'll be doing well if you can handle "nine-ninths of ninety nine".
Notice that the first word is the numerator (n). The second is the denominator (d). The third is always 'of'. The forth word is either the tens (t) or the units (u). If the forth was the units you're done otherwise if there is a fifth word it's the units.
The answer in that case is n/d*(t*10+u). If the tens or units are missing they're zero - obviously.
PS: You might need special handling for zero if you object to someone typing in ninety zero. It obviously means ninety but we don't say it in English!
you could try an mapping from
one ->1
two ->2
three ->3
four ->4
and so on
and on the other hand:
half ->2
third ->3
fourth ->4
then create an double to divide first value with 2nd..
at least multiply this value with the third (you can use the first mapping for this value) and you got the result.
At least, it is not easy due to you have to build the mapping between string and int manually.
What is the best way for converting phone numbers into international format (E.164) using Java?
Given a 'phone number' and a country id (let's say an ISO country code), I would like to convert it into a standard E.164 international format phone number.
I am sure I can do it by hand quite easily - but I would not be sure it would work correctly in all situations.
Which Java framework/library/utility would you recommend to accomplish this?
P.S. The 'phone number' could be anything identifiable by the general public - such as
* (510) 786-0404
* 1-800-GOT-MILK
* +44-(0)800-7310658
that last one is my favourite - it is how some people write their number in the UK and means that you should either use the +44 or you should use the 0.
The E.164 format number should be all numeric, and use the full international country code (e.g.+44)
Google provides a library for working with phone numbers. The same one they use for Android
http://code.google.com/p/libphonenumber/
String swissNumberStr = "044 668 18 00"
PhoneNumberUtil phoneUtil = PhoneNumberUtil.getInstance();
try {
PhoneNumber swissNumberProto = phoneUtil.parse(swissNumberStr, "CH");
} catch (NumberParseException e) {
System.err.println("NumberParseException was thrown: " + e.toString());
}
// Produces "+41 44 668 18 00"
System.out.println(phoneUtil.format(swissNumberProto, PhoneNumberFormat.INTERNATIONAL));
// Produces "044 668 18 00"
System.out.println(phoneUtil.format(swissNumberProto, PhoneNumberFormat.NATIONAL));
// Produces "+41446681800"
System.out.println(phoneUtil.format(swissNumberProto, PhoneNumberFormat.E164));
Speaking from experience at writing this kind of thing, it's really difficult to do with 100% reliability. I've written some Java code to do this that is reasonably good at processing the data we have but won't be applicable in every country. Questions you need to ask are:
Are the character to number mappings consistent between countries? The US uses a lot of this (eg 1800-GOT-MILK) but in Australia, as one example, its pretty rare. What you'd need to do is ensure that you were doing the correct mapping for the country in question if it varies (it might not). I don't know what countries that use different alphabets (eg Cyrilic in Russia and the former Eastern block countries) do;
You have to accept that your solution will not be 100% and you should not expect it to be. You need to take a "best guess" approach. For example, theres no real way of knowing that 132345 is a valid phone number in Australia, as is 1300 123 456 but that these are the only two patterns that are for 13xx numbers and they're not callable from overseas;
You also have to ask if you want to validate regions (area codes). I believe the US uses a system where the second digit of the area code is a 1 or a 0. This may have once been the case but I'm not sure if it still applies. Whatever the case, many other countries will have other rules. In Australia, the valid area codes for landlines and mobile (cell) phones are two digits (the first is 0). 08, 03 and 04 are all valid. 01 isn't. How do you cater for that? Do you want to?
Countries use different conventions no matter how many digits they're writing. You have to decide if you want to accept something other than the "norm". These are all common in Australia:
(02) 1234 5678
02 1234 5678
0411 123 123 (but I've never seen 04 1112 3456)
131 123
13 1123
131 123
1 300 123 123
1300 123 123
02-1234-5678
1300-234-234
+44 78 1234 1234
+44 (0)78 1234 1234
+44-78-1234-1234
+44-(0)78-1234-1234
0011 44 78 1234 1234 (0011 is the standard international dialling code)
(44) 078 1234 1234 (not common)
And thats just off the top of my head. For one country. In France, for example, its common the write the phone number in number pairs (12 34 56 78) and they pronounce it that way too: instead of:
un (one), deux (two), trois (three), ...
its
douze (twelve), trente-quatre (thirty four), ...
Do you want to cater for that level of cultural difference? I would assume not but the question is worth considering just in case you make your rules too strict.
Also some people may append extension numbers on phone numbers, possibly with "ext" or similar abbreviation. Do you want to cater for that?
Sorry, no code here. Just a list of questions to ask yourself and issues to consider. As others have said, a series of regular expressions can do much of the above but ultimately phone number fields are (mostly) free form text at the end of the day.
This was my solution:
public static String FixPhoneNumber(Context ctx, String rawNumber)
{
String fixedNumber = "";
// get current location iso code
TelephonyManager telMgr = (TelephonyManager) ctx.getSystemService(Context.TELEPHONY_SERVICE);
String curLocale = telMgr.getNetworkCountryIso().toUpperCase();
PhoneNumberUtil phoneUtil = PhoneNumberUtil.getInstance();
Phonenumber.PhoneNumber phoneNumberProto;
// gets the international dialling code for our current location
String curDCode = String.format("%d", phoneUtil.getCountryCodeForRegion(curLocale));
String ourDCode = "";
if(rawNumber.indexOf("+") == 0)
{
int bIndex = rawNumber.indexOf("(");
int hIndex = rawNumber.indexOf("-");
int eIndex = rawNumber.indexOf(" ");
if(bIndex != -1)
{
ourDCode = rawNumber.substring(1, bIndex);
}
else if(hIndex != -1)
{
ourDCode = rawNumber.substring(1, hIndex);
}
else if(eIndex != -1)
{
ourDCode = rawNumber.substring(1, eIndex);
}
else
{
ourDCode = curDCode;
}
}
else
{
ourDCode = curDCode;
}
try
{
phoneNumberProto = phoneUtil.parse(rawNumber, curLocale);
}
catch (NumberParseException e)
{
return rawNumber;
}
if(curDCode.compareTo(ourDCode) == 0)
fixedNumber = phoneUtil.format(phoneNumberProto, PhoneNumberFormat.NATIONAL);
else
fixedNumber = phoneUtil.format(phoneNumberProto, PhoneNumberFormat.INTERNATIONAL);
return fixedNumber.replace(" ", "");
}
I hope this helps someone with the same problem.
Enjoy and use freely.
Thanks for the answers. As stated in the original question, I am much more interested in the formatting of the number into the standard format than I am in determining if it is a valid (as in genuine) phone number.
I have some hand crafted code currently that takes a phone number String (as entered by the user) and a source country context and target country context (the country from where the number is being dialed, and the country to where the number is being dialed - this is known to the system) and then does the following conversion in steps
Strip all whitespace from the number
Translate all alpha into digits - using a lookup table of letter to digit (e.g. A-->2, B-->2, C-->2, D-->3) etc. for the keypad (I was not aware that some keypads distribute these differently)
Strip all punctuation - keeping a preceding '+' intact if it exists (in case the number is already in some sort of international format).
Determine if the number has an international dialling prefix for the country context - e.g. if source context is the UK, I would see if it starts with a '00' - and replace it with a '+'. I do not currently check whether the digits following the '00' are followed by the international dialing code for the target country. I look up the international dialing prefix for the source country in a lookup table (e.g. GB-->'00', US-->'011' etc.)
Determine if the number has a local dialing prefix for the country context - e.g. if the source context is the UK, I would look to see if it starts with a '0' - and replace it with a '+' followed by the international dialing code for the target country. I look up the local dialing prefix for the source country in a lookup table (e.g. GB-->'0', US-->'1' etc.), and the international dialing code for the target country in another lookup table (e.g.'GB'='44', US='1')
It seems to work for everything I have thrown at it so far - except for the +44(0)1234-567-890 situation - I will add a special case check for that one.
Writing it was not hard - and I can add special cases for each strange exception I come across. But I would really like to know if there is a standard solution.
The phone companies seem to deal with this thing every day. I never get inconsistent results when dialing numbers using the PSTN. For example, in the US (where mobile phones have the same area codes as landlines, I could dial +1-123-456-7890, or 011-1-123-456-7890 (where 011 is the international dialing prefix in the US and 1 is the international dialing code for the US), 1-123-456-7890 (where 1 is the local dialing prefix in the US) or even 456-7890 (assuming I was in the 123 area code at the time) and get the same results each time. I assume that internally these dialed numbers get converted to the same E.164 standard format, and that the conversion is all done in software.
To be honest, it sounds like you've got most of the bases covered already.
The +44(0)800 format sometimes (incorrectly) used in the UK is annoying and isn't strictly valid according to E.123, which is the ITU-T recommendation for how numbers should be displayed. If you haven't got a copy of E.123 it's worth a look.
For what it's worth, the telephone network itself doesn't always use E.164. Often there'll be a flag in the ISDN signalling generated by the PBX (or in the network if you're on a steam phone) which tells the network whether the number being dialled is local, national or international.
In some countries you can validate 112 as a valid phone number, but if you stick a country code in front of it it won't be valid any more. In other countries you can't validate 112 but you can validate 911 as a valid phone number.
I've seen some phones that put Q on the 7 key and Z on the 9 key. I've seen some phones that put Q and Z on the 0 key, and some that put Q and Z on the 1 key.
An area code that existed yesterday might not exist today, and vice-versa.
In half of North America (country code 1), the second digit rule used to be 0 or 1 for area codes, but that rule went away 10 years ago.
I'm not aware of a standard library or framework available for formatting telephone numbers into E.164.
The solution used for our product, which requires formatting PBX provided caller-id into E.164, is to deploy a file (database table) containing the E.164 format information for all countries applicable.
This has the advantage that the application can be updated (to handle all the strange corner cases in various PSTN networks) w/out requiring changes to the production code base.
The table contains a row for each country code and information regarding area code length and subscriber length. There may be multiple entries for a country depending on what variations are possible with area code and subscriber number lengths.
Using New Zealand PSTN (partial) dial plan as an example of the table..
CC AREA_CODE AREA_CODE_LENGTH SUBSCRIBER SUBSCRIBER_LENGTH
64 1 7
64 21 2 7
64 275 3 6
We do something similar to what you have described, i.e. strip the provided telephone number of any non-digit characters and then format based on various rules regarding overall number plan length, outside access code, and long distance/international access codes.