Regex for matching different float formats - java

I'm looking for a regex in scala to match several floats:
9,487,346 -> should match
9.487.356,453->should match
38,4 -> match
-38,4 -> should match
-38.5
-9,487,346.76
-38 -> should match
So basically it should match a number that:
Numbered lists are easy
possibly gave thousand separators (either comma or dot)
possibly are decimal again with either comma or dot as separator
Currently I'm stuck with
val pattern="\\d+((\\.\\d{3}+)?(,\\d{1,2}+)?|(,\\d{3}+)?(\\.\\d{1,2}+)?)"
Edit: I'm mostly concered with European Notation.
Example where the current pattern not matches: 1,052,161
I guess it would be close enough to match that the String only contains numbers,sign, comma and dot

If, as your edit suggests, you are willing to accept a string that simply "contains numbers, sign, comma and dot" then the task is trivial.
[+-]?\d[\d.,]*
update
After thinking it over, and considering some options, I realize that your original request is possible if you'll allow for 2 different RE patterns, one for US-style numbers (commas before dot) and one for Euro-style numbers (dots before comma).
def isValidNum(num: String): Boolean =
num.matches("[+-]?\\d{1,3}(,\\d{3})*(\\.\\d+)?") ||
num.matches("[+-]?\\d{1,3}(\\.\\d{3})*(,\\d+)?")
Note that the thousand separators are not optional, so a number like "1234" is not evaluated as valid. That can be changed by adding more RE patterns: || num.matches("[+-]?\\d+")

Based on your rules,
It should match a number that:
Numbered lists are easy
possibly gave thousand separators (either comma or dot)
possibly are decimal again with either comma or dot as separator
Regex:
^[+-]?\d{1,3}(?:[,.]\d{3})*(?:[,.]\d+)?$
[+-]? Allows + or - or nothing at the start
\d{1,3} allows one to 3 digits
([,.]\d{3}) allows . or , as thousands separator followed by 3 digits (* allows unlimited such matches)
(?:[,.]\d+)? allows . or , as decimal separator followed by at least one digit.
This matches all of the OP's example cases. Take a look at the demo below for more:
Regex101 Demo
However one limitation is it allows . or , as thousand separator and as decimal separator and doesn't validate that if , is thousands separator then . should be decimal separator. As a result the below cases incorrectly show up as matches:
201,350,780,88
211.950.266.4
To fix this as well, the previous regex can have 2 alternatives - one to check for a notation that has , as thousands separator and . as decimal, and another one to check vice-versa. Regex:
^[+-]?\d{1,3}(?:(?:(?:\.\d{3})*(?:\,\d+)?)|(?:(?:\,\d{3})*(?:\.\d+)?))$
Regex101 Demo
Hope this helps!

Related

Complicated regex and possible simple way to do it [duplicate]

I don't write many regular expressions so I'm going to need some help on the one.
I need a regular expression that can validate that a string is an alphanumeric comma delimited string.
Examples:
123, 4A67, GGG, 767 would be valid.
12333, 78787&*, GH778 would be invalid
fghkjhfdg8797< would be invalid
This is what I have so far, but isn't quite right: ^(?=.*[a-zA-Z0-9][,]).*$
Any suggestions?
Sounds like you need an expression like this:
^[0-9a-zA-Z]+(,[0-9a-zA-Z]+)*$
Posix allows for the more self-descriptive version:
^[[:alnum:]]+(,[[:alnum:]]+)*$
^[[:alnum:]]+([[:space:]]*,[[:space:]]*[[:alnum:]]+)*$ // allow whitespace
If you're willing to admit underscores, too, search for entire words (\w+):
^\w+(,\w+)*$
^\w+(\s*,\s*\w+)*$ // allow whitespaces around the comma
Try this pattern: ^([a-zA-Z0-9]+,?\s*)+$
I tested it with your cases, as well as just a single number "123". I don't know if you will always have a comma or not.
The [a-zA-Z0-9]+ means match 1 or more of these symbols
The ,? means match 0 or 1 commas (basically, the comma is optional)
The \s* handles 1 or more spaces after the comma
and finally the outer + says match 1 or more of the pattern.
This will also match
123 123 abc (no commas) which might be a problem
This will also match 123, (ends with a comma) which might be a problem.
Try the following expression:
/^([a-z0-9\s]+,)*([a-z0-9\s]+){1}$/i
This will work for:
test
test, test
test123,Test 123,test
I would strongly suggest trimming the whitespaces at the beginning and end of each item in the comma-separated list.
You seem to be lacking repetition. How about:
^(?:[a-zA-Z0-9 ]+,)*[a-zA-Z0-9 ]+$
I'm not sure how you'd express that in VB.Net, but in Python:
>>> import re
>>> x [ "123, $a67, GGG, 767", "12333, 78787&*, GH778" ]
>>> r = '^(?:[a-zA-Z0-9 ]+,)*[a-zA-Z0-9 ]+$'
>>> for s in x:
... print re.match( r, s )
...
<_sre.SRE_Match object at 0xb75c8218>
None
>>>>
You can use shortcuts instead of listing the [a-zA-Z0-9 ] part, but this is probably easier to understand.
Analyzing the highlights:
[a-zA-Z0-9 ]+ : capture one or more (but not zero) of the listed ranges, and space.
(?:[...]+,)* : In non-capturing parenthesis, match one or more of the characters, plus a comma at the end. Match such sequences zero or more times. Capturing zero times allows for no comma.
[...]+ : capture at least one of these. This does not include a comma. This is to ensure that it does not accept a trailing comma. If a trailing comma is acceptable, then the expression is easier: ^[a-zA-Z0-9 ,]+
Yes, when you want to catch comma separated things where a comma at the end is not legal, and the things match to $LONGSTUFF, you have to repeat $LONGSTUFF:
$LONGSTUFF(,$LONGSTUFF)*
If $LONGSTUFF is really long and contains comma repeated items itself etc., it might be a good idea to not build the regexp by hand and instead rely on a computer for doing that for you, even if it's just through string concatenation. For example, I just wanted to build a regular expression to validate the CPUID parameter of a XEN configuration file, of the ['1:a=b,c=d','2:e=f,g=h'] type. I... believe this mostly fits the bill: (whitespace notwithstanding!)
xend_fudge_item_re = r"""
e[a-d]x= #register of the call return value to fudge
(
0x[0-9A-F]+ | #either hardcode the reply
[10xks]{32} #or edit the bitfield directly
)
"""
xend_string_item_re = r"""
(0x)?[0-9A-F]+: #leafnum (the contents of EAX before the call)
%s #one fudge
(,%s)* #repeated multiple times
""" % (xend_fudge_item_re, xend_fudge_item_re)
xend_syntax = re.compile(r"""
\[ #a list of
'%s' #string elements
(,'%s')* #repeated multiple times
\]
$ #and nothing else
""" % (xend_string_item_re, xend_string_item_re), re.VERBOSE | re.MULTILINE)
Try ^(?!,)((, *)?([a-zA-Z0-9])\b)*$
Step by step description:
Don't match a beginning comma (good for the upcoming "loop").
Match optional comma and spaces.
Match characters you like.
The match of a word boundary make sure that a comma is necessary if more arguments are stacked in string.
Please use - ^((([a-zA-Z0-9\s]){1,45},)+([a-zA-Z0-9\s]){1,45})$
Here, I have set max word size to 45, as longest word in english is 45 characters, can be changed as per requirement

Extract exactly n digits in a sentence using REGEX

Example
The no.s 1234 65
Input: n
For n=4, the output should be 1234
For n=2, the output should be : 65 (not 12)
Tried \d{n} which gives 12 and \d{n,} gives 1234 but i want the exact matching one.
Pattern p = Pattern.compile("//\d{n,}");
you need negative lookaround assertion: (?<!..): negative look behind, and (?!..): negative look ahead : regex101
(?<!\d)\d{4}(?!\d)
however not all regex engine supports them, maybe a work around may match also the preceeding character and following character (contrary to look-around which are 0 width matches), (\D matches all excpet a digit)
(?:^|\D)(\d{4})(?:\D|$)
I think what you meant is the \b character.
Hence, the regex you're looking for would be (for n=2):
\b\d{2}\b
From what I understand, you're looking for a regex that will match a number in a string which has n digits, taking into into account the spacing between the numbers. If that's the case, you're looking for something like this:
\b\d{4}\b
The \b will ensure the match is constrained to the start/end of a 'word' where a word is the boundary between anything matched by \w (which includes digits) and anything matched by the opposite, \W (which includes spaces).
I don't code in java but I can try to answer this using regex in general.
If your number is in the format d1d2d3d4 d5d6 and you want to extract digits d5d6, create 3 groups as r'([0-9]+)("/s")([0-9]+)' – each set of parenthesis () represent one group. Now, extract the third group only in another object which is your required output.

Java Regular expression for replacing specific strings

I want to replace numbers in a string if it is more than 3 digits (Phone numbers should be replaced) and it should not replace the number if it is followed by $ and if the number has decimal points. I used the below expression.
"\d{3,}+(?!\$/\.)"
Issues I face are , it is replacing numbers that are more than ten digits as i want to replace some numbers which are some ID's with more than 10 digits. Also if a number has more than 3 digits after the decimal , those numbers are also getting replaced. I dont want a number to be replaced if it has decimal points. can some body help?
For Eg, say a number string "3452678916381914". Actually it has to be replaced. But the above regex not replacing that. For numbers like $1234,45.567 - those numbers shouldn't be replaced. But above regex replacing 45.567
use lookahead and lookbehind regex, 1st assert start word boundary is not precede by a $ or ., then assert end word boundary is not follow by a $ or .
It works for both example you provided, you might need to tweak a little bit to handle some corner case
(?<![\$\.])\b\d{3,}\b(?![\$\.])
see demo, it match the first 2 but not the rest
3452678916381914 # match
1234 56789 # match
$1234,45.567
$1234
12.345
12345.6678
123$

Regular expression to identify all numerics, across all localization formats

I'm scanning a text with a Scanner object, let's say lineScanner. Here are the declarations:
String myText= "200,00/100,00/28/65.36/21/458,696/25.125/4.23/6.3/4,2/659845/4524/456,65/45/23.495.254,3";
Scanner lineScanner = new Scanner(myText);
With that Scanner, I would like to find the first BigDecimal, and after the second one, and so on. I declared a BIG_DECIMAL_PATTERN to match any case.
Here are the rules I defined:
Thousands separator is always followed by exactly 3 digits
There is always exactly 1 or 2 digits after the decimal point.
If the thousands separator is the comma symbol, so the decimal point is the dot symbol and conversely
Thousands separator is optional, as decimal part of the number
String nextBigDecimal = lineScanner.findInLine(BIG_DECIMAL_PATTERN);
Now, here is the BIG_DECIMAL_PATTERN I declared:
private final String BIG_DECIMAL_PATTERN=
"\\d+(\\054\\d{3}+)?(\\056\\d{1,2}+)?|\\d+(\\056\\d{3}+)?(\\054\\d{1,2}+)?)";
\\054 is the ASCII octal representation of ","
\\056 is the ASCII octal representation of "."
My problem is that it doesn't work well because when the pattern of the first part is found, the second part (after the |) is not checked and in my example
the first match will be 200 and not 200,00. So I can try this:
private final String BIG_DECIMAL_PATTERN=\\d+([.,]\\d{3}+)?([,.]\\d{1,2}+)?
But there is a new problem: comma and dot are not exclusive, I mean if one is the thousands separator, the decimal point should be the other one.
Thanks for helping.
I believe a variant of your 2nd RegEx will work for you. Consider this regex:
^\\d+(?:([.,])\\d{3})*(?:(?!\\1)[.,]\\d{1,2})?$
Live Demo: http://www.rubular.com/r/vHlEdBMhO9
Explanation: What it does is to first capture the comma or dot in capture group # 1. And then later makes sure same capture group # 1 doesn't appear at decimal point using negative lookahead. Which in other words ensures that if comma comes first then dot will come later and viceversa.
Could you do an either-or regular expression? E.g. something like:
private final String BIG_DECIMAL_PATTERN
= "\\d+((\\.\\d{3}+)?(,\\d{1,2}+)?|(,\\d{3}+)?(\\.\\d{1,2}+)?)"
Note - I haven't checked whether your regex actually works - and suspect this may not be the best way of achieving what you are trying to do. All I'm doing here to get you up and running is suggesting you could try using (regex1|regex2) where regex1 is dots followed by commas and regex2 is commas followed by dots.

How can I express such requirement using Java regular expression?

I need to check that a file contains some amounts that match a specific format:
between 1 and 15 characters (numbers or ",")
may contains at most one "," separator for decimals
must at least have one number before the separator
this amount is supposed to be in the middle of a string, bounded by alphabetical characters (but we have to exclude the malformed files).
I currently have this:
\d{1,15}(,\d{1,14})?
But it does not match with the requirement as I might catch up to 30 characters here.
Unfortunately, for some reasons that are too long to explain here, I cannot simply pick a substring or use any other java call. The match has to be in a single, java-compatible, regular expression.
^(?=.{1,15}$)\d+(,\d+)?$
^ start of the string
(?=.{1,15}$) positive lookahead to make sure that the total length of string is between 1 and 15
\d+ one or more digit(s)
(,\d+)? optionally followed by a comma and more digits
$ end of the string (not really required as we already checked for it in the lookahead).
You might have to escape backslashes for Java: ^(?=.{1,15}$)\\d+(,\\d+)?$
update: If you're looking for this in the middle of another string, use word boundaries \b instead of string boundaries (^ and $).
\b(?=[\d,]{1,15}\b)\d+(,\d+)?\b
For java:
"\\b(?=[\\d,]{1,15}\\b)\\d+(,\\d+)?\\b"
More readable version:
"\\b(?=[0-9,]{1,15}\\b)[0-9]+(,[0-9]+)?\\b"

Categories

Resources