validate decimal with decimal separator and thousand separator - java

Hi I used this regex for validate number with decimal separator and thousand separator
ets = "\\,";
eds = "\\.";
"^([+\\-]?[0-9" + ets + "]*(" + eds + "[0-9]*)?)$"
But this fail (it accept when it should not) for two of my unit test case,
12., and 1,,2, anyone can help please?
Note: This work for 1..2.

Let's look at the actual regex that is used:
^([+\-]?[0-9\,]*(\.[0-9]*)?)$
This matches 12. because your second part is (\.[0-9]*). Note that * means zero or more, so digits are optional.
This also matches 1,,2 because you included the comma in the first character class [0-9\,]. So actually your regex would match ,,,,,,,, as well.
This can be solved without regexes, but if you need a regex, you'd probably want something like this:
^[+-]?[0-9]{1,3}(,[0-9]{3})*(\.[0-9]+)?$
Broken down:
^ # match start of string
[+-]? # matches optional + or - sign
[0-9]{1,3} # match one or more digits
(,[0-9]{3})* # match zero or more groups of comma plus three digits
(\. # match literal dot
[0-9]+ # match one or more digits
)? # makes the decimal portion optional
$ # match end of string
To use this in Java you'd want something like:
ets = ","; // commas don't need to be escaped
eds = "\\."; // matches literal dot
regex = "^[+-]?[0-9]{1,3}(" + ets + "[0-9]{3})*(" + eds + "[0-9]+)?$"

If I understand correctly, 12. matches because you are matching 0 or 1 occurences of (a period, then 0 or more ocurrences of any number between 0 and 9). So you may have a period and nothing in front of it.
1,,2 matches because you are matching 0 or more occurences of any characters between 0 and 9, or a comma. therefor you could have 0,,,,,,,,,,,,,,0.
If you want the last one not to match, make sure you can only have up to 3 numbers before a comma (in the thousand separator) using curly braces to indicate the amount of ocurrences allowed (i. e. {0,3}) after a set of numbers.
[0-9]{0,3},
#NullUserException just gave a complete regexp that works for your intentions

Related

Allow only left aligned zeros using regex

I am fairly new to using regex. I have a serial number which can take the following forms: VV-XXXXXX-P or VVXXXXXXP
If the hyphen variant is used, then the number of 'X' can be variable. For example 01-162-8 is equivalent to 010001628.
In order to identify the 2 formats, I have created the following regex's:
String HYPHENS = ([0]{1}[13]{1}-[1-9]{1,6}-[0-9]{1})
String NO_HYPHENS = ([0]{1}[13]{1}[0-9]{6}[0-9]{1})
However the issue with the NO_HYPHENS variant is that it allows 0 anywhere in the sequence
For example: 010010628 should not be a valid sequence because there's a non leading 0.
Additionally, how would I create a regex that I can use to replace all 0 from the sequence but the first one? I have tried the following but it also replaces the first 0.
String code = 010001234;
code = code.replaceAll("0+", "");
How could I modify the regex's to achieve this?
You can use
String NO_HYPHENS = "0[13](?!0*[1-9]0)[0-9]{6}[0-9]";
code = code.replaceAll("(?!^)0(?!$)", "");
See the regex demo.
The 0[13](?!0*[1-9]0)[0-9]{6}[0-9] regex matches:
0 - zero
[13] - one or three
(?!0*[1-9]0) - any zeros followed with a non-zero digit and then a zero is not allowed at this location
[0-9]{6} - six digits
[0-9] - a digit.
I understand you use it in Java with .matches, else, add ^ at the start and $ at the end.
Also, the (?!^)0(?!$) regex will match any zero that is not at the string start/end position.
^0[13]0*[1-9]*[0-9]$
^ - beginning of string
0 - first sign must be zero
[13] - second sign must be one or three
0* - sequence of zeros of variable length
[1-9] - sequence of non-zeros of variable length
[0-9] - finally one digit (it can be replaced with \d also)
$ - end of string
This regex has one problem: it doesn't check how many digits are in the XXXXXX section of serial number. But you can check it with length function:
String code = "010000230";
if (code.matches("^0[13]0*[1-9]*[0-9]$") && code.length() == 9) {
// code is valid
}
// replacement
code = code.replaceAll("^(0[13])0*([1-9]*[0-9])$", "$1$2");
Explanation of the replacement:
(0[13]) group number 1 (groups are in bracket)
0* some zeros
([1-9]*[0-9]) group number 2
This will be replaced with:
$1$2 group number 1 and group number 2 ($1 means group number 1)

Regex to allow only 10 or 16 digit comma separated number

I want to validate a textfield in a Java based app where I want to allow only comma separated numbers and they should be either 10 or 16 digits. I have a regex that ^[0-9,;]+$ to allow only numbers, but it doesn't work for 10 or 16 digits only.
You can use {n,m} to specify length.
So matching one number with either 10 or 16 digits would be
^(\d{10}|\d{16})$
Meaning: match for exactly 10 or 16 digits and the stuff before is start-of-line and the stuff behind is end-of-line.
Now add separator:
^((\d{10}|\d{16})[,;])*(\d{10}|\d{16})$
Some sequences of 10-or-16 digit followed by either , or ; and then one sequece 10-or-16 with end-of-line.
You need to escape those \ in java.
public static void main(String[] args) {
String regex = "^((\\d{10}|\\d{16})[,;])*(\\d{10}|\\d{16})$";
String y = "0123456789,0123456789123456,0123456789";
System.out.println(y.matches(regex)); //Should be true
String n = "0123456789,01234567891234567,0123456789";
System.out.println(n.matches(regex)); //should be false
}
I would probably use this regex:
(\d{10}(?:\d{6})?,?)+
Explanation:
( - Begin capture group
\d{10} - Matching at least 10 digits
(?: - Begin non capture group
\d{6} - Match 6 more digits
)? - End group, mark as optional using ?
,? - optionally capture a comma
)+ - End outer capture group, require at least 1 or more to exist? (mabye change to * for 0 or more)
The following inputs match this regex
1234567890123456,1234567890
1234567890123456
1234567890
these inputs do not match
123,1234567890
12355
123456789012
You need to have both anchors and word boundaries:
/^(?:\b(?:\d{10}|\d{16})\b,?)*$/
The anchors are necessary so you don't get false positives for partial matches and the word boundaries are necessary so you don't get false positives for 20, 26, 30, 32 digit numbers.
Here is my version
(?:\d+,){9}\d+|(?:\d+,){15}\d+
Let's review it. First of all there is a problem to say: 10 or 16. So, I have to create actually 2 expressions with | between them.
Second, the expression itself. Your version just says that you allow digits and commas. However this is not what you really want because for example string like ,,, will match your regex.
So, the regex should be like (?:\d+,){n}\d+ that means: sequence of several digits terminated by comma and then sequence of several digits, e.g. 123,45,678 (where 123,45 match the first part and 678 match the second part)
Finally we get regex that I have written in the beginning of my answer:
(?:\d+,){9}\d+|(?:\d+,){15}\d+
And do not forget that when you write regex in you java code you have to duplicate the back slash, like this:
Pattern.compile("\\d+,{9}\\d+|\\d+,{15}\\d+")
EDIT: I have just added non-capturing group (?: ...... )

Semi colon separated alphanumeric

I need to validate the below string using regular expression in Java:
String alphanumericList ="[\"State\"; \"districtOne\";\"districtTwo\"]";
I have tried the following:
String pattern="^\\[ (\"[\\w]\")\\s+(?:\\s+;\\s+ (\"[\\w]\")+) \\]$";
String alphanumericList ="[\"State1\"; \"district1\";\"district2\"]";
But the validation fails.
Any help is appreciated.
I'll try and mark the possible issues with your expression (issue numbers above the chars):
1 4 2 3 1 4 5 1
"^\\[ (\"[\\w]\")\\s+(?:\\s+;\\s+ (\"[\\w]\")+) \\]$"
As you can see, there are at least 5 issues:
The spaces in your expression are interpreted literally, i.e. if the input doesn't contain them, it would not match. Most probably you want to remove those spaces.
You expect at least one whitespace character after the first group (\\s+), which the input doesn't seem to contain. You probably want to remove that or change the quantifier from + to *.
You expect at least one whitespace character before each semicolon. Together with no. 2 this would make at least two after the first group. The solution would be the same as for no. 2.
Your expression the strings between double quotes seems wrong. (\"[\\w]\")+ means "a double quote, a single word character, a double quote" and all at least once. Besides that, \w is already a character class, you the brackets around that are not needed here (unless you want to add more classes or characters inside). You probably want (\"\\w+\") instead.
Additionally to 4 your non-capturing group that contains the semicolon ((?:\\s+;\\s+ (\"[\\w]\")+)) doesn't have a quantifier, i.e. it would be expected exactly once. You probably want to put the quantifier + or * after that group.
Another point that's not a direct issue is the capturing group around \"[\\w]\". Since you seem to want to match multiple strings after semicolons you'd only be able to capture one of the matching groups. Hence you'd most probably not be able to do what you intended anyways and thus the group is not necessary.
That said the fixed original expression would look like this:
pattern = "^\\[(\"\\w+\")(?:\\s*;\\s+\"\\w+\")+\\]$"
You are looking for this pattern:
String pattern = "\\[\\s*\"[^\"]*\"\\s*(?:;\\s*\"[^\"]*\"\\s*)*+\\]";
No need to add anchors since there are implicit if you use the matches() method since this method is the more appropriate for validation tasks.
pattern details:
\\[ # a literal opening square bracket
\\s* # optional whitespaces
\" # literal quote
[^\"]* # content between quotes: chars that are not a quote (zero or more)
\"
\\s*
(?: # non-capturing group:
; # a literal semi-colon
\\s*
\" # quoted content
[^\"]*
\"
\\s*
)*+ # repeat this group zero or more time (with a possessive quantifier)
\\] # a literal closing square bracket
The possessive quantifier prevent the regex engine to backtrack into repeated non-capturing groups if the closing square bracket is not present. It is a security to prevent uneeded backtracking and to make the pattern fail faster. Not that you can make possessive other quantifiers too before the non-capturing group for the same reason. More about possessive quantifiers.
I decided to describe the content between quotes in this way: \"[^\"]*\", but you can be more restrictive, allowing for example only words characters: \"\\w*\" or more general, allowing escaped quotes: \"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*+\"
Try this
static final String HEAD = "^\\[\\s*";
static final String TAIL = "\\s*\\]$";
static final String SEP = "\\s*;\\s*";
static final String ITEM = "\"[^\"]*\"";
static final String PAT = HEAD + ITEM + "(" + SEP + ITEM + ")*" + TAIL;
Try:
pattern = "^\\[(\"\\w+\";\\s*)*(\"\\w+\")\\]$";

Java Pattern regex subtraction with greedy quantifiers

I am currently trying to use a regex to validate that my input has a certain format.
In the possible input there is only one combination of characters that I don't wan't to match.
Therefore I would like to use the subtraction as described in the JavaDoc for the Pattern class.
[a-z&&[^bc]] a through z, except for b and c: [ad-z] (subtraction)
So it is possible to exclude a certain pattern of characters from my expression.
Unfortunetly I was not able to get the regex right.
The pattern should match a String of exactly 8 digits ([0-9]{8}). Furthermore it should not match for the String of exactly 8 zero characters.
12345678 -> match yes
00000001 -> match yes
00000000 -> macth no
So this is how I tried it: regex = "[[0-9]{8}]&&[^[0{8}]]"
My Question now is, how can I group multiple characters together for a match.
Or how would the regex have to look like to meet my requirements.
Would be nice if somebody could help me with that.
In that case, you're better off with a negative lookahead assertion:
regex = "^(?!0{8})[0-9]{8}$"
A character class always matches a single character from a certain set, and that set is defined in the expression between square brackets, which is why your approach doesn't work.
[[0-9]{8}]&&[^[0{8}]] actually means
[[0-9]{8} # Match 8 characters between 0 and 9 or a "["
]&& # Match literal "]&&"
[^[0{8}] # Match a character except "[", "0", "{", "8", or "}"
] # Match a literal "]"
A solution without lookaround would have to make sure that at least one nonzero digit is present, while still making sure that the overall number of digits is exactly 8. That makes it complicated:
^(?:[1-9][0-9]{7}|[0-9][1-9][0-9]{6}|[0-9]{2}[1-9][0-9]{5}|[0-9]{3}[1-9][0-9]{4}|[0-9]{4}[1-9][0-9]{3}|[0-9]{5}[1-9][0-9]{2}|[0-9]{6}[1-9][0-9]|[0-9]{7}[1-9])$
Explanation:
^ # Start of string
(?: # Start of group
[1-9][0-9]{7} # Match digit > 0, followed by 7 digits
| # or
[0-9][1-9][0-9]{6} # Match any digit, a digit 1-9, 6 other digits
| # or
[0-9]{2}[1-9][0-9]{5} # Match 2 digits, a digit 1-9, 5 other digits
| # or
... # etc. etc. etc.
) # End of group
$ # End of string

Pattern/Regular expression to grab a number *only* if it's the only field in the record

This has been driving me crazy the past couple of days. I'm trying to kill two birds with one stone by validating a record and extracting a field at the same time. My strategy has been to do this with a regular expression:
private Pattern firstNumber = Pattern.compile("\\d{1}");
Which I understand to mean "the first number in the line (record)." So far this has been effective at grabbing the first field (and ensuring that it's a number), but I want to take this a step further:
How can I tweak the regexp to specify that I want the number only if it's the sole field?
That is, if the record is simply 10, I want to grab 10. But if the record is 10 4, I don't want to grab anything (as this is an invalid record for the project).
I tried:
private Pattern oneNumberOnly = Pattern.compile("\\d{1}\n");
But -- to my chagrin -- this (and any other permutation of it) does not pick up any numbers. Is there something I'm missing here?
You can denote beginning of line/string with ^ and end of line/string with $, so the pattern would be
^\d+$
The {1} won't work because it excludes anything with more than one digit, such as 10. Using \d+ indicates one or more digits. Using \d may also allow decimals and negative values (not sure about Java), so if you only want digits, replace \d with [0-9].
Specifying {1} is always redundant, by the way, because by default an atom is matched once.
You can use the start line character and end line character. If you are trying to grab a number that is on its own line you can use:
Pattern.compile("^(\\d)++$");
By adding the {1} you will only get 1 digit of a number. You should also trim the string you are comparing against to get rid of any extra whitespace.
^ - Start of line character
\\d - digit character [0-9]
+ - 1 or more characters that match \d
+ - possesive (this will grab all the digits and is quicker than greedy quantifiers)
$ - End of line character

Categories

Resources