Java Pattern regex subtraction with greedy quantifiers

Java Pattern regex subtraction with greedy quantifiers - java

I am currently trying to use a regex to validate that my input has a certain format.
In the possible input there is only one combination of characters that I don't wan't to match.
Therefore I would like to use the subtraction as described in the JavaDoc for the Pattern class.
[a-z&&[^bc]] a through z, except for b and c: [ad-z] (subtraction)
So it is possible to exclude a certain pattern of characters from my expression.
Unfortunetly I was not able to get the regex right.
The pattern should match a String of exactly 8 digits ([0-9]{8}). Furthermore it should not match for the String of exactly 8 zero characters.
12345678 -> match yes
00000001 -> match yes
00000000 -> macth no
So this is how I tried it: regex = "[[0-9]{8}]&&[^[0{8}]]"
My Question now is, how can I group multiple characters together for a match.
Or how would the regex have to look like to meet my requirements.
Would be nice if somebody could help me with that.

In that case, you're better off with a negative lookahead assertion:
regex = "^(?!0{8})[0-9]{8}$"
A character class always matches a single character from a certain set, and that set is defined in the expression between square brackets, which is why your approach doesn't work.
[[0-9]{8}]&&[^[0{8}]] actually means
[[0-9]{8} # Match 8 characters between 0 and 9 or a "["
]&& # Match literal "]&&"
[^[0{8}] # Match a character except "[", "0", "{", "8", or "}"
] # Match a literal "]"
A solution without lookaround would have to make sure that at least one nonzero digit is present, while still making sure that the overall number of digits is exactly 8. That makes it complicated:
^(?:[1-9][0-9]{7}|[0-9][1-9][0-9]{6}|[0-9]{2}[1-9][0-9]{5}|[0-9]{3}[1-9][0-9]{4}|[0-9]{4}[1-9][0-9]{3}|[0-9]{5}[1-9][0-9]{2}|[0-9]{6}[1-9][0-9]|[0-9]{7}[1-9])$
Explanation:
^ # Start of string
(?: # Start of group
[1-9][0-9]{7} # Match digit > 0, followed by 7 digits
| # or
[0-9][1-9][0-9]{6} # Match any digit, a digit 1-9, 6 other digits
| # or
[0-9]{2}[1-9][0-9]{5} # Match 2 digits, a digit 1-9, 5 other digits
| # or
... # etc. etc. etc.
) # End of group
$ # End of string

Related

Make regular expression fail for invalid input [duplicate]

This question already has answers here:
Java RegEx meta character (.) and ordinary dot?
(9 answers)
Closed 2 years ago.
Upon validation using regular expression in Java, I need to return true for height having values :
80cm
80.2cm
80.25cm
My regular expression is as follows :
(\d)(\d?)(.?)(\d?)(\d?)(c)(m)
However if I pass in height as 71-80cm , the regular expression returns true too.
What change should I make to the regular expression to return false when height is 71-80cm ?

. matches any character, so you need to have \\. or just \. depending on the source. Check out: Java RegEx meta character (.) and ordinary dot?
Furthermore, additional changes need to be made such that e.g. 8025cm is not accepted if that is what you want.

I assume that the OP wishes to match substrings of the form
abcm
where:
"cm" is a literal;
"cm" is not followed by a letter;
"b" is the string representation of a non-negative float or integer (e.g., "80" or "80.25", but not "08" or ".25"); and
"a" is a character other than "-", "+" and ".", unless "b" is at the beginning of the string, in which case "a" is an empty string.
If my assumptions are correct you could use the following regex to match b in abcm:
(?<![-+.\d])[1-9]\d*(?:\.\d+)?cm(?![a-zA-Z])
Demo
The regex engine performs the following operations:
(?<! # begin negative lookbehind
[-+.\d] # match '-', '+', '.' or a digit
) # end negative lookbehind
[1-9] # match digit other than zero
\d* # match 0+ digits
(?:\.\d+) # match '.' followed by 1+ digits in a non-cap grp
? # optionally match non-cap grp
cm # match 'cm'
(?![a-zA-Z]) # match a letter in a negative lookahead
If my assumptions about what is required are not correct it may be evident how my answer could be adjusted appropriately.

Ok, let's take your expression and clean it up a little. You don't need all the capturing groups (..), since all you're interested in is validating the complete string. For that reason you should also enclose the expression in line beginning ^ and line end $ anchors, so your expression can't match inside a larger string. Lastly, you can group the period and trailing digits together (?:), since you won't get one without the other as per your example data. Which gets us:
^\d\d?(?:\.\d\d?)?cm$
See regex demo.
Then in Java, that check could look like this:
boolean foundMatch = subjectString.matches("^\\d\\d?(?:\\.\\d\\d?)?cm$");

Optimization of a regex for a Java identifier. Separating the number in the ending and the other part

I need to read a string as valid Java identifier and to get separately the number in the ending (if there is any) and the start part.
a1 -> a,1
a -> a,
a123b -> a123b,
ab123 -> ab, 123
a123b456 -> a123b, 456
a123b456c789 -> a123b456c, 789
_a123b456c789 -> _a123b456c, 789
I had written a pair of regex that I have tested on http://www.regexplanet.com/advanced/java/index.html, and they look to work OK
([a-zA-Z_][a-zA-Z0-9_]*[a-zA-Z_]|[a-zA-Z_])(\d+)$
([a-zA-Z_](?:[a-zA-Z0-9_]*[a-zA-Z_])?)(\d+)$
How can I shorten them? Or can you advice another regex?
I can't change [a-zA-Z_] for \w, for the last takes digits, too.
(We are talking on regex strings BEFORE replacement \ for \\ in Java/Groovy)

The Incremental Java says:
Each identifier must have at least one character.
The first character must be picked from: alpha, underscore, or dollar sign. The first character can not be a digit.
The rest of the characters (besides the first) can be from: alpha, digit, underscore, or dollar sign. In other words, it can be any valid identifier character.
Put simply, an identifier is one or more characters selected from alpha, digit, underscore, or dollar sign. The only restriction is the first character can't be a digit.
And the Java docs also add:
The convention, however, is to always begin your variable names with a letter, not "$" or "_". Additionally, the dollar sign character, by convention, is never used at all.
You may use this one that can be used to match any valid variable and put the starting chunk of chars into one group and all the trailing digits into another group:
^(?!\d)([$\w]+?)(\d*)$
See the regex demo
Or this one that will only match the identifiers that follow the convention:
^(?![\d_])(\w+?)(\d*)$
See this regex demo
Details:
^ - start of string
(?!\d) - the first char cannot be a digit ((?![\d_]) will fail the match if the first char is digit or _)
([$\w]+?) - Group 1: one or more word or $ chars (the (\w+?) will just match letters/digit/_ chars), as few as possible (as the +? is a lazy quantifier) up to the first occurrence of...
(\d*)$ - Group 2: zero or more digits at the end of string ($).
Groovy demo:
// Non-convention Java identifier
def res = 'a123b$456_c789' =~ /^(?!\d)([$\w]+?)(\d*)$/
print("${res[0][1]} : ${res[0][2]}") // => a123b$456_c : 789
// Convention Java identifier
def res2 = 'a123b456_c' =~ /^(?!\d)([$\w]+?)(\d*)$/
print("${res2[0][1]} : ${res2[0][2]}") // => a123b456_c :

EDIT: I tried to make my solution as simple as I could but I didn't think about it long enough so it is incorrect. Just look at the accepted answer
I believe you can shorten it to ^([a-zA-Z_][a-zA-Z_\d]*[^\d])(\d*)$ - match all possible characters with not a number at the end, and a number.

Regular expression that accepts only two digit integer or a floating number

I am trying to validate a text field that accepts number like 10.99, 1.99, 1, 10, 21.
\d{0,2}\.\d{1,2}
Above expression is only passing values such as 10.99, 11.99,1.99, but I want something that would satisfy my requirement.

Try this:
^\d{1,2}(\.\d{1,2})?$
^ - Match the start of string
\d{1,2} - Must contains at least 1 digit at most 2 digits
(\.\d{1,2}) - When decimal points occur must have a . with at least 1 and at most 2 digits
? - can have zero to 1 times
$ - Match the end of string

Assuming you don't want to allow edge cases like 00, and want at least 1 and at most 2 decimal places after the point mark:
^(?!00)\d\d?(\.\d\d?)?$
This precludes a required digit before the decimal point, ie ".12" would not match (you would have to enter "0.12", which is best practice).
If you're using String#matches(), you can drop the leading/trailing ^ and $, because that method must to match the entire string to return true.

First \d{0,2} does not seem to fit your requirement as in that case it will be valid for no number as well. It will give you the correct output but logically it does not mean to check no number in your string so you can change it to \d{1,2}
Now, in regex ? is for making things optional, you can use it with individual expression like below:
\d{1,2}\.?\d{0,2}
or you can use it on the combined expression like below
\d{1,2}(\.\d{1,2})?
You can also refer below list for further queries:
abc… Letters
123… Digits
\d Any Digit
\D Any Non-digit character
. Any Character
\. Period
[abc] Only a, b, or c
[^abc] Not a, b, nor c
[a-z] Characters a to z
[0-9] Numbers 0 to 9
\w Any Alphanumeric character
\W Any Non-alphanumeric character
{m} m Repetitions
{m,n} m to n Repetitions
* Zero or more repetitions
+ One or more repetitions
? Optional character
\s Any Whitespace
\S Any Non-whitespace character
^…$ Starts and ends
(…) Capture Group
(a(bc)) Capture Sub-group
(.*) Capture all
(abc|def) Matches abc or def
Useful link : https://regexone.com/

Can you try using this :
(\d{1,2}\.\d{1,2})|(\d{1,2})
Here is a Demo, you can check also simple program
You have two parts or two groups one to check the float numbers #.#, #.##, ##.##, ##.# and the second group to check the integer #, ##, so we can use the or |, float|integer

I think patterns of this type are best handled with alteration:
/^\s*([-+]?[0-9]*\.[0-9]+([eE][-+]?[0-9]+)?)$ #float
| # or
^(\d{1,2})$ # 2 digit int/mx
Demo

Regex to allow only 10 or 16 digit comma separated number

I want to validate a textfield in a Java based app where I want to allow only comma separated numbers and they should be either 10 or 16 digits. I have a regex that ^[0-9,;]+$ to allow only numbers, but it doesn't work for 10 or 16 digits only.

You can use {n,m} to specify length.
So matching one number with either 10 or 16 digits would be
^(\d{10}|\d{16})$
Meaning: match for exactly 10 or 16 digits and the stuff before is start-of-line and the stuff behind is end-of-line.
Now add separator:
^((\d{10}|\d{16})[,;])*(\d{10}|\d{16})$
Some sequences of 10-or-16 digit followed by either , or ; and then one sequece 10-or-16 with end-of-line.
You need to escape those \ in java.
public static void main(String[] args) {
String regex = "^((\\d{10}|\\d{16})[,;])*(\\d{10}|\\d{16})$";
String y = "0123456789,0123456789123456,0123456789";
System.out.println(y.matches(regex)); //Should be true
String n = "0123456789,01234567891234567,0123456789";
System.out.println(n.matches(regex)); //should be false
}

I would probably use this regex:
(\d{10}(?:\d{6})?,?)+
Explanation:
( - Begin capture group
\d{10} - Matching at least 10 digits
(?: - Begin non capture group
\d{6} - Match 6 more digits
)? - End group, mark as optional using ?
,? - optionally capture a comma
)+ - End outer capture group, require at least 1 or more to exist? (mabye change to * for 0 or more)
The following inputs match this regex
1234567890123456,1234567890
1234567890123456
1234567890
these inputs do not match
123,1234567890
12355
123456789012

You need to have both anchors and word boundaries:
/^(?:\b(?:\d{10}|\d{16})\b,?)*$/
The anchors are necessary so you don't get false positives for partial matches and the word boundaries are necessary so you don't get false positives for 20, 26, 30, 32 digit numbers.

Here is my version
(?:\d+,){9}\d+|(?:\d+,){15}\d+
Let's review it. First of all there is a problem to say: 10 or 16. So, I have to create actually 2 expressions with | between them.
Second, the expression itself. Your version just says that you allow digits and commas. However this is not what you really want because for example string like ,,, will match your regex.
So, the regex should be like (?:\d+,){n}\d+ that means: sequence of several digits terminated by comma and then sequence of several digits, e.g. 123,45,678 (where 123,45 match the first part and 678 match the second part)
Finally we get regex that I have written in the beginning of my answer:
(?:\d+,){9}\d+|(?:\d+,){15}\d+
And do not forget that when you write regex in you java code you have to duplicate the back slash, like this:
Pattern.compile("\\d+,{9}\\d+|\\d+,{15}\\d+")
EDIT: I have just added non-capturing group (?: ...... )

validate decimal with decimal separator and thousand separator

Hi I used this regex for validate number with decimal separator and thousand separator
ets = "\\,";
eds = "\\.";
"^([+\\-]?[0-9" + ets + "]*(" + eds + "[0-9]*)?)$"
But this fail (it accept when it should not) for two of my unit test case,
12., and 1,,2, anyone can help please?
Note: This work for 1..2.

Let's look at the actual regex that is used:
^([+\-]?[0-9\,]*(\.[0-9]*)?)$
This matches 12. because your second part is (\.[0-9]*). Note that * means zero or more, so digits are optional.
This also matches 1,,2 because you included the comma in the first character class [0-9\,]. So actually your regex would match ,,,,,,,, as well.
This can be solved without regexes, but if you need a regex, you'd probably want something like this:
^[+-]?[0-9]{1,3}(,[0-9]{3})*(\.[0-9]+)?$
Broken down:
^ # match start of string
[+-]? # matches optional + or - sign
[0-9]{1,3} # match one or more digits
(,[0-9]{3})* # match zero or more groups of comma plus three digits
(\. # match literal dot
[0-9]+ # match one or more digits
)? # makes the decimal portion optional
$ # match end of string
To use this in Java you'd want something like:
ets = ","; // commas don't need to be escaped
eds = "\\."; // matches literal dot
regex = "^[+-]?[0-9]{1,3}(" + ets + "[0-9]{3})*(" + eds + "[0-9]+)?$"

If I understand correctly, 12. matches because you are matching 0 or 1 occurences of (a period, then 0 or more ocurrences of any number between 0 and 9). So you may have a period and nothing in front of it.
1,,2 matches because you are matching 0 or more occurences of any characters between 0 and 9, or a comma. therefor you could have 0,,,,,,,,,,,,,,0.
If you want the last one not to match, make sure you can only have up to 3 numbers before a comma (in the thousand separator) using curly braces to indicate the amount of ocurrences allowed (i. e. {0,3}) after a set of numbers.
[0-9]{0,3},
#NullUserException just gave a complete regexp that works for your intentions

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java Pattern regex subtraction with greedy quantifiers - java

Related

Make regular expression fail for invalid input [duplicate]

Optimization of a regex for a Java identifier. Separating the number in the ending and the other part

Regular expression that accepts only two digit integer or a floating number

Regex to allow only 10 or 16 digit comma separated number

validate decimal with decimal separator and thousand separator

Categories

Resources