Make regular expression fail for invalid input [duplicate] - java

This question already has answers here:
Java RegEx meta character (.) and ordinary dot?
(9 answers)
Closed 2 years ago.
Upon validation using regular expression in Java, I need to return true for height having values :
80cm
80.2cm
80.25cm
My regular expression is as follows :
(\d)(\d?)(.?)(\d?)(\d?)(c)(m)
However if I pass in height as 71-80cm , the regular expression returns true too.
What change should I make to the regular expression to return false when height is 71-80cm ?

. matches any character, so you need to have \\. or just \. depending on the source. Check out: Java RegEx meta character (.) and ordinary dot?
Furthermore, additional changes need to be made such that e.g. 8025cm is not accepted if that is what you want.

I assume that the OP wishes to match substrings of the form
abcm
where:
"cm" is a literal;
"cm" is not followed by a letter;
"b" is the string representation of a non-negative float or integer (e.g., "80" or "80.25", but not "08" or ".25"); and
"a" is a character other than "-", "+" and ".", unless "b" is at the beginning of the string, in which case "a" is an empty string.
If my assumptions are correct you could use the following regex to match b in abcm:
(?<![-+.\d])[1-9]\d*(?:\.\d+)?cm(?![a-zA-Z])
Demo
The regex engine performs the following operations:
(?<! # begin negative lookbehind
[-+.\d] # match '-', '+', '.' or a digit
) # end negative lookbehind
[1-9] # match digit other than zero
\d* # match 0+ digits
(?:\.\d+) # match '.' followed by 1+ digits in a non-cap grp
? # optionally match non-cap grp
cm # match 'cm'
(?![a-zA-Z]) # match a letter in a negative lookahead
If my assumptions about what is required are not correct it may be evident how my answer could be adjusted appropriately.

Ok, let's take your expression and clean it up a little. You don't need all the capturing groups (..), since all you're interested in is validating the complete string. For that reason you should also enclose the expression in line beginning ^ and line end $ anchors, so your expression can't match inside a larger string. Lastly, you can group the period and trailing digits together (?:), since you won't get one without the other as per your example data. Which gets us:
^\d\d?(?:\.\d\d?)?cm$
See regex demo.
Then in Java, that check could look like this:
boolean foundMatch = subjectString.matches("^\\d\\d?(?:\\.\\d\\d?)?cm$");

Related

Java Regex: Allow only one blank space between two words [duplicate]

I am using the following regular expression without restricting any character length:
var test = /^(a-z|A-Z|0-9)*[^$%^&*;:,<>?()\""\']*$/ // Works fine
In the above when I am trying to restrict the characters length to 15 as below, it throws an error.
var test = /^(a-z|A-Z|0-9)*[^$%^&*;:,<>?()\""\']*${1,15}/ //**Uncaught SyntaxError: Invalid regular expression**
How can I make the above regular expression work with the characters limit to 15?
You cannot apply quantifiers to anchors. Instead, to restrict the length of the input string, use a lookahead anchored at the beginning:
// ECMAScript (JavaScript, C++)
^(?=.{1,15}$)[a-zA-Z0-9]*[^$%^&*;:,<>?()\"']*$
^^^^^^^^^^^
// Or, in flavors other than ECMAScript and Python
\A(?=.{1,15}\z)[a-zA-Z0-9]*[^$%^&*;:,<>?()\"']*\z
^^^^^^^^^^^^^^^
// Or, in Python
\A(?=.{1,15}\Z)[a-zA-Z0-9]*[^$%^&*;:,<>?()\"']*\Z
^^^^^^^^^^^^^^^
Also, I assume you wanted to match 0 or more letters or digits with (a-z|A-Z|0-9)*. It should look like [a-zA-Z0-9]* (i.e. use a character class here).
Why not use a limiting quantifier, like {1,15}, at the end?
Quantifiers are only applied to the subpattern to the left, be it a group or a character class, or a literal symbol. Thus, ^[a-zA-Z0-9]*[^$%^&*;:,<>?()\"']{1,15}$ will effectively restrict the length of the second character class [^$%^&*;:,<>?()\"'] to 1 to 15 characters. The ^(?:[a-zA-Z0-9]*[^$%^&*;:,<>?()\"']*){1,15}$ will "restrict" the sequence of 2 subpatterns of unlimited length (as the * (and +, too) can match unlimited number of characters) to 1 to 15 times, and we still do not restrict the length of the whole input string.
How does the lookahead restriction work?
The (?=.{1,15}$) / (?=.{1,15}\z) / (?=.{1,15}\Z) positive lookahead appears right after ^/\A (note in Ruby, \A is the only anchor that matches only start of the whole string) start-of-string anchor. It is a zero-width assertion that only returns true or false after checking if its subpattern matches the subsequent characters. So, this lookahead tries to match any 1 to 15 (due to the limiting quantifier {1,15}) characters but a newline right at the end of the string (due to the $/\z/\Z anchor). If we remove the $ / \z / \Z anchor from the lookahead, the lookahead will only require the string to contain 1 to 15 characters, but the total string length can be any.
If the input string can contain a newline sequence, you should use [\s\S] portable any-character regex construct (it will work in JS and other common regex flavors):
// ECMAScript (JavaScript, C++)
^(?=[\s\S]{1,15}$)[a-zA-Z0-9]*[^$%^&*;:,<>?()\"']*$
^^^^^^^^^^^^^^^^^
// Or, in flavors other than ECMAScript and Python
\A(?=[\s\S]{1,15}\z)[a-zA-Z0-9]*[^$%^&*;:,<>?()\"']*\z
^^^^^^^^^^^^^^^^^^
// Or, in Python
\A(?=[\s\S]{1,15}\Z)[a-zA-Z0-9]*[^$%^&*;:,<>?()\"']*\Z
^^^^^^^^^^^^^^^^^^

Regular expression for both length with whitespaces

I am trying to write a regular expression with following conditions.
Allow empty at any position in string.
First three are characters-range (1-3)
Next six are numeric (must) -range (6)
Next optional to have characters - range (1-3)
After that optional to have numeric - range(0-2)
For this i tried lot of things nothing works.
^[a-zA-Z]{1,3}[0-9]{6}[a-zA-Z]{0,3}[0-9]{0,2}
This expression works fine for matching all criteria but it is not allowing empty strings. Thanks in advance.
I just want to validate the string like "AB 123456 ADF 12".
As i mentioned first point the string contains empty space at any position in given string like "AB 123 456 ADF 12".
You have to wrap your pattern in parentheses and make it optional using ?:
^(?:[a-zA-Z]{1,3}[0-9]{6}[a-zA-Z]{0,3}[0-9]{0,2})?$
^ Assert beginning of string
(?: Start of non-capturing group
[a-zA-Z]{1,3}[0-9]{6}[a-zA-Z]{0,3}[0-9]{0,2} Your pattern
)? End of NCG, optional
$ Assert end of string
If you want to match strings with whitespace characters add \\s (or \s treating literal) and remove ?:
^(?:[a-zA-Z]{1,3}[0-9]{6}[a-zA-Z]{0,3}[0-9]{0,2}|\s*)$
^^^^
Live demo
Update
Based on comment:
^(?:[a-zA-Z](?:\s*[a-zA-Z]){0,2}\s*\d(?:\s*\d){5}(?:\s*[a-zA-Z](?:\s*[a-zA-Z]){0,2})?\s*(?:\d\s*\d?)?)$
Live demo

Java regular expressions for specific name\value format

I'm not familiar yet with java regular expressions. I want to validate a string that has the following format:
String INPUT = "[name1 value1];[name2 value2];[name3 value3];";
namei and valuei are Strings should contain any characters expect white-space.
I tried with this expression:
String REGEX = "([\\S*\\s\\S*];)*";
But if I call matches() I get always false even for a good String.
what's the best regular expression for it?
This does the trick:
(?:\[\w.*?\s\w.*?\];)*
If you want to only match three of these, replace the * at the end with {3}.
Explanation:
(?:: Start of non-capturing group
\[: Escapes the [ sign which is a meta-character in regex. This
allows it to be used for matching.
\w.*?: Lazily matches any word character [a-z][A-Z][0-9]_. Lazy matching means it attempts to match the character as few times possible, in this case meaning that when will stop matching once it finds the following \s.
\s: Matches one whitespace
\]: See \[
;: Matches one semicolon
): End of non-capturing group
*: Matches any number of what is contained in the preceding non-capturing group.
See this link for demonstration
You should escape square brackets. Also, if your aim is to match only three, replace * with {3}
(\[\\S*\\s\\S*\];){3}

Validating name string with dashes and singlequotes

I am trying to validate a string with the following specification:
"Non-empty string that contains only letters, dashes, or single quotes"
I'm using String.matches("[a-zA-Z|-|']*") but it's not catching the - characters correctly. For example:
Test Result Should Be
==============================
shouldpass true true
fail3 false false
&fail false false
pass-pass false true
pass'again true true
-'-'-pass false true
So "pass-pass" and "-'-'-pass" are failing. What am I doing wrong with my regex?
You should use the following regex:
[a-zA-Z'-]+
You regex is allowing literal |, and you have a range specified, from | to |. The hyphen must be placed at the end or beginning of the character class, or escaped in the middle if you want to match a literal hyphen. The + quantificator at the end will ensure the string is non-empty.
Another alternative is to include all Unicode letters:
[\p{L}'-]+
Java string: "[\\p{L}'-]+".
Possible solution:
[a-zA-Z-']+
Problems with your regex:
If you don't want to accept empty strings, change * to + to accept one or more characters instead of zero or more.
Characters in character class are implicitly separated by OR operator. For instance:
regex [abc] is equivalent of this regex a|b|c.
So as you see regex engine doesn't need OR operator there, which means that | will be treated as simple pipe literal:
[a|b] represents a OR | OR b characters
You seem to know that - has special meaning in character class, which is to create range of characters like a-z. This means that |-| will be treated by regex engine as range of characters between | and | (which effectively is only one character: |) which looks like main problem of your regex.
To create - literal we either need to
escape it \-
place it where - wouldn't be able to be interpreted as range. To be more precise we need to place it somewhere where it will not have access to characters which could be use as left and right range indicators l-r like:
at start of character class [- ...] (no left range character)
at end of character class [... -] (no right range character)
right after other range like A-Z-x - Z was already used as character representing end of range A-Z so it can't reused in Z-x range.
This will work:
[a-zA-Z'-]+
Using the | is going to search for a range, you just want that specific character.
Tested Here
try {
if (subjectString.matches("(?i)([a-z'-]+)")) {
// String matched entirely
} else {
// Match attempt failed
}
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
}
EXPLANATION:
(?i)([a-z'-]+)
----------
Options: Case insensitive; Exact spacing; Dot doesn't match line breaks; ^$ don't match at line breaks; Default line breaks
Match the regex below and capture its match into backreference number 1 «([a-z'-]+)»
Match a single character present in the list below «[a-z'-]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
A character in the range between “a” and “z” (case insensitive) «a-z»
The literal character “'” «'»
The literal character “-” «-»

Java Pattern regex subtraction with greedy quantifiers

I am currently trying to use a regex to validate that my input has a certain format.
In the possible input there is only one combination of characters that I don't wan't to match.
Therefore I would like to use the subtraction as described in the JavaDoc for the Pattern class.
[a-z&&[^bc]] a through z, except for b and c: [ad-z] (subtraction)
So it is possible to exclude a certain pattern of characters from my expression.
Unfortunetly I was not able to get the regex right.
The pattern should match a String of exactly 8 digits ([0-9]{8}). Furthermore it should not match for the String of exactly 8 zero characters.
12345678 -> match yes
00000001 -> match yes
00000000 -> macth no
So this is how I tried it: regex = "[[0-9]{8}]&&[^[0{8}]]"
My Question now is, how can I group multiple characters together for a match.
Or how would the regex have to look like to meet my requirements.
Would be nice if somebody could help me with that.
In that case, you're better off with a negative lookahead assertion:
regex = "^(?!0{8})[0-9]{8}$"
A character class always matches a single character from a certain set, and that set is defined in the expression between square brackets, which is why your approach doesn't work.
[[0-9]{8}]&&[^[0{8}]] actually means
[[0-9]{8} # Match 8 characters between 0 and 9 or a "["
]&& # Match literal "]&&"
[^[0{8}] # Match a character except "[", "0", "{", "8", or "}"
] # Match a literal "]"
A solution without lookaround would have to make sure that at least one nonzero digit is present, while still making sure that the overall number of digits is exactly 8. That makes it complicated:
^(?:[1-9][0-9]{7}|[0-9][1-9][0-9]{6}|[0-9]{2}[1-9][0-9]{5}|[0-9]{3}[1-9][0-9]{4}|[0-9]{4}[1-9][0-9]{3}|[0-9]{5}[1-9][0-9]{2}|[0-9]{6}[1-9][0-9]|[0-9]{7}[1-9])$
Explanation:
^ # Start of string
(?: # Start of group
[1-9][0-9]{7} # Match digit > 0, followed by 7 digits
| # or
[0-9][1-9][0-9]{6} # Match any digit, a digit 1-9, 6 other digits
| # or
[0-9]{2}[1-9][0-9]{5} # Match 2 digits, a digit 1-9, 5 other digits
| # or
... # etc. etc. etc.
) # End of group
$ # End of string

Categories

Resources