Emailid validation using java regex - java

Can anyone please help me to find out a solution of this problem using "java regex".
Question: The EmailId should be in the following format <<1st part>>.<<2nd part>>#<<3rd part>><<4th part>>
1st part should contain alpha numeric characters and it must contain at least 1 uppercase alphabet, 1 lowercase alphabet, and 1 number.
2nd part should contain alpha numeric characters.
3rd part should be an alphabetical value of length 3 to 8.
4th part can be “.com” or “.co.in”
My solution is:
if(EmailId.matches(""^(?=.*\\d)(?=.*[a-z])(?=.*[A-Z]).{3,}\\.[\\w&&[^_]]+#[\\w&&[^_]]{3,8}\\.(com|co\\.in)")){
return true;
}
But this solution is accepting "RAKESH1.Roshan#infy.co.in" this Email Id, which is not acceptable.
I don't know where I am going wrong.
Please help!!!!!!!!

Your regex is not working because of . used in your pattern. If you do not allow all chars, you should only stick to specific classes of chars you allow.
I suggest:
.matches("(?=\\p{Alnum}*\\p{Upper})(?=\\p{Alnum}*[0-9])(?=\\p{Alnum}*\\p{Lower})\\p{Alnum}*[.]\\p{Alnum}+#\\p{Alpha}{3,8}[.]co(m|[.]in)"))
See the Java demo. Since the pattern is used with .matches(), no ^ at the start and $ at the end anchors are necessary.
Details:
(?=\\p{Alnum}*\\p{Upper}) - Right from the start of the string, there must be an uppercase letter after 0+ alphanumeric chars
`(?=\p{Alnum}*[0-9]) - Right from the start of the string, there must be a digit after 0+ alphanumeric chars
(?=\\p{Alnum}*\\p{Lower}) - Right from the start of the string, there must be a lowercase letter after 0+ alphanumeric chars
\\p{Alnum}* - 0 or more alphanumeric chars (replace * with + if you need to require at least 1)
[.] - a literal . char
\\p{Alnum}+ - 1 or more alphanumeric chars
# - a literal # char
\\p{Alpha}{3,8} - 3 to 8 or more alphabetic chars
[.]co(m|[.]in) - .com or .co.in at the end of the string.

The problem is that your lookaheads aren't limited to the first part.
For example, with the input RAKESH1.Roshan#infy.co.in, the lookahead (?=.*[a-z]) will skip RAKESH1.R and find the following lowercase o.
You can fix this by changing .* to [^.]* in all lookaheads.
Another problem is the .{3,}. This will match any character, not just alphanumeric ones. Change this to [\\w&&[^_]]{3,} (or just [\\w&&[^_]]+).

Related

Having problems with java regex

I have the following regex:
/[-A-Z]{4}\d{2}/[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12}.png
Basically I want to check for strings of the basic type
ABCD12/<here_is_a_random_uuid_as_a_string>.png
The UUID (which is in UPPER CASE) checking works fine, but now let's take a look at a special case. I want to accept strings like this
--CD12/...
AB--12/...
but NOT like this:
A--D12/...
But I can not get the first part of the regex right. Basically I need to check for either two digits or two -after each other twice.
For my understanding [-A-Z]{4} means "either - or something between A - Z with a length of 4". So why doesn't my pattern work?
EDIT:
This answer was posted within the comments and it works:
(?mi)^(?:--[A-Z]{2}|[A-Z]{2}(?:--|[A-Z]{2}))\d{2}/[0-9A-F]{8}(?:-[0-9A-F]{4}){3}-[0-9A-F]{12}\.png$
Can somebody explain to me what (?mi) and what (?:...) means? The normal ? means 0 or 1 time, but what is the : for?
EDIT 2:
Just for those how might have a similar problem and do not want to read all of those regexes ;)
I slightly modified an answer to also accept patterns like ----12. The end result:
"^/(?:--[A-Z]{2}|-{4}|[A-Z]{2}(?:--|[A-Z]{2}))\\d{2}/[0-9A-F]{8}(?:-[0-9A-F]{4}){3}-[0-9A-F]{12}\\.png$"
It works like a charm.
You may use this regex for your cases:
^(?:--[A-Z]{2}|[A-Z]{2}(?:--|[A-Z]{2}))\d{2}/[0-9A-F]{8}(?:-[0-9A-F]{4}){3}-[0-9A-F]{12}\.png$
RegEx Demo
Details about first part:
^: Start
(?:: Start non-capture group
--[A-Z]{2}: Match -- followed by 2 letters
|: OR
[A-Z]{2}: Match 2 letters
(?:--|[A-Z]{2}): Match -- OR 2 letters
): End non-capture group
btw (?:...) is non-capture group.
Your [-A-Z]{4} matches any four occurrences of an uppercase ASCII letter or -, so it can also match ----, A---, ---B, -B--, etc.
You want to make sure that if there are hyphens, they come after or before two letters:
(?:[A-Z]{2}--|--[A-Z]{2}|[A-Z]{4})
It means:
(?: - start of a non-capturing group:
[A-Z]{2}-- - two uppercase ASCII letters and then --
| - or
--[A-Z]{2} - -- and then any two uppercase ASCII letters
| - or
[A-Z]{4} - any four uppercase ASCII letters
) - end of the non-capturing group.
The full pattern:
(?:[A-Z]{2}--|--[A-Z]{2}|[A-Z]{4})\d{2}/[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12}\.png
To force the entire string match, add ^ (start of string) and $ (end of string) anchors:
^(?:[A-Z]{2}--|--[A-Z]{2}|[A-Z]{4})\d{2}/[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12}\.png$
See the regex demo
Note the . matches any char, to match a literal dot, you should escape it.

Regex to validate custom format

I have this format: xx:xx:xx or xx:xx:xx-y, where x can be 0-9 a-f A-F and y can be only 0 or 1.
I come up with this regex: ([0-9A-Fa-f]{2}[:][0-9A-Fa-f]{2}[:][0-9A-Fa-f]{2}|[-][0-1]{1})
(See regexr).
But this matches 0a:0b:0c-3 too, which is not expected.
Is there any way to remove these cases from result?
[:] means a character from the list that contains only :. It is the same as
:. The same for [-] which has the same result as -.
Also, {1} means "the previous piece exactly one time". It does not have any effect, you can remove it altogether.
To match xx:xx:xx or xx:xx:xx-y, the part that matches -y must be optional. The quantifier ? after the optional part mark it as optional.
All in all, your regex should be like this:
[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}(-[01])?
If the regex engine you use can be told to ignore the character case then you can get rid of A-F (or a-f) from all character classes and the regex becomes:
[0-9a-f]{2}:[0-9a-f]{2}:[0-9a-f]{2}(-[01])?
How it works, piece by piece:
[0-9a-f] # any digit or letter from (and including) 'a' to 'f'
{2} # the previous piece exactly 2 times
: # the character ':'
[0-9a-f]
{2}
:
[0-9a-f]
{2}
( # start a group; it does not match anything
- # the character '-'
[01] # any character from the class (i.e. '0' or '1')
) # end of group; the group is needed for the next quantifier
? # the previous piece (i.e. the group) is optional
# it can appear zero or one times
See it in action: https://regexr.com/4rfvr
Update
As #the-fourth-bird mentions in a comment, if the regex must match the entire string then you need to anchor its ends:
^[0-9a-f]{2}:[0-9a-f]{2}:[0-9a-f]{2}(-[01])?$
^ as the first character of a regex matches the beginning of the string, $ as the last character matches the end of the string. This way the regex matches the entire string only (when there aren't other characters before or after the xx:xx:xx or xx:xx:xx-y part).
If you use the regex to find xx:xx:xx or xx:xx:xx-y in a larger string then you don't need to add ^ and $. Of course, you can add only ^ or $ to let the regex match only at the beginning or at the end of the string.
You want
xx:xx:xx or if it is followed by a -, then it must be a 0 or 1 and then it is the end (word boundry).
So you don't want any of these
0a:0b:0c-123
0a:0b:0cd
10a:0b:0c
either.
Then you want "negative lookingahead", so if you match the first part, you don't want it to be followed by a - (the first pattern) and it should end there (word boundary), and if it is followed by a -, then it must be a 0 or 1, and then a word boundary:
/\b([0-9a-f]{2}[:][0-9a-f]{2}[:][0-9a-f]{2}(?!-)\b|\b[0-9a-f]{2}[:][0-9a-f]{2}[:][0-9a-f]{2}-[01]\b)/i
To prevent any digit in front, a word boundary is added to the front as well.
Example: https://regexr.com/4rg42
The following almost worked:
/\b([0-9a-f]{2}[:][0-9a-f]{2}[:][0-9a-f]{2}\b[^-]|\b[0-9a-f]{2}[:][0-9a-f]{2}[:][0-9a-f]{2}-[01]\b)/i
but if it is the end of file and it is 3a:2b:11, then the [^-] will try to match a non - character and it won't match.
Example: https://regexr.com/4rg4q

Java regex - First needs to be the letter X(case insensitve) the rest digits

I need to match to see if a string is in the format of X[d].... It has to have the letter X (case intensive) at the start and AT LEAST 1 digit after. I tried the following regex, but it doesn't matchanything:
^(?i)[x](?=.*[0-9])*$
// ^(?i)[x] - first character needs to be x (case intensive)
// (?=.*[0-9]) - should have at least one digit after and must be all digits after
Use the following.
^(?i)x\d+$
This translates to case insensitive x followed by one or more digits 0-9. There's no need for brackets around the x because it's not a set. It's only one character.
Alternatively, you can create a set that consists of upper and lower case x.
^[xX]\d+$
In Java, you may use
s.matches("(?i)x[0-9]+")
It will match a string starting with x or X and then having 1 or more digits.
You should not quantify a lookahead, a zero-width assertion, since it would mean it would match an empty location and matching it repeatedly means you are still there and the regex index is not advanced.
However, Java regex just ignores a quantified lookahead. Your current regex, ^(?i)[x](?=.*[0-9])*$, matches x but not x5 as there is only one part to match, [x]. see the Java demo.
Even if you remove the * quantifier, ^(?i)[x](?=.*[0-9])$ does not match any string since $, end of string, is required right after x while (?=.*[0-9]) positive lookahead requires a digit after any 0+ chars other than line break chars.

Restrict consecutive characters using Java Regex

I need to allow alphanumeric characters , "?","." , "/" and "-" in the given string. But I need to restrict consecutive - only.
For example:
www.google.com/flights-usa should be valid
www.google.com/flights--usa should be invalid
currently I'm using ^[a-zA-Z0-9\\/\\.\\?\\_\\-]+$.
Please suggest me how to restrict consecutive - only.
You may use grouping with quantifiers:
^[a-zA-Z0-9/.?_]+(?:-[a-zA-Z0-9/.?_]+)*$
See the regex demo
Details:
^ - start of string
[a-zA-Z0-9/.?_]+ - 1 or more characters from the set defined in the character class (can be replaced with [\w/.?]+)
(?:-[a-zA-Z0-9/.?_]+)* - zero or more sequences ((?:...)*) of:
- - hyphen
[a-zA-Z0-9/.?_]+ - see above
$ - end of string.
Or use a negative lookahead:
^(?!.*--)[a-zA-Z0-9/.?_-]+$
^^^^^^^^^
See the demo here
Details:
^ - start of string
(?!.*--) - a negative lookahead that will fail the match once the regex engine finds a -- substring after any 0+ chars other than a newline
[a-zA-Z0-9/.?_-]+ - 1 or more chars from the set defined in the character class
$ - end of string.
Note that [a-zA-Z0-9_] = \w if you do not use the Pattern.UNICODE_CHARACTER_CLASS flag. So, the first would look like "^[\\w/.?]+(?:-[\\w/.?]+)*$" and the second as "^(?!.*--)[\\w/.?-]+$".
One approach is to restrict multiple dashes with negative look-behind on a dash, like this:
^(?:[a-zA-Z0-9\/\.\?\_]|(?<!-)-)+$
The right side of the |, i.e. (?<!-)-, means "a dash, unless preceded by another dash".
Demo.
I'm not sure of the efficiency of this, but I believe this should work.
^([a-zA-Z0-9\/\.\?\_]|\-([^\-]|$))+$
For each character, this regex checks if it can match [a-zA-Z0-9\/\.\?\_], which is everything you included in your regex except the hyphen. If that does not match, it instead tries to match \-([^\-]|$), which matches a hyphen not followed by another hyphen, or a hyphen at the end of the string.
Here's a demo.

Regex to match exactly n occurrences of letters and m occurrences of digits

I have to match an 8 character string, which can contain exactly 2 letters (1 uppercase and 1 lowercase), and exactly 6 digits, but they can be permutated arbitrarily.
So, basically:
K82v6686 would pass
3w28E020 would pass
1276eQ900 would fail (too long)
98Y78k9k would fail (three letters)
A09B2197 would fail (two capital letters)
I've tried using the positive lookahead to make sure that the string contains digits, uppercase and lowercase letters, but I have trouble with limiting it to a certain number of occurrences. I suppose I could go about it by including all possible combinations of where the letters and digits can occur:
(?=.*[0-9])(?=.*[A-Z])(?=.*[a-z]) ([A-Z][a-z][0-9]{6})|([A-Z][0-9][a-z][0-9]{5})| ... | ([0-9]{6}[a-z][A-Z])
But that's a very roundabout way of doing it, and I'm wondering if there's a better solution.
You can use
^(?=[^A-Z]*[A-Z][^A-Z]*$)(?=[^a-z]*[a-z][^a-z]*$)(?=(?:\D*\d){6}\D*$)[a-zA-Z0-9]{8}$
See the regex demo (a bit modified due to the multiline input). In Java, do not forget to use double backslashes (e.g. \\d to match a digit).
Here is a breakdown:
^ - start of string (assuming no multiline flag is to be used)
(?=[^A-Z]*[A-Z][^A-Z]*$) - check if there is only 1 uppercase letter (use \p{Lu} to match any Unicode uppercase letter and \P{Lu} to match any character other than that)
(?=[^a-z]*[a-z][^a-z]*$) - similar check if there is only 1 lowercase letter (alternatively, use \p{Ll} and \P{Ll} to match Unicode letters)
(?=(?:\D*\d){6}\D*$) - check if there are six digits in a string (=from the beginning of the string, there can be 0 or more non-digit symbols (\D matches any character but a digit, you may also replace it with [^0-9]), then followed by a digit (\d) and then followed by 0 or more non-digit characters (\D*) up to the end of string ($)) and then
[a-zA-Z0-9]{8} - match exactly 8 alphanumeric characters.
$ - end of string.
Following the logic, we can even reduce this to just
^(?=[^a-z]*[a-z][^a-z]*$)(?=(?:\D*\d){6}\D*$)[a-zA-Z0-9]{8}$
One condition can be removed as we only allow lower- and uppercase letters and digits with [a-zA-Z0-9], and when we apply 2 conditions the 3rd one is automatically performed when matching the string (one character must be an uppercase in this case).
When using it with Java matches() method, there is no need to use ^ and $ anchors at the start and end of the pattern, but you still need it in the lookaheads:
String s = "K82v6686";
String rx = "(?=[^a-z]*[a-z][^a-z]*$)" + // 1 lowercase letter check
"(?=(?:\\D*\\d){6}\\D*$)" + // 6 digits check
"[a-zA-Z0-9]{8}"; // matching 8 alphanum chars exactly
if (s.matches(rx)) {
System.out.println("Valid");
}
Pattern.matches(".*[A-Z].*", s) &&
Pattern.matches(".*[a-z].*", s) &&
Pattern.matches(".*(\\D*\\d){6}.*", s) &&
Pattern.matches(".{8}", s)
As we need an alternating automaton to be created for this task, it's much simpler to use a conjunction of regexps for constituent types of character.
We require it to have at least one lowercase letter, one uppercase letter and 6 digits, which three classes are mutually exclusive. And with the last condition we require the length of string to be exactly the sum of these numbers in such a way leaving no room for extra characters beyond the desired types. Of course we may say s.lenght() == 8 as the last condition term but this would break the style :).
Sort the string lexically and then match against ^(?:[a-z][A-Z]|[A-Z][a-z])[0-9]{6}$.

Categories

Resources