How to form the RegEx of user name string in Java?
Rules in Exercise :
Only 3 - 10 characters.
Only 'a'-'z', 'A'-'Z', '1'-'9', '_' and '.' are allowed.
'_' and '.' can only be appeared 0 to 2 times.
"abc_._" = false
"abc..." = false
"abc__" = true
"abc.." = true
"abc_." = true
If I do not use Regex, it will be easier.
Without considering '1'-'9', I have tried the following RegEx but they are not workable.
String username_regex = "[a-zA-Z||[_||.]{0,2}]{3,10}";
String username_regex = "[a-zA-Z]{3,10}||[_||.]{0,2}";
My function :
public static boolean isUserNameCorrect(String user_name) {
String username_regex = "[a-zA-Z||[_]{0,2}]{3,10}";
boolean isMatch = user_name.matches(username_regex);
return isMatch;
}
What RegEx should I use?
If I remember well from CS classes, it is not possible to create one single regex to satisfy all three requirements. So, I would make separate checks for each condintion. For example, this regex checks for conditions 1 and 2, and condition 3 is checked separately.
private static final Pattern usernameRegex = Pattern.compile("[a-zA-Z1-9._]{3,10}");
public static boolean isUserNameCorrect(String userName) {
boolean isMatch = usernameRegex.matcher(userName).matches();
return isMatch && countChar(userName, '.')<=2 && countChar(userName, '_') <=2;
}
public static int countChar(String s, char c) {
int count = 0;
int index = s.indexOf(c, 0);
while ( index >= 0 ) {
count++;
index = s.indexOf(c, index+1);
}
return count;
}
BTW, notice the pattern that allows you to reuse a regex in Java (performace gain, because it is expensive to compile a regex).
The reason that a regex cannot do what you want (again if I remember well) is that this problem requires a context-free-grammar, while regex is a regular grammar. Ream more
First off, || isn't necessary for this problem, and in fact doesn't do what you think it does. I've only ever seen it used in groups for regex (like if you want to match Hello or World, you'd match (Hello|World) or (?:Hello|World), and in those cases you only use a single |.
Next, let me explain why each of the regex you have tried won't work.
String username_regex = "[a-zA-Z||[_||.]{0,2}]{3,10}";
Range operators inside a character class aren't interpreted as range operators, and instead will just represent the literals that make up the range operators. In addition, nested character classes are simply combined. So this is effectively equal to:
String username_regex = "[a-zA-Z_|.{0,2}]{3,10}";
So it'll match some combination of 3-10 of the following: a-z, A-Z, 0, 2, {, }, ., |, and _.
And that's not what you wanted.
String username_regex = "[a-zA-Z]{3,10}||[_||.]{0,2}";
This will match 3 to 10 of a-z or A-Z, followed by two pipes, followed by _, |, or . 0 to 2 times. Also not what you wanted.
The easy way to do this is by splitting the requirements into two sections and creating two regex strings based off of those:
Only 3 - 10 characters, where only 'a'-'z', 'A'-'Z', '1'-'9', '_' and '.' are allowed.
'_' and '.' can only appear 0 to 2 times.
The first requirement is quite simple: we just need to create a character class including all valid characters and place limits on how many of those can appear:
"[a-zA-Z1-9_.]{3,10}"
Then I would validate that '_' and '.' appear 0 to 2 times:
".*[._].*[._].*"
or
"(?:.*[._].*){0,2}" // Might work, might not. Preferable to above regex if easy configuration is necessary. Might need reluctant quantifiers...
I'm unfortunately not experienced enough to figure out what a single regex would look like... But these are at least quite readable.
May not be elegant but you may try this:
^(([A-Za-z0-9\._])(?!.*[\._].*[\._].*[\._])){3,10}$
Here is the explanation:
NODE EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
( group and capture to \1 (between 3 and 10
times (matching the most amount
possible)):
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
[A-Za-z0-9\._] any character of: 'A' to 'Z', 'a' to
'z', '0' to '9', '\.', '_'
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
.* any character except \n (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
[\._] any character of: '\.', '_'
--------------------------------------------------------------------------------
.* any character except \n (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
[\._] any character of: '\.', '_'
--------------------------------------------------------------------------------
.* any character except \n (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
[\._] any character of: '\.', '_'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
){3,10} end of \1 (NOTE: because you are using a
quantifier on this capture, only the LAST
repetition of the captured pattern will be
stored in \1)
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string
This will satisfy your above-mentioned requirement. Hope it helps :)
Please try this:
[[a-Z][0-9][._]?[[a-Z][0-9][._]?[[a-Z][0-9]*
Niko
EDIT :
You're right. Then several Regexp :
Regex1: ^[\w.]{3-10}$
Regex2: ^[[a-Z][0-9]][_.]?[[a-Z][0-9]][_.]?[[a-Z][0-9]]*$
I hope I forgot nothing!
Related
I am trying to mask the CC number, in a way that third character and last three characters are unmasked.
For eg.. 7108898787654351 to **0**********351
I have tried (?<=.{3}).(?=.*...). It unmasked last three characters. But it unmasks first three also.
Can you throw some pointers on how to unmask 3rd character alone?
You can use this regex with a lookahead and lookbehind:
str = str.replaceAll("(?<!^..).(?=.{3})", "*");
//=> **0**********351
RegEx Demo
RegEx Details:
(?<!^..): Negative lookahead to assert that we don't have 2 characters after start behind us (to exclude 3rd character from matching)
.: Match a character
(?=.{3}): Positive lookahead to assert that we have at least 3 characters ahead
I would suggest that regex isn't the only way to do this.
char[] m = new char[16]; // Or whatever length.
Arrays.fill(m, '*');
m[2] = cc.charAt(2);
m[13] = cc.charAt(13);
m[14] = cc.charAt(14);
m[15] = cc.charAt(15);
String masked = new String(m);
It might be more verbose, but it's a heck of a lot more readable (and debuggable) than a regex.
Here is another regular expression:
(?!(?:\D*\d){14}$|(?:\D*\d){1,3}$)\d
See the online demo
It may seem a bit unwieldy but since a credit card should have 16 digits I opted to use negative lookaheads to look for an x amount of non-digits followed by a digit.
(?! - Negative lookahead
(?: - Open 1st non capture group.
\D*\d - Match zero or more non-digits and a single digit.
){14} - Close 1st non capture group and match it 14 times.
$ - End string ancor.
| - Alternation/OR.
(?: - Open 2nd non capture group.
\D*\d - Match zero or more non-digits and a single digit.
){1,3} - Close 2nd non capture group and match it 1 to 3 times.
$ - End string ancor.
) - Close negative lookahead.
\d - Match a single digit.
This would now mask any digit other than the third and last three regardless of their position (due to delimiters) in the formatted CC-number.
Apart from where the dashes are after the first 3 digits, leave the 3rd digit unmatched and make sure that where are always 3 digits at the end of the string:
(?<!^\d{2})\d(?=[\d-]*\d-?\d-?\d$)
Explanation
(?<! Negative lookbehind, assert what is on the left is not
^\d{2} Match 2 digits from the start of the string
) Close lookbehind
\d Match a digit
(?= Positive lookahead, assert what is on the right is
[\d-]* 0+ occurrences of either - or a digit
\d-?\d-?\d Match 3 digits with optional hyphens
$ End of string
) Close lookahead
Regex demo | Java demo
Example code
String regex = "(?<!^\\d{2})\\d(?=[\\d-]*\\d-?\\d-?\\d$)";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
String strings[] = { "7108898787654351", "7108-8987-8765-4351"};
for (String s : strings) {
Matcher matcher = pattern.matcher(s);
System.out.println(matcher.replaceAll("*"));
}
Output
**0**********351
**0*-****-****-*351
Don't think you should use a regex to do what you want. You could use StringBuilder to create the required string
String str = "7108-8987-8765-4351";
StringBuilder sb = new StringBuilder("*".repeat(str.length()));
for (int i = 0; i < str.length(); i++) {
if (i == 2 || i >= str.length() - 3) {
sb.replace(i, i + 1, String.valueOf(str.charAt(i)));
}
}
System.out.print(sb.toString()); // output: **0*************351
You may add a ^.{0,1} alternative to allow matching . when it is the first or second char in the string:
String s = "7108898787654351"; // **0**********351
System.out.println(s.replaceAll("(?<=.{3}|^.{0,1}).(?=.*...)", "*"));
// => **0**********351
The regex can be written as a PCRE compliant pattern, too: (?<=.{3}|^|^.).(?=.*...).
The regex can be written as a PCRE compliant pattern, too: (?<=.{3}|^|^.).(?=.*...).
It is equal to
System.out.println(s.replaceAll("(?<!^..).(?=.*...)", "*"));
See the Java demo and a regex demo.
Regex details
(?<=.{3}|^.{0,1}) - there must be any three chars other than line break chars immediately to the left of the current location, or start of string, or a single char at the start of the string
(?<!^..) - a negative lookbehind that fails the match if there are any two chars other than line break chars immediately to the left of the current location
. - any char but a line break char
(?=.*...) - there must be any three chars other than line break chars immediately to the right of the current location.
If the CC number always has 16 digits, as it does in the example, and as do Visa and MasterCard CC's, matches of the following regular expression can be replaced with an asterisk.
\d(?!\d{0,2}$|\d{13}$)
Start your engine!
I am trying to write regex where it should fail if all same numbers supplied as phone number. When I supplied with with below input it passes the validation.
999.999.9999 or 999-999-9999 or 999 999 9999. Any suggestion on regex pattern on how to fail validation it supplied all same number.
private static boolean validatePhoneNumber(String phoneNo) {
//validate phone numbers of format "1234567890"
if (phoneNo.matches("\\d{10}")) return true;
//validating phone number with -, . or spaces
else if(phoneNo.matches("\\d{3}[-\\.\\s]\\d{3}[-\\.\\s]\\d{4}")) return true;
//Invalid phone number where 999.999.9999 or 999-999-9999 or 999 999 9999
else if(phoneNo.matches"(\\D?[0-9]{3}\\D?)[\\s][0-9]{3}-[0-9]{4}")) return false;
//return false if nothing matches the input
else return false;
}
You can do it with a single regex:
(?!(\d)\1{2}\D?\1{3}\D?\1{4})\d{3}([-. ]?)\d{3}\2\d{4}
As Java code, your method would be:
private static boolean validatePhoneNumber(String phoneNo) {
// Check if phone number is valid format (optional -, . or space)
// e.g. "1234567890", "123-456-7890", "123.456.7890", or "123 456 7890"
// and is that all digits are not the same, e.g. "999-999-9999"
return phoneNo.matches("(?!(\\d)\\1{2}\\D?\\1{3}\\D?\\1{4})\\d{3}([-. ]?)\\d{3}\\2\\d{4}");
}
Explanation
The regex is in 2 parts:
(?!xxx)yyy
The yyy part is:
\d{3}([-. ]?)\d{3}\2\d{4}
Which means:
\d{3} Match 3 digits
([-. ]?) Match a dash, dot, space, or nothing, and capture it (capture group #2)
\d{3} Match 3 digits
\2 Match the previously captured separator
\d{4} Match 4 digits
This means that it will match e.g. 123-456-7890 or 123.456.7890, but not 123.456-7890
The (?!xxx) part is a zero-width negative lookahead, i.e. it matches if the xxx expression doesn't match, and the xxx part is:
(\d)\1{2}\D?\1{3}\D?\1{4}
Which means:
(\d) Match a digit and capture it (capture group #1)
\1{2} Match 2 more of the captured digit
\D? Optionally match a non-digit
\1{3} Match 3 more of the captured digit
\D? Optionally match a non-digit
\1{4} Match 4 more of the captured digit
Since the second part has already verified the separators, the negative look-ahead is just using a more relaxed \D to skip any separator character.
Although you can write a regex to do this it feels more readable with iteration.
boolean uniqueDigits = phoneNo.chars()
.filter(Character::isDigit)
.distinct()
.count() >= 2;
You can use the following regexs to match telephone numbers whose digits are not all the same :
for the 0123456789 format :
(?!(.)\\1{9})\\d{10}
You can try it here.
for the 012-345-6789 format :
(?!(.)\\1{2}[-.\\s]\\1{3}[-.\\s]\\1{4})\\d{3}[-.\\s]\\d{3}[-.\\s]\\d{4}
You can try it here.
It relies on negative lookahead to check that the numbers we're going to match aren't all the same digit.
Better to use Stream API instead of complex regex
if(phoneNo.chars().filter(c -> c != '.' && c != '-' && c != ' ').distinct().count() > 1)
or
phoneNo.chars().filter(c -> ".- ".indexOf(c) > -1).distinct().count() > 1
or
phoneNo.chars().filter(Character::isDigit).distinct().count() > 1
I have a string
string 1(excluding the quotes) -> "my car number is #8746253 which is actually cool"
conditions - The number 8746253, could be of any length and
- the number can also be immediately followed by an end-of-line.
I want to group-out 8746253 which should not be followed by a dot "."
I have tried,
.*#(\d+)[^.].*
This will get me the number for sure, but this will match even if there is a dot, because [.^] will match the last digit of the number(for example, 3 in the below case)
string 2(excluding the quotes) -> "earth is #8746253.Kms away, which is very far"
I want to match only the string 1 type and not the string 2 types.
To match any number of digits after # that are not followed with a dot, use
(?<=#)\d++(?!\.)
The ++ is a possessive quantifier that will make the regex engine only check the lookahead (?!\.) only after the last matched digit, and won't backtrack if there is a dot after that. So, the whole match will get failed if there is a dit after the last digit in a digit chunk.
See the regex demo
To match the whole line and put the digits into capture group #1:
.*#(\d++)(?!\.).*
See this regex demo. Or a version without a lookahead:
^.*#(\d++)(?:[^.\r\n].*)?$
See another demo. In this last version, the digit chunk can only be followed with an optional sequence of a char that is not a ., CR and LF followed with any 0+ chars other than line break chars ((?:[^.\r\n].*)?) and then the end of string ($).
This works like you have described
public class MyRegex{
public static void main(String[] args) {
Pattern patern = Pattern.compile("#(\\d++)[^\\.]");
Matcher matcher1 = patern.matcher("my car number is #8746253 which is actually cool");
if(matcher1.find()){
System.out.println(matcher1.group(1));
}
Matcher matcher2 = patern.matcher("earth is #8746253.Kms away, which is very far");
if(matcher2.find()){
System.out.println(matcher1.group(1));
}else{
System.out.println("No match found");
}
}
}
Outputs:
> 8746253
> No match found
I have Strings that represent rows in a table like this:
{failures=4, successes=6, name=this_is_a_name, p=40.00}
I made an expression that can be used with Pattern.split() to get me back all of the values in a String[]:
[\{\,](.*?)\=
In the online regex tester it works well with the exception of the ending }.
But when I actually run the pattern against my first row I get a String[] where the first element is an empty string. I only want the 4 values (not keys) from each row not the extra empty value.
Pattern getRowValues = Pattern.compile("[\\{\\,](.*?)\\=");
String[] row = getRowValues.split("{failures=4, successes=6, name=this_is_a_name, p=40.00}");
//CURRENT
//row[0]=> ""
//row[1]=>"4"
//row[2]=>"6"
//row[3]=>"this_is_a_name"
//row[4]=>"40.00}"
//WANT
//row[0]=>"4"
//row[1]=>"6"
//row[2]=>"this_is_a_name"
//row[3]=>"40.00"
String[] parts = getRowValues
// Strip off the leading '{' and trailing '}'
.replaceAll("^\\{|\\}$", "")
// then just split on comma-space
.split(", ");
If you want just the values:
String[] parts = getRowValues
// Strip off the leading '{' and up to (but no including) the first =,
// and the trailing '}'
.replaceAll("^\\{[^=]*|\\}$", "")
// then just split on comma-space and up to (but no including) the =
.split(", [^=]*");
Option 1
Modify your regular expression to [{,](.*?)=|[}] where I removed all the unnecessarily escaped characters in each of the [...] constructs and added the |[}]
See also Live Demo
Option 2
=([^,]*)[,}]
This regular expression will do the following:
capture all the substrings after the = and before the , or close }
Example
Live Demo
https://regex101.com/r/yF2gG7/1
Sample text
{failures=4, successes=6, name=this_is_a_name, p=40.00}
Capture groups
Each match gets the following capture groups:
Capture group 0 gets the entire substring from = to , or }
Capture group 1 gets just the value not including the =, ,, or } characters
Sample Matches
[0][0] = =4,
[0][1] = 4
[1][0] = =6,
[1][1] = 6
[2][0] = =this_is_a_name,
[2][1] = this_is_a_name
[3][0] = =40.00}
[3][1] = 40.00
Explanation
NODE EXPLANATION
----------------------------------------------------------------------
= '='
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
[^,]* any character except: ',' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
[,}] any character of: ',', '}'
----------------------------------------------------------------------
Ok... I have an unsatisfactory solution to a problem.
The problem is I have input like so:
{sup 19}F({sup 3}He,t){sup 19}Ne(p){sup 18}F
and need output like so:
¹⁹F(³He,t)¹⁹Ne(p)¹⁸F
I use a series of replacements first to split each of the {sup xx} sections into {sup x}{sup x} and then use a regex to match each of those and replace the characters with their UTF-8 single equivalents. The "problem" is that the {sup} sections can have numbers 1, 2 or 3 digits long (maybe more, I don't know), and I want to "expand" them into separate {sup} sections with one digit each. ( I also have the same problem with {sub} for subscripts... )
My current solution looks like this (in java):
retval = retval.replaceAll("\\{sup ([1-9])([0-9])\\}", "{sup $1}{sup $2}");
retval = retval.replaceAll("\\{sup ([1-9])([0-9])([0-9])\\}", "{sup $1}{sup $2}{sup $3}");
My question: is there a way to do this in a single pass no matter how many digits ( or at least some reasonable number ) there are?
Yes, but it may be a bit of a hack, and you'll have to be careful it doesn't overmatch!
Regex:
(?:\{sup\s)?(\d)(?=\d*})}?
Replacement String:
{sup $1}
A short explanation:
(?: | start non-capturing group 1
\{ | match the character '{'
sup | match the substring: "sup"
\s | match any white space character
) | end non-capturing group 1
? | ...and repeat it once or not at all
( | start group 1
\d | match any character in the range 0..9
) | end group 1
(?= | start positive look ahead
\d | match any character in the range 0..9
* | ...and repeat it zero or more times
} | match the substring: "}"
) | stop negative look ahead
} | match the substring: "}"
? | ...and repeat it once or not at all
In plain English: it matches a single digit, only when looking ahead there's a } with optional digits in between. If possible, the substrings {sup and } are also replaced.
EDIT:
A better one is this:
(?:\{sup\s|\G)(\d)(?=\d*})}?
That way, digits like in the string "set={123}" won't be replaced. The \G in my second regex matches the spot where the previous match ended.
The easiest way to do this kind of thing is with something like PHP's preg_replace_callback or .NET's MatchEvaluator delegates. Java doesn't have anything like that built in, but it does expose the lower-level API that lets you implement it yourself. Here's one way to do it:
import java.util.regex.*;
public class Test
{
static String sepsup(String orig)
{
Pattern p = Pattern.compile("(\\{su[bp] )(\\d+)\\}");
Matcher m = p.matcher(orig);
StringBuffer sb = new StringBuffer();
while (m.find())
{
m.appendReplacement(sb, "");
for (char ch : m.group(2).toCharArray())
{
sb.append(m.group(1)).append(ch).append("}");
}
}
m.appendTail(sb);
return sb.toString();
}
public static void main (String[] args)
{
String s = "{sup 19}F({sup 3}He,t){sub 19}Ne(p){sup 18}F";
System.out.println(s);
System.out.println(sepsup(s));
}
}
result:
{sup 19}F({sup 3}He,t){sub 19}Ne(p){sup 18}F
{sup 1}{sup 9}F({sup 3}He,t){sub 1}{sub 9}Ne(p){sup 1}{sup 8}F
If you wanted, you could go ahead and generate the superscript and subscript characters and insert those instead.
Sure, this is a standard Regular Expression construct. You can find out about all the metacharacters in the Pattern Javadoc, but for your purposes, you probably want the "+" metacharacter, or the {1,3} greedy quantifier. Details in the link.