Java regular expression for UserName with length range - java

I am writing a regular expression to validate UserName.
Here is the rule:
Length: 6 - 20 characters
Must start with letter a-zA-Z
Can contains a-zA-Z0-9 and dot(.)
Can't have 2 consecutive dots
Here is what I tried:
public class TestUserName {
private static String USERNAME_PATTERN = "[a-z](\\.?[a-z\\d]+)+";
private static Pattern pattern = Pattern.compile(USERNAME_PATTERN, CASE_INSENSITIVE);
public static void main(String[] args) {
System.out.println(pattern.matcher("user.name").matches()); // true
System.out.println(pattern.matcher("user.name2").matches()); // true
System.out.println(pattern.matcher("user2.name").matches()); // true
System.out.println(pattern.matcher("user..name").matches()); // false
System.out.println(pattern.matcher("1user.name").matches()); // false
}
}
The pattern I used is good but no length constraint.
I tried to append {6,20} constraint to the pattern but It failed.
"[a-z](\\.?[a-z\\d]+)+{6,20}" // failed pattern to validate length
Anyone has any ideas?
Thanks!

You can use a lookahead regex for all the checks:
^[a-zA-Z](?!.*\.\.)[a-zA-Z.\d]{5,19}$
Using [a-zA-Z.\d]{5,19} because we have already matched one char [a-zA-Z] at start this making total length in the range {6,20}
Negative lookahead (?!.*\.\.) will assert failure if there are 2 consecutive dots
Equivalent Java pattern will be:
Pattern p = Pattern.compile("^[a-zA-Z](?!.*\\.\\.)[a-zA-Z.\\d]{5,19}$");

Use a negative look ahead to prevent double dots:
"^(?!.*\\.\\.)(?i)[a-z][a-z\\d.]{5,19}$"
(?i) means case insensitve (so [a-z] means [a-zA-Z])
(?!.*\\.\\.) means there isn't two consecutive dots anywhere in it
The rest is obvious.
See live demo.

I would use the following regex :
^(?=.{6,20}$)(?!.*\.\.)[a-zA-Z][a-zA-Z0-9.]+$
The (?=.{6,20}$) positive lookahead makes sure the text will contain 6 to 20 characters, while the (?!.*\.\.) negative lookahead makes sure the text will not contain .. at any point.

This will also suffice (for only matching)
(?=^.{6,20}$)(?=^[A-Za-z])(?!.*\.\.)
For capturing, the matched pattern, you can use
(?=^.{6,20}$)(?=^[A-Za-z])(?!.*\.\.)(^.*$)

Related

Use a regex to find a pattern somewhere between two words

Given the following string
{"type":"PrimaryParty","name":"Karen","id":"456789-9996"},
{"type":"SecondaryParty","name":"Juliane","id":"345678-9996"},
{"type":"SecondaryParty","name":"Ellen","id":"001234-9996"}
I am looking for strings matching the pattern \d{6}-\d{4}, but only if they are following the string "SecondaryParty". The processor is Java-based
Using https://regex101.com/ I have come up with this, which works fine using the ECMAScript(JavaScript) Flavor.
(?<=SecondaryParty.*?)\d{6}-\d{4}(?=\"})
But as soon as I switch to Java, it says
* A quantifier inside a lookbehind makes it non-fixed width
? The preceding token is not quantifiable
When using it in java.util.regex, the error says
Look-behind group does not have an obvious maximum length near index 20 (?<=SecondaryParty.*?)\d{6}-\d{4}(?="}) ^
How do I overcome the "does not have an obvious maximum length" problem in Java?
You can get the value without using lookarounds by matching instead, and use a single capture group for the value that you want to get:
\"SecondaryParty\"[^{}]*\"(\d{6}-\d{4})\"
Explanation
\"SecondaryParty\" Match "SecondaryParty"
[^{}]*\" Match optional chars other than { and }
(\d{6}-\d{4}) Capture group 1, match 6 digits - 4 digits
\" Match "
See a regex101 demo and a Java demo.
You might use a curly braces quantifier as a workaround:
(?<=SecondaryParty.{0,255})\d{6}-\d{4}(?=\"})
The minimum and maximum inside curly braces quantifier are depend on your actual data.
You could use (?<=SecondaryParty)(.*?)(\d{6}-\d{4})(?=\"}) regex expression and take the value of the second group which will match the pattern \d{6}-\d{4}, but only if they are following the string "SecondaryParty".
Sample Java code
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class IdRegexMatcher {
public static void main(String[] args) {
String input ="{\"type\":\"PrimaryParty\",\"name\":\"Karen\",\"id\":\"456789-9996\"},\n" +
"{\"type\":\"SecondaryParty\",\"name\":\"Juliane\",\"id\":\"345678-9996\"},\n" +
"{\"type\":\"SecondaryParty\",\"name\":\"Ellen\",\"id\":\"001234-9996\"}";
Pattern pattern = Pattern.compile("(?<=SecondaryParty)(.*?)(\\d{6}-\\d{4})(?=\\\"})");
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
String idStr = matcher.group(2);
System.out.println(idStr);
}
}
}
which gives the output
345678-9996
001234-9996
One possible optimization in the above regex could be to use [^0-9]*? instead of .*? under the assumption that the name wouldn't contain numbers.

Regex: Is it possible to skip repeating negative lookbehinds?

I've been trying to fix a simple regex that:
Matches all characters from beginning of line (^) to the first & character or to the end of line ($).
The match cannot start with a &.
Examples:
test should match test.
one&two should match one.
&test shouldn't match anything.
My current regex is the following:
^(?<!\&)(.+?)(?=\&|$)
(Regex101)
Currently, this regex fails example 3, where if I gave this regex &test it matches &test, but it shouldn't match anything.
I think it may be a problem with the negative lookbehind (?<!\&) and that &test matches because the character before it is not a &, but it doesn't account for any following & characters.
Is modifying the negative lookbehind to account for repeating & characters possible, and if so, how could I fix this regex?
(I know that Regex101 is using Python's Regex, but this question's Regex is intended to work with Java.)
You need to use a look-ahead instead of a look-behind, and instead of lazy dot matching with a lookahead, use a negated character class:
^[^&]+
See demo (note that \n is added just for a demo, if you test strings without newline characters, it won't be necessary).
Here, ^ asserts the position at the start of the string, and [^&]+ class matches 1 or more characters other than & (thus, no need to use (?=\&|$) look-ahead, if needed, the whole line will be matched).
See IDEONE demo
public static void main (String[] args) throws java.lang.Exception
{
System.out.println(fetchMatch("test", 0));
System.out.println(fetchMatch("one&test", 0));
System.out.println(fetchMatch("&test", 0));
}
public static String fetchMatch(String s, int groupId)
{
Pattern pattern = Pattern.compile("^[^&]+");
Matcher matcher = pattern.matcher(s);
if (matcher.find()){
return matcher.group(groupId);
}
return "ERROR: NOT MATCHED";
}
Output:
test
one
ERROR: NOT MATCHED

Java Regex of String start with number and fixed length

I made a regular expression for checking the length of String , all characters are numbers and start with number e.g 123
Following is my expression
REGEX =^123\\d+{9}$";
But it was unable to check the length of String. It validates those strings only their length is 9 and start with 123.
But if I pass the String 1234567891 it also validates it. But how should I do it which thing is wrong on my side.
Like already answered here, the simplest way is just removing the +:
^123\\d{9}$
or
^123\\d{6}$
Depending on what you need exactly.
You can also use another, a bit more complicated and generic approach, a negative lookahead:
(?!.{10,})^123\\d+$
Explanation:
This: (?!.{10,}) is a negative look-ahead (?= would be a positive look-ahead), it means that if the expression after the look-ahead matches this pattern, then the overall string doesn't match. Roughly it means: The criteria for this regular expression is only met if the pattern in the negative look-ahead doesn't match.
In this case, the string matches only if .{10} doesn't match, which means 10 or more characters, so it only matches if the pattern in front matches up to 9 characters.
A positive look-ahead does the opposite, only matching if the criteria in the look-ahead also matches.
Just putting this here for curiosity sake, it's more complex than what you need for this.
Try using this one:
^123\\d{6}$
I changed it to 6 because 1, 2, and 3 should probably still count as digits.
Also, I removed the +. With it, it would match 1 or more \ds (therefore an infinite amount of digits).
Based on your comment below Doorknobs's answer you can do this:
int length = 9;
String prefix = "123"; // or whatever
String regex = "^" + prefix + "\\d{ " + (length - prefix.length()) + "}$";
if (input.matches(regex)) {
// good
} else {
// bad
}

Java regex: Negative lookahead

I'm trying to craft two regular expressions that will match URIs. These URIs are of the format: /foo/someVariableData and /foo/someVariableData/bar/someOtherVariableData
I need two regexes. Each needs to match one but not the other.
The regexes I originally came up with are:
/foo/.+ and /foo/.+/bar/.+ respectively.
I think the second regex is fine. It will only match the second string. The first regex, however, matches both. So, I started playing around (for the first time) with negative lookahead. I designed the regex /foo/.+(?!bar) and set up the following code to test it
public static void main(String[] args) {
String shouldWork = "/foo/abc123doremi";
String shouldntWork = "/foo/abc123doremi/bar/def456fasola";
String regex = "/foo/.+(?!bar)";
System.out.println("ShouldWork: " + shouldWork.matches(regex));
System.out.println("ShouldntWork: " + shouldntWork.matches(regex));
}
And, of course, both of them resolve to true.
Anybody know what I'm doing wrong? I don't need to use Negative lookahead necessarily, I just need to solve the problem, and I think that negative lookahead might be one way to do it.
Thanks,
Try
String regex = "/foo/(?!.*bar).+";
or possibly
String regex = "/foo/(?!.*\\bbar\\b).+";
to avoid failures on paths like /foo/baz/crowbars which I assume you do want that regex to match.
Explanation: (without the double backslashes required by Java strings)
/foo/ # Match "/foo/"
(?! # Assert that it's impossible to match the following regex here:
.* # any number of characters
\b # followed by a word boundary
bar # followed by "bar"
\b # followed by a word boundary.
) # End of lookahead assertion
.+ # Match one or more characters
\b, the "word boundary anchor", matches the empty space between an alphanumeric character and a non-alphanumeric character (or between the start/end of the string and an alnum character). Therefore, it matches before the b or after the r in "bar", but it fails to match between w and b in "crowbar".
Protip: Take a look at http://www.regular-expressions.info - a great regex tutorial.

Regex for mismatched substrings

I am trying to match some Java code that has mismatched strings in it. For example, I have the following block of code that I want to match:
protected String methodName(String args[]) {
final String METHOD = "wrongMethodName";
...
}
And the following block of code I don't want to match
protected String methodName(String args[]) {
final String METHOD = "methodName";
...
}
Right now, I have the following (not working) regular expression, which requires DOTALL enabled:
(\w+?)\(.*?\) ?{.*?METHOD *= *".*?";
If I try negative look behind with the capture group, the regex doesn't compile because the size of the look behind isn't known before hand.
java.util.regex.PatternSyntaxException:
Look-behind group does not have an obvious maximum length near index 39
Is there a way I can use the capture group in this regex to say I want to match strings that don't match the capture group?
I think you can use a negative lookahead instead (if I understood your problem correctly), try:
(\b\w+?\b)\(.*?\) ?{.*?METHOD *= *"(?!\1).*?"
See it here on Regexr
I used also word boundaries in the first group, otherwise it just starts matching at the second letter.

Categories

Resources