Regex to match user and user#domain - java

A user can login as "user" or as "user#domain". I only want to extract "user" in both cases. I am looking for a matcher expression to fit it, but im struggling.
final Pattern userIdPattern = Pattern.compile("(.*)[#]{0,1}.*");
final Matcher fieldMatcher = userIdPattern.matcher("user#test");
final String userId = fieldMatcher.group(1)
userId returns "user#test". I tried various expressions but it seems that nothing fits my requirement :-(
Any ideas?

If you use "(.*)[#]{0,1}.*" pattern with .matches(), the (.*) grabs the whole line first, then, when the regex index is still at the end of the line, the [#]{0,1} pattern triggers and matches at the end of the line because it can match 0 # chars, and then .* again matches at that very location as it matches any 0+ chars. Thus, the whole line lands in your Group 1.
You may use
String userId = s.replaceFirst("^([^#]+).*", "$1");
See the regex demo.
Details
^ - start of string
([^#]+) - Group 1 (referred to with $1 from the replacement pattern): any 1+ chars other than #
.* - the rest of the string.

A little bit of googling came up with this:
(.*?)(?=#|$)
Will match everthing before an optional #

I would suggest keeping it simple and not relying on regex in this case if you are using java and have a simple case like you provided.
You could simply do something like this:
String userId = "user#test";
if (userId.indexOf("#") != -1)
userId = userId.substring(0, userId.indexOf("#"));
// from here on userId will be "user".
This will always either strip out the "#test" or just skip stripping it out when it is not there.
Using regex in most cases makes the code less maintainable by another dev in the future because most devs are not very good with regular expressions, at least in my experience.

You included the # as optional, so the match tries to get the longest user name. As you didn't put the restriction of a username is not allowed to have #s in it, it matched the longest string.
Just use:
[^#]*
as the matching subexpr for usernames (and use $0 to get the matched string)
Or you can use this one that can be used to find several matches (and to get both the user part and the domain part):
\b([^#\s]*)(#[^#\s]*)?\b
The \b force your string to be tied to word boundaries, then the first group matches non-space and non-# chars (any number, better to use + instead of * there, as usernames must have at least one char) followed (optionally) by a # and another string of non-space and non-# chars). In this case, $0 matches the whole email addres, $1 matches the username part, and $2 the #domain part (you can refine to only the domain part, adding a new pair of parenthesis, as in
b([^#\s]*)(#([^#\s]*))?\b
See demo.

Related

Regex pattern matching with multiple strings

Forgive me. I am not familiarized much with Regex patterns.
I have created a regex pattern as below.
String regex = Pattern.quote(value) + ", [NnoneOoff0-9\\-\\+\\/]+|[NnoneOoff0-9\\-\\+\\/]+, "
+ Pattern.quote(value);
This regex pattern is failing with 2 different set of strings.
value = "207e/160";
Use Case 1 -
When channelStr = "207e/160, 149/80"
Then channelStr.matches(regex), returns "true".
Use Case 2 -
When channelStr = "207e/160, 149/80, 11"
Then channelStr.matches(regex), returns "false".
Not able to figure out why? As far I can understand it may be because of the multiple spaces involved when more than 2 strings are present with separated by comma.
Not sure what should be correct pattern I should write for more than 2 strings.
Any help will be appreciated.
If you print your pattern, it is:
\Q207e/160\E, [NnoneOoff0-9\-\+\/]+|[NnoneOoff0-9\-\+\/]+, \Q207e/160\E
It consists of an alternation | matching a mandatory comma as well on the left as on the right side.
Using matches(), should match the whole string and that is the case for 207e/160, 149/80 so that is a match.
Only for this string 207e/160, 149/80, 11 there are 2 comma's, so you do get a partial match for the first part of the string, but you don't match the whole string so matches() returns false.
See the matches in this regex demo.
To match all the values, you can use a repeating pattern:
^[NnoeOf0-9+/-]+(?:,\h*[NnoeOf0-90+/-]+)*$
^ Start of string
[NnoeOf0-9\\+/-]+
(?: Non capture group
,\h* Match a comma and optional horizontal whitespace chars
[NnoeOf0-90-9\\+/-]+ Match 1+ any of the listed in the character class
)* Close the non capture group and optionally repeat it (if there should be at least 1 comma, then the quantifier can be + instead of *)
$ End of string
Regex demo
Example using matches():
String channelStr1 = "207e/160, 149/80";
String channelStr2 = "207e/160, 149/80, 11";
String regex = "^[NnoeOf0-9+/-]+(?:,\\h*[NnoeOf0-90+/-]+)*$";
System.out.println(channelStr1.matches(regex));
System.out.println(channelStr2.matches(regex));
Output
true
true
Note that in the character class you can put - at the end not having to escape it, and the + and / also does not have to be escaped.
You can use regex101 to test your RegEx. it has a description of everything that's going on to help with debugging. They have a quick reference section bottom right that you can use to figure out what you can do with examples and stuff.
A few things, you can add literals with \, so \" for a literal double quote.
If you want the pattern to be one or more of something, you would use +. These are called quantifiers and can be applied to groups, tokens, etc. The token for a whitespace character is \s. So, one or more whitespace characters would be \s+.
It's difficult to tell exactly what you're trying to do, but hopefully pointing you to regex101 will help. If you want to provide examples of the current RegEx you have, what you want to match and then the strings you're using to test it I'll be happy to provide you with an example.
^(?:[NnoneOoff0-9\\-\\+\\/]+ *(?:, *(?!$)|$))+$
^ Start
(?: ... ) Non-capturing group that defines an item and its separator. After each item, except the last, the separator (,) must appear. Spaces (one, several, or none) can appear before and after the comma, which is specified with *. This group can appear one or more times to the end of the string, as specified by the + quantifier after the group's closing parenthesis.
Regex101 Test

regex to filter out string

I'm filtering out string using below regex
^(?!.*(P1 | P2)).*groupName.*$
Here group name is specific string which I replace at run time. This regex is already running fine.
I've two input strings which needs to pass through from this regex. Can't change ^(?!.*(P1 | P2)) part of regex, so would like to change regex after this part only. Its a very generic regex which is being used at so many places, so I have only place to have changes is groupName part of regex. Is there any way where only 2 string could pass through this regex ?
1) ADMIN-P3-UI-READ-ONLY
2) ADMIN-P3-READ-ONLY
In regex groupName is a just a variable which will be replaced at run time with required string. In this case I want 2 string to be passed, so groupName part can be replaced with READ-ONLY but it will pass 1 string too.
Can anyone suggest on this how to make this work ?
You could use negative lookBehind:
(?<!UI-)READ-ONLY
so there must be no UI- before READ-ONLY
You can add another lookahead at the very start of your pattern to further restrict what it matches because your pattern is of the "match-everything-but" type.
So, it may look like
String extraCondition = "^(?!.*UI)";
String regex = "^(?!.*(P1|P2)).*READ-ONLY.*$";
String finalRegex = extraCondition + regex;
The pattern will look like
^(?!.*UI)^(?!.*(P1|P2)).*READ-ONLY.*$
matching
^(?!.*UI) - no UI after any zero or more chars other than line break chars as many as possible from the start of string
^(?!.*(P1|P2)) - no P1 nor P2 after any zero or more chars other than line break chars as many as possible from the start of string
.*READ-ONLY - any zero or more chars other than line break chars as many as possible and then READ-ONLY
.*$ - the rest of the string. Note you may safely remove $ here unless you want to make sure there are no extra lines in the input string.

Regex: Match group if present otherwise ignore and proceed with other matches

I have been trying to match a regex pattern within the following data:
String:
TestData to 1colon delimiter list has 1 rows.Set...value is: 1 Save Error: 267467374736437-TestInfo send Error
Words to match:
TestData
267467374736437-TestInfo
Regex pattern i m using:
(.+?\s)?.*(\s\d+-.*?\s)?
Scenario here is that 2nd match (267467374736437-TestInfo) can be absent in the string to be matched. So, i want it to be a match if it exists otherwise proceed with other matches. Due to this i added zero or one match quantifier ? to the group pattern above. But then it ignores the 2nd group all together.
If i use the below pattern:
`(.+?\s)?.*(\s\d+-.*?\s)`
It matches just fine but fails if string "267467374736437-TestInfo" from the matching string as it's not having the "?" quantifier.
Please help me understand where is it going wrong.
I would rather not use a complex regex, which will be ugly and a maintenance nightmare. Instead, one simple way would be to just split the string and grab the first term, and then use a smart regex to pinpoint the second term.
String input = "TestData to 1colon delimiter list has 1 rows.Set...value is: 1 Save Error: 267467374736437-TestInfo send Error";
String first = input.split(" ")[0];
String second = input.replaceAll(".*Save Error:\\s(.*)?\\s", "$1");
Explore the regex:
Regex101
The optional pattern at the end will almost never not be matched if a more generic pattern occurs. In your case, the greedy dot .* grabs the whole rest of the line up to the end, and since the last pattern is optional, the regex engine calls it a day and does not try to accommodate any text for it.
If you had a lazy dot .*?, the only position where it would work is right after the preceding subpattern, which is rarely the case.
Thus, you can only rely on a tempered greedy token:
^(\S+)(?:(?!\d+-\S).)*(\d+-\S+)?
See the regex demo.
Or an unrolled version:
^(\S+)\D*(?:\d(?!\d*-\S)\D*)*(\d+-\S+)?

Extract specific data from string with regex

I want to capture multiple string which match some specific patterns,
For example my string is like
String textData = "#1_Label for UK#2_Label for US#4_Label for FR#";
I want to get string between two # which match with string like for UK
Output should like this
if match string is UK than
output should be 1_Label for UK
if match string is label than
output should be 1_Label for UK, 2_Label for US and 4_Label for FR
if match string is 1_ than
output should be 1_Label for UK
I don't want to extract data via array list and extraction should be case insensitive.
Can you please help me out from this problem?
Regards,
Ashish Mishra
You can use this regex for search:
#([^#]*?Label[^#]*)(?=#)
Replace Label with your search keyword.
RegEx Demo
Java Pattern:
Pattern p = Pattern.compile( "#([^#]*?" + Pattern.quote(keyword) + "[^#]*)(?=#)" );
If the data always is between two hashes, try a regex like this: (?i)#.*your_match.*# where your_match would be UK, label, 1_ etc.
Then use this expression in conjunction with the Pattern and Matcher classes.
If you want to match multiple strings, you'd need to exclude the hashes from the match by using look-around methods as well as reluctant modifiers, e.g. (?i)(?<=#).*?label.*?(?=#).
Short breakdown:
(?i) will make the expression case insensitive
(?<=#) is a positive look-behind, i.e. the match must be preceeded by a hash (but doesn't include the hash)
.*? matches any sequence of characters but is reluctant, i.e. it tries to match as few characters as possible
(?=#) is a positive look-ahead, which means the match must be followed by a hash (also not included in the match)
Without the look-around methods the hashes would be included in the match and thus using Matcher.find() you'd skip every other label in your test string, i.e. you'd get the matches #1_Label for UK# and #4_Label for FR# but not #2_Label for US#.
Without the relucatant modifiers the expression would match everything between the first and the last hash.
As an alternative and better, replace .*? with [^#]*, which would mean that the match cannot contain any hash, thus removing the need for reluctant modifiers as well as removing the problem that looking for US would match 1_Label for UK#2_Label for US.
So most probably the final regex you're after looks like this: (?i)(?<=#)[^#]*your_match[^#]*(?=#).
([^#]*UK[^#]*) for UK
([^#]*Label[^#]*) for Label
([^#]*1_[^#]*) for 1_
Try this.Grab the captures.See demo.
http://regex101.com/r/kQ0zR5/3
http://regex101.com/r/kQ0zR5/4
http://regex101.com/r/kQ0zR5/5
I have solved this problem with below pattern,
(?i)([^#]*?us[^#]*)(?=#)
Thank you so much Anubhava, VKS and Thomas for you reply.
Regards,
Ashish Mishra

Regular expression to extract sub string in groups

I want to extract names from the following input using regular expression.
Student Names:
Name1
Name2
Name3
Parent Names:
Name1
Name2
Name3
I am using the following method to match the data and I am not supposed to modify the method. I have to come up with regular expression that works with this method.
public void parseName(String patternRegX){
Pattern patternDomainStatus = Pattern.compile(patternRegX);
Matcher matcherName = patternName.matcher(inputString);
List<String> tmp=new ArrayList<String>();
while (matcherName.find()){
if (!matcherName.group(2).isEmpty())
tmp.add(matcherName.group(2));
}
}
I came up with a regular expression that could get me the desired result, but the problem I found was that grouping doesn't work inside square brackets([]).
private String studentRegX="(Student Names:\n[ +(\S+)\n]+\n)";
I am using the following regular expression now, but that is getting me only the last name in each set.
private String studentRegX="Student Names:\\n( +(\\S+)\\n)+\\n";
private String parentRegX="Parent Names:\\n( +(\\S+)\\n)+\\n";
Thank you in advance for the help.
First of all, I hope you can change the parseName method a little bit, because it doesn't compile. patternDomainStatus and patternName are probably supposed to refer to the same object:
Pattern pattern = Pattern.compile(patternRegX);
Matcher matcherName = pattern.matcher(inputString);
Secondly, you need to think about your regex a little differently.
Right now, your regexes are trying to match entire chunks with multiple names in them. But matcherName.find() finds "the next subsequence of the input sequence that matches the pattern" (per the javadoc).
So what you want is a regex that matches a single name. matcherName.find() will loop through each part of your string that matches that regex.
Because regex has little to do with algorithmic prowess, here an answer:
On Windows the line break is unfortunately "\r\n".
I check that a newline preceded and that there is at least some white space before the name.
The name may have a space.
With look-behind I check that "Parent Names" follows.
Then
Pattern.compile("(?s)(?<=\n)[ \t]+([^\r\n]*)\r?\n(?=.*Parent Names)");
// ~~~~ '.' also matches newline
// ~~~~~~~ look-behind must be newline
// ~~~~~~ whitespace (spaces/tabs)
// ~~~~~~~~~~ group 1, name
// ~~~~~~~~~~~~~~~~~~~~ look-ahead
Without say, a bit different algorithm would be more solid and understandable.
To make it group(2) instead of the above group(1), you could introduce extra braces before: ([ \t]+)
It can be done using the \G anchor all in a single regex.
This opens it up for a little regex algorithmic prowess.
Each match will be either:
Group 1 is not NULL/empty - New student group, group 3 will contain first student name.
Group 2 is not NULL/empty - New parent group, group 3 will contain first parent name.
Group 3 is never NULL/empty - The first/next either student or parent name depending on which
group 1 or 2 last matched.
In all cases, group 3 will contain a name that has been trimmed and ready to put into an array.
# "~(?mi-)(?:(?!\\A)\\G|^(?:(Student)|(Parent))[ ]Names:)\\s*^(?!(?:Student|Parent)[ ]Names:)[^\\S\\r\\n]*(.+?)[^\\S\\r\\n]*$~"
(?xmi-) # Inline 'Expanded, multiline, case insensitive' modifiers
(?:
(?! \A ) # Here, matched before, give Name a first chance
\G # to match again.
|
^ # BOL
(?:
( Student ) # (1), New 'Student' group
| ( Parent ) # (2), New 'Parent' group
)
[ ] Names:
)
# Name section
\s* # Consume all whitespace up until the start of a Name line
^ # BOL
(?!
(?: Student | Parent ) # Names only, Not the start of Student/Parent group here
[ ] Names:
)
[^\S\r\n]* # Trim leading whitespace ( can use \h if supported )
( .+? ) # (3), the Name
[^\S\r\n]* # Trim trailing whitespace ( can use \h if supported )
$ # EOL
If you're not already familiar with the difference between repeating a capturing group and capturing a repeating group, that's worth reading up on. One resource for that is http://www.regular-expressions.info/captureall.html, but others would be fine too.
If you already knew about that difference and were trying to capture a repeating group already with what you've written above, then please edit your post to explain what you're trying to do (a letter-by-letter explanation would be ideal, so we see what you understand and what you don't, so we can help you with whatever you're stuck on).
I see what I believe is the solution, but since this is clearly homework, I'm not willing to simply give it to you. But I'd be happy to help you figure it out.
--- Edit: ---
You're only getting one match because the regex requires "Student Names:" or "Parent Names:" to be in each match, so you can only match once. For your regex to match multiple times in a row (as required by the while (matcherName.find())), you need to get the "Student Names:" and "Parent Names:" out of the regex, so the regex can match repeatedly.
It's easy to get all of the names (both students and parents), with just a regex that looks for newlines followed by one or more spaces and then text. The challenge is to differentiate the student names (which come before the "Parent Names:" line) from the parent names (which come after the "Parent Names:" line). The key concept for differentiating between them is lookaheads, which can be positive or negative. Take a look at them and see if you can figure out how to implement this using lookaheads.
Also, you may find that group #2 isn't the group you really want to use. It's unfortunate that the group number is hard-coded, but since it is, you can tweak your regex to make groups non-capturing with (?:stuff) syntax. That will let you reduce the number of groups and ensure that the group you actually want is #2.

Categories

Resources