I'm trying to craft two regular expressions that will match URIs. These URIs are of the format: /foo/someVariableData and /foo/someVariableData/bar/someOtherVariableData
I need two regexes. Each needs to match one but not the other.
The regexes I originally came up with are:
/foo/.+ and /foo/.+/bar/.+ respectively.
I think the second regex is fine. It will only match the second string. The first regex, however, matches both. So, I started playing around (for the first time) with negative lookahead. I designed the regex /foo/.+(?!bar) and set up the following code to test it
public static void main(String[] args) {
String shouldWork = "/foo/abc123doremi";
String shouldntWork = "/foo/abc123doremi/bar/def456fasola";
String regex = "/foo/.+(?!bar)";
System.out.println("ShouldWork: " + shouldWork.matches(regex));
System.out.println("ShouldntWork: " + shouldntWork.matches(regex));
}
And, of course, both of them resolve to true.
Anybody know what I'm doing wrong? I don't need to use Negative lookahead necessarily, I just need to solve the problem, and I think that negative lookahead might be one way to do it.
Thanks,
Try
String regex = "/foo/(?!.*bar).+";
or possibly
String regex = "/foo/(?!.*\\bbar\\b).+";
to avoid failures on paths like /foo/baz/crowbars which I assume you do want that regex to match.
Explanation: (without the double backslashes required by Java strings)
/foo/ # Match "/foo/"
(?! # Assert that it's impossible to match the following regex here:
.* # any number of characters
\b # followed by a word boundary
bar # followed by "bar"
\b # followed by a word boundary.
) # End of lookahead assertion
.+ # Match one or more characters
\b, the "word boundary anchor", matches the empty space between an alphanumeric character and a non-alphanumeric character (or between the start/end of the string and an alnum character). Therefore, it matches before the b or after the r in "bar", but it fails to match between w and b in "crowbar".
Protip: Take a look at http://www.regular-expressions.info - a great regex tutorial.
Related
Forgive me. I am not familiarized much with Regex patterns.
I have created a regex pattern as below.
String regex = Pattern.quote(value) + ", [NnoneOoff0-9\\-\\+\\/]+|[NnoneOoff0-9\\-\\+\\/]+, "
+ Pattern.quote(value);
This regex pattern is failing with 2 different set of strings.
value = "207e/160";
Use Case 1 -
When channelStr = "207e/160, 149/80"
Then channelStr.matches(regex), returns "true".
Use Case 2 -
When channelStr = "207e/160, 149/80, 11"
Then channelStr.matches(regex), returns "false".
Not able to figure out why? As far I can understand it may be because of the multiple spaces involved when more than 2 strings are present with separated by comma.
Not sure what should be correct pattern I should write for more than 2 strings.
Any help will be appreciated.
If you print your pattern, it is:
\Q207e/160\E, [NnoneOoff0-9\-\+\/]+|[NnoneOoff0-9\-\+\/]+, \Q207e/160\E
It consists of an alternation | matching a mandatory comma as well on the left as on the right side.
Using matches(), should match the whole string and that is the case for 207e/160, 149/80 so that is a match.
Only for this string 207e/160, 149/80, 11 there are 2 comma's, so you do get a partial match for the first part of the string, but you don't match the whole string so matches() returns false.
See the matches in this regex demo.
To match all the values, you can use a repeating pattern:
^[NnoeOf0-9+/-]+(?:,\h*[NnoeOf0-90+/-]+)*$
^ Start of string
[NnoeOf0-9\\+/-]+
(?: Non capture group
,\h* Match a comma and optional horizontal whitespace chars
[NnoeOf0-90-9\\+/-]+ Match 1+ any of the listed in the character class
)* Close the non capture group and optionally repeat it (if there should be at least 1 comma, then the quantifier can be + instead of *)
$ End of string
Regex demo
Example using matches():
String channelStr1 = "207e/160, 149/80";
String channelStr2 = "207e/160, 149/80, 11";
String regex = "^[NnoeOf0-9+/-]+(?:,\\h*[NnoeOf0-90+/-]+)*$";
System.out.println(channelStr1.matches(regex));
System.out.println(channelStr2.matches(regex));
Output
true
true
Note that in the character class you can put - at the end not having to escape it, and the + and / also does not have to be escaped.
You can use regex101 to test your RegEx. it has a description of everything that's going on to help with debugging. They have a quick reference section bottom right that you can use to figure out what you can do with examples and stuff.
A few things, you can add literals with \, so \" for a literal double quote.
If you want the pattern to be one or more of something, you would use +. These are called quantifiers and can be applied to groups, tokens, etc. The token for a whitespace character is \s. So, one or more whitespace characters would be \s+.
It's difficult to tell exactly what you're trying to do, but hopefully pointing you to regex101 will help. If you want to provide examples of the current RegEx you have, what you want to match and then the strings you're using to test it I'll be happy to provide you with an example.
^(?:[NnoneOoff0-9\\-\\+\\/]+ *(?:, *(?!$)|$))+$
^ Start
(?: ... ) Non-capturing group that defines an item and its separator. After each item, except the last, the separator (,) must appear. Spaces (one, several, or none) can appear before and after the comma, which is specified with *. This group can appear one or more times to the end of the string, as specified by the + quantifier after the group's closing parenthesis.
Regex101 Test
I want to replace one string in a big string, but my regular expression is not proper I guess. So it's not working.
Main string is
Some sql part which is to be replaced
cond = emp.EMAIL_ID = 'xx#xx.com' AND
emp.PERMANENT_ADDR LIKE('%98n%')
AND hemp.EMPLOYEE_NAME = 'xxx' and is_active='Y'
String to find and replace is
Based on some condition sql part to be replaced
hemp.EMPLOYEE_NAME = 'xxx'
I have tried this with
Pattern and Matcher class is used and
Pattern pat1 = Pattern.compile("/^hemp.EMPLOYEE_NAME\\s=\\s\'\\w\'\\s[and|or]*/$", Pattern.CASE_INSENSITIVE);
Matcher mat = pat1.matcher(cond);
while (mat.find()) {
System.out.println("Match: " + mat.group());
cond = mat.replaceFirst("xx "+mat.group()+"x");
mat = pat1.matcher(cond);
}
It's not working, not entering the loop at all. Any help is appreciated.
Obviously not - your regexp pattern doesn't make any sense.
The opening /: In some languages, regexps aren't strings and start with an opening slash. Java is not one of those languages, and it has nothing to do with regexps itself. So, this looks for a literal slash in that SQL, which isn't there, thus, failure.
^ is regexpese for 'start of string'. Your string does not start with hemp.EMPLOYEE_NAME, so that also doesn't work. Get rid of both / and ^ here.
\\s is one whitespace character (there are many whitespace characters - this matches any one of them, exactly one though). Your string doesn't have any spaces. Your intent, surely, was \\s* which matches 0 to many of them, i.e.: \\s* is: "Whitespace is allowed here". \\s is: There must be exactly one whitespace character here. Make all the \\s in your regexp an \\s*.
\\w is exactly one 'word' character (which is more or less a letter or digit), you obviously wanted \\w*.
[and|or] this is regexpese for: "An a, or an n, or a d, or an o, or an r, or a pipe symbol". Clearly you were looking for (and|or) which is regexpese for: Either the sequence "and", or the sequence "or".
* - so you want 0 to many 'and' or 'or', which makes no sense.
closing slash: You don't want this.
closing $: You don't want this - it means 'end of string'. Your string didn't end here.
The code itself:
replaceFirst, itself, also does regexps. You don't want to double apply this stuff. That's not how you replace a found result.
This is what you wanted:
Matcher mat = pat1.matcher(cond);
mat.replaceFirst("replacement goes here");
where replacement can include references to groups in the match if you want to take parts of what you matched (i.e. don't use mat.group(), use those references).
More generally did you read any regexp tutorial, did any testing, or did any reading of the javadoc of Pattern and Matcher?
I've been developing for a few years. It's just personal experience, perhaps, but, reading is pretty fundamental.
Instead of the anchors ^ and $, you can use word boundaries \b to prevent a partial match.
If you want to match spaces on the same line, you can use \h to match horizontal whitespace char, as \s can also match a newline.
You can use replaceFirst on the string using $0 to get the full match, and an inline modifier (?i) for a case insensitive match.
Note that using [and|or] is a character class matching one of the listed chars and escape the dot to match it literally, or else . matches any char except a newline.
(?i)\bhemp\.EMPLOYEE_NAME\h*=\h*'\w+'\h+(?:and|or)\b
See a regex demo or a Java demo
For example
String regex = "\\bhemp\\.EMPLOYEE_NAME\\h*=\\h*'\\w+'\\h+(?:and|or)\\b";
String string = "cond = emp.EMAIL_ID = 'xx#xx.com' AND\n"
+ "emp.PERMANENT_ADDR LIKE('%98n%') \n"
+ "AND hemp.EMPLOYEE_NAME = 'xxx' and is_active='Y'";
System.out.println(string.replaceFirst(regex, "xx$0x"));
Output
cond = emp.EMAIL_ID = 'xx#xx.com' AND
emp.PERMANENT_ADDR LIKE('%98n%')
AND xxhemp.EMPLOYEE_NAME = 'xxx' andx is_active='Y'
I tried searching but could not find anything that made any sense to me! I am noob at regex :)
Trying to see if a particular word "some_text" exists in another string.
String s = "This is a test() function"
String s2 = "This is a test () function"
Assuming the above two strings I can search this using the following pattern at RegEx Tool
[^\w]test[ ]*[(]
But unable to get a positive match in Java using
System.out.println(s.matches("[^\\w]test[ ]*[(]");
I have tried with double \ and even four \\ as escape characters but nothing really works.
The requirement is to see the word starts with space or is the first word of a line and has an open bracket "(" after that particular word, so that all these "test (), test() or test ()" should get a positive match.
Using Java 1.8
Cheers,
Faisal.
The point you are missing is that Java matches() puts a ^ at the start and a $ at the end of the Regex for you. So your expression actually is seen as:
^[^\w]test[ ]*[(]$
which is never going to match your input.
Going from your requirement description, I suggest reworking your regex expression to something like this (assuming by "particular word" you meant test):
(?:.*)(?<=\s)(test(?:\s+)?\()(?:.*)
See the regex at work here.
Explanation:
^ Start of line - added by matches()
(?:.*) Non-capturing group - match anything before the word, but dont capture into a group
(?<=\s) Positive lookbehind - match if word preceded by space, but dont match the space
( Capturing group $1
test(?:\s+)? Match word test and any following spaces, if they exist
\( Match opening bracket
)
(?:.*) Non-capturing group - match rest of string, but dont capture in group
$ End of line - added by matches()
Code sample:
public class Main {
public static void main(String[] args) {
String s = "This is a test() function";
String s2 = "This is a test () function";
System.out.println(s.matches("(?:.*)((?<=\\s))(test(?:\\s+)?\\()(?:.*)"));
//true
}
}
I believe this should be enough:
s.find("\\btest\\s*\\(")
Try this "\btest\b(?= *()".
And dont use "matches", use "find". Mathes trying to match the whole string
https://regex101.com/r/xaPCyp/1
The Matches() method tells whether or not this whole string matches the given regular expression. Since that's not the case you'll yield errors.
If you just interested in if your lookup-value exists within the string I found the following usefull:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Main {
public static void main(String[] args) {
String s = "This is a test () function";
Pattern p = Pattern.compile("\\btest *\\(");
Matcher m = p.matcher(s);
if (m.find())
System.out.println("Found a match");
else
System.out.println("Did not find a match");
}
}
I went with the following pattern: \\btest *\\(
\\b - Match word-boundary (will also catch if first word).
test - Literally match your lookup-value.
* - Zero or more literal spaces.
\\( - Escaped open paranthesis to match literally.
Debuggex Demo
The .matches method will match the whole string where your pattern would only get a partial match.
In the pattern that you tried, the negated character class [^\\w] could also match more than a whitespace boundary as it matches any char except a word character. It could for example also match a ( or a newline.
As per the comments test() function should also match, using [^\\w] or (?<=\s) expects a character to be there on the left.
Instead you could make use of (?<!\\S) to assert a whitespace boundary on the left.
.*(?<!\S)test\h*\(.*
Explanation
.* Match 0+ times any char except a newline
(?<!\S) Assert a whitespace boundary on the left
test\h* Match test and 0+ horizontal whitespace chars
\( Match a ( char
.* Match 0+ times any char except a newline
Regex demo | Java demo
In Java
System.out.println(s.matches(".*(?<!\\S)test\\h*\\(.*"));
I am struggling with the following issue: say there's a regex 1 and there's regex 2 which should match everything the regex 1 does not.
Let's have the regex 1:
/\$\d+/ (i.e. the dollar sign followed by any amount of digits.
Having a string like foo$12___bar___$34wilma buzz it detects $12 and $34.
How does the regex 2 should look in order to match the remained parts of the aforementioned string, i.e. foo, ___bar___ and wilma buzz? In other words it should pick up all the "remained" chunks of the source string.
You may use String#split to split on given regex and get remaining substrings in an array:
String[] arr = str.split( "\\$\\d+" );
//=> ["foo", "___bar___", "wilma buzz"]
RegEx Demo
It was tricky to get this working, but this regex will match everything besides \$\d+ for you. EDIT: no longer erroneously matches $44$444 or similar.
(?!\$\d+)(.+?)\$\d+|\$\d+|(?!\$\d+)(.+)
Breakdown
(?!\$\d+)(.+?)\$\d+
(?! ) negative lookahead: assert the following string does not match
\$\d+ your pattern - can be replaced with another pattern
(.+?) match at least one symbol, as few as possible
\$\d+ non-capturing match your pattern
OR
\$\d+ non-capturing group: matches one instance of your pattern
OR
(?!\$\d+)(.+)
(?!\$\d+) negative lookahead to not match your pattern
(.+) match at least one symbol, as few as possible
GENERIC FORM
(?!<pattern>)(.+?)<pattern>|<pattern>|(?!<pattern>)(.+)
By replacing <pattern>, you can match anything that doesn't match your pattern. Here's one that matches your pattern, and here's an example of arbitrary pattern (un)matching.
Good luck!
Try this one
[a-zA-Z_]+
Or even better
[^\$\d]+ -> With the ^symbol you can negotiate the search like ! in the java -> not equal
In an odd number length string, how could you match (or capture) the middle character?
Is this possible with PCRE, plain Perl or Java regex flavors?
With .NET regex you could use balancing groups to solve it easily (that could be a good example). By plain Perl regex I mean not using any code constructs like (??{ ... }), with which you could run any code and of course do anything.
The string could be of any odd number length.
For example in the string 12345 you would want to get the 3, the character at the center of the string.
This is a question about the possibilities of modern regex flavors and not about the best algorithm to do that in some other way.
With PCRE and Perl (and probably Java) you could use:
^(?:.(?=.*?(?(1)(?=.\1$))(.\1?$)))*(.)
which would capture the middle character of odd length strings in the 2nd capturing group.
Explained:
^ # beginning of the string
(?: # loop
. # match a single character
(?=
# non-greedy lookahead to towards the end of string
.*?
# if we already have captured the end of the string (skip the first iteration)
(?(1)
# make sure we do not go past the correct position
(?= .\1$ )
)
# capture the end of the string +1 character, adding to \1 every iteration
( .\1?$ )
)
)* # repeat
# the middle character follows, capture it
(.)
Hmm, maybe someone can come up with a pure regex solution, but if not you could always dynamically build the regex like this:
public static void main(String[] args) throws Exception {
String s = "12345";
String regex = String.format(".{%d}3.{%d}", s.length() / 2, s.length() / 2);
Pattern p = Pattern.compile(regex);
System.out.println(p.matcher(s).matches());
}