Regex: Is it possible to skip repeating negative lookbehinds?

Regex: Is it possible to skip repeating negative lookbehinds? - java

I've been trying to fix a simple regex that:
Matches all characters from beginning of line (^) to the first & character or to the end of line ($).
The match cannot start with a &.
Examples:
test should match test.
one&two should match one.
&test shouldn't match anything.
My current regex is the following:
^(?<!\&)(.+?)(?=\&|$)
(Regex101)
Currently, this regex fails example 3, where if I gave this regex &test it matches &test, but it shouldn't match anything.
I think it may be a problem with the negative lookbehind (?<!\&) and that &test matches because the character before it is not a &, but it doesn't account for any following & characters.
Is modifying the negative lookbehind to account for repeating & characters possible, and if so, how could I fix this regex?
(I know that Regex101 is using Python's Regex, but this question's Regex is intended to work with Java.)

You need to use a look-ahead instead of a look-behind, and instead of lazy dot matching with a lookahead, use a negated character class:
^[^&]+
See demo (note that \n is added just for a demo, if you test strings without newline characters, it won't be necessary).
Here, ^ asserts the position at the start of the string, and [^&]+ class matches 1 or more characters other than & (thus, no need to use (?=\&|$) look-ahead, if needed, the whole line will be matched).
See IDEONE demo
public static void main (String[] args) throws java.lang.Exception
{
System.out.println(fetchMatch("test", 0));
System.out.println(fetchMatch("one&test", 0));
System.out.println(fetchMatch("&test", 0));
}
public static String fetchMatch(String s, int groupId)
{
Pattern pattern = Pattern.compile("^[^&]+");
Matcher matcher = pattern.matcher(s);
if (matcher.find()){
return matcher.group(groupId);
}
return "ERROR: NOT MATCHED";
}
Output:
test
one
ERROR: NOT MATCHED

Related

Use a regex to find a pattern somewhere between two words

Given the following string
{"type":"PrimaryParty","name":"Karen","id":"456789-9996"},
{"type":"SecondaryParty","name":"Juliane","id":"345678-9996"},
{"type":"SecondaryParty","name":"Ellen","id":"001234-9996"}
I am looking for strings matching the pattern \d{6}-\d{4}, but only if they are following the string "SecondaryParty". The processor is Java-based
Using https://regex101.com/ I have come up with this, which works fine using the ECMAScript(JavaScript) Flavor.
(?<=SecondaryParty.*?)\d{6}-\d{4}(?=\"})
But as soon as I switch to Java, it says
* A quantifier inside a lookbehind makes it non-fixed width
? The preceding token is not quantifiable
When using it in java.util.regex, the error says
Look-behind group does not have an obvious maximum length near index 20 (?<=SecondaryParty.*?)\d{6}-\d{4}(?="}) ^
How do I overcome the "does not have an obvious maximum length" problem in Java?

You can get the value without using lookarounds by matching instead, and use a single capture group for the value that you want to get:
\"SecondaryParty\"[^{}]*\"(\d{6}-\d{4})\"
Explanation
\"SecondaryParty\" Match "SecondaryParty"
[^{}]*\" Match optional chars other than { and }
(\d{6}-\d{4}) Capture group 1, match 6 digits - 4 digits
\" Match "
See a regex101 demo and a Java demo.

You might use a curly braces quantifier as a workaround:
(?<=SecondaryParty.{0,255})\d{6}-\d{4}(?=\"})
The minimum and maximum inside curly braces quantifier are depend on your actual data.

You could use (?<=SecondaryParty)(.*?)(\d{6}-\d{4})(?=\"}) regex expression and take the value of the second group which will match the pattern \d{6}-\d{4}, but only if they are following the string "SecondaryParty".
Sample Java code
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class IdRegexMatcher {
public static void main(String[] args) {
String input ="{\"type\":\"PrimaryParty\",\"name\":\"Karen\",\"id\":\"456789-9996\"},\n" +
"{\"type\":\"SecondaryParty\",\"name\":\"Juliane\",\"id\":\"345678-9996\"},\n" +
"{\"type\":\"SecondaryParty\",\"name\":\"Ellen\",\"id\":\"001234-9996\"}";
Pattern pattern = Pattern.compile("(?<=SecondaryParty)(.*?)(\\d{6}-\\d{4})(?=\\\"})");
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
String idStr = matcher.group(2);
System.out.println(idStr);
}
}
}
which gives the output
345678-9996
001234-9996
One possible optimization in the above regex could be to use [^0-9]*? instead of .*? under the assumption that the name wouldn't contain numbers.

To check if a pattern exists in a String

I tried searching but could not find anything that made any sense to me! I am noob at regex :)
Trying to see if a particular word "some_text" exists in another string.
String s = "This is a test() function"
String s2 = "This is a test () function"
Assuming the above two strings I can search this using the following pattern at RegEx Tool
[^\w]test[ ]*[(]
But unable to get a positive match in Java using
System.out.println(s.matches("[^\\w]test[ ]*[(]");
I have tried with double \ and even four \\ as escape characters but nothing really works.
The requirement is to see the word starts with space or is the first word of a line and has an open bracket "(" after that particular word, so that all these "test (), test() or test ()" should get a positive match.
Using Java 1.8
Cheers,
Faisal.

The point you are missing is that Java matches() puts a ^ at the start and a $ at the end of the Regex for you. So your expression actually is seen as:
^[^\w]test[ ]*[(]$
which is never going to match your input.
Going from your requirement description, I suggest reworking your regex expression to something like this (assuming by "particular word" you meant test):
(?:.*)(?<=\s)(test(?:\s+)?\()(?:.*)
See the regex at work here.
Explanation:
^ Start of line - added by matches()
(?:.*) Non-capturing group - match anything before the word, but dont capture into a group
(?<=\s) Positive lookbehind - match if word preceded by space, but dont match the space
( Capturing group $1
test(?:\s+)? Match word test and any following spaces, if they exist
\( Match opening bracket
)
(?:.*) Non-capturing group - match rest of string, but dont capture in group
$ End of line - added by matches()
Code sample:
public class Main {
public static void main(String[] args) {
String s = "This is a test() function";
String s2 = "This is a test () function";
System.out.println(s.matches("(?:.*)((?<=\\s))(test(?:\\s+)?\\()(?:.*)"));
//true
}
}

I believe this should be enough:
s.find("\\btest\\s*\\(")

Try this "\btest\b(?= *()".
And dont use "matches", use "find". Mathes trying to match the whole string
https://regex101.com/r/xaPCyp/1

The Matches() method tells whether or not this whole string matches the given regular expression. Since that's not the case you'll yield errors.
If you just interested in if your lookup-value exists within the string I found the following usefull:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Main {
public static void main(String[] args) {
String s = "This is a test () function";
Pattern p = Pattern.compile("\\btest *\\(");
Matcher m = p.matcher(s);
if (m.find())
System.out.println("Found a match");
else
System.out.println("Did not find a match");
}
}
I went with the following pattern: \\btest *\\(
\\b - Match word-boundary (will also catch if first word).
test - Literally match your lookup-value.
* - Zero or more literal spaces.
\\( - Escaped open paranthesis to match literally.
Debuggex Demo

The .matches method will match the whole string where your pattern would only get a partial match.
In the pattern that you tried, the negated character class [^\\w] could also match more than a whitespace boundary as it matches any char except a word character. It could for example also match a ( or a newline.
As per the comments test() function should also match, using [^\\w] or (?<=\s) expects a character to be there on the left.
Instead you could make use of (?<!\\S) to assert a whitespace boundary on the left.
.*(?<!\S)test\h*\(.*
Explanation
.* Match 0+ times any char except a newline
(?<!\S) Assert a whitespace boundary on the left
test\h* Match test and 0+ horizontal whitespace chars
\( Match a ( char
.* Match 0+ times any char except a newline
Regex demo | Java demo
In Java
System.out.println(s.matches(".*(?<!\\S)test\\h*\\(.*"));

Java regular expression for UserName with length range

I am writing a regular expression to validate UserName.
Here is the rule:
Length: 6 - 20 characters
Must start with letter a-zA-Z
Can contains a-zA-Z0-9 and dot(.)
Can't have 2 consecutive dots
Here is what I tried:
public class TestUserName {
private static String USERNAME_PATTERN = "[a-z](\\.?[a-z\\d]+)+";
private static Pattern pattern = Pattern.compile(USERNAME_PATTERN, CASE_INSENSITIVE);
public static void main(String[] args) {
System.out.println(pattern.matcher("user.name").matches()); // true
System.out.println(pattern.matcher("user.name2").matches()); // true
System.out.println(pattern.matcher("user2.name").matches()); // true
System.out.println(pattern.matcher("user..name").matches()); // false
System.out.println(pattern.matcher("1user.name").matches()); // false
}
}
The pattern I used is good but no length constraint.
I tried to append {6,20} constraint to the pattern but It failed.
"[a-z](\\.?[a-z\\d]+)+{6,20}" // failed pattern to validate length
Anyone has any ideas?
Thanks!

You can use a lookahead regex for all the checks:
^[a-zA-Z](?!.*\.\.)[a-zA-Z.\d]{5,19}$
Using [a-zA-Z.\d]{5,19} because we have already matched one char [a-zA-Z] at start this making total length in the range {6,20}
Negative lookahead (?!.*\.\.) will assert failure if there are 2 consecutive dots
Equivalent Java pattern will be:
Pattern p = Pattern.compile("^[a-zA-Z](?!.*\\.\\.)[a-zA-Z.\\d]{5,19}$");

Use a negative look ahead to prevent double dots:
"^(?!.*\\.\\.)(?i)[a-z][a-z\\d.]{5,19}$"
(?i) means case insensitve (so [a-z] means [a-zA-Z])
(?!.*\\.\\.) means there isn't two consecutive dots anywhere in it
The rest is obvious.
See live demo.

I would use the following regex :
^(?=.{6,20}$)(?!.*\.\.)[a-zA-Z][a-zA-Z0-9.]+$
The (?=.{6,20}$) positive lookahead makes sure the text will contain 6 to 20 characters, while the (?!.*\.\.) negative lookahead makes sure the text will not contain .. at any point.

This will also suffice (for only matching)
(?=^.{6,20}$)(?=^[A-Za-z])(?!.*\.\.)
For capturing, the matched pattern, you can use
(?=^.{6,20}$)(?=^[A-Za-z])(?!.*\.\.)(^.*$)

Java Regex-matches anything except three specific string

Given such Java Regex codes:
Pattern pattern = Pattern.compile("[^(bob)(alice)(kitty)]");
String s = "a";
Matcher matcher = pattern.matcher(s);
boolean bl = matcher.find();
System.out.println(bl);
The output is false. Why? The regex [^(bob)(alice)(kitty)] matches any things except bob, alice or kitty. Then the result should be true, right?

Because your regex is not doing what you think it should be doing.
Use this regex with Negative lookahead:
Pattern pattern = Pattern.compile("^(?!bob|alice|kitty).*$");
Your regex: [^(bob)(alice)(kitty)] is using a character class and inside a character class there are no groups.
(?!bob|alice|kitty) is negative lookahead that means fail the match if any of these 3 words appear at start of input.
Important to use anchors ^ and $ to make sure we're not matching from middle of the string.
If you want to avoid matching these 3 words anywhere in input then use this regex:
^(?!.*?(?:bob|alice|kitty)).*$
RegEx Demo

Java regex: Negative lookahead

I'm trying to craft two regular expressions that will match URIs. These URIs are of the format: /foo/someVariableData and /foo/someVariableData/bar/someOtherVariableData
I need two regexes. Each needs to match one but not the other.
The regexes I originally came up with are:
/foo/.+ and /foo/.+/bar/.+ respectively.
I think the second regex is fine. It will only match the second string. The first regex, however, matches both. So, I started playing around (for the first time) with negative lookahead. I designed the regex /foo/.+(?!bar) and set up the following code to test it
public static void main(String[] args) {
String shouldWork = "/foo/abc123doremi";
String shouldntWork = "/foo/abc123doremi/bar/def456fasola";
String regex = "/foo/.+(?!bar)";
System.out.println("ShouldWork: " + shouldWork.matches(regex));
System.out.println("ShouldntWork: " + shouldntWork.matches(regex));
}
And, of course, both of them resolve to true.
Anybody know what I'm doing wrong? I don't need to use Negative lookahead necessarily, I just need to solve the problem, and I think that negative lookahead might be one way to do it.
Thanks,

Try
String regex = "/foo/(?!.*bar).+";
or possibly
String regex = "/foo/(?!.*\\bbar\\b).+";
to avoid failures on paths like /foo/baz/crowbars which I assume you do want that regex to match.
Explanation: (without the double backslashes required by Java strings)
/foo/ # Match "/foo/"
(?! # Assert that it's impossible to match the following regex here:
.* # any number of characters
\b # followed by a word boundary
bar # followed by "bar"
\b # followed by a word boundary.
) # End of lookahead assertion
.+ # Match one or more characters
\b, the "word boundary anchor", matches the empty space between an alphanumeric character and a non-alphanumeric character (or between the start/end of the string and an alnum character). Therefore, it matches before the b or after the r in "bar", but it fails to match between w and b in "crowbar".
Protip: Take a look at http://www.regular-expressions.info - a great regex tutorial.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regex: Is it possible to skip repeating negative lookbehinds? - java

Related

Use a regex to find a pattern somewhere between two words

To check if a pattern exists in a String

Java regular expression for UserName with length range

Java Regex-matches anything except three specific string

Java regex: Negative lookahead

Categories

Resources