java.util.regex.Matcher.replaceAll replacing without a match? - java

According to the javadoc:
Replaces every subsequence of the input sequence that matches the pattern with the given replacement string.
This seems to indicate that this call will not replace, unless a match is made.
And yet:
public class MisMatch {
public static void main(String args[]){
Pattern doubleSlash = Pattern.compile("\\\\");
String stringWithSingleSlash = "maybe\\no";
System.out.println("Matches:"+doubleSlash.matcher(stringWithSingleSlash).matches());
String replace = doubleSlash.matcher(stringWithSingleSlash).replaceAll("ABC");
System.out.println(replace);
System.out.println("Equal:"+(stringWithSingleSlash.equals(replace)));
}
}
This prints:
Matches:false
maybeABCno
Equal:false
so it is not matching, but still replacing. What am I missing here?

matches on returns true if the whole string matches - it doesn't match substrings.
So if stringWithSingleSlash were just "\\" instead of "mabye\\no", matches would return true.
If the fact that doubleSlash matches a single backslash confuses you, the explanation is that "\\\\" is a string with two backslashes and the regex engine interprets two backslashes as one escaped backslash (because the backslash is an escape character in regexes as well as in string literals).

The matches() function attempts to match the pattern against the entire string.
It seems like what you should use is to check if .find() returns true.
Take a look at the Matcher javadoc, here's an excerpt:
Once created, a matcher can be used to
perform three different kinds of match
operations:
The matches method attempts to match
the entire input sequence against the
pattern.
The lookingAt method attempts to match
the input sequence, starting at the
beginning, against the pattern.
The find method scans the input
sequence looking for the next
subsequence that matches the pattern.
Each of these methods returns a
boolean indicating success or failure.
More information about a successful
match can be obtained by querying the
state of the matcher.

Related

Java String.Matches(); [duplicate]

if("test%$#*)$(%".matches("[^a-zA-Z\\.]"))
System.exit(0);
if("te/st.txt".matches("[^a-zA-Z\\.]"))
System.exit(0);
The program isn't exiting even though the regexes should be returning true. What's wrong with the code?
matches returns true only if regex matches entire string.
In your case your regex represents only one character that is not a-z, A-Z or ..
I suspect that you want to check if string contains one of these special characters which you described in regex. In that case surround your regex with .* to let regex match entire string. Oh, and you don't have to escape . inside character class [.].
if ("test%$#*)$(%".matches(".*[^a-zA-Z.].*")) {
//string contains character that is not in rage a-z, A-Z, or '.'
BUT if you care about performance you can use Matcher#find() method which
can return true the moment it will find substring containing match for regex. This way application will not need to check rest of the text, which saves us more time the longer remaining text is.
Will not force us to constantly build Pattern object each time String#matches(regex) is called, because we can create Pattern once and reuse it with different data.
Demo:
Pattern p = Pattern.compile("[^a-zA-Z\\.]");
Matcher m = p.matcher("test%$#*)$(%");
if(m.find())
System.exit(0);
//OR with Matcher inlined since we don't really need that variable
if (p.matcher("test%$#*)$(%").find())
System.exit(0);
x.matches(y) is equivalent to
Pattern.compile(y).matcher(x).matches()
and requires the whole string x to match the regex y. If you just want to know if there is some substring of x that matches y then you need to use find() instead of matches():
if(Pattern.compile("[^a-zA-Z.]").matcher("test%$#*)$(%").find())
System.exit(0);
Alternatively you could reverse the sense of the test:
if(!"test%$#*)$(%".matches("[a-zA-Z.]*"))
by providing a pattern that matches the strings that are allowed rather than the characters that aren't, and then seeing whether the test string fails to match this pattern.
You obtain always false because the matches() method returns true only when the pattern matches the full string.

Why is String.matches returning false in Java?

if("test%$#*)$(%".matches("[^a-zA-Z\\.]"))
System.exit(0);
if("te/st.txt".matches("[^a-zA-Z\\.]"))
System.exit(0);
The program isn't exiting even though the regexes should be returning true. What's wrong with the code?
matches returns true only if regex matches entire string.
In your case your regex represents only one character that is not a-z, A-Z or ..
I suspect that you want to check if string contains one of these special characters which you described in regex. In that case surround your regex with .* to let regex match entire string. Oh, and you don't have to escape . inside character class [.].
if ("test%$#*)$(%".matches(".*[^a-zA-Z.].*")) {
//string contains character that is not in rage a-z, A-Z, or '.'
BUT if you care about performance you can use Matcher#find() method which
can return true the moment it will find substring containing match for regex. This way application will not need to check rest of the text, which saves us more time the longer remaining text is.
Will not force us to constantly build Pattern object each time String#matches(regex) is called, because we can create Pattern once and reuse it with different data.
Demo:
Pattern p = Pattern.compile("[^a-zA-Z\\.]");
Matcher m = p.matcher("test%$#*)$(%");
if(m.find())
System.exit(0);
//OR with Matcher inlined since we don't really need that variable
if (p.matcher("test%$#*)$(%").find())
System.exit(0);
x.matches(y) is equivalent to
Pattern.compile(y).matcher(x).matches()
and requires the whole string x to match the regex y. If you just want to know if there is some substring of x that matches y then you need to use find() instead of matches():
if(Pattern.compile("[^a-zA-Z.]").matcher("test%$#*)$(%").find())
System.exit(0);
Alternatively you could reverse the sense of the test:
if(!"test%$#*)$(%".matches("[a-zA-Z.]*"))
by providing a pattern that matches the strings that are allowed rather than the characters that aren't, and then seeing whether the test string fails to match this pattern.
You obtain always false because the matches() method returns true only when the pattern matches the full string.

regular expression to match string of characters only if it is not followed by a specific character

I need a regular expression which will match a string only if it doesn't followed by a forward slash (/) character and I want to match whole string.
For example below string should match
/Raj/details/002-542545-1145457
but not this one
/Raj/details/002-542545-1145457/show
I tried to use Negative Lookahead to achieve this as specified by this answer.
My code is like this.
pattern = Pattern.compile("/.*/details/(?!.*/)");
matcher = pattern.matcher(text);
if(matcher.matches()) {
System.out.println("success");
} else {
System.out.println("failure");
}
It is giving failure. But if I use matcher.find() then it is giving success.
Please help me understanding why it is not matching and a reg-exp to achieve this?
This
^/[^/]+/details/[^/]+$
will match
/Raj/details/002-542545-1145457
but not
/Raj/details/002-542545-1145457/show
I think your regex is not doing what you are expecting it to do. You are missing a part of the expression that captures the numerical data after /details/.
Your regex is positive for .find() because there is a match inside the string for your current expression, but the string does not match the expression entirely which is why .matches() doesn't work.
Your current expression is not a greedy search, it stops matching as soon as it gets to /details/. It fails the match if there is a / after /details/, so it is not matching the characters between /details/ and any potential / - in your examples, the numerical data. Which causes .matches() to fail, even though there is still a positive match.
If you want it to match the whole string up to and including the numbers but nothing afterwards, the following regex should work: /.*/details/[0-9\-]*(?!.*/) - with that both .find() and .matches() will return positive, as the expression is now matching everything up to the potential /.
You regex looks OK. matcher() only matches the whole String, while find() matches next substring inside your String.

About regex in Java (Very simple, but I don't know why it doesn't work

public static void main(String[] args) {
// TODO Auto-generated method stub
System.out.println(Pattern.matches("[^A-Za-z0-9]","##%abc"));
}
This is very very simple code about regex in Java.
As far as I know, [^A-Za-z0-9] should return true when it matches with any special characters because [^ means the negation and A-Za-z0-9 means all characters including numbers. I don't know why above code keeps returning false, instead of true.
Add a +:
System.out.println(Pattern.matches("[^A-Za-z0-9]+.*","##%abc"));
// no, this will only match the first few characters,
// added wildcard to catch the rest
Pattern.matches() implies a full match, i.e. the entire pattern matches the text from beginning to end. In your case you are doing a find(), i.e. there are multiple matches of the Pattern in the text, but not a single full match, as your pattern matches only one character.
Your match is attempting to match one character alone.
You should instead reconfgiure the match to indicate the first character and the rest you don't care about e.g.
Pattern.matches("[^A-Za-z0-9].*","##%abc")
Note the .* after your match on the first character.

Java regex basic usage problem

The following code works:
String str= "test with foo hoo";
Pattern pattern = Pattern.compile("foo");
Matcher matcher = pattern.matcher(str);
if(matcher.find()) { ... }
But this example does not:
if(Pattern.matches("foo", str)) { ... }
And neither this version:
if(str.matches("foo")) { ... }
In the real code, str is a chunk of text with multiple lines if that is treated differently by the matcher, also in the real code, replace will be used to replace a string of text.
Anyway, it is strange that it works in the first version but not the other two versions.
Edit
Ok, I realise that the behaviour is the same in the first example if if(matcher.matches()) { ... } is used instead of matcher.find. I still cannot make it work for multiline input but I stick to the Pattern.compile/Pattern.matcher solution anyway.
Your last couple of examples fail because matches adds an implicit start and end anchor to your regular expression. In other words, it must be an exact match of the entire string, not a partial match.
You can work around this by using .*foo.* instead. Using Matcher.find is more flexible solution though, so I'd recommend sticking with that.
In Java, String.matches delegates to Pattern.matches which in turn delegates to Matcher.matches, which checks if a regex matches the entire string.
From the java.util.regex.Matcher API:
Once created, a matcher can be used to perform three different kinds of match operations:
The matches method attempts to match the entire input sequence against the pattern.
The lookingAt method attempts to match the input sequence, starting at the beginning, against the pattern.
The find method scans the input sequence looking for the next subsequence that matches the pattern.
To find if a substring matches pattern, you can:
Matcher.find() the pattern within the string
Check if the entire string matches .*pattern.*
Related questions
On matches() matching whole string:
Why is my regex is not matching?
Java Regex Match Error
Java RegEx Pattern not matching (works in .NET)
On hitEnd() for partial matching:
How can I perform a partial match with java.util.regex.*?
Can java.util.regex.Pattern do partial matches?
On multiline vs singleline/Pattern.DOTALL mode:
string.matches(".*") returns false

Categories

Resources