I'm trying to understand Pattern.quote using the following code:
String pattern = Pattern.quote("1252343% 8 567 hdfg gf^$545");
System.out.println("Pattern is : "+pattern);
produces the output:
Pattern is : \Q1252343% 8 567 hdfg gf^$545\E
What are \Q and \E here? The documentation description says :
Returns a literal pattern String for the specified String.
This method produces a String that can be used to create a Pattern that would match the string s as if it were a literal pattern.
Metacharacters or escape sequences in the input sequence will be given no special meaning.
But Pattern.quote's return type is String and not a compiled Pattern object.
Why is this method required and what are some usage examples?
\Q means "start of literal text" (i.e. regex "open quote")
\E means "end of literal text" (i.e. regex "close quote")
Calling the Pattern.quote() method wraps the string in \Q...\E, which turns the text is into a regex literal. For example, Pattern.quote(".*") would match a dot and then an asterisk:
System.out.println("foo".matches(".*")); // true
System.out.println("foo".matches(Pattern.quote(".*"))); // false
System.out.println(".*".matches(Pattern.quote(".*"))); // true
The method's purpose is to not require the programmer to have to remember the special terms \Q and \E and to add a bit of readability to the code - regex is hard enough to read already. Compare:
someString.matches(Pattern.quote(someLiteral));
someString.matches("\\Q" + someLiteral + "\\E"));
Referring to the javadoc:
Returns a literal pattern String for the specified String.
This method produces a String that can be used to create a Pattern that would match the string s as if it were a literal pattern.
Metacharacters or escape sequences in the input sequence will be given no special meaning.
The Pattern.quote method quotes part of a regex pattern to make regex interpret it as string literals.
Say you have some user input in your search program, and you want to regex for it. But this input may have unsafe characters so you can use
Pattern pattern = Pattern.compile(Pattern.quote(userInput));
This method does not quote a Pattern but, as you point out, wraps a String in regex quotes.
\Q and \E, among all others, are thoroughly documented on the java.util.regex.Pattern Javadoc page. They mean "begin Quote", "End quote" and demark a region where all the chars have the literal meaning. The way to use the return of Pattern.quote is to feed it to Pattern.compile, or any other method that accepts a pattern string, such as String.split.
If you compile the String returned by Pattern.quote, you'll get a Pattern which matches the literal string that you quoted.
\Q and \E mark the beginning and end of the quoted part of the string.
Regex collides frequently with normal strings. Say I want a regex to search for a certain string that is only known at runtime. How can we be sure that the string doesn't have regex meaning eg(".*.*.*")? We quote it.
This method used to make the pattern treated as a sequence of literal characters.
This has the same effect as a PATTERN.LITERAL flag.
Related
I have a String input as well as String pattern and assume they could contain all sort of such characters which have special meaning for regex, and I would like exact word replacement to take place without giving any special consideration to special characters. Any special meaning should be ignored. And I won't know at compile time exactly how many such special characters might be present in either the input string or the input pattern.
So here is the formal problem statement:-
Assume the the object input_string is the input of type String.
Then we have another string input_pattern which is also an object of type String.
Now I want to perform the following:-
String result=input_string.replaceFirst(input_pattern,"replacewithsomethingdoesntmatter");
the replacement should take place in 'exact' match manner, without considering any regex special meaning of characters if present in the strings. How to make it happen?
You can use the Pattern.quote() method to escape characters that have a special meaning in regular expressions:
String pattern = "^(.*)$";
String quotedPattern = Pattern.quote(pattern);
System.out.println(quotedPattern);
This will wrap the pattern in quotation markers (\Q and \E), indicating that the wrapped sequence needs to be matched literally.
Alternatively, you can wrap the pattern in quotation markers manually:
String pattern = "^(.*)$";
String quotedPattern = "\\Q" + pattern + "\\E";
System.out.println(quotedPattern);
The first approach is probably safer, because it will also make accommodations for expressions that already contain quotation markers.
I'm trying to get true in the following test. I have a string with the backslash, that for some reason doesn't recognized.
String s = "Good news\\ everyone!";
Boolean test = s.matches("(.*)news\\.");
System.out.println(test);
I've tried a lot of variants, but only one (.*)news(.*) works. But that actually means any characters after news, i need only with \.
How can i do that?
Group the elements at the end:(.*)news\\(.*)
You can use this instead :
Boolean test = s.matches("(.*)news\\\\(.*)");
Try something like:
Boolean test = s.matches(".*news\\\\.*");
Here .* means any number of characters followed by news, followed by double back slashes (escaped in a string) and then any number of characters after that (can be zero as well).
With your regex what it means is:
.* Any number of characters
news\\ - matches by "news\" (see one slash)
. followed by one character.
which doesn't satisfies for String in your program "Good news\ everyone!"
You are testing for an escaped occurrence of a literal dot: ".".
Refactor your pattern as follows (inferring the last part as you need it for a full match):
String s = "Good news\\ everyone!";
System.out.println(s.matches("(.*)news\\\\.*"));
Output
true
Explanation
The back-slash is used to escape characters and the back-slash itself in Java Strings
In Java Pattern representations, you need to double-escape your back-slashes for representing a literal back-slash ("\\\\"), as double-back-slashes are already used to represent special constructs (e.g. \\p{Punct}), or escape them (e.g. the literal dot \\.).
String.matches will attempt to match the whole String against your pattern, so you need the terminal part of the pattern I've added
you can try this :
String s = "Good news\\ everyone!";
Boolean test = s.matches("(.*)news\\\\(.*)");
System.out.println(test);
I'm trying to understand Pattern.quote using the following code:
String pattern = Pattern.quote("1252343% 8 567 hdfg gf^$545");
System.out.println("Pattern is : "+pattern);
produces the output:
Pattern is : \Q1252343% 8 567 hdfg gf^$545\E
What are \Q and \E here? The documentation description says :
Returns a literal pattern String for the specified String.
This method produces a String that can be used to create a Pattern that would match the string s as if it were a literal pattern.
Metacharacters or escape sequences in the input sequence will be given no special meaning.
But Pattern.quote's return type is String and not a compiled Pattern object.
Why is this method required and what are some usage examples?
\Q means "start of literal text" (i.e. regex "open quote")
\E means "end of literal text" (i.e. regex "close quote")
Calling the Pattern.quote() method wraps the string in \Q...\E, which turns the text is into a regex literal. For example, Pattern.quote(".*") would match a dot and then an asterisk:
System.out.println("foo".matches(".*")); // true
System.out.println("foo".matches(Pattern.quote(".*"))); // false
System.out.println(".*".matches(Pattern.quote(".*"))); // true
The method's purpose is to not require the programmer to have to remember the special terms \Q and \E and to add a bit of readability to the code - regex is hard enough to read already. Compare:
someString.matches(Pattern.quote(someLiteral));
someString.matches("\\Q" + someLiteral + "\\E"));
Referring to the javadoc:
Returns a literal pattern String for the specified String.
This method produces a String that can be used to create a Pattern that would match the string s as if it were a literal pattern.
Metacharacters or escape sequences in the input sequence will be given no special meaning.
The Pattern.quote method quotes part of a regex pattern to make regex interpret it as string literals.
Say you have some user input in your search program, and you want to regex for it. But this input may have unsafe characters so you can use
Pattern pattern = Pattern.compile(Pattern.quote(userInput));
This method does not quote a Pattern but, as you point out, wraps a String in regex quotes.
\Q and \E, among all others, are thoroughly documented on the java.util.regex.Pattern Javadoc page. They mean "begin Quote", "End quote" and demark a region where all the chars have the literal meaning. The way to use the return of Pattern.quote is to feed it to Pattern.compile, or any other method that accepts a pattern string, such as String.split.
If you compile the String returned by Pattern.quote, you'll get a Pattern which matches the literal string that you quoted.
\Q and \E mark the beginning and end of the quoted part of the string.
Regex collides frequently with normal strings. Say I want a regex to search for a certain string that is only known at runtime. How can we be sure that the string doesn't have regex meaning eg(".*.*.*")? We quote it.
This method used to make the pattern treated as a sequence of literal characters.
This has the same effect as a PATTERN.LITERAL flag.
I want to match \Q and \E in a Java regex.
I am writing a program which will compute the length of the string, matching to the pattern (this program assumes that there is no any quantifier in regex except {some number}, that's why the length of the string is uniquely defined) and I want at first delete all expressions like \Qsome text\E.
But regex like this:
"\\Q\\Q\\E\\Q\\E\\E"
obviously doesn't work.
Use Pattern.quote(...):
String s = "\\Q\\Q\\E\\Q\\E\\E";
String escaped = Pattern.quote(s);
Just escape the backslashes. The sequence \\\\ matches a literal backslash, so to match a literal \Q:
"\\\\Q"
and to match a literal \E:
"\\\\E"
You can make it more readable for a maintainer by making it obvious that each sequence matches a single character using [...] as in:
"[\\\\][Q]"
I don't understand why the "$" is special.
String str = "bla aa";
String tag = "$";
str = str.replaceFirst("aa", tag);
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 1
If I change the tag = "\\$", then it works fine. But why does it need to be escaped? thanks in advance.
Because it is a special regex symbol (in results it's about capturing groups), and replaceFirst takes regex arguments. The documentation explicitly warns you:
Note that backslashes () and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string; see Matcher.replaceFirst(java.lang.String). Use Matcher.quoteReplacement(java.lang.String) to suppress the special meaning of these characters, if desired.
Now a bit more about $. In the regex pattern it means "end of line".
In the replacement string, $g means "the g th group". So for a regex a([a-z]+)([0-9]+), you have two groups - $1 and $2, and you can refer to them when replacing. See the explanation here
Replace first takes regular expression. According to Pattern javadoc $ matches The end of a line.
$ matches the end of the line in a regex. So if you need it as a simple character, you need to escape it. You can find more at JAVA Pattern