Pattern.matches("123$45","123$45") returns false, I presume because of the special $ char.
My suspicion was that escaping the $ would make it pass
e.g. Pattern.matches("123\$45","123\$45")
But this also fails.
What is the proper way to make sure they match?
This is the "canonical" regex which is \$, but here this is a Java string. And in a Java string, a \ is written "\\". Therefore:
"123\\$45"
As to your target string, it just needs to be "123$45".
If the pattern you are looking for is fixed pattern, then manually escape the '$' character so that it isn't treated as a regex metacharacter; i.e.
boolean itMatches = Pattern.matches("123\\$45", "123$45");
The '$' is escaped at the level of the String object using a single backslash. However, since we are expressing this using a String literal, and backslash is the escape character for string literals, we need to (string) escape the (regex) escape character. Hence, we need two backslashes ... here.
If you don't escape the escape, the Java compiler says in effect "I don't recognize "\$" as a valid String literal escape sequence. ERROR!".
On the other hand, if the pattern input or generated, then you can use Pattern.quote() to quote it; i.e.
String literal = "123$45"; // ... or any literal string you want to match.
boolean itMatches = Pattern.matches(Pattern.quote(literal), "123$45");
Related
Why do I need four backslashes (\) to add one backslash into a String?
String replacedValue = neName.replaceAll(",", "\\\\,");
Here in above code you can check I have to replace all commas (,) from \, but I have to add three more backslash (\) ?
Can anybody explain this concept?
Escape once for Java, and a second time for regexp.
\ -> \\ -> \\\\
Or since you're not actually using regular expressions, take khelwood's advice and use replace(String,String) so you need to only escape once.
The documentation of String.replaceAll(regex, replacement) states:
Note that backslashes (\) and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string; see Matcher.replaceAll.
The documentation of Matcher.replaceAll(replacement) then states:
backslashes are used to escape literal characters in the replacement string
So to put this more clearly, when you replace with \,, it is as if you were escaping the comma. But what you want is really the \ character, so you should escape it with \\,. Since that in Java, \ also needs to be escaped, the replacement String becomes \\\\,.
If you are having a hard time remembering all this, you can use the method Matcher.quoteReplacement(s), whose goal is to correctly escape the replacement part. Your code would become:
String replacedValue = neName.replaceAll(",", Matcher.quoteReplacement("\\,"));
\ is used for escape sequence
For example
go to next line then use \n or \r
for tab \t
likewise to print \ which is special in string literal you have to escape it with another \ which gives us \\
Now replaceAll should be used with a regex, since you're not using a regex, use replace as suggested in the comments.
String s = neName.replace(",", "\\,");
You have to first escape the backslash because it's a literal (giving \\), and then escape it again because of the regular expression (giving \\\\).
Therefore this -
String replacedValue = neName.replaceAll(",", "\\\\,"); // you need ////
You can use replace instead of replaceAll-
String replacedValue = neName.replace(",", "\\,");
I have a large String with many occurrences like this:
List<String>
I need to convert that String so that it matches
List\<String\>
I was assuming that I would use the Java replaceAll("", "") method but I can't get it to work as I am not all that familiar with Regular expressions.
Any help would be appreciated
You need four backslash characters, e.g.:
String input = "List<String>";
input = input.replaceAll("<", "\\\\<").replaceAll(">", "\\\\>");
"\\\\<" is the string literal for specifying \\<.
But why 2 \ are necessary in the replacement string? Since the replacement string itself also has escape syntax (to escape $, which is used for specifying content in capturing group). \< (or as string literal "\\<") is interpreted as < by the replace method. So we need to escape the \ character at the replacement string level.
I'm learning Regex, and running into trouble in the implementation.
I found the RegexTestHarness on the Java Tutorials, and running it, the following string correctly identifies my pattern:
[\d|\s][\d]\.
(My pattern is any double digit, or any single digit preceded by a space, followed by a period.)
That string is obtained by this line in the code:
Pattern pattern =
Pattern.compile(console.readLine("%nEnter your regex: "));
When I try to write a simple class in Eclipse, it tells me the escape sequences are invalid, and won't compile unless I change the string to:
[\\d|\\s][\\d]\\.
In my class I'm using`Pattern pattern = Pattern.compile();
When I put this string back into the TestHarness it doesn't find the correct matches.
Can someone tell me which one is correct? Is the difference in some formatting from console.readLine()?
\ is special character in String literals "...". It is used to escape other special characters, or to create characters like \n \r \t.
To create \ character in string literal which can be used in regex engine you need to escape it by adding another \ before it (just like you do in regex when you need to escape its metacharacters like dot \.). So String representing \ will look like "\\".
This problem doesn't exist when you are reading data from user, because you are already reading literals, so even if user will write in console \n it will be interpreted as two characters \ and n.
Also there is no point in adding | inside class character [...] unless your intention is to make that class also match | character, remember that [abc] is the same as (a|b|c) so there is no need for | in "[\\d|\\s]".
If you want to represent a backslash in a Java string literal you need to escape it with another backslash, so the string literal "\\s" is two characters, \ and s. This means that to represent the regular expression [\d\s][\d]\. in a Java string literal you would use "[\\d\\s][\\d]\\.".
Note that I also made a slight modification to your regular expression, [\d|\s] will match a digit, whitespace, or the literal | character. You just want [\d\s]. A character class already means "match one of these", since you don't need the | for alternation within a character class it loses its special meaning.
My pattern is any double digit or single digit preceded by a space, followed by a period.)
Correct regex will be:
Pattern pattern = Pattern.compile("(\\s\\d|\\d{2})\\.");
Also if you're getting regex string from user input then your should call:
Pattern.quote(useInputRegex);
To escape all the regex special characters.
Also you double escaping because 1 escape is handled by String class and 2nd one is passed on to regex engine.
What is happening is that escape sequences are being evaluated twice. Once for java, and then once for your regex.
the result is that you need to escape the escape character, when you use a regex escape sequence.
for instance, if you needed a digit, you'd use
"\\d"
This question already has answers here:
Why does this Java regex cause "illegal escape character" errors?
(7 answers)
Closed 3 years ago.
This simple regex program
import java.util.regex.*;
class Regex {
public static void main(String [] args) {
System.out.println(args[0]); // #1
Pattern p = Pattern.compile(args[0]); // #2
Matcher m = p.matcher(args[1]);
boolean b = false;
while(b = m.find()) {
System.out.println(m.start()+" "+m.group());
}
}
}
invoked by java regex "\d" "sfdd1" compiles and runs fine.
But if #1 is replaced by Pattern p = Pattern.compile("\d");, it gives compiler error saying illegal escape character. In #1 I also tried printing the pattern specified in the command line arguments. It prints \d, which means it is just getting replaced by \d in #2.
So then why won't it throw any exception? At the end it's string argument that Pattern.compile() is taking, doesn't it detect illegal escape character then? Can someone please explain why is this behaviour?
A backslash character in a string literal needs to be escaped (preceded by a backslash). When passed in from the command line the string is not a string literal. The compiler complains because "\d" is not a valid escape sequence (see Escape Sequences for Character and String Literals ).
The \ character is used as an escape character for both Java string literals and regular expressions. This confuses many programmers. When you want to create a String in Java to represent a regular expression that has an escape character then you need to escape the Java escape character.
When passing the string in on the command line the JVM handles this for you and simply creates the String.
What you want is this
Pattern p = Pattern.compile("\\d");
The backslash \ in Java results in an escape in strings. For example, the string "\t" results in a tab character in java. This is also why "\n" produces a newline.
In regular expressions, \d is an escape with respect to the regular expression, not Java. This means in order to get \d in a string literal, you have to type "\\d" in the string. Basically, you have to escape the \ to get the literal value \d, and then when Pattern compiles the regex, it further escapes the \d to be parsed as a digit.
This can be confusing, but long story short, you should never have a single \ in a string literal for a regular expression since even the string literal "\\n" gets parsed properly.
I'm not entirely sure if I understand the question, but it seems like your problem is that you're treating "\d" as a Java escape character, which doesn't exist. To treat it as a regex escape character, use "\d" to escape the Java escape.
I want to match \Q and \E in a Java regex.
I am writing a program which will compute the length of the string, matching to the pattern (this program assumes that there is no any quantifier in regex except {some number}, that's why the length of the string is uniquely defined) and I want at first delete all expressions like \Qsome text\E.
But regex like this:
"\\Q\\Q\\E\\Q\\E\\E"
obviously doesn't work.
Use Pattern.quote(...):
String s = "\\Q\\Q\\E\\Q\\E\\E";
String escaped = Pattern.quote(s);
Just escape the backslashes. The sequence \\\\ matches a literal backslash, so to match a literal \Q:
"\\\\Q"
and to match a literal \E:
"\\\\E"
You can make it more readable for a maintainer by making it obvious that each sequence matches a single character using [...] as in:
"[\\\\][Q]"