Why must Regex escapes be escaped? [duplicate] - java

I want to replace all whitespace characters in a string with a "+" and all "ß" with "ss"... it works well for "ß", but somehow eclipse won't let me use \s for a whitespace.. I tried "\t" instead, but it doesn't work either.. I get the following error:
Invalid escape sequence (valid ones
are \b \t \n \f \r \" \' \ )
this is my code:
try {
String temp1 = from.getText().toString();
start_from = temp1.replaceAll("ß", "ss");
start_from = start_from.replaceAll("\s", "+");
}
why doesn't it work? is it a problem with android, eclipse or what?
thanks in advance!

You need to escape the slash
start_from = start_from.replaceAll("\\s", "+");

The problem is that \ is an escape character in java as well as regex patterns. If you want to match the regex pattern \n, say, and you'd go ahead and write
replaceAll("\n", "+");
The regex pattern would not end up being \n: it would en up being an actual newline, since that's what "\n" means in Java. If you want the pattern to contain a backslash, you'll need to make sure you escape that backslash, so that it is not treated as a special character within the string.
replaceAll("\\s", "+");

You can use the java.util.regex.Pattern class and use something like p = Pattern.compile("\s"); in combination with p.matcher(start_from).replaceAll("+"). Alternatively, just escape your "\s" metacharacter as "\\s".

Related

How Java replaceAll operation works with backslashes?

Why do I need four backslashes (\) to add one backslash into a String?
String replacedValue = neName.replaceAll(",", "\\\\,");
Here in above code you can check I have to replace all commas (,) from \, but I have to add three more backslash (\) ?
Can anybody explain this concept?
Escape once for Java, and a second time for regexp.
\ -> \\ -> \\\\
Or since you're not actually using regular expressions, take khelwood's advice and use replace(String,String) so you need to only escape once.
The documentation of String.replaceAll(regex, replacement) states:
Note that backslashes (\) and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string; see Matcher.replaceAll.
The documentation of Matcher.replaceAll(replacement) then states:
backslashes are used to escape literal characters in the replacement string
So to put this more clearly, when you replace with \,, it is as if you were escaping the comma. But what you want is really the \ character, so you should escape it with \\,. Since that in Java, \ also needs to be escaped, the replacement String becomes \\\\,.
If you are having a hard time remembering all this, you can use the method Matcher.quoteReplacement(s), whose goal is to correctly escape the replacement part. Your code would become:
String replacedValue = neName.replaceAll(",", Matcher.quoteReplacement("\\,"));
\ is used for escape sequence
For example
go to next line then use \n or \r
for tab \t
likewise to print \ which is special in string literal you have to escape it with another \ which gives us \\
Now replaceAll should be used with a regex, since you're not using a regex, use replace as suggested in the comments.
String s = neName.replace(",", "\\,");
You have to first escape the backslash because it's a literal (giving \\), and then escape it again because of the regular expression (giving \\\\).
Therefore this -
String replacedValue = neName.replaceAll(",", "\\\\,"); // you need ////
You can use replace instead of replaceAll-
String replacedValue = neName.replace(",", "\\,");

How to split a string with double quotes " as the delimiter?

I tried splitting like this-
tableData.split("\\"")
but it does not work.
It seems that you tried to escape it same way as you would escape | which is "\\|". But difference between | and " is that
| is metacharacter in regex engine (it represents OR operator)
" is metacharacter in Java language in string literal (it represents start/end of the string)
To escape any String metacharacter (like ") you need to place before it other String metacharacter responsible for escaping which is \1. So to create String which would contain " like this is "quote" you would need to write it as
String s = "this is \"quote\"";
// ^^ ^^ these represent " literal, not end of string
Same idea is applied if we would like to create \ literal (we would need to escape it by placing another \ before it). For instance if we would want to create string representing c:\foo\bar we would need to write it as
String s = "c:\\foo\\bar";
// ^^ ^^ these will represent \ literal
So as you see \ is used to escape metacharacters (make them simple literals).
This character is used in Java language for Strings, but it also is used in regex engine to escape its metacharacters:
\, ^, $, ., |, ?, *, +, (, ), [, {.
If you would like to create regex which will match [ character you will need to use regex \[ but String representing this regex in Java needs to be written as
String leftBracketRegex = "\\[";
// ^^ - Remember what was said earlier?
// To create \ literal in String we need to escape it
So to split on [ we would need to invoke split("\\[") because regex representing [ is \[ which needs to be written as "\\[" in Java.
Since " is not special character in regex but it is special in String we need to escape it only in string literal by writing it as
split("\"");
1) \ is also used to create other characters line separators \n, tab \t. It can also be used to create Unicode characters like \uXXXX where XXXX is index of character in Unicode table in hexadecimal form.
You have escaped the \ by putting in \ twice, try
tableData.split("\"")
Why does this happen?
A backslash escapes the following character. Since the next character is another backslash, the second backslash will be escaped, thus the doublequote won't.
Your resulting escaped string is \", where it should really be just ".
Edit:
Also keep in mind, that String.split() interprets its pattern parameter as a regular expression, which has several special characters, which have to be escaped in the resulting string.
So if you want split by a .(which is a special regex character), you need to specify it as String.split("\\."). The first backslash escapes the escaping function of the second backlash and would result in "\.".
In case of regex characters you could also just use Pattern.quote(); to escape your desired delimiter, but this is far out of the scope the question orignally had.
Try with single backslash \
tableData.split("\"")
Try like this by escaping " with single backslash \ :
tableData.split("\"")
You are not escaping properly. The snippet code will not even compile because of it. The correct way to do it is
tableData.split("\"");
A single backslash will do the trick.
Like this:
tableData.split("\"");
You can actually split without the backward slash. You only have to use single quote
tableData.split('"');

java regex expression escape characters

Hi i'm trying to split a string separated by vertical bars. for example:
String str = "a=1|b=2";
In java, we should do like this:
str.split("\\|");
If I use a single slash:
str.split("\|");
compiler gives errors:
Invalid escape sequence (valid ones are \b \t \n \f \r \" \' \ )
Can anyone explain me why this happens? thanks!
Backslash \ is a special character. In the Java world it is used to escape a character.
The pipe | is a special character in the Regex world, which means "OR".
To use the pipe as a separator you need to escape it(so it can be recognized during the regex parsing), so you need to get this in your regex: \|.
But as backshlash is a special character in Java and that you are using a String object, you have to escape the backslash so it can be interpreted as the final expected final result: \|
To do so, you simply escape backslash with another backslash: \\|
The first backslash escapes the second backslash (java requirement) which escapes the pipe (regex requirement).
In Java strings, a backslash needs to be escaped with another backslash. So, while the "canonical" form of the regex is indeed \|, as a Java string, this must be written "\\|".

"Invalid escape sequence (valid ones are \b \t \n \f \r \" \' \\ )" syntax error

I wrote code for matching filepath which have extenstion .ncx ,
pattern = Pattern.compile("$(\\|\/)[a-zA-Z0-9_]/.ncx");
Matcher matcher = pattern.mather("\sample.ncx");
This shows a invalid escape sequence (valid ones are \b \t \n \f \r \" \' \ ) syntax error pattern. How can I fix it.
Pattern p = Pattern.compile("[/\\\\]([a-zA-Z0-9_]+\\.ncx)$");
Matcher m = p.matcher("\\sample.ncx");
if (m.find())
{
System.out.printf("The filename is '%s'%n", m.group(1));
}
output:
The filename is 'sample.ncx'
$ anchors the match to the end of the string (or to the end of a line in multiline mode). It belongs at the end of your regex, not the beginning.
[/\\\\] is a character class that matches a forward-slash or a backslash. The backslash has to be double-escaped because it has special meaning both in a regex and in a string literal. The forward-slash does not require escaping.
[a-zA-Z0-9_]+ matches one or more of the listed characters; without the plus sign, you were only matching one.
The second forward-slash in your regex makes no sense, but you do need a backslash there to escape the dot--and of course, the backslash has to be escaped for the Java string literal.
Because I switched from the alternation (|) to a character class for the leading slash, the parentheses in your regex were no longer needed. Instead, I used them to capture the actual filename, just to demonstrate how that's done.
In java \ is a reserved character for escaping. so you need to escape the \.
pattern=Pattern.compile("$(\\\\|\\/)[a-zA-Z0-9_]/.ncx");
try this
$(\\|\\/)[a-zA-Z0-9_]/.ncx

Java doesn't work with regex \s, says: invalid escape sequence

I want to replace all whitespace characters in a string with a "+" and all "ß" with "ss"... it works well for "ß", but somehow eclipse won't let me use \s for a whitespace.. I tried "\t" instead, but it doesn't work either.. I get the following error:
Invalid escape sequence (valid ones
are \b \t \n \f \r \" \' \ )
this is my code:
try {
String temp1 = from.getText().toString();
start_from = temp1.replaceAll("ß", "ss");
start_from = start_from.replaceAll("\s", "+");
}
why doesn't it work? is it a problem with android, eclipse or what?
thanks in advance!
You need to escape the slash
start_from = start_from.replaceAll("\\s", "+");
The problem is that \ is an escape character in java as well as regex patterns. If you want to match the regex pattern \n, say, and you'd go ahead and write
replaceAll("\n", "+");
The regex pattern would not end up being \n: it would en up being an actual newline, since that's what "\n" means in Java. If you want the pattern to contain a backslash, you'll need to make sure you escape that backslash, so that it is not treated as a special character within the string.
replaceAll("\\s", "+");
You can use the java.util.regex.Pattern class and use something like p = Pattern.compile("\s"); in combination with p.matcher(start_from).replaceAll("+"). Alternatively, just escape your "\s" metacharacter as "\\s".

Categories

Resources