Escape ( in regular expression - java

Im searching for the regular expression - ".(conflicted copy.". I wrote the following code for this
String str = "12B - (conflicted copy 2013-11-16-11-07-12)";
boolean matches = str.matches(".*(conflicted.*");
System.out.println(matches);
But I get the exception
Exception in thread "main" java.util.regex.PatternSyntaxException: Unclosed group near index 15
.(conflicted.
I understand that the compiler thinks that ( is the beginning of a pattern group. I tried to escape ( by adding \( but that doesnt work.
Can someone tell me how to escape ( here ?

Escaping is done by \. In Java, \ is written as \\1, so you should escaping the ( would be \\(.
Side note: It's good to have a look at Pattern#quote that returns a literal pattern String. In your case, it's not that helpful since you don't want to escape all special-characters.
1 Because a character preceded by a backslash (\) is an escape sequence and has special meaning to the compiler.

( in regex is metacharacter which means "start of group" and it needs to be closed with ). If you want refex engine to tread it as simple literal you need to escape it. You can do it by adding \ before it, but since \ is also metacharacter in String (used for example to create characters like "\n", "\t") you need to escape it as well which will look like "\\". So try
str.matches(".*\\(conflicted.*");
Other option is to use character class to escape ( like
str.matches(".*[(]conflicted.*");
You can also use Pattern.quote() on part that needs to be escaped like
str.matches(".*"+Pattern.quote("(")+"conflicted.*");
Or simply surround part in which all characters should be threaded as literals with "\\Q" and "\\E" which represents start and end of quotation.
str.matches(".*\\Q(\\Econflicted.*");

In Regular Expressions all characters can be safely escaped by adding a backslash in front.
Keep in mind that in most languages, including C#, PHP and Java, the backslash itself is also a native escape, and thus needs to be escaped itself in non-literal strings, so requiring you to enter "myText \\(".
Using a backslash inside a regular expression may require you to escape it both on the language level and the regex level ("\\\\"): this passes "\\" to the regex engine, which parses it as "\" itself.

Related

How Java replaceAll operation works with backslashes?

Why do I need four backslashes (\) to add one backslash into a String?
String replacedValue = neName.replaceAll(",", "\\\\,");
Here in above code you can check I have to replace all commas (,) from \, but I have to add three more backslash (\) ?
Can anybody explain this concept?
Escape once for Java, and a second time for regexp.
\ -> \\ -> \\\\
Or since you're not actually using regular expressions, take khelwood's advice and use replace(String,String) so you need to only escape once.
The documentation of String.replaceAll(regex, replacement) states:
Note that backslashes (\) and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string; see Matcher.replaceAll.
The documentation of Matcher.replaceAll(replacement) then states:
backslashes are used to escape literal characters in the replacement string
So to put this more clearly, when you replace with \,, it is as if you were escaping the comma. But what you want is really the \ character, so you should escape it with \\,. Since that in Java, \ also needs to be escaped, the replacement String becomes \\\\,.
If you are having a hard time remembering all this, you can use the method Matcher.quoteReplacement(s), whose goal is to correctly escape the replacement part. Your code would become:
String replacedValue = neName.replaceAll(",", Matcher.quoteReplacement("\\,"));
\ is used for escape sequence
For example
go to next line then use \n or \r
for tab \t
likewise to print \ which is special in string literal you have to escape it with another \ which gives us \\
Now replaceAll should be used with a regex, since you're not using a regex, use replace as suggested in the comments.
String s = neName.replace(",", "\\,");
You have to first escape the backslash because it's a literal (giving \\), and then escape it again because of the regular expression (giving \\\\).
Therefore this -
String replacedValue = neName.replaceAll(",", "\\\\,"); // you need ////
You can use replace instead of replaceAll-
String replacedValue = neName.replace(",", "\\,");

How to split a string with double quotes " as the delimiter?

I tried splitting like this-
tableData.split("\\"")
but it does not work.
It seems that you tried to escape it same way as you would escape | which is "\\|". But difference between | and " is that
| is metacharacter in regex engine (it represents OR operator)
" is metacharacter in Java language in string literal (it represents start/end of the string)
To escape any String metacharacter (like ") you need to place before it other String metacharacter responsible for escaping which is \1. So to create String which would contain " like this is "quote" you would need to write it as
String s = "this is \"quote\"";
// ^^ ^^ these represent " literal, not end of string
Same idea is applied if we would like to create \ literal (we would need to escape it by placing another \ before it). For instance if we would want to create string representing c:\foo\bar we would need to write it as
String s = "c:\\foo\\bar";
// ^^ ^^ these will represent \ literal
So as you see \ is used to escape metacharacters (make them simple literals).
This character is used in Java language for Strings, but it also is used in regex engine to escape its metacharacters:
\, ^, $, ., |, ?, *, +, (, ), [, {.
If you would like to create regex which will match [ character you will need to use regex \[ but String representing this regex in Java needs to be written as
String leftBracketRegex = "\\[";
// ^^ - Remember what was said earlier?
// To create \ literal in String we need to escape it
So to split on [ we would need to invoke split("\\[") because regex representing [ is \[ which needs to be written as "\\[" in Java.
Since " is not special character in regex but it is special in String we need to escape it only in string literal by writing it as
split("\"");
1) \ is also used to create other characters line separators \n, tab \t. It can also be used to create Unicode characters like \uXXXX where XXXX is index of character in Unicode table in hexadecimal form.
You have escaped the \ by putting in \ twice, try
tableData.split("\"")
Why does this happen?
A backslash escapes the following character. Since the next character is another backslash, the second backslash will be escaped, thus the doublequote won't.
Your resulting escaped string is \", where it should really be just ".
Edit:
Also keep in mind, that String.split() interprets its pattern parameter as a regular expression, which has several special characters, which have to be escaped in the resulting string.
So if you want split by a .(which is a special regex character), you need to specify it as String.split("\\."). The first backslash escapes the escaping function of the second backlash and would result in "\.".
In case of regex characters you could also just use Pattern.quote(); to escape your desired delimiter, but this is far out of the scope the question orignally had.
Try with single backslash \
tableData.split("\"")
Try like this by escaping " with single backslash \ :
tableData.split("\"")
You are not escaping properly. The snippet code will not even compile because of it. The correct way to do it is
tableData.split("\"");
A single backslash will do the trick.
Like this:
tableData.split("\"");
You can actually split without the backward slash. You only have to use single quote
tableData.split('"');

Underlined backslash IntelliJ

I am using a backslash as an escape character for a serialization format I am working on. I have it as a constant but IntelliJ is underlining it and highlighting it red. On hover it gives no error messages or any information as to why it does not like it.
What is the reason for this and how do I fix it?
IntelliJ is smarter than I am and realised that I was using this character in a regular expression where 2 backslashes would be needed, however, IntelliJ also assumed that my puny mind could find the problem without giving me any information about it.
If it's being used as a regular expression, then the "\" must be escaped.
If you're escaping a "\" as "\" like traditional regular expressions require, then you also need to add two more \\ for a total of \\\\.
This is because of the way Java interprets "\":
In literal Java strings the backslash is an escape character. The
literal string "\" is a single backslash. In regular expressions, the
backslash is also an escape character. The regular expression \
matches a single backslash. This regular expression as a Java string,
becomes "\\". That's right: 4 backslashes to match a single one.
The regex \w matches a word character. As a Java string, this is
written as "\w".
The same backslash-mess occurs when providing replacement strings for
methods like String.replaceAll() as literal Java strings in your Java
code. In the replacement text, a dollar sign must be encoded as \$ and
a backslash as \ when you want to replace the regex match with an
actual dollar sign or backslash. However, backslashes must also be
escaped in literal Java strings. So a single dollar sign in the
replacement text becomes "\$" when written as a literal Java string.
The single backslash becomes "\\". Right again: 4 backslashes to
insert a single one.

Why do I need two slashes in Java Regex to find a "+" symbol?

Just something I don't understand the full meaning behind. I understand that I need to escape any special meaning characters if I want to find them using regex. And I also read somewhere that you need to escape the backslash in Java if it's inside a String literal. My question though is if I "escape" the backslash, doesn't it lose its meaning? So then it wouldn't be able to escape the following plus symbol?
Throws an error (but shouldn't it work since that's how you escape those special characters?):
replaceAll("\+\s", ""));
Works:
replaceAll("\\+\\s", ""));
Hopefully that makes sense. I'm just trying to understand the functionality behind why I need those extra slashes when the regex tutorials I've read don't mention them. And things like "\+" should find the plus symbol.
There are two "escapings" going on here. The first backslash is to escape the second backslash for the Java language, to create an actual backslash character. The backslash character is what escapes the + or the s for interpretation by the regular expression engine. That's why you need two backslashes -- one for Java, one for the regular expression engine. With only one backslash, Java reports \s and \+ as illegal escape characters -- not for regular expressions, but for an actual character in the Java language.
Funda behind extra slashes is that , first slash '\' is escape for the string and second slash '\' is escape for the regex.

How to undo replace performed by regex?

In java, I have the following regex ([\\(\\)\\/\\=\\:\\|,\\,\\\\]) which is compiled and then used to escape each of the special characters ()/=:|,\ with a backslash as follows escaper.matcher(value).replaceAll("\\\\$1")
So the string "A/C:D/C" would end up as "A\/C\:D\/C"
Later on in the process, I need to undo that replace. That means I need to match on the combination of \(, \), \/ etc. and replace it with the character immediately following the backslash character. A backslash followed by any other character should not be matched and there could be cases where a special character will exist without the preceeding backslash, in which case it shouldn't match either.
Since I know all of the cases I could do something like
myString.replaceAll("\\(", "(").replaceAll("\\)", ")").replaceAll("\\/", "/")...
but I wonder if there is a simpler regex that would allow me to perform the replace for all the special characters in a single step.
That seems pretty straightforward. If this were your original code (excess escapes removed):
Pattern escaper = Pattern.compile("([()/=:|,\\\\])");
String escaped = escaper.matcher(original).replaceAll("\\\\$1");
...the opposite would be:
Pattern unescaper = Pattern.compile("\\\\([()/=:|,\\\\])");
String unescaped = unescaper.matcher(escaped).replaceAll("$1");
If you weren't escaping and unescaping backslashes themselves (as you're doing), you would have problems, but this should work fine.
I don't know java regex flavor but this work with PCRE
replace \\ followed by ([()/=:|,\\]) by $1
in perl you can do
$str =~ s#\\([()/=:|,\\])#$1#g;

Categories

Resources