Underlined backslash IntelliJ - java

I am using a backslash as an escape character for a serialization format I am working on. I have it as a constant but IntelliJ is underlining it and highlighting it red. On hover it gives no error messages or any information as to why it does not like it.
What is the reason for this and how do I fix it?

IntelliJ is smarter than I am and realised that I was using this character in a regular expression where 2 backslashes would be needed, however, IntelliJ also assumed that my puny mind could find the problem without giving me any information about it.

If it's being used as a regular expression, then the "\" must be escaped.
If you're escaping a "\" as "\" like traditional regular expressions require, then you also need to add two more \\ for a total of \\\\.
This is because of the way Java interprets "\":
In literal Java strings the backslash is an escape character. The
literal string "\" is a single backslash. In regular expressions, the
backslash is also an escape character. The regular expression \
matches a single backslash. This regular expression as a Java string,
becomes "\\". That's right: 4 backslashes to match a single one.
The regex \w matches a word character. As a Java string, this is
written as "\w".
The same backslash-mess occurs when providing replacement strings for
methods like String.replaceAll() as literal Java strings in your Java
code. In the replacement text, a dollar sign must be encoded as \$ and
a backslash as \ when you want to replace the regex match with an
actual dollar sign or backslash. However, backslashes must also be
escaped in literal Java strings. So a single dollar sign in the
replacement text becomes "\$" when written as a literal Java string.
The single backslash becomes "\\". Right again: 4 backslashes to
insert a single one.

Related

Java Regular Expression - how to use backslash [duplicate]

This question already has answers here:
java, regular expression, need to escape backslash in regex
(4 answers)
Closed 6 years ago.
I am really confused with how to escape. Sometimes I just need to prepend a backslash but sometimes I need to prepend double backslash like "\\.".
Could any one tell me why?
Also, could anyone give me an explanation of difference in
String.split("\t"),
String.split("\\t"),
String.split("\\\t"),
String.split("\\\\t")?
Backslash is special character in string literals - we can use it to create \n or escape " like \".
But backslash is also special in regular expression engine - for instance we can use it to use default character classes like \w \d \s.
So if you want to create string which will represent regex/text like \w you need to write it as "\\w".
If you want to write regex which will represent \ literal then text representing such regex needs to look like \\ which means String representing such text needs to be written as "\\\\".
In other words we need to escape backslash twice:
- once in regex \\
- and once in string "\\\\".
If you want to pass to regex engine literal which will represent tab then you don't need to escape backslash at all. Java will understand "\t" string as string representing tab character and you can pass such string to your regex engine without problems.
For our comfort regex engine in Java interprets text representing \t (also \r and \n) same way as string literals interpret "\t". In other words we can pass to regex engine text which will represent \ character and t character and be sure that it will be interpreted as representation of tab character.
So code like split("\t") or split("\\t") will try to split on tab.
Code like split("\\\\t") will try to split text not on tab character, but on \ character followed by t. It happens because "\\\\" as explained represents text \\ which regex engine sees as escaped \ (so it is treated as literal).

How to split a string with double quotes " as the delimiter?

I tried splitting like this-
tableData.split("\\"")
but it does not work.
It seems that you tried to escape it same way as you would escape | which is "\\|". But difference between | and " is that
| is metacharacter in regex engine (it represents OR operator)
" is metacharacter in Java language in string literal (it represents start/end of the string)
To escape any String metacharacter (like ") you need to place before it other String metacharacter responsible for escaping which is \1. So to create String which would contain " like this is "quote" you would need to write it as
String s = "this is \"quote\"";
// ^^ ^^ these represent " literal, not end of string
Same idea is applied if we would like to create \ literal (we would need to escape it by placing another \ before it). For instance if we would want to create string representing c:\foo\bar we would need to write it as
String s = "c:\\foo\\bar";
// ^^ ^^ these will represent \ literal
So as you see \ is used to escape metacharacters (make them simple literals).
This character is used in Java language for Strings, but it also is used in regex engine to escape its metacharacters:
\, ^, $, ., |, ?, *, +, (, ), [, {.
If you would like to create regex which will match [ character you will need to use regex \[ but String representing this regex in Java needs to be written as
String leftBracketRegex = "\\[";
// ^^ - Remember what was said earlier?
// To create \ literal in String we need to escape it
So to split on [ we would need to invoke split("\\[") because regex representing [ is \[ which needs to be written as "\\[" in Java.
Since " is not special character in regex but it is special in String we need to escape it only in string literal by writing it as
split("\"");
1) \ is also used to create other characters line separators \n, tab \t. It can also be used to create Unicode characters like \uXXXX where XXXX is index of character in Unicode table in hexadecimal form.
You have escaped the \ by putting in \ twice, try
tableData.split("\"")
Why does this happen?
A backslash escapes the following character. Since the next character is another backslash, the second backslash will be escaped, thus the doublequote won't.
Your resulting escaped string is \", where it should really be just ".
Edit:
Also keep in mind, that String.split() interprets its pattern parameter as a regular expression, which has several special characters, which have to be escaped in the resulting string.
So if you want split by a .(which is a special regex character), you need to specify it as String.split("\\."). The first backslash escapes the escaping function of the second backlash and would result in "\.".
In case of regex characters you could also just use Pattern.quote(); to escape your desired delimiter, but this is far out of the scope the question orignally had.
Try with single backslash \
tableData.split("\"")
Try like this by escaping " with single backslash \ :
tableData.split("\"")
You are not escaping properly. The snippet code will not even compile because of it. The correct way to do it is
tableData.split("\"");
A single backslash will do the trick.
Like this:
tableData.split("\"");
You can actually split without the backward slash. You only have to use single quote
tableData.split('"');

Why do I need two slashes in Java Regex to find a "+" symbol?

Just something I don't understand the full meaning behind. I understand that I need to escape any special meaning characters if I want to find them using regex. And I also read somewhere that you need to escape the backslash in Java if it's inside a String literal. My question though is if I "escape" the backslash, doesn't it lose its meaning? So then it wouldn't be able to escape the following plus symbol?
Throws an error (but shouldn't it work since that's how you escape those special characters?):
replaceAll("\+\s", ""));
Works:
replaceAll("\\+\\s", ""));
Hopefully that makes sense. I'm just trying to understand the functionality behind why I need those extra slashes when the regex tutorials I've read don't mention them. And things like "\+" should find the plus symbol.
There are two "escapings" going on here. The first backslash is to escape the second backslash for the Java language, to create an actual backslash character. The backslash character is what escapes the + or the s for interpretation by the regular expression engine. That's why you need two backslashes -- one for Java, one for the regular expression engine. With only one backslash, Java reports \s and \+ as illegal escape characters -- not for regular expressions, but for an actual character in the Java language.
Funda behind extra slashes is that , first slash '\' is escape for the string and second slash '\' is escape for the regex.

Escape ( in regular expression

Im searching for the regular expression - ".(conflicted copy.". I wrote the following code for this
String str = "12B - (conflicted copy 2013-11-16-11-07-12)";
boolean matches = str.matches(".*(conflicted.*");
System.out.println(matches);
But I get the exception
Exception in thread "main" java.util.regex.PatternSyntaxException: Unclosed group near index 15
.(conflicted.
I understand that the compiler thinks that ( is the beginning of a pattern group. I tried to escape ( by adding \( but that doesnt work.
Can someone tell me how to escape ( here ?
Escaping is done by \. In Java, \ is written as \\1, so you should escaping the ( would be \\(.
Side note: It's good to have a look at Pattern#quote that returns a literal pattern String. In your case, it's not that helpful since you don't want to escape all special-characters.
1 Because a character preceded by a backslash (\) is an escape sequence and has special meaning to the compiler.
( in regex is metacharacter which means "start of group" and it needs to be closed with ). If you want refex engine to tread it as simple literal you need to escape it. You can do it by adding \ before it, but since \ is also metacharacter in String (used for example to create characters like "\n", "\t") you need to escape it as well which will look like "\\". So try
str.matches(".*\\(conflicted.*");
Other option is to use character class to escape ( like
str.matches(".*[(]conflicted.*");
You can also use Pattern.quote() on part that needs to be escaped like
str.matches(".*"+Pattern.quote("(")+"conflicted.*");
Or simply surround part in which all characters should be threaded as literals with "\\Q" and "\\E" which represents start and end of quotation.
str.matches(".*\\Q(\\Econflicted.*");
In Regular Expressions all characters can be safely escaped by adding a backslash in front.
Keep in mind that in most languages, including C#, PHP and Java, the backslash itself is also a native escape, and thus needs to be escaped itself in non-literal strings, so requiring you to enter "myText \\(".
Using a backslash inside a regular expression may require you to escape it both on the language level and the regex level ("\\\\"): this passes "\\" to the regex engine, which parses it as "\" itself.

How to undo replace performed by regex?

In java, I have the following regex ([\\(\\)\\/\\=\\:\\|,\\,\\\\]) which is compiled and then used to escape each of the special characters ()/=:|,\ with a backslash as follows escaper.matcher(value).replaceAll("\\\\$1")
So the string "A/C:D/C" would end up as "A\/C\:D\/C"
Later on in the process, I need to undo that replace. That means I need to match on the combination of \(, \), \/ etc. and replace it with the character immediately following the backslash character. A backslash followed by any other character should not be matched and there could be cases where a special character will exist without the preceeding backslash, in which case it shouldn't match either.
Since I know all of the cases I could do something like
myString.replaceAll("\\(", "(").replaceAll("\\)", ")").replaceAll("\\/", "/")...
but I wonder if there is a simpler regex that would allow me to perform the replace for all the special characters in a single step.
That seems pretty straightforward. If this were your original code (excess escapes removed):
Pattern escaper = Pattern.compile("([()/=:|,\\\\])");
String escaped = escaper.matcher(original).replaceAll("\\\\$1");
...the opposite would be:
Pattern unescaper = Pattern.compile("\\\\([()/=:|,\\\\])");
String unescaped = unescaper.matcher(escaped).replaceAll("$1");
If you weren't escaping and unescaping backslashes themselves (as you're doing), you would have problems, but this should work fine.
I don't know java regex flavor but this work with PCRE
replace \\ followed by ([()/=:|,\\]) by $1
in perl you can do
$str =~ s#\\([()/=:|,\\])#$1#g;

Categories

Resources