This question already has answers here:
java, regular expression, need to escape backslash in regex
(4 answers)
Closed 6 years ago.
I am really confused with how to escape. Sometimes I just need to prepend a backslash but sometimes I need to prepend double backslash like "\\.".
Could any one tell me why?
Also, could anyone give me an explanation of difference in
String.split("\t"),
String.split("\\t"),
String.split("\\\t"),
String.split("\\\\t")?
Backslash is special character in string literals - we can use it to create \n or escape " like \".
But backslash is also special in regular expression engine - for instance we can use it to use default character classes like \w \d \s.
So if you want to create string which will represent regex/text like \w you need to write it as "\\w".
If you want to write regex which will represent \ literal then text representing such regex needs to look like \\ which means String representing such text needs to be written as "\\\\".
In other words we need to escape backslash twice:
- once in regex \\
- and once in string "\\\\".
If you want to pass to regex engine literal which will represent tab then you don't need to escape backslash at all. Java will understand "\t" string as string representing tab character and you can pass such string to your regex engine without problems.
For our comfort regex engine in Java interprets text representing \t (also \r and \n) same way as string literals interpret "\t". In other words we can pass to regex engine text which will represent \ character and t character and be sure that it will be interpreted as representation of tab character.
So code like split("\t") or split("\\t") will try to split on tab.
Code like split("\\\\t") will try to split text not on tab character, but on \ character followed by t. It happens because "\\\\" as explained represents text \\ which regex engine sees as escaped \ (so it is treated as literal).
Related
How to write a regular expression to match this \" (a backslash then a quote)? Assume I have a string like this:
click to search
I need to replace all the \" with a ", so the result would look like:
click to search
This one does not work: str.replaceAll("\\\"", "\"") because it only matches the quote. Not sure how to get around with the backslash. I could have removed the backslash first, but there are other backslashes in my string.
If you don't need any of regex mechanisms like predefined character classes \d, quantifiers etc. instead of replaceAll which expects regex use replace which expects literals
str = str.replace("\\\"","\"");
Both methods will replace all occurrences of targets, but replace will treat targets literally.
BUT if you really must use regex you are looking for
str = str.replaceAll("\\\\\"", "\"")
\ is special character in regex (used for instance to create \d - character class representing digits). To make regex treat \ as normal character you need to place another \ before it to turn off its special meaning (you need to escape it). So regex which we are trying to create is \\.
But to create string literal representing text \\ so you could pass it to regex engine you need to write it as four \ ("\\\\"), because \ is also special character in String literals (part of code written using "...") since it can be used for instance as \t to represent tabulator.
That is why you also need to escape \ there.
In short you need to escape \ twice:
in regex \\
and then in String literal "\\\\"
You don't need a regular expression.
str.replace("\\\"", "\"")
should work just fine.
The replace method takes two substrings and replaces all non-overlapping occurrences of the first with the second. Per the javadoc:
public String replace(CharSequence target,
CharSequence replacement)
Replaces each substring of this string that matches the literal target sequence with the specified literal replacement sequence. The replacement proceeds from the beginning of the string to the end, for example, replacing "aa" with "b" in the string "aaa" will result in "ba" rather than "ab".
try this: str.replaceAll("\\\\\"", "\\\"")
because Java will replace \ twice:
(1) \\\\\" --> \\" (for string)
(2) \\" --> \" (for regex)
I tried splitting like this-
tableData.split("\\"")
but it does not work.
It seems that you tried to escape it same way as you would escape | which is "\\|". But difference between | and " is that
| is metacharacter in regex engine (it represents OR operator)
" is metacharacter in Java language in string literal (it represents start/end of the string)
To escape any String metacharacter (like ") you need to place before it other String metacharacter responsible for escaping which is \1. So to create String which would contain " like this is "quote" you would need to write it as
String s = "this is \"quote\"";
// ^^ ^^ these represent " literal, not end of string
Same idea is applied if we would like to create \ literal (we would need to escape it by placing another \ before it). For instance if we would want to create string representing c:\foo\bar we would need to write it as
String s = "c:\\foo\\bar";
// ^^ ^^ these will represent \ literal
So as you see \ is used to escape metacharacters (make them simple literals).
This character is used in Java language for Strings, but it also is used in regex engine to escape its metacharacters:
\, ^, $, ., |, ?, *, +, (, ), [, {.
If you would like to create regex which will match [ character you will need to use regex \[ but String representing this regex in Java needs to be written as
String leftBracketRegex = "\\[";
// ^^ - Remember what was said earlier?
// To create \ literal in String we need to escape it
So to split on [ we would need to invoke split("\\[") because regex representing [ is \[ which needs to be written as "\\[" in Java.
Since " is not special character in regex but it is special in String we need to escape it only in string literal by writing it as
split("\"");
1) \ is also used to create other characters line separators \n, tab \t. It can also be used to create Unicode characters like \uXXXX where XXXX is index of character in Unicode table in hexadecimal form.
You have escaped the \ by putting in \ twice, try
tableData.split("\"")
Why does this happen?
A backslash escapes the following character. Since the next character is another backslash, the second backslash will be escaped, thus the doublequote won't.
Your resulting escaped string is \", where it should really be just ".
Edit:
Also keep in mind, that String.split() interprets its pattern parameter as a regular expression, which has several special characters, which have to be escaped in the resulting string.
So if you want split by a .(which is a special regex character), you need to specify it as String.split("\\."). The first backslash escapes the escaping function of the second backlash and would result in "\.".
In case of regex characters you could also just use Pattern.quote(); to escape your desired delimiter, but this is far out of the scope the question orignally had.
Try with single backslash \
tableData.split("\"")
Try like this by escaping " with single backslash \ :
tableData.split("\"")
You are not escaping properly. The snippet code will not even compile because of it. The correct way to do it is
tableData.split("\"");
A single backslash will do the trick.
Like this:
tableData.split("\"");
You can actually split without the backward slash. You only have to use single quote
tableData.split('"');
This question already has answers here:
Including a hyphen in a regex character bracket?
(6 answers)
Closed 3 years ago.
I would like to match a line with a regular expression. The line contains two numbers which could be divided by plus, minus or a star (multiplication). However, I am not sure how to escape the star.
line.matches("[0-9]*[+-*][0-9]*");
I tried also line.matches("[0-9]*[+-\\*][0-9]*"); but it does not work either.
Should I put the star into separate group? Why does the escaping \\* not work in this case?
* is not metacharacter in character class ([...]) so you don't need to escape it at all. What you need to escape is - because inside character class it is responsible for creating range of characters like [a-z].
So instead of "[+-*]" which represents all characters placed in Unicode Table between + and * use
"[+\\-*]"
or place - where it can't be used as range indicator
at start of character class [-+*]
at end of character class [+*-]
or right after other range if you have one [a-z-+*]
BTW if you would like to add \ literal to your operators you need to write it in regex as \\ (it is metacharacter used for escaping or to access standard character classes like \w so we also need to escape it). But since \ is also special in String literals (it can be used to represent characters via \n \r \t or \uXXXX), you also need to escape it there as well. So in regex \ needs to be represented as \\ which as string literal is written as "\\\\".
BTW 2: to represent digit instead of [0-9] you can use \d (written in string literal as "\\d" since \ is special there and requires escaping).
BTW 3: if you want to make sure that there will be at least two numbers in string (one on each side of operator) you need to use + instead of * at [0-9]* since + represents one or more occurrence of previous element, while * represents zero or more occurrences.
So your code can look like
line.matches("\\d+[-+*]\\d+");
I am using a backslash as an escape character for a serialization format I am working on. I have it as a constant but IntelliJ is underlining it and highlighting it red. On hover it gives no error messages or any information as to why it does not like it.
What is the reason for this and how do I fix it?
IntelliJ is smarter than I am and realised that I was using this character in a regular expression where 2 backslashes would be needed, however, IntelliJ also assumed that my puny mind could find the problem without giving me any information about it.
If it's being used as a regular expression, then the "\" must be escaped.
If you're escaping a "\" as "\" like traditional regular expressions require, then you also need to add two more \\ for a total of \\\\.
This is because of the way Java interprets "\":
In literal Java strings the backslash is an escape character. The
literal string "\" is a single backslash. In regular expressions, the
backslash is also an escape character. The regular expression \
matches a single backslash. This regular expression as a Java string,
becomes "\\". That's right: 4 backslashes to match a single one.
The regex \w matches a word character. As a Java string, this is
written as "\w".
The same backslash-mess occurs when providing replacement strings for
methods like String.replaceAll() as literal Java strings in your Java
code. In the replacement text, a dollar sign must be encoded as \$ and
a backslash as \ when you want to replace the regex match with an
actual dollar sign or backslash. However, backslashes must also be
escaped in literal Java strings. So a single dollar sign in the
replacement text becomes "\$" when written as a literal Java string.
The single backslash becomes "\\". Right again: 4 backslashes to
insert a single one.
I'm learning Regex, and running into trouble in the implementation.
I found the RegexTestHarness on the Java Tutorials, and running it, the following string correctly identifies my pattern:
[\d|\s][\d]\.
(My pattern is any double digit, or any single digit preceded by a space, followed by a period.)
That string is obtained by this line in the code:
Pattern pattern =
Pattern.compile(console.readLine("%nEnter your regex: "));
When I try to write a simple class in Eclipse, it tells me the escape sequences are invalid, and won't compile unless I change the string to:
[\\d|\\s][\\d]\\.
In my class I'm using`Pattern pattern = Pattern.compile();
When I put this string back into the TestHarness it doesn't find the correct matches.
Can someone tell me which one is correct? Is the difference in some formatting from console.readLine()?
\ is special character in String literals "...". It is used to escape other special characters, or to create characters like \n \r \t.
To create \ character in string literal which can be used in regex engine you need to escape it by adding another \ before it (just like you do in regex when you need to escape its metacharacters like dot \.). So String representing \ will look like "\\".
This problem doesn't exist when you are reading data from user, because you are already reading literals, so even if user will write in console \n it will be interpreted as two characters \ and n.
Also there is no point in adding | inside class character [...] unless your intention is to make that class also match | character, remember that [abc] is the same as (a|b|c) so there is no need for | in "[\\d|\\s]".
If you want to represent a backslash in a Java string literal you need to escape it with another backslash, so the string literal "\\s" is two characters, \ and s. This means that to represent the regular expression [\d\s][\d]\. in a Java string literal you would use "[\\d\\s][\\d]\\.".
Note that I also made a slight modification to your regular expression, [\d|\s] will match a digit, whitespace, or the literal | character. You just want [\d\s]. A character class already means "match one of these", since you don't need the | for alternation within a character class it loses its special meaning.
My pattern is any double digit or single digit preceded by a space, followed by a period.)
Correct regex will be:
Pattern pattern = Pattern.compile("(\\s\\d|\\d{2})\\.");
Also if you're getting regex string from user input then your should call:
Pattern.quote(useInputRegex);
To escape all the regex special characters.
Also you double escaping because 1 escape is handled by String class and 2nd one is passed on to regex engine.
What is happening is that escape sequences are being evaluated twice. Once for java, and then once for your regex.
the result is that you need to escape the escape character, when you use a regex escape sequence.
for instance, if you needed a digit, you'd use
"\\d"