Disable string escaping (backslash hell) - java

I've started using Java regexes and I find I have to write patterns like this (contrived example):
C:\\\\windows\\\\system\\d+
to match
C:\windows\system32
Is there any way to use java regex without insane amounts of backslashes?

Use Pattern.quote(String s) (click for documentation)
It treats all meta characters in the passed String as literal characters (but you still must escape backslashes in construction of a String literal). This lets you type \\ instead of \\\\ to denote an actual/literal \ in the regex pattern. But this also means that any other special characters will be interpreted literally as well (such as \d+ in your example).
But in your example, you could use:
Pattern.quote("C:\\windows\\system") + "\\d+";
Test it with this: System.out.println("C:\\windows\\system32".matches(Pattern.quote("C:\\windows\\system") + "\\d+"));

Related

nifi regex failed to escape backslash "\" [duplicate]

How to write a regular expression to match this \" (a backslash then a quote)? Assume I have a string like this:
click to search
I need to replace all the \" with a ", so the result would look like:
click to search
This one does not work: str.replaceAll("\\\"", "\"") because it only matches the quote. Not sure how to get around with the backslash. I could have removed the backslash first, but there are other backslashes in my string.
If you don't need any of regex mechanisms like predefined character classes \d, quantifiers etc. instead of replaceAll which expects regex use replace which expects literals
str = str.replace("\\\"","\"");
Both methods will replace all occurrences of targets, but replace will treat targets literally.
BUT if you really must use regex you are looking for
str = str.replaceAll("\\\\\"", "\"")
\ is special character in regex (used for instance to create \d - character class representing digits). To make regex treat \ as normal character you need to place another \ before it to turn off its special meaning (you need to escape it). So regex which we are trying to create is \\.
But to create string literal representing text \\ so you could pass it to regex engine you need to write it as four \ ("\\\\"), because \ is also special character in String literals (part of code written using "...") since it can be used for instance as \t to represent tabulator.
That is why you also need to escape \ there.
In short you need to escape \ twice:
in regex \\
and then in String literal "\\\\"
You don't need a regular expression.
str.replace("\\\"", "\"")
should work just fine.
The replace method takes two substrings and replaces all non-overlapping occurrences of the first with the second. Per the javadoc:
public String replace(CharSequence target,
CharSequence replacement)
Replaces each substring of this string that matches the literal target sequence with the specified literal replacement sequence. The replacement proceeds from the beginning of the string to the end, for example, replacing "aa" with "b" in the string "aaa" will result in "ba" rather than "ab".
try this: str.replaceAll("\\\\\"", "\\\"")
because Java will replace \ twice:
(1) \\\\\" --> \\" (for string)
(2) \\" --> \" (for regex)

How to split a string with double quotes " as the delimiter?

I tried splitting like this-
tableData.split("\\"")
but it does not work.
It seems that you tried to escape it same way as you would escape | which is "\\|". But difference between | and " is that
| is metacharacter in regex engine (it represents OR operator)
" is metacharacter in Java language in string literal (it represents start/end of the string)
To escape any String metacharacter (like ") you need to place before it other String metacharacter responsible for escaping which is \1. So to create String which would contain " like this is "quote" you would need to write it as
String s = "this is \"quote\"";
// ^^ ^^ these represent " literal, not end of string
Same idea is applied if we would like to create \ literal (we would need to escape it by placing another \ before it). For instance if we would want to create string representing c:\foo\bar we would need to write it as
String s = "c:\\foo\\bar";
// ^^ ^^ these will represent \ literal
So as you see \ is used to escape metacharacters (make them simple literals).
This character is used in Java language for Strings, but it also is used in regex engine to escape its metacharacters:
\, ^, $, ., |, ?, *, +, (, ), [, {.
If you would like to create regex which will match [ character you will need to use regex \[ but String representing this regex in Java needs to be written as
String leftBracketRegex = "\\[";
// ^^ - Remember what was said earlier?
// To create \ literal in String we need to escape it
So to split on [ we would need to invoke split("\\[") because regex representing [ is \[ which needs to be written as "\\[" in Java.
Since " is not special character in regex but it is special in String we need to escape it only in string literal by writing it as
split("\"");
1) \ is also used to create other characters line separators \n, tab \t. It can also be used to create Unicode characters like \uXXXX where XXXX is index of character in Unicode table in hexadecimal form.
You have escaped the \ by putting in \ twice, try
tableData.split("\"")
Why does this happen?
A backslash escapes the following character. Since the next character is another backslash, the second backslash will be escaped, thus the doublequote won't.
Your resulting escaped string is \", where it should really be just ".
Edit:
Also keep in mind, that String.split() interprets its pattern parameter as a regular expression, which has several special characters, which have to be escaped in the resulting string.
So if you want split by a .(which is a special regex character), you need to specify it as String.split("\\."). The first backslash escapes the escaping function of the second backlash and would result in "\.".
In case of regex characters you could also just use Pattern.quote(); to escape your desired delimiter, but this is far out of the scope the question orignally had.
Try with single backslash \
tableData.split("\"")
Try like this by escaping " with single backslash \ :
tableData.split("\"")
You are not escaping properly. The snippet code will not even compile because of it. The correct way to do it is
tableData.split("\"");
A single backslash will do the trick.
Like this:
tableData.split("\"");
You can actually split without the backward slash. You only have to use single quote
tableData.split('"');

Java Regex Escape Characters

I'm learning Regex, and running into trouble in the implementation.
I found the RegexTestHarness on the Java Tutorials, and running it, the following string correctly identifies my pattern:
[\d|\s][\d]\.
(My pattern is any double digit, or any single digit preceded by a space, followed by a period.)
That string is obtained by this line in the code:
Pattern pattern =
Pattern.compile(console.readLine("%nEnter your regex: "));
When I try to write a simple class in Eclipse, it tells me the escape sequences are invalid, and won't compile unless I change the string to:
[\\d|\\s][\\d]\\.
In my class I'm using`Pattern pattern = Pattern.compile();
When I put this string back into the TestHarness it doesn't find the correct matches.
Can someone tell me which one is correct? Is the difference in some formatting from console.readLine()?
\ is special character in String literals "...". It is used to escape other special characters, or to create characters like \n \r \t.
To create \ character in string literal which can be used in regex engine you need to escape it by adding another \ before it (just like you do in regex when you need to escape its metacharacters like dot \.). So String representing \ will look like "\\".
This problem doesn't exist when you are reading data from user, because you are already reading literals, so even if user will write in console \n it will be interpreted as two characters \ and n.
Also there is no point in adding | inside class character [...] unless your intention is to make that class also match | character, remember that [abc] is the same as (a|b|c) so there is no need for | in "[\\d|\\s]".
If you want to represent a backslash in a Java string literal you need to escape it with another backslash, so the string literal "\\s" is two characters, \ and s. This means that to represent the regular expression [\d\s][\d]\. in a Java string literal you would use "[\\d\\s][\\d]\\.".
Note that I also made a slight modification to your regular expression, [\d|\s] will match a digit, whitespace, or the literal | character. You just want [\d\s]. A character class already means "match one of these", since you don't need the | for alternation within a character class it loses its special meaning.
My pattern is any double digit or single digit preceded by a space, followed by a period.)
Correct regex will be:
Pattern pattern = Pattern.compile("(\\s\\d|\\d{2})\\.");
Also if you're getting regex string from user input then your should call:
Pattern.quote(useInputRegex);
To escape all the regex special characters.
Also you double escaping because 1 escape is handled by String class and 2nd one is passed on to regex engine.
What is happening is that escape sequences are being evaluated twice. Once for java, and then once for your regex.
the result is that you need to escape the escape character, when you use a regex escape sequence.
for instance, if you needed a digit, you'd use
"\\d"

String.replaceAll(...) of Java not working for \\ and \

I want to convert the directory path from:
C:\Users\Host\Desktop\picture.jpg
to
C:\\Users\\Host\\Desktop\\picture.jpg
I am using replaceAll() function and other replace functions but they do not work.
How can I do this?
I have printed the statement , it gives me the one which i wanted ie
C:\Users\Host\Desktop\picture.jpg
but now when i pass this variable to open the file, i get this exception why?
java.io.FileNotFoundException: C:\Users\Host\Desktop\picture.jpg
EDIT: Changed from replaceAll to replace - you don't need a regex here, so don't use one. (It was a really poor design decision on the part of the Java API team, IMO.)
My guess (as you haven't provided enough information) is that you're doing something like:
text.replace("\\", "\\\\");
Strings are immutable in Java, so you need to use the return value, e.g.
String newText = oldText.replace("\\", "\\\\");
If that doesn't answer your question, please provide more information.
(I'd also suggest that usually you shouldn't be doing this yourself anyway - if this is to include the information in something like a JSON response, I'd expect the wider library to perform escaping for you.)
Note that the doubling is required as \ is an escape character for Java string (and character) literals. Note that as replace doesn't treat the inputs as regular expression patterns, there's no need to perform further doubling, unlike replaceAll.
EDIT: You're now getting a FileNotFoundException because there isn't a filename with double backslashes in - what made you think there was? If you want it as a valid filename, why are you doubling the backslashes?
You have to use :
String t2 = t1.replaceAll("\\\\", "\\\\\\\\");
or (without pattern) :
String t2 = t1.replace("\\", "\\\\");
Each "\" has to be preceeded by an other "\". But it's also true for the preceeding "\" so you have to write four backslashes each time you want one in regex.
In strings \ is bydefault used as escape character therefore in order to select "\" in a string you have to use "\" and for "\" (i.e blackslack two times) use "\\". This will solve your problem and thos will also apply to other symbols also like "
Two explanations:
1. Replace double backslashes to one (not what you asked)
You have to escape the backslash by backslashes. Like this:
String newPath = oldPath.replaceAll("\\\\\\\\", "\\");
The first parameter needs to be escaped twice. Once for the Java Compiler and once because you use regular expressions. So you want to replace two backslashes by one. So, since we have to escape a backslash add one backslash. Now you have \\. This will be compiled to \. BUT!! you have to escape the backslash once again because the first parameter of the replaceAll method uses regular expressions. So to escape it, add a backslash, but that backslash needs to be escaped, so we get \\\\. These for backslashes represents one backslash in the regex. But you want to replace the double backslash to one. So use 8 backslashes.
The second parameter of the replaceAll method isn't using regular expressions, but it has to be escaped as well. So, you need to escape it once for the Java Compiler and once for the replace method: \\\\. This is compiled to two backslashes, which are being interpreted as 1 backslash in the replaceAll method.
2. Replace single backslash to a pair of backslashes (what you asked)
String newPath = oldPath.replaceAll("\\\\", "\\\\\\\\");
Same logic as above.
3. Use replace() instead of replaceAll().
String newPath = oldPath.replace("\\", "\\\\");
The difference is that the replace() method doesn't use regular expressions, so you don't have to escape every backslash twice for the first parameter.
Hopefully, I explained well...
-- Edit: Fixed error, as pointed out by xehpuk --

How to undo replace performed by regex?

In java, I have the following regex ([\\(\\)\\/\\=\\:\\|,\\,\\\\]) which is compiled and then used to escape each of the special characters ()/=:|,\ with a backslash as follows escaper.matcher(value).replaceAll("\\\\$1")
So the string "A/C:D/C" would end up as "A\/C\:D\/C"
Later on in the process, I need to undo that replace. That means I need to match on the combination of \(, \), \/ etc. and replace it with the character immediately following the backslash character. A backslash followed by any other character should not be matched and there could be cases where a special character will exist without the preceeding backslash, in which case it shouldn't match either.
Since I know all of the cases I could do something like
myString.replaceAll("\\(", "(").replaceAll("\\)", ")").replaceAll("\\/", "/")...
but I wonder if there is a simpler regex that would allow me to perform the replace for all the special characters in a single step.
That seems pretty straightforward. If this were your original code (excess escapes removed):
Pattern escaper = Pattern.compile("([()/=:|,\\\\])");
String escaped = escaper.matcher(original).replaceAll("\\\\$1");
...the opposite would be:
Pattern unescaper = Pattern.compile("\\\\([()/=:|,\\\\])");
String unescaped = unescaper.matcher(escaped).replaceAll("$1");
If you weren't escaping and unescaping backslashes themselves (as you're doing), you would have problems, but this should work fine.
I don't know java regex flavor but this work with PCRE
replace \\ followed by ([()/=:|,\\]) by $1
in perl you can do
$str =~ s#\\([()/=:|,\\])#$1#g;

Categories

Resources