Java - Escaping Meta-characters [ and ] in Regex [duplicate] - java

This question already has answers here:
How to escape text for regular expression in Java?
(8 answers)
Closed 3 years ago.
I am attempting to replace the first occurrence of the string "[]" in another string:
aString.replaceFirst("[]", "blah");
I get the error:
java.util.regex.PatternSyntaxException: Unclosed character class near index 1 []
[ and ] are obviously metacharacters, however when I try to escape them with a \
eclipse complains that it is not a valid escape sequence.
I've looked but couldn't find, what am I missing?
Thank You

Regex patterns use \ as escape character, but so does Java. So to get a single escape (\) in a regex pattern you should write: \\. To escape an escape inside a regex, double the pattern: \\\\.
Of course that's extremely tedious, made all the worse because regexes have a ton of escape sequences like that. Which is why Java regexes also support “quoting” litteral parts of the pattern and this allows you to write your pattern as: \\Q[]\\E.
EDIT: As the other answer hints at: java.util.regex.Pattern.quote() performs this wrapping between \\Q and \\E.

Try \\[ and \\]. You need to double escape, because \ is also an escape character for strings (as is \" when you want to have double-quotes in your text). Therefore to get a \ in your string you have to use \\.

aString.replaceFirst("\\[\\]", "blah");
or in the more general case
aString.replaceFirst(java.util.regex.Pattern.quote("[]"), "blah");

Related

Java Regular Expression - how to use backslash [duplicate]

This question already has answers here:
java, regular expression, need to escape backslash in regex
(4 answers)
Closed 6 years ago.
I am really confused with how to escape. Sometimes I just need to prepend a backslash but sometimes I need to prepend double backslash like "\\.".
Could any one tell me why?
Also, could anyone give me an explanation of difference in
String.split("\t"),
String.split("\\t"),
String.split("\\\t"),
String.split("\\\\t")?
Backslash is special character in string literals - we can use it to create \n or escape " like \".
But backslash is also special in regular expression engine - for instance we can use it to use default character classes like \w \d \s.
So if you want to create string which will represent regex/text like \w you need to write it as "\\w".
If you want to write regex which will represent \ literal then text representing such regex needs to look like \\ which means String representing such text needs to be written as "\\\\".
In other words we need to escape backslash twice:
- once in regex \\
- and once in string "\\\\".
If you want to pass to regex engine literal which will represent tab then you don't need to escape backslash at all. Java will understand "\t" string as string representing tab character and you can pass such string to your regex engine without problems.
For our comfort regex engine in Java interprets text representing \t (also \r and \n) same way as string literals interpret "\t". In other words we can pass to regex engine text which will represent \ character and t character and be sure that it will be interpreted as representation of tab character.
So code like split("\t") or split("\\t") will try to split on tab.
Code like split("\\\\t") will try to split text not on tab character, but on \ character followed by t. It happens because "\\\\" as explained represents text \\ which regex engine sees as escaped \ (so it is treated as literal).

Why can't I replace ":)" [duplicate]

This question already has answers here:
How to replace brackets in strings
(4 answers)
Closed 8 years ago.
I can't seem to replace a string of ":)" to something else, here is my code:
if(message.contains(":)")) message = message.replaceAll(":)", replacement);
This is the error:
Exception in thread "Listen" java.util.regex.PatternSyntaxException: Unmatched closing ')'
near index 0
:)
^
What should I do?
Don't use replaceAll(); use replace() when you want to replace literal strings:
message.replace(":)", replacement)
replaceAll() deals with regular expressions, in which ) has a special meaning, hence the error.
You must escape ) in regexen:
message = message.replaceAll(":\\)", replacement);
This is because ) has special meaning (capture groups), so you have to "tell" regex that you just want a literal ).
Write:
message.replaceAll(Pattern.quote(":)"), replacement);
String#replaceAll accept a regex, not a regular String. ) has a special meaning in regex, using quote will cause treating :) as the String :) and not the regex.
If you don't want to use Pattern#quote, you should escape the ) by \\. Note that escaping a regex is done by \, but in Java, \ is written as \\.
If you don't like any of the mentioned, use String#replace that doesn't accept a regex, and you're fine.

Escape ( in regular expression

Im searching for the regular expression - ".(conflicted copy.". I wrote the following code for this
String str = "12B - (conflicted copy 2013-11-16-11-07-12)";
boolean matches = str.matches(".*(conflicted.*");
System.out.println(matches);
But I get the exception
Exception in thread "main" java.util.regex.PatternSyntaxException: Unclosed group near index 15
.(conflicted.
I understand that the compiler thinks that ( is the beginning of a pattern group. I tried to escape ( by adding \( but that doesnt work.
Can someone tell me how to escape ( here ?
Escaping is done by \. In Java, \ is written as \\1, so you should escaping the ( would be \\(.
Side note: It's good to have a look at Pattern#quote that returns a literal pattern String. In your case, it's not that helpful since you don't want to escape all special-characters.
1 Because a character preceded by a backslash (\) is an escape sequence and has special meaning to the compiler.
( in regex is metacharacter which means "start of group" and it needs to be closed with ). If you want refex engine to tread it as simple literal you need to escape it. You can do it by adding \ before it, but since \ is also metacharacter in String (used for example to create characters like "\n", "\t") you need to escape it as well which will look like "\\". So try
str.matches(".*\\(conflicted.*");
Other option is to use character class to escape ( like
str.matches(".*[(]conflicted.*");
You can also use Pattern.quote() on part that needs to be escaped like
str.matches(".*"+Pattern.quote("(")+"conflicted.*");
Or simply surround part in which all characters should be threaded as literals with "\\Q" and "\\E" which represents start and end of quotation.
str.matches(".*\\Q(\\Econflicted.*");
In Regular Expressions all characters can be safely escaped by adding a backslash in front.
Keep in mind that in most languages, including C#, PHP and Java, the backslash itself is also a native escape, and thus needs to be escaped itself in non-literal strings, so requiring you to enter "myText \\(".
Using a backslash inside a regular expression may require you to escape it both on the language level and the regex level ("\\\\"): this passes "\\" to the regex engine, which parses it as "\" itself.

How do you match military time? [duplicate]

This question already has answers here:
Invalid escape sequence \d
(2 answers)
Closed 10 years ago.
I'm trying to create a valid Java regex for matching strings representing standard "military time":
String militaryTimeRegex = "^([01]\d|2[0-3]):?([0-5]\d)$";
This gives me a compiler error:
Invalid escape sequence (valid ones are \b \t \n \f \r \" \' \ )
Where am I going wrong?!?
Make sure you use double backslashes for escaping characters:
String militaryTimeRegex = "^([01]\\d|2[0-3]):?([0-5]\\d)$";
Single backslashes indicate the beginning of an escape sequence. You need to use \\ to get the character as it appears in the String.
To answer your comment, you are currently only matching 19:00. You need to account for the additional :00 at the end of the String in your pattern:
String militaryTimeRegex = "^([01]\\d|2[0-3]):?([0-5]\\d):?([0-5]\\d)$";
In Java, you need to double-escape all the \ characters:
String militaryTimeRegex = "^([01]\\d|2[0-3]):([0-5]\\d):([0-5]\\d)$";
Why? because \ is the escape character for strings, and if you need a literal \ to appear somewhere inside a string, then you have to escape it, too: \\.
According to the error message \d does not exist. Escape it with \\d
Although \d is valid regex syntax, you need to escape the backslash in the Java string:
String militaryTimeRegex = "^([01]\\d|2[0-3]):?([0-5]\\d)$";

In regular expression, how can we match the character "." itself? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
regular expression for DOT
Say I have a String:
String domain = "www.example.com";
To extract the word "example" I am using the split function in java
String[] keys = domain.split(".");
String result = keys[1];
Clearly this is wrong because the "." is a wrong regular expression since it matches any character.
What is the escape sequence which matches specifically the character "."?
Though this question does seem trivial but I can't seem to find any quick reference or previous answers. Thanks.
By escaping it like as follows
\\.
Use \\.. You need to escape it.
You can get the regular expression for any literal string by using Pattern.quote().
Pattern.quote(".") evaluates to "\\."
In this case it would probably be clearer just to use \\.
You can escape . by prefixing it with \\. Hence, use \\. Reason is that the literal string \\ is a single backslash. In regular expressions, the backslash is also an escape character. The regular expression \\ matches a single backslash.
You can escape the . character by using \\. or using the brackets [.].
Hence your code becomes:
String[] keys = domain.split("\\."); // or domain.split("[.]");
String result = keys[1];
Or you could create a class containing the dot, without escaping:
[.]

Categories

Resources