Escaping a String from getting regex parsed in Java - java

In Java, suppose I have a String variable S, and I want to search for it inside of another String T, like so:
if (T.matches(S)) ...
(note: the above line was T.contains() until a few posts pointed out that that method does not use regexes. My bad.)
But now suppose S may have unsavory characters in it. For instance, let S = "[hi". The left square bracket is going to cause the regex to fail. Is there a function I can call to escape S so that this doesn't happen? In this particular case, I would like it to be transformed to "\[hi".

String.contains does not use regex, so there isn't a problem in this case.
Where a regex is required, rather rejecting strings with regex special characters, use java.util.regex.Pattern.quote to escape them.

As Tom Hawtin said, you need to quote the pattern. You can do this in two ways (edit: actually three ways, as pointed out by #diastrophism):
Surround the string with "\Q" and "\E", like:
if (T.matches("\\Q" + S + "\\E"))
Use Pattern instead. The code would be something like this:
Pattern sPattern = Pattern.compile(S, Pattern.LITERAL);
if (sPattern.matcher(T).matches()) { /* do something */ }
This way, you can cache the compiled Pattern and reuse it. If you are using the same regex more than once, you almost certainly want to do it this way.
Note that if you are using regular expressions to test whether a string is inside a larger string, you should put .* at the start and end of the expression. But this will not work if you are quoting the pattern, since it will then be looking for actual dots. So, are you absolutely certain you want to be using regular expressions?

Try Pattern.quote(String). It will fix up anything that has special meaning in the string.

Any particular reason not to use String.indexOf() instead? That way it will always be interpreted as a regular string rather than a regex.

Regex uses the backslash character '\' to escape a literal. Given that java also uses the backslash character you would need to use a double bashslash like:
String S = "\\[hi"
That will become the String:
\[hi
which will be passed to the regex.
Or if you only care about a literal String and don't need a regex you could do the following:
if (T.indexOf("[hi") != -1) {

T.contains() (according to javadoc : http://java.sun.com/javase/6/docs/api/java/lang/String.html) does not use regexes. contains() delegates to indexOf() only.
So, there are NO regexes used here. Were you thinking of some other String method ?

Related

Java: How to escape all regex metacharacters in a given String?

I'm looking for a utility method in Java that will escape all regex metacharacters in a given String.
I want to convert this:
foo.bar(baz)
Into this:
foo\.bar\(baz\)
So that I can take any sample string and convert it into a regex-friendly search pattern. Surely one must exist, but I cannot seem to find anything.
(Pattern.quote(String s) offers something similar to what I need, but not the exact same functionality.)
Pattern.quote(String s) does exactly what you want.
Calling Pattern.quote("foo.bar(baz)") returns "\Qfoo.bar(baz)\E", which matches exactly the same as the pattern "foo\.bar\(baz\)".

What all characters can be used as String Delimiters in Java?

I am trying break a String in various pieces using delimiter(":").
String sepIds[]=ids.split(":");
It is working fine. But when I replace ":" with " * " and use " * " as delimiter, it doesn't work.
String sepIds[]=ids.split("*"); //doesn't work
It just hangs up there, and doesn't execute further.
What mistake I am making here?
String#split takes a regular expression as parameter. In regex some chars have special meanings so they need to be escaped, for example:
"foo*bar".split("\\*")
the result will be as you expect:
[foo, bar]
You could also use the method Pattern#quote to simplify the task.
"foo*bar".split(Pattern.quote("*"))
String.split expects a regular expression argument. * has got a meaning in regex. So if you want to use them then you need to escape them like this:
String sepIds[]=ids.split("\\*");
The argument of .split() is a regular expression, not a string literal. Therefore you need to escape * since it is a special regex character. Write:
ids.split("\\*");
This is how you would split agaisnt one or more spaces:
ids.split("\\s+");
Note that Guava has Splitter which is very, very fast and can split against literals:
Splitter.on('*').split(ids);
'*' and '.' are special characters you have to blackshlash it.
String sepIds[]=ids.split("\\*");
To read more about java patterns please visit that page.
That is expected behaviour. The documentation for the String split function says that the input string is treated as a regular expression (with a link explaining how that works). As Germann points out, '*' is a special character in regular expressions.
Java's String.split() uses regular expressions to split up the string (unlike similar functions in C# or python). * is a special character in regular expressions and you need to escape it with a \ (backslash). So you should use instead:
String sepIds[]=ids.split("\\*");
You can find more information on regular expressions anywhere on the internet a quite complete list of special characters supported by java should be here: http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

Replace a String containing "$" with "\$" in Java

How can I do it? I made a research but I could not find a clear answer.I tried to use
pass = pass.replaceAll("$", "\\$");
but It does not work.
use
pass = pass.replace("$", "\\$");
It will also replace all occurrences. See JavaDoc.
If you prefer the hard way and want to use a regex, you need:
pass = pass.replaceAll("\\$", "\\\\\\$");
This can be simplified with Matcher.quoteReplacement() but still, only use replaceAll() when you need to replace something that matches a regular expression, and use replace() when you have to replace a literal sequence.
The problem is that String.replaceAll uses regular expressions, where both \ and $ have special meanings. You don't want that as far as I can tell - you just want to replace the strings verbatim. As such, you should use String.replace:
pass = pass.replace("$", "\\$");
(Personally I think the fact that replaceAll uses regular expressions is a design mistake, but that's another matter.)

Refactor Regex Pattern - Java

I have the following aaaa_bb_cc string to match and written a regex pattern like
\\w{4}+\\_\\w{2}\\_\\w{2} and it works. Is there any simple regex which can do this same ?
You don't need to escape the underscores:
\w{4}+_\w{2}_\w{2}
And you can collapse the last two parts, if you don't capture them anyway:
\w{4}+(?:_\w{2}){2}
Doesn't get shorter, though.
(Note: Re-add the needed backslashes for Java's strings, if you like; I prefer to omit them while talking about regular expressions :))
I sometimes do what I call "meta-regexing" as follows:
String pattern = "x{4}_x{2}_x{2}".replace("x", "[a-z]");
System.out.println(pattern); // prints "[a-z]{4}_[a-z]{2}_[a-z]{2}"
Note that this doesn't use \w, which can match an underscore. That is, your original pattern would match "__________".
If x really needs to be replaced with [a-zA-Z0-9], then just do it in the one place (instead of 3 places).
Other examples
Regex for metamap in Java
How do I convert CamelCase into human-readable names in Java?
Yes, you can use just \\w{4}_\\w{2}_\\w{2} or maybe \\w{4}(_\\w{2}){2}.
Looks like your \w does not need to match underscore, so you can use [a-zA-Z0-9] instead
[a-zA-Z0-9]{4}_[a-zA-Z0-9]{2}_[a-zA-Z0-9]{2}

Is using "\\\\" to match '\' with Regex in Java the most Readable Way?

I know that the following works but it is not that readable, is there any way to make it more readable in the code itself without the addition of a comment?
//Start her off
String sampleregex = "\\\\";
if (input.matches(sampleregex))
//do something
//do some more
Why not
if (input.contains("\\")) {
}
since you simply appear to be looking for a backward slash ?
Assuming you mean "\\\\" instead of "////":
You could escape it with \Q and \E, which removes one layer of backslashes: "\\Q\\\E", but that's not that much better. You could also use Pattern.quote("\\") to have it escaped at runtime. But personally, I'd just stick with "\\\\".
(As an aside, you need four of them because \ is used to escape things in both the regex engine and in Java Strings, so you need to escape once so the regex engine knows you're not trying to scape anything else (so that's \\); then you need to escape both of those so Java knows you're not escaping something in the string (so that's \\\\)).
/ is not a regex metacharacter, so the regex string "/" matches a single slash, and "////" matches four in a row.
I imagine you meant to ask about matching a single backslash, rather than a forward slash, in which case, no, you need to put "\\\\" in your regex string literal to match a single backslash. (And I needed to enter eight to make four show up on SO--damn!)
My solution is similiar to Soldier.moth's but with a twist. Create a constants file which contains common regular expressions and keep adding to it. The expressions as constants can even be combined providing a layer of abstraction to building regular expressions, but in the end they still often end up messy.
public static final String SINGLE_BACKSLASH = "\\\\";
The one solution I've thought of is to do
String singleSlash = "\\\\";
if(input.matches(singleSlash))
//...
Using better names for your variables and constants, and composing them step by step is a good way to do without comments, for example:
final string backslash = "\\";
final string regexEscapedBackslash = backslash + backslash;
if (input.matches(regexEscapedBackslash)) {
...

Categories

Resources