How to escape a square bracket for Pattern compilation? - java

I have comma separated list of regular expressions:
.{8},[0-9],[^0-9A-Za-z ],[A-Z],[a-z]
I have done a split on the comma. Now I'm trying to match this regex against a generated password. The problem is that Pattern.compile does not like square brackets that is not escaped.
Can some please give me a simple function that takes a string like so: [0-9] and returns the escaped string \[0-9\].

For some reason, the above answer didn't work for me. For those like me who come after, here is what I found.
I was expecting a single backslash to escape the bracket, however, you must use two if you have the pattern stored in a string. The first backslash escapes the second one into the string, so that what regex sees is \]. Since regex just sees one backslash, it uses it to escape the square bracket.
\\]
In regex, that will match a single closing square bracket.
If you're trying to match a newline, for example though, you'd only use a single backslash. You're using the string escape pattern to insert a newline character into the string. Regex doesn't see \n - it sees the newline character, and matches that. You need two backslashes because it's not a string escape sequence, it's a regex escape sequence.

You can use Pattern.quote(String).
From the docs:
public static String quote​(String s)
Returns a literal pattern String for the specified String.
This method produces a String that can be used to create a Pattern that would match the string s as if it were a literal pattern.
Metacharacters or escape sequences in the input sequence will be given no special meaning.

You can use the \Q and \E special characters...anything between \Q and \E is automatically escaped.
\Q[0-9]\E

Pattern.compile() likes square brackets just fine. If you take the string
".{8},[0-9],[^0-9A-Za-z ],[A-Z],[a-z]"
and split it on commas, you end up with five perfectly valid regexes: the first one matches eight non-line-separator characters, the second matches an ASCII digit, and so on. Unless you really want to match strings like ".{8}" and "[0-9]", I don't see why you would need to escape anything.

Related

Java Regex Escape Characters

I'm learning Regex, and running into trouble in the implementation.
I found the RegexTestHarness on the Java Tutorials, and running it, the following string correctly identifies my pattern:
[\d|\s][\d]\.
(My pattern is any double digit, or any single digit preceded by a space, followed by a period.)
That string is obtained by this line in the code:
Pattern pattern =
Pattern.compile(console.readLine("%nEnter your regex: "));
When I try to write a simple class in Eclipse, it tells me the escape sequences are invalid, and won't compile unless I change the string to:
[\\d|\\s][\\d]\\.
In my class I'm using`Pattern pattern = Pattern.compile();
When I put this string back into the TestHarness it doesn't find the correct matches.
Can someone tell me which one is correct? Is the difference in some formatting from console.readLine()?
\ is special character in String literals "...". It is used to escape other special characters, or to create characters like \n \r \t.
To create \ character in string literal which can be used in regex engine you need to escape it by adding another \ before it (just like you do in regex when you need to escape its metacharacters like dot \.). So String representing \ will look like "\\".
This problem doesn't exist when you are reading data from user, because you are already reading literals, so even if user will write in console \n it will be interpreted as two characters \ and n.
Also there is no point in adding | inside class character [...] unless your intention is to make that class also match | character, remember that [abc] is the same as (a|b|c) so there is no need for | in "[\\d|\\s]".
If you want to represent a backslash in a Java string literal you need to escape it with another backslash, so the string literal "\\s" is two characters, \ and s. This means that to represent the regular expression [\d\s][\d]\. in a Java string literal you would use "[\\d\\s][\\d]\\.".
Note that I also made a slight modification to your regular expression, [\d|\s] will match a digit, whitespace, or the literal | character. You just want [\d\s]. A character class already means "match one of these", since you don't need the | for alternation within a character class it loses its special meaning.
My pattern is any double digit or single digit preceded by a space, followed by a period.)
Correct regex will be:
Pattern pattern = Pattern.compile("(\\s\\d|\\d{2})\\.");
Also if you're getting regex string from user input then your should call:
Pattern.quote(useInputRegex);
To escape all the regex special characters.
Also you double escaping because 1 escape is handled by String class and 2nd one is passed on to regex engine.
What is happening is that escape sequences are being evaluated twice. Once for java, and then once for your regex.
the result is that you need to escape the escape character, when you use a regex escape sequence.
for instance, if you needed a digit, you'd use
"\\d"

Check string contains whitespace along with some other char sequence using regex in java

am using regex expression to check if a string contains white space.
my regex is : ^\\s+$
for example if my string is my name then regex matches should return true.
but it is returning true only if my string contains only spaces no other character.
How to check if a string contains a whitespace or tab or carriage return characters in between/start/end of some string.
^(.*\s+.*)+$ seems to work for me. Accepts anything as long as there is at least one space in the string. This will match the entire string.
If you only want to check for the presence of a space, you can just use \s without any begin or end markers in the string. The difference is that this will only match the individual spaces.
Your regex is not correct.
That's a string representing a regular expression. (as tchrist pointed out correctly)
The corresponding pattern that you get when using Pattern.compile() matches only strings containing one or more whitespace characters, starting from the beginning until the end. Thus, the matching string only consists of whitespace characters.
Try this string instead for Pattern.compile():
"\\s+"
The difference is that without the anchors "^" and "$" there may be other characters around the whitespace character. The whitespace character(s) may be everywhere in the string.
Using this pattern-string the whitespace character(s) must be at the beginning:
"^\\s+"
And here the sequence of whitespace characters has to be at the end:
"\\s+$"
Use org.apache.commons.lang.StringUtils.containsAny(). See http://commons.apache.org/lang/api-3.1/org/apache/commons/lang3/StringUtils.html.

How to match \Q and \E in Java regex?

I want to match \Q and \E in a Java regex.
I am writing a program which will compute the length of the string, matching to the pattern (this program assumes that there is no any quantifier in regex except {some number}, that's why the length of the string is uniquely defined) and I want at first delete all expressions like \Qsome text\E.
But regex like this:
"\\Q\\Q\\E\\Q\\E\\E"
obviously doesn't work.
Use Pattern.quote(...):
String s = "\\Q\\Q\\E\\Q\\E\\E";
String escaped = Pattern.quote(s);
Just escape the backslashes. The sequence \\\\ matches a literal backslash, so to match a literal \Q:
"\\\\Q"
and to match a literal \E:
"\\\\E"
You can make it more readable for a maintainer by making it obvious that each sequence matches a single character using [...] as in:
"[\\\\][Q]"

Splitting a string that has escape sequence using regular expression in Java

String to be split
abc:def:ghi\:klm:nop
String should be split based on ":"
"\" is escape character. So "\:" should not be treated as token.
split(":") gives
[abc]
[def]
[ghi\]
[klm]
[nop]
Required output is array of string
[abc]
[def]
[ghi\:klm]
[nop]
How can the \: be ignored
Use a look-behind assertion:
split("(?<!\\\\):")
This will only match if there is no preceding \. Using double escaping \\\\ is required as one is required for the string declaration and one for the regular expression.
Note however that this will not allow you to escape backslashes, in the case that you want to allow a token to end with a backslash. To do that you will have to first replace all double backslashes with
string.replaceAll("\\\\\\\\", ESCAPE_BACKSLASH)
(where ESCAPE_BACKSLASH is a string which will not occur in your input) and then, after splitting using the look-behind assertion, replace the ESCAPE_BACKSLASH string with an unescaped backslash with
token.replaceAll(ESCAPE_BACKSLASH, "\\\\")
Gumbo was right using a look-behind assertion, but in case your string contains the escaped escape character (e.g. \\) right in front of a comma, the split might break. See this example:
test1\,test1,test2\\,test3\\\,test3\\\\,test4
If you do a simple look-behind split for (?<!\\), as Gumbo suggested, the string gets split into two parts only test1\,test1 and test2\\,test3\\\,test3\\\\,test4. This is because the look-behind just checks one character back for the escape character. What would actually be correct, if the string is split on commas and commas preceded by an even number of escape characters.
To achieve this a slightly more complex (double) look-behind expression is needed:
(?<!(?<![^\\]\\(?:\\{2}){0,10})\\),
Using this more complex regular expression in Java, again requires to escape all \ by \\. So this should be a more sophisticated answer to your question:
"any comma separated string".split("(?<!(?<![^\\\\]\\\\(?:\\\\{2}){0,10})\\\\),");
Note: Java does not support infinite repetitions inside of lookbehinds. Therefore only up to 10 repeating double escape characters are checked by using the expression {0,10}. If needed, you can increase this value by adjusting the latter number.

Java regular expression value.split("\\."), "the back slash dot" divides by character?

From what I understand, the backslash dot (\.) means one character of any character? So because backslash is an escape, it should be backslash backslash dot ("\\.")
What does this do to a string? I just saw this in an existing code I am working on. From what I understand, it will split the string into individual characters. Why do this instead of String.toCharArray(). So this splits the string to an array of string which contains only one char for each string in the array?
My guess is that you are missing that backslash ('\') characters are escape characters in Java String literals. So when you want to use a '\' escape in a regex written as a Java String you need to escape it; e.g.
Pattern.compile("\."); // Java syntax error
// A regex that matches a (any) character
Pattern.compile(".");
// A regex that matches a literal '.' character
Pattern.compile("\\.");
// A regex that matches a literal '\' followed by one character
Pattern.compile("\\\\.");
The String.split(String separatorRegex) method splits a String into substrings separated by substrings matching the regex. So str.split("\\.") will split str into substrings separated by a single literal '.' character.
The regex "." would match any character as you state. However an escaped dot "\." would match literal dot characters. Thus 192.168.1.1 split on "\." would result in {"192", "168", "1", "1"}.
Your wording isn't completely clear, but I think this is what you're asking.

Categories

Resources