I am having difficulty with using \b as a word delimiter in Java Regex.
For
text = "/* sql statement */ INSERT INTO someTable";
Pattern.compile("(?i)\binsert\b"); no match found
Pattern insPtrn = Pattern.compile("\bINSERT\b"); no match found
but
Pattern insPtrn = Pattern.compile("INSERT"); finds a match
Any idea what I am doing wrong?
When writing regular expressions in Java, you need to be sure to escape all of the backslashes, so the regex \bINSERT\b becomes "\\bINSERT\\b" as a Java string.
If you do not escape the backslash, then the \b in the string literal is interpreted as a backspace character.
Use this instead: -
Pattern insPtrn = Pattern.compile("\\bINSERT\\b")
You need to escape \b with an extra backslash..
Related
Is there any difference of use regular expression \b in java and js?
I tried below test:
in javascript:
console.log(/\w+\b/.test("test中文"));//true
in java:
String regEx = "\\w+\\b";
text = "test中文";
Pattern pattern = Pattern.compile(regEx);
Matcher matcher = pattern.matcher(text);
while(matcher.find()) {
System.out.println("matched");//never executed
}
Why the result of the two example above are not same?
That is because by default Java supports Unicode for \b but not for \w, while JavaScript doesn't support Unicode for both.
So \w can only match [a-zA-Z0-9_] characters (in our case test) but \b can't accept place (marked with |)
test|中文
as between alphabetic and non-alphabetic Unicode standards because both t and 中 are considered alphabetic characters by Unicode.
If you want to have \b which will ignore Unicode you can use look-around mechanism and rewrite it as (?:(?<=\\w)(?!\\w)|(?<!\\w)(?=\\w)), or in case of this example simple (?!\\w) instead of \\b will also work.
If you want \w to also support Unicode compile your pattern with Pattern.UNICODE_CHARACTER_CLASS flag (which can also be written as flag expression (?U))
The Jeva regex looks for a sequence of word characters, i.e. [a-zA-Z_0-9]+ preceding a word boundary. But 中文 doesn't fit \w. If you use \\b alone, you'll find two matches: begin and end of the string.
As has been pointed out by georg, Javascript isn't interpreting characters the same way as Java's Regex engine.
The following regular expression matches the character a:
"a"
The following regular expression matches all characters except a:
"[^a]"
The following regular expression matches a ton of characters:
"."
How do I match everything that is not matched by "."? I can't use the same technique as above:
"[^.]"
because inside the brackets, the . changes meaning and only stands for the character . itself :(
The below negative lookahead will work.
(?:(?!.)[\S\s])
Java regex would be,
"(?:(?!.)[\\S\\s])"
DEMO
The idea behind the above regex is, it would match only \r or \n or \t or \f that is the characters which aren't matched by a dot (Multiline mode).
"[^\\.]"
use double backslash for regex used character. for example
\\.\\]\\[\\-\\)\\(\\?
I want to match \Q and \E in a Java regex.
I am writing a program which will compute the length of the string, matching to the pattern (this program assumes that there is no any quantifier in regex except {some number}, that's why the length of the string is uniquely defined) and I want at first delete all expressions like \Qsome text\E.
But regex like this:
"\\Q\\Q\\E\\Q\\E\\E"
obviously doesn't work.
Use Pattern.quote(...):
String s = "\\Q\\Q\\E\\Q\\E\\E";
String escaped = Pattern.quote(s);
Just escape the backslashes. The sequence \\\\ matches a literal backslash, so to match a literal \Q:
"\\\\Q"
and to match a literal \E:
"\\\\E"
You can make it more readable for a maintainer by making it obvious that each sequence matches a single character using [...] as in:
"[\\\\][Q]"
I want to replace all whitespace characters in a string with a "+" and all "ß" with "ss"... it works well for "ß", but somehow eclipse won't let me use \s for a whitespace.. I tried "\t" instead, but it doesn't work either.. I get the following error:
Invalid escape sequence (valid ones
are \b \t \n \f \r \" \' \ )
this is my code:
try {
String temp1 = from.getText().toString();
start_from = temp1.replaceAll("ß", "ss");
start_from = start_from.replaceAll("\s", "+");
}
why doesn't it work? is it a problem with android, eclipse or what?
thanks in advance!
You need to escape the slash
start_from = start_from.replaceAll("\\s", "+");
The problem is that \ is an escape character in java as well as regex patterns. If you want to match the regex pattern \n, say, and you'd go ahead and write
replaceAll("\n", "+");
The regex pattern would not end up being \n: it would en up being an actual newline, since that's what "\n" means in Java. If you want the pattern to contain a backslash, you'll need to make sure you escape that backslash, so that it is not treated as a special character within the string.
replaceAll("\\s", "+");
You can use the java.util.regex.Pattern class and use something like p = Pattern.compile("\s"); in combination with p.matcher(start_from).replaceAll("+"). Alternatively, just escape your "\s" metacharacter as "\\s".
I have comma separated list of regular expressions:
.{8},[0-9],[^0-9A-Za-z ],[A-Z],[a-z]
I have done a split on the comma. Now I'm trying to match this regex against a generated password. The problem is that Pattern.compile does not like square brackets that is not escaped.
Can some please give me a simple function that takes a string like so: [0-9] and returns the escaped string \[0-9\].
For some reason, the above answer didn't work for me. For those like me who come after, here is what I found.
I was expecting a single backslash to escape the bracket, however, you must use two if you have the pattern stored in a string. The first backslash escapes the second one into the string, so that what regex sees is \]. Since regex just sees one backslash, it uses it to escape the square bracket.
\\]
In regex, that will match a single closing square bracket.
If you're trying to match a newline, for example though, you'd only use a single backslash. You're using the string escape pattern to insert a newline character into the string. Regex doesn't see \n - it sees the newline character, and matches that. You need two backslashes because it's not a string escape sequence, it's a regex escape sequence.
You can use Pattern.quote(String).
From the docs:
public static String quote(String s)
Returns a literal pattern String for the specified String.
This method produces a String that can be used to create a Pattern that would match the string s as if it were a literal pattern.
Metacharacters or escape sequences in the input sequence will be given no special meaning.
You can use the \Q and \E special characters...anything between \Q and \E is automatically escaped.
\Q[0-9]\E
Pattern.compile() likes square brackets just fine. If you take the string
".{8},[0-9],[^0-9A-Za-z ],[A-Z],[a-z]"
and split it on commas, you end up with five perfectly valid regexes: the first one matches eight non-line-separator characters, the second matches an ASCII digit, and so on. Unless you really want to match strings like ".{8}" and "[0-9]", I don't see why you would need to escape anything.