So, I have this semi-complex regex that is searching for all text in between two strings, then replacing it.
My search regex for this is:
(jump *[A-Z].*)(?:[^])*?([A-Z].*:)
This gives an Unclosed Character Class on the final closing bracket, which I have been struggling to solve. The regex seems to work as intended on RegexR (http://regexr.com/?38k63)
Could anyone provide some help or insight?
Thanks in advance.
The error is at here:
(jump *[A-Z].*)(?:[^])*?([A-Z].*:)
^
In character class ^ is still a special character. It usually negates other characters when you place there. So escape it with \\ in Java.
Different regex engines will treat [^] differently. Some will assume that it's the beginning of a negative character class excluding ] and any characters up to the next ] in the pattern, (e.g. [^][] will match anything except ] and [). Other engines will treat as a empty negative character class (which will match anything). This is why some regex engines will work, and others report it as an error.
If you meant for it to match a literal ^ character, you'll have to escape it like this:
(jump *[A-Z].*)(?:[\^])*?([A-Z].*:)
Or better yet, just remove it from the character class (you'll still have to escape it because ^ has special meaning outside of a character class, too):
(jump *[A-Z].*)(?:\^)*?([A-Z].*:)
Or if you meant for it to match everything up to the next [A-Z].*:, try a character class like this:
(jump *[A-Z].*)(?:[\s\S])*?([A-Z].*:)
And of course, because this is Java, don't forget that you'll need to escape the all the \ characters in any string literals.
Problem seems here in use of [^]:
(jump *[A-Z].*)(?:[^])*?([A-Z].*:)
^
-------------------|
Try this regex instead:
(jump *[A-Z].*)[\\s\\S]*?([A-Z].*:)
OR this:
(?s)(jump *[A-Z].*).*?([A-Z].*:)
Related
I tried this but it doesn't work :
[^\s-]
Any Ideas?
[^\s-]
should work and so will
[^-\s]
[] : The char class
^ : Inside the char class ^ is the
negator when it appears in the beginning.
\s : short for a white space
- : a literal hyphen. A hyphen is a
meta char inside a char class but not
when it appears in the beginning or
at the end.
It can be done much easier:
\S which equals [^ \t\r\n\v\f]
Which programming language are you using? May be you just need to escape the backslash like "[^\\s-]"
In Java:
String regex = "[^-\\s]";
System.out.println("-".matches(regex)); // prints "false"
System.out.println(" ".matches(regex)); // prints "false"
System.out.println("+".matches(regex)); // prints "true"
The regex [^-\s] works as expected. [^\s-] also works.
See also
Regular expressions and escaping special characters
regular-expressions.info/Character class
Metacharacters Inside Character Classes
The hyphen can be included right after the opening bracket, or right before the closing bracket, or right after the negating caret.
Note that regex is not one standard, and each language implements its own based on what the library designers felt like. Take for instance the regex standard used by bash, documented here: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05.
If you are having problems with regular expressions not working, it might be good to simplify it, for instance using "[^ -]" if this covers all forms of whitespace in your case.
Try [^- ], \s will match 5 other characters beside the space (like tab, newline, formfeed, carriage return).
I'm struggling with REGEX and require it for a program.
The input require only alphanumerical keys and also (allow only comma,:,space,/,- in special chars)
I have tried = (^[a-zA-Z0-9,:\S/-]*$)
As far as i understand and please correct me if I'm wrong.
a-zA-Z0-9 - The alphanumerical keys.
,: - Comma and colon
\S - Space
/ - I'm not sure how to represent a forward slash thus i escaped it
- - Dash also not sure if it is needed to escape it.
Would be appreciated if this can be corrected and also a explanation of each part.
Thanks in advance.
You can replace a-zA-Z0-9 with just \\w which is short for [a-zA-Z_0-9]. Furthermore, \\S is any character, but not a whitespace, you should use a \\s instead. You don't need to escape /, and even - if it's the first one or the last one, because if it's placed between two characters it could be interpreted as range and you'll have to escape it. So, you can make your regex like ^([\w,:\s/-]*)$
The \S shorthand matches any character except whitespace, just the opposite of what you want. Lowercase \s matches whitespace [\t\v\n\r\f ]. But if you only want spaces, just put a space in the character class.
a hyphen - needs to be escaped inside characters, unless it's the first or last character in the character class, but you could always escape it just to be sure.
Slashes / don't need to be escaped. They're escaped in other languages where you use them as pattern delimiters. ie: /regex/i.
Besides hyphens and shorthands, only backslashes \\ and closing brackets \] need to be escaped.
Remember in java, you always need to use double backslashes (one is interpreted by java, the other by the regex engine).
Regex
pattern = "^[a-zA-Z0-9 ,:/\\-]*$"
Move the Start of Line ^ and End of Line $ outside the group - like
^([a-zA-Z0-9,:\S/-]*)$
That should do it.
Just something I don't understand the full meaning behind. I understand that I need to escape any special meaning characters if I want to find them using regex. And I also read somewhere that you need to escape the backslash in Java if it's inside a String literal. My question though is if I "escape" the backslash, doesn't it lose its meaning? So then it wouldn't be able to escape the following plus symbol?
Throws an error (but shouldn't it work since that's how you escape those special characters?):
replaceAll("\+\s", ""));
Works:
replaceAll("\\+\\s", ""));
Hopefully that makes sense. I'm just trying to understand the functionality behind why I need those extra slashes when the regex tutorials I've read don't mention them. And things like "\+" should find the plus symbol.
There are two "escapings" going on here. The first backslash is to escape the second backslash for the Java language, to create an actual backslash character. The backslash character is what escapes the + or the s for interpretation by the regular expression engine. That's why you need two backslashes -- one for Java, one for the regular expression engine. With only one backslash, Java reports \s and \+ as illegal escape characters -- not for regular expressions, but for an actual character in the Java language.
Funda behind extra slashes is that , first slash '\' is escape for the string and second slash '\' is escape for the regex.
I'm trying to determine whether or not a expression passed into my Expressions class has an operator. Either +-*/^ for add, subtract, multiply, divide, and exponent respectively.
What is wrong with this code?
private static boolean hasOperator(String expression)
{
return expression.matches("[\+-\*/\^]+");
}
I thought that I had the special characters escaped properly but I keep getting the error: "illegal escape character" when trying to compile.
Thanks for your help.
Don't escape what needs not to be escaped:
return expression.matches("[-+*/^]+");
should work just fine. Most regex metacharacters (., (, ), +, *, etc.) lose their special meaning when used in a character class. The ones you need to pay attention to are [, -, ^, and ]. And for the last three, you can strategically place in them char class so they don't take their special meaning:
^ can be placed anywhere except right after the opening bracket: [a^]
- can be placed right after the opening bracket or right before the closing bracket: [-a] or [a-]
] can be placed right after the opening bracket: []a]
But for future reference, if you need to include a backslash as an escape character in a regex string, you'll need to escape it twice, eg:
"\\(.*?\\)" // match something inside parentheses
So to match a literal backslash, you'd need four of them:
"hello\\\\world" // this regex matches hello\world
Another note: String.matches() will try to match the entire string against the pattern, so unless your string consists of just a bunch of operators, you'll need to use use something like .matches(".*[-+*/^].*"); instead (or use Matcher.find())
I'm using a system where I need to enter hundreds of RegEx expressions. I've recently changed a few things and am getting the following error:
java.lang.RuntimeException: ?+* follows nothing in expression
I've no idea what this means and would really appreciate any pointers for what I should be looking for to fix it.
Many thanks :)
Katie
The obvious interpretation is that you have a regex that starts with a '?', a '+' or a '*' meta-character. Maybe it should have been escaped. Maybe you've accidentally deleted the preceding things that is "quantified" by the meta-character.
I do have a '*' at the beginning of some expressions - is that bad?
Yup. If that is supposed to match a literal asterisk character, it must be preceded by a '\' to escape it. (And as Felix Kling pointed out, the '\' will itself need to be escaped if the regex is embedded in a Java string literal.)
Should I be putting '.*' (ie. dot star) instead?
It depends what you want the regex to match at that point. '.*' means "eagerly match zero or more characters". If that's what you mean, that's what you should use.
It means you have a quantifier (+,?, or *) that isn't quantifying anything. My guess is you might have forgotten to escape one of those characters (with \) when trying to match it.