Filtering using Regular Expressions - java

I am having a filter for the following regular expressions
[^#()[]\;:,<>]+\#([A-Za-z0-9_\-\.])+\.([A-Za-z]{2,4})$/
I need to negate the following special charters before the #domain.com
#()[]\;:",<
any suggestions??

Try escaping the ] in the character class.
[^#()[\]\;:,<>]+\#([A-Za-z0-9_\-\.])+\.([A-Za-z]{2,4})$/
^^
If not escaped the ] will be treated incorrectly as the end of the character class.
Since this has been tagged as Java, remember that you need to escape using \\ and not just \.

Related

Why do I need two slashes in Java Regex to find a "+" symbol?

Just something I don't understand the full meaning behind. I understand that I need to escape any special meaning characters if I want to find them using regex. And I also read somewhere that you need to escape the backslash in Java if it's inside a String literal. My question though is if I "escape" the backslash, doesn't it lose its meaning? So then it wouldn't be able to escape the following plus symbol?
Throws an error (but shouldn't it work since that's how you escape those special characters?):
replaceAll("\+\s", ""));
Works:
replaceAll("\\+\\s", ""));
Hopefully that makes sense. I'm just trying to understand the functionality behind why I need those extra slashes when the regex tutorials I've read don't mention them. And things like "\+" should find the plus symbol.
There are two "escapings" going on here. The first backslash is to escape the second backslash for the Java language, to create an actual backslash character. The backslash character is what escapes the + or the s for interpretation by the regular expression engine. That's why you need two backslashes -- one for Java, one for the regular expression engine. With only one backslash, Java reports \s and \+ as illegal escape characters -- not for regular expressions, but for an actual character in the Java language.
Funda behind extra slashes is that , first slash '\' is escape for the string and second slash '\' is escape for the regex.

Unclosed Character Class Error?

Here is the error:
Exception in thread "main" java.util.regex.PatternSyntaxException: Unclosed character class near index 3
], [
^
at java.util.regex.Pattern.error(Pattern.java:1924)
at java.util.regex.Pattern.clazz(Pattern.java:2493)
at java.util.regex.Pattern.sequence(Pattern.java:2030)
at java.util.regex.Pattern.expr(Pattern.java:1964)
at java.util.regex.Pattern.compile(Pattern.java:1665)
at java.util.regex.Pattern.<init>(Pattern.java:1337)
at java.util.regex.Pattern.compile(Pattern.java:1022)
at java.lang.String.split(String.java:2313)
at java.lang.String.split(String.java:2355)
at testJunior2013.J2.main(J2.java:31)
This is the area of the code that is causing the issues.
String[][] split = new String[1][rows];
split[0] = (Arrays.deepToString(array2d)).split("], ["); //split at the end of an array row
What does this error mean and what needs to be done to fix the code above?
TL;DR
You want:
.split("\\], \\[")`
Escape each square bracket twice — once for each context in which you need to strip them from their special meaning: within a Regular Expression first, and within a Java String secondly.
Consider using Pattern#quote when you need your entire pattern to be interpreted literally.
Explanation
String#split works with a Regular Expression but [ and ] are not standard characters, regex-wise: they have a special meaning in that context.
In order to strip them from their special meaning and simply match actual square brackets, they need to be escaped, which is done by preceding each with a backslash — that is, using \[ and \].
However, in a Java String, \ is not a standard character either, and needs to be escaped as well.
Thus, just to split on [, the String used is "\\[" and you are trying to obtain:
.split("\\], \\[")
A sensible alternative
However, in this case, you're not just semantically escaping a few specific characters in a Regular Expression, but actually wishing that your entire pattern be interpreted literally: there's a method to do just that 🙂
Pattern#quote is used to signify that the:
Metacharacters [...] in your pattern will be given no special meaning.
(from the Javadoc linked above)
I recommend, in this case, that you use the following, more sensible and readable:
.split(Pattern.quote("], ["))
Split receives a regex and [, ] characters have meaning in regex, so you should escape them with \\[ and \\].
The way you are currently doing it, the parser finds a ] without a preceding [ so it throws that error.
String.split() takes a regular expression, not a normal string as an argument. In a regular expression, ] and [ are special characters, which need to be preceded by backslashes to be taken literally. Use .split("\\], \\["). (the double backslashes tell Java to interpret the string as "\], \[").
.split("], [")
^---start of char class
end----?
Change it to
.split("], \[")
^---escape the [
Try to use it
String stringToSplit = "8579.0,753.34,796.94,\"[784.2389999999999,784.34]\",\"[-4.335912230999999, -4.3603307895,4.0407909059, 4.08669583455]\",[],[],[],0.1744,14.4,3.5527136788e-15,0.330667850653,0.225286999939,Near_Crash";
String [] arraySplitted = stringToSplit.replaceAll("\"","").replaceAll("\\[","").replaceAll("\\]","").trim().split(",");

Escape ( in regular expression

Im searching for the regular expression - ".(conflicted copy.". I wrote the following code for this
String str = "12B - (conflicted copy 2013-11-16-11-07-12)";
boolean matches = str.matches(".*(conflicted.*");
System.out.println(matches);
But I get the exception
Exception in thread "main" java.util.regex.PatternSyntaxException: Unclosed group near index 15
.(conflicted.
I understand that the compiler thinks that ( is the beginning of a pattern group. I tried to escape ( by adding \( but that doesnt work.
Can someone tell me how to escape ( here ?
Escaping is done by \. In Java, \ is written as \\1, so you should escaping the ( would be \\(.
Side note: It's good to have a look at Pattern#quote that returns a literal pattern String. In your case, it's not that helpful since you don't want to escape all special-characters.
1 Because a character preceded by a backslash (\) is an escape sequence and has special meaning to the compiler.
( in regex is metacharacter which means "start of group" and it needs to be closed with ). If you want refex engine to tread it as simple literal you need to escape it. You can do it by adding \ before it, but since \ is also metacharacter in String (used for example to create characters like "\n", "\t") you need to escape it as well which will look like "\\". So try
str.matches(".*\\(conflicted.*");
Other option is to use character class to escape ( like
str.matches(".*[(]conflicted.*");
You can also use Pattern.quote() on part that needs to be escaped like
str.matches(".*"+Pattern.quote("(")+"conflicted.*");
Or simply surround part in which all characters should be threaded as literals with "\\Q" and "\\E" which represents start and end of quotation.
str.matches(".*\\Q(\\Econflicted.*");
In Regular Expressions all characters can be safely escaped by adding a backslash in front.
Keep in mind that in most languages, including C#, PHP and Java, the backslash itself is also a native escape, and thus needs to be escaped itself in non-literal strings, so requiring you to enter "myText \\(".
Using a backslash inside a regular expression may require you to escape it both on the language level and the regex level ("\\\\"): this passes "\\" to the regex engine, which parses it as "\" itself.

how can i escape a group of special characters in java in one method?

i use lucene search but lucene has a bunch of special characters to escape like:
- && || ! ( ) { } [ ] ^ " ~ * ? : \
i am having problem escaping these characters because they are too many and if i use the String.replaceAll() method, i'll just end up having a really long line of code just for escaping the characters. what is the best way to do? thanks!
There is also a method called QueryParser#escape, which may be useful:
Returns a String where those characters that QueryParser expects to be escaped are escaped by a preceding \.
Use regular expression to replace those characters in one go.
example:
String s="some text && || []!{} ()^*?~ and ";
Pattern p= Pattern.compile("([-&\\|!\\(\\){}\\[\\]\\^\"\\~\\*\\?:\\\\])");
s=p.matcher(s).replaceAll("\\\\$1");
System.out.println(s);\\prints some text \&\& \|\| \[\]\!\{\} \(\)\^\*\?\~ and
Use regular expression. String.replaceAll() supports regular expression, so you can solve this problem using one single call. Just be careful: some of these characters are special for regular expressions too, so they mus be escaped "twice":
str.replaceAll("([-\\&\\|!\\(\\)\\{\\}\\[\\]\\^\\"~\\*\\?:\\])", "\\$1");
(I have not tried this, probably this line needs some fixes, but this is the idea)
There is a Apache Commons Library for that:
http://commons.apache.org/proper/commons-lang/javadocs/api-2.6/org/apache/commons/lang/StringEscapeUtils.html

Regular expressions and matching URLs with metacharacters

I'm having trouble finding a regular expression that matches the following String.
Korben;http://feeds.feedburner.com/KorbensBlog-UpgradeYourMind?format=xml;1
One problem is escaping the question mark. Java's pattern matcher doesn't seem to accept \? as a valid escape sequence but it also fails to work with the tester at myregexp.com.
Here's what I have so far:
([a-zA-Z0-9])+;http://([a-zA-Z0-9./-]+);[0-9]+
Any suggestions?
Edit: The original intent was to match all URLs that could be found after the first semi colon.
If you are putting the expression in a string, you need to escape the "\" as well. That is:
String expr = "([a-zA-Z0-9])+;http://([a-zA-Z0-9./\\-\\?]+);[0-9]+";
You also need to escape the "-" if it's not the last character in a character class ([...]) construct.
[?] matches "?"
Maybe you need to escape your backslash, if your expression is in a string. Something like "\\?"
([a-zA-Z0-9]+);http://([a-zA-Z0-9./-]+)(\?[^;]+);([0-9]+)
Works for me on that RexExp Editor website.

Categories

Resources