Regex with -, ::, ( and ) - java

I need to split the string
(age-is-25::OR::last_name-is-qa6)::AND::(age-is-20::OR::first_name-contains-test)
into
string[0] = (age-is-25::OR::last_name-is-qa6)
string[1] = AND
string[2] = (age-is-20::OR::first_name-contains-test)
I tried writing so many regex expressions, but nothing works as expected.
Using the following regex, Matcher.groupCount() which returns 2 but assigning results to an arraylist returns null as the elements.
Pattern pattern = Pattern.compile("(\\)::)?|(::\\()?");
I tried to split it using ):: or ::(.
I know the regex looks too stupid, but being a beginner this is the best I could write.

You can use positive lookahead and lookbehind to match the first and last parentheses.
String str = "(age-is-25::OR::last_name-is-qa6)::AND::(age-is-20::OR::first_name-contains-test)";
for (String s : str.split("(?<=\\))::|::(?=\\()"))
System.out.println(s);
Outputs:
(age-is-25::OR::last_name-is-qa6)
AND
(age-is-20::OR::first_name-contains-test)
Just a note however: It seems like you are parsing some kind of recursive language. Regular expressions are not good at doing this. If you are doing advanced parsing I would recommend you to look at other parsing methods.

To me it looks like a big part of your stress comes from the need for escaping special characters in your search term. I highly recommend to not do manual escaping of special characters, but instead to use Pattern.quote(...) for the escaping.

This should works
"(?<=\\))::|::(?=\\()"

This should work for you.
\)::|::\(

textString.split("\\)::|::\\(")
should work.

Related

Pattern for Guava Splitter

I need to split String by comma or dot or backslach :
Pattern stringPattern = Pattern.compile("\\s+|,|\\\\|");
Splitter.on(stringPattern).omitEmptyStrings().split(description));
but this pattern don't work , what is wrong ?
Why not use a CharMatcher?
Splitter.on(CharMatcher.anyOf(",.\\")).omitEmptyStrings().split(description);
Given your simple problem, I don't think you need the regular expressions.
The correct regex for comma or dot or backslash is [.,\\], so in Java that's
Pattern.compile("[.,\\\\]")
I do like Olivier's suggestion of CharMatcher though.
I'd use string.split with the regular expressions. Following should work (I have not tried)
description.split(",.\\")
Then do null check (as such splitter has extra api for the same).
Patterns are useful for identifying "groups". Any regular expression related splitting can be equally done with strings (instead of pattern)-that is not to discourage from using Guava!

Replace a String containing "$" with "\$" in Java

How can I do it? I made a research but I could not find a clear answer.I tried to use
pass = pass.replaceAll("$", "\\$");
but It does not work.
use
pass = pass.replace("$", "\\$");
It will also replace all occurrences. See JavaDoc.
If you prefer the hard way and want to use a regex, you need:
pass = pass.replaceAll("\\$", "\\\\\\$");
This can be simplified with Matcher.quoteReplacement() but still, only use replaceAll() when you need to replace something that matches a regular expression, and use replace() when you have to replace a literal sequence.
The problem is that String.replaceAll uses regular expressions, where both \ and $ have special meanings. You don't want that as far as I can tell - you just want to replace the strings verbatim. As such, you should use String.replace:
pass = pass.replace("$", "\\$");
(Personally I think the fact that replaceAll uses regular expressions is a design mistake, but that's another matter.)

How do I write a regular expression to find the following pattern?

I am trying to write a regular expression to do a find and replace operation. Assume Java regex syntax. Below are examples of what I am trying to find:
12341+1
12241+1R1
100001+1R2
So, I am searching for a string beginning with one or more digits, followed by a "1+1" substring, followed by 0 or more characters. I have the following regex:
^(\d+)(1\\+1).*
This regex will successfully find the examples above, however, my goal is to replace the strings with everything before "1+1". So, 12341+1 would become 1234, and 12241+1R1 would become 1224. If I use the first grouped expression $1 to replace the pattern, I get the wrong result as follows:
12341+1 becomes 12341
12241+1R1 becomes 12241
100001+1R2 becomes 100001
Any ideas?
Your existing regex works fine, just that you are missing a \ before \d
String str = "100001+1R2";
str = str.replaceAll("^(\\d+)(1\\+1).*","$1");
Working link
IMHO, the regex is correct.
Perhaps you wrote it wrong in the code. If you want to code the regex ^(\d+)(1\+1).* in a string, you have to write something like String regex = "^(\\d+)(1\\+1).*".
Your output is the result of ^(\d+)(1+1).* replacement, as you miss some backslash in the string (e.g. "^(\\d+)(1\+1).*").
Your regex looks fine to me - I don't have access to java but in JavaScript the code..
"12341+1".replace(/(\d+)(1\+1)/g, "$1");
Returns 1234 as you'd expect. This works on a string with many 'codes' in too e.g.
"12341+1 54321+1".replace(/(\d+)(1\+1)/g, "$1");
gives 1234 5432.
Personally, I wouldn't use a Regex at all (it'd be like using a hammer on a thumbtack), I'd just create a substring from (Pseudocode)
stringName.substring(0, stringName.indexOf("1+1"))
But it looks like other posters have already mentioned the non-greedy operator.
In most Regex Syntaxes you can add a '?' after a '+' or '*' to indicate that you want it to match as little as possible before moving on in the pattern. (Thus: ^(\d+?)(1+1) matches any number of digits until it finds "1+1" and then, NOT INCLUDING the "1+1" it continues matching, whereas your original would see the 1 and match it as well).

Refactor Regex Pattern - Java

I have the following aaaa_bb_cc string to match and written a regex pattern like
\\w{4}+\\_\\w{2}\\_\\w{2} and it works. Is there any simple regex which can do this same ?
You don't need to escape the underscores:
\w{4}+_\w{2}_\w{2}
And you can collapse the last two parts, if you don't capture them anyway:
\w{4}+(?:_\w{2}){2}
Doesn't get shorter, though.
(Note: Re-add the needed backslashes for Java's strings, if you like; I prefer to omit them while talking about regular expressions :))
I sometimes do what I call "meta-regexing" as follows:
String pattern = "x{4}_x{2}_x{2}".replace("x", "[a-z]");
System.out.println(pattern); // prints "[a-z]{4}_[a-z]{2}_[a-z]{2}"
Note that this doesn't use \w, which can match an underscore. That is, your original pattern would match "__________".
If x really needs to be replaced with [a-zA-Z0-9], then just do it in the one place (instead of 3 places).
Other examples
Regex for metamap in Java
How do I convert CamelCase into human-readable names in Java?
Yes, you can use just \\w{4}_\\w{2}_\\w{2} or maybe \\w{4}(_\\w{2}){2}.
Looks like your \w does not need to match underscore, so you can use [a-zA-Z0-9] instead
[a-zA-Z0-9]{4}_[a-zA-Z0-9]{2}_[a-zA-Z0-9]{2}

Escaping a String from getting regex parsed in Java

In Java, suppose I have a String variable S, and I want to search for it inside of another String T, like so:
if (T.matches(S)) ...
(note: the above line was T.contains() until a few posts pointed out that that method does not use regexes. My bad.)
But now suppose S may have unsavory characters in it. For instance, let S = "[hi". The left square bracket is going to cause the regex to fail. Is there a function I can call to escape S so that this doesn't happen? In this particular case, I would like it to be transformed to "\[hi".
String.contains does not use regex, so there isn't a problem in this case.
Where a regex is required, rather rejecting strings with regex special characters, use java.util.regex.Pattern.quote to escape them.
As Tom Hawtin said, you need to quote the pattern. You can do this in two ways (edit: actually three ways, as pointed out by #diastrophism):
Surround the string with "\Q" and "\E", like:
if (T.matches("\\Q" + S + "\\E"))
Use Pattern instead. The code would be something like this:
Pattern sPattern = Pattern.compile(S, Pattern.LITERAL);
if (sPattern.matcher(T).matches()) { /* do something */ }
This way, you can cache the compiled Pattern and reuse it. If you are using the same regex more than once, you almost certainly want to do it this way.
Note that if you are using regular expressions to test whether a string is inside a larger string, you should put .* at the start and end of the expression. But this will not work if you are quoting the pattern, since it will then be looking for actual dots. So, are you absolutely certain you want to be using regular expressions?
Try Pattern.quote(String). It will fix up anything that has special meaning in the string.
Any particular reason not to use String.indexOf() instead? That way it will always be interpreted as a regular string rather than a regex.
Regex uses the backslash character '\' to escape a literal. Given that java also uses the backslash character you would need to use a double bashslash like:
String S = "\\[hi"
That will become the String:
\[hi
which will be passed to the regex.
Or if you only care about a literal String and don't need a regex you could do the following:
if (T.indexOf("[hi") != -1) {
T.contains() (according to javadoc : http://java.sun.com/javase/6/docs/api/java/lang/String.html) does not use regexes. contains() delegates to indexOf() only.
So, there are NO regexes used here. Were you thinking of some other String method ?

Categories

Resources