Java: Regular Expression not matching? - java

I am trying to extract a special sequence out of a String using the following Regular Expression:
[(].*[)]
My Pattern should only match if the String contains () with text between them.
Somehow, i I create a new Pattern using Pattern#compile(myString) and then match the String using Matcher matcher = myPattern.matcher(); it doesn't find anything, even though I tried it on regexr.com and it worked there.
My Pattern is a static final Pattern object in another class (I directly used Pattern#compile(myString).
Example String to match:
save (xxx,yyy)

The likely problem here is your quantifier.
Since you're using greedy * with a combination of . for any character, your match will not delimit correctly as . will also match closing ).
Try using reluctant [(].*?[)].
See quantifiers in docs.
You can also escape parenthesis instead of using custom character classes, like so: \\( and \\), but that has nothing to do with your issue.
Also note (thanks esprittn)
The * quantifier will match 0+ characters, so if you want to restrict your matches to non-empty parenthesis, use .+? instead - that'll guarantee at least one character inside your parenthesis.

Hope the below code helps : its extracts the data between '(' & ')' including them .
String pattern = "\\(.*\\)";
String line = "save(xx,yy)";
Pattern TokenPattern = Pattern.compile(pattern);
Matcher m = TokenPattern.matcher(line);
while (m.find()) {
int start = m.start(0);
int end = m.end(0);
System.out.println(line.substring(start, end));
}
to remove the brackets change 'start' to 'start+1' and 'end' to 'end-1' to change the bounding indexes of the sub-string being taken.

Related

java regular expression and replace all occurrences

I want to replace one string in a big string, but my regular expression is not proper I guess. So it's not working.
Main string is
Some sql part which is to be replaced
cond = emp.EMAIL_ID = 'xx#xx.com' AND
emp.PERMANENT_ADDR LIKE('%98n%')
AND hemp.EMPLOYEE_NAME = 'xxx' and is_active='Y'
String to find and replace is
Based on some condition sql part to be replaced
hemp.EMPLOYEE_NAME = 'xxx'
I have tried this with
Pattern and Matcher class is used and
Pattern pat1 = Pattern.compile("/^hemp.EMPLOYEE_NAME\\s=\\s\'\\w\'\\s[and|or]*/$", Pattern.CASE_INSENSITIVE);
Matcher mat = pat1.matcher(cond);
while (mat.find()) {
System.out.println("Match: " + mat.group());
cond = mat.replaceFirst("xx "+mat.group()+"x");
mat = pat1.matcher(cond);
}
It's not working, not entering the loop at all. Any help is appreciated.
Obviously not - your regexp pattern doesn't make any sense.
The opening /: In some languages, regexps aren't strings and start with an opening slash. Java is not one of those languages, and it has nothing to do with regexps itself. So, this looks for a literal slash in that SQL, which isn't there, thus, failure.
^ is regexpese for 'start of string'. Your string does not start with hemp.EMPLOYEE_NAME, so that also doesn't work. Get rid of both / and ^ here.
\\s is one whitespace character (there are many whitespace characters - this matches any one of them, exactly one though). Your string doesn't have any spaces. Your intent, surely, was \\s* which matches 0 to many of them, i.e.: \\s* is: "Whitespace is allowed here". \\s is: There must be exactly one whitespace character here. Make all the \\s in your regexp an \\s*.
\\w is exactly one 'word' character (which is more or less a letter or digit), you obviously wanted \\w*.
[and|or] this is regexpese for: "An a, or an n, or a d, or an o, or an r, or a pipe symbol". Clearly you were looking for (and|or) which is regexpese for: Either the sequence "and", or the sequence "or".
* - so you want 0 to many 'and' or 'or', which makes no sense.
closing slash: You don't want this.
closing $: You don't want this - it means 'end of string'. Your string didn't end here.
The code itself:
replaceFirst, itself, also does regexps. You don't want to double apply this stuff. That's not how you replace a found result.
This is what you wanted:
Matcher mat = pat1.matcher(cond);
mat.replaceFirst("replacement goes here");
where replacement can include references to groups in the match if you want to take parts of what you matched (i.e. don't use mat.group(), use those references).
More generally did you read any regexp tutorial, did any testing, or did any reading of the javadoc of Pattern and Matcher?
I've been developing for a few years. It's just personal experience, perhaps, but, reading is pretty fundamental.
Instead of the anchors ^ and $, you can use word boundaries \b to prevent a partial match.
If you want to match spaces on the same line, you can use \h to match horizontal whitespace char, as \s can also match a newline.
You can use replaceFirst on the string using $0 to get the full match, and an inline modifier (?i) for a case insensitive match.
Note that using [and|or] is a character class matching one of the listed chars and escape the dot to match it literally, or else . matches any char except a newline.
(?i)\bhemp\.EMPLOYEE_NAME\h*=\h*'\w+'\h+(?:and|or)\b
See a regex demo or a Java demo
For example
String regex = "\\bhemp\\.EMPLOYEE_NAME\\h*=\\h*'\\w+'\\h+(?:and|or)\\b";
String string = "cond = emp.EMAIL_ID = 'xx#xx.com' AND\n"
+ "emp.PERMANENT_ADDR LIKE('%98n%') \n"
+ "AND hemp.EMPLOYEE_NAME = 'xxx' and is_active='Y'";
System.out.println(string.replaceFirst(regex, "xx$0x"));
Output
cond = emp.EMAIL_ID = 'xx#xx.com' AND
emp.PERMANENT_ADDR LIKE('%98n%')
AND xxhemp.EMPLOYEE_NAME = 'xxx' andx is_active='Y'

IllegalStateException with Pattern/Matcher

I'm using Matcher to capture groups using a regular expression in Java and it keeps throwing an IllegalStateException even though I know that the expression matches.
This is my code:
String safeName = Pattern.compile("(\\.\\w+)$").matcher("google.ca").group();
I'm expecting safeName to be .ca as captured with the capturing group in the regular expression but instead I get:
IllegalStateException: No match found
I also tried with .group(0) and .group(1) but the same error occurs.
According to the documentation for group() and group(int group):
Capturing groups are indexed from left to right, starting at one. Group zero denotes the entire pattern, so the expression m.group(0) is equivalent to m.group().
What am I doing wrong?
Matcher is helper class which handles iterating over data to search for substrings matching regex. It is possible that entire string will contain many sub-strings which can be matched, so by calling group() you can't specify which actual match you are interested in. To solve this problem Matcher lets you iterate over all matching sub-strings and then use parts you are interested in.
So before you can use group you need to let Matcher iterate over your string to find() match for your regex. To check if regex matches entire String we can use matches() method instead of find().
Generally to find all matching substrings we are using
Pattern p = Pattern.compiler("yourPattern");
Matcher m = p.matcher("yourData");
while(m.find()){
String match = m.group();
//here we can do something with match...
}
Since you are assuming that text you want to find exists only once in your string (at its end) you don't need to use loop, but simple if (or conditional operator) should solve your problem.
Matcher m = Pattern.compile("(\\.\\w+)$").matcher("google.ca");
String safeName = m.find() ? m.group() : null;

Pattern Matching for java using regex

I have a Long string that I have to parse for different keywords. For example, I have the String:
"==References== This is a reference ==Further reading== *{{cite book|editor1-last=Lukes|editor1-first=Steven|editor2-last=Carrithers|}} * ==External links=="
And my keywords are
'==References==' '==External links==' '==Further reading=='
I have tried a lot of combination of regex but i am not able to recover all the strings.
the code i have tried:
Pattern pattern = Pattern.compile("\\=+[A-Za-z]\\=+");
Matcher matcher = pattern.matcher(textBuffer.toString());
while (matcher.find()) {
System.out.println(matcher.group(0));
}
You don't need to escape the = sign. And you should also include a whitespace inside your character class.
Apart from that, you also need a quantifier on your character class to match multiple occurrences. Try with this regex:
Pattern pattern = Pattern.compile("=+[A-Za-z ]+=+");
You can also increase the flexibility to accept any characters in between two =='s, by using .+? (You need reluctant quantifier with . to stop it from matching everything till the last ==) or [^=]+:
Pattern pattern = Pattern.compile("=+[^=]+=+");
If the number of ='s are same on both sides, then you need to modify your regex to use capture group, and backreference:
"(=+)[^=]+\\1"

How can I make a Java regex all or nothing?

I'm trying to make a regex all or nothing in the sense that the given word must EXACTLY match the regular expression - if not, a match is not found.
For instance, if my regex is:
^[a-zA-Z][a-zA-Z|0-9|_]*
Then I would want to match:
cat9
cat9_
bob_____
But I would NOT want to match:
cat7-
cat******
rango78&&
I want my regex to be as strict as possible, going for an all or nothing approach. How can I go about doing that?
EDIT: To make my regex absolutely clear, a pattern must start with a letter, followed by any number of numbers, letters, or underscores. Other characters are not permitted. Below is the program in question I am using to test out my regex.
Pattern p = Pattern.compile("^[a-zA-Z][a-zA-Z|0-9|_]*");
Scanner in = new Scanner(System.in);
String result = "";
while(!result.equals("-1")){
result = in.nextLine();
Matcher m = p.matcher(result);
if(m.find())
{
System.out.println(result);
}
}
I think that if you use String.matches(regex), then you will get the effect you are looking for. The documentation says that matches() will return true only if the entire string matches the pattern.
The regex won't match the second example. It's already strict, since * and & are not in the allowed set of characters.
It may match a prefix, but you can avoid this by adding '$' to the end of the regex, which explicitly matches end of input. So try,
^[a-zA-Z][a-zA-Z|0-9|_]*$
This will ensure the match is against the entire input string, and not just a prefix.
Note that \w is the same as [A-Za-z0-9_]. And you need to anchor to the end of the string like so:
Pattern p = Pattern.compile("^[a-zA-Z]\\w*$")

Problem matching regex pattern in Android

I am trying to search this string:
,"tt" : "ABC","r" : "+725.00","a" : "55.30",
For:
"r" : "725.00"
And here is my current code:
Pattern p = Pattern.compile("([r]\".:.\"[+|-][0-9]+.[0-9][0-9]\")");
Matcher m = p.matcher(raw_string);
I've been trying multiple variations of the pattern, and a match is never found. A second set of eyes would be great!
Your regexp actually works, it's almost correct
Pattern p = Pattern.compile("\"[r]\".:.\"[+|-][0-9]+.[0-9][0-9]\"");
Matcher m = p.matcher(raw_string);
if (m.find()){
String res = m.toMatchResult().group(0);
}
The next line should read:
if ( m.find() ) {
Are you doing that?
A few other issues: You're using . to match the spaces surrounding the colon; if that's always supposed to be whitespace, you should use + (one or more spaces) or \s+ (one or more whitespace characters). On the other hand, the dot between the digits is supposed to match a literal ., so you should escape it: \. Of course, since this is a Java String literal, you need to escape the backslashes: \\s+, \\..
You don't need the square brackets around the r, and if you don't want to match a | in front of the number you should change [+|-] to [+-].
While some of these issues I've mentioned could result in false positives, none of them would prevent it from matching valid input. That's why I suspect you aren't actually applying the regex by calling find(). It's a common mistake.
First thing try to escape your dot symbol: ...[0-9]+\.[0-9][0-9]...
because the dot symbol match any character...
Second thing: the [+|-]define a range of characters but it's mandatory...
try [+|-]?
Alban.

Categories

Resources