Java matcher unable to finding last group - java

I'm trying regex after a long time. I'm not sure if the issue is with regex or the logic.
String test = "project/components/content;contentLabel|contentDec";
String regex = "(([A-Za-z0-9-/]*);([A-Za-z0-9]*))";
Map<Integer, String> matchingGroups = new HashMap<>();
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(test);
//System.out.println("Input: " + test + "\n");
//System.out.println("Regex: " + regex + "\n");
//System.out.println("Matcher Count: " + matcher.groupCount() + "\n");
if (matcher != null && matcher.find()) {
for (int i = 0; i < matcher.groupCount(); i++) {
System.out.println(i + " -> " + matcher.group(i) + "\n");
}
}
I was expecting the above to give me the output as below:
0 -> project/components/content;contentLabel|contentDec
1 -> project/components/content
2 -> contentLabel|contentDec
But when running the code the group extractions are off.
Any help would be really appreciated.
Thanks!

You have a few issues:
You're missing | in your second character class.
You have an unnecessary capture group around the whole regex.
When outputting the groups, you need to use <= matcher.groupCount() because matcher.group(0) is reserved for the whole match, so your capture groups are in group(1) and group(2).
This will work:
String test = "project/components/content;contentLabel|contentDec";
String regex = "([A-Za-z0-9-/]*);([A-Za-z0-9|]*)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(test);
if (matcher != null && matcher.find()) {
for (int i = 0; i <= matcher.groupCount(); i++) {
System.out.println(i + " -> " + matcher.group(i) + "\n");
}
}

Related

Java Pattern to match String beginning and end?

I have an input that looks like this : 0; expires=2016-12-27T16:52:39
I am trying extract from this only the date, using Pattern and Matcher.
private String extractDateFromOutput(String result) {
Pattern p = Pattern.compile("(expires=)(.+?)(?=(::)|$)");
Matcher m = p.matcher(result);
while (m.find()) {
System.out.println("group 1: " + m.group(1));
System.out.println("group 2: " + m.group(2));
}
return result;
}
Why does this matcher find more than 1 group ? The output is as follows:
group 1: expires=
group 2: 2016-12-27T17:04:39
How can I get only group 2 out of this?
Thank you !
Because you have used more than one capturing group in your regex.
Pattern p = Pattern.compile("expires=(.+?)(?=::|$)");
Just remove the capturing group around
expires
::
private String extractDateFromOutput(String result) {
Pattern p = Pattern.compile("expires=(.+?)(?=::|$)");
Matcher m = p.matcher(result);
while (m.find()) {
System.out.println("group 1: " + m.group(1));
// no group 2, accessing will gives you an IndexOutOfBoundsException
//System.out.println("group 2: " + m.group(2));
}
return result;
}

java regex read property, what is different double parentheses

form: column1 = emp_no
extract:
key: column1
value: emp_no
first code:
String p1 = "column1 = emp_no";
String propertyRegexp = "^\\s*(\\w+)\\s*=\\s*(\\w+)\\s*$";
Pattern pattern = Pattern.compile(propertyRegexp);
Matcher matcher = pattern.matcher(p1);
System.out.println("groupCount: " + matcher.groupCount());
if(matcher.matches()) {
for(int i = 0; i < matcher.groupCount(); i++) {
System.out.println(i + ": " + matcher.group(i));
}
}
first result:
groupCount: 2
0: column1 = emp_no
1: column1
It is not possible to find a second result.
The second brackets change to double parentheses.
second code:
String p1 = "column1 = emp_no";
String propertyRegexp = "^\\s*(\\w+)\\s*=\\s*((\\w+))\\s*$";
Pattern pattern = Pattern.compile(propertyRegexp);
Matcher matcher = pattern.matcher(p1);
System.out.println("groupCount: " + matcher.groupCount());
if(matcher.matches()) {
for(int i = 0; i < matcher.groupCount(); i++) {
System.out.println(i + ": " + matcher.group(i));
}
}
second result:
groupCount: 3
0: column1 = emp_no
1: column1
2: emp_no
I want results are output.
What is different regex in first and second code?
Change your code to.
String p1 = "column1 = emp_no";
String propertyRegexp = "^\\s*(\\w+)\\s*=\\s*(\\w+)\\s*$";
Pattern pattern = Pattern.compile(propertyRegexp);
Matcher matcher = pattern.matcher(p1);
System.out.println("groupCount: " + matcher.groupCount());
if(matcher.matches()) {
for(int i = 1; i <= matcher.groupCount(); i++) { //see the changes
System.out.println(i + ": " + matcher.group(i));
}
}
0th group always contains the entire matched string.
Actual groups start from index 1
Check out this live demo
Groups in regex are indexed from 0, but group 0 is added by regex engine automatically to represent entire match. Your groups are indexed as 1 and 2.
So your first attempt was almost correct, you should simply change loop from
for(int i = 0; i < matcher.groupCount(); i++) {
to
for(int i = 1; i <= matcher.groupCount(); i++) {
// ^ ^
You can read more about groups at official Java tutorual about regex https://docs.oracle.com/javase/tutorial/essential/regex/groups.html
where we can find example showing how groups are numbered:
...capturing groups are numbered by counting their opening parentheses from left to right. In the expression ((A)(B(C))), for example, there are four such groups:
((A)(B(C)))
(A)
(B(C))
(C)
...
There is also a special group, group 0, which always represents the entire expression.

Filter out the price or cost in the Java

I have got ((?:[0-9]{1,3}[\.,]?)*[\.,]?[0-9]+) to filter out the prices in a string on java so I put them like this:
public static final String new_price = "((?:[0-9]{1,3}[\\.,]?)*[\\.,]?[0-9]+)";
final Pattern p = Pattern.compile(new_price, 0);
final Matcher m = p.matcher(label);
if (m.matches()) {
Log.d(TAG, "found! good start");
if (m.groupCount() == 1) {
Log.d(TAG, "start match price" + " : " + m.group(0));
}
if (m.groupCount() == 2) {
Log.d(TAG, "start match price" + " : " + m.group(1));
}
}
I got the sample working on http://www.regexr.com/ but it never found the matches on the run time. Any idea??
Instead of using matches() you should run m.find() which searches for the next match (this should be done in a while loop!):
String new_price = "((?:[0-9]{1,3}[\\.,]?)*[\\.,]?[0-9]+)";
String label = "$500.00 - $522.30";
final Pattern p = Pattern.compile(new_price, 0);
final Matcher m = p.matcher(label);
while (m.find()) {
System.out.println("found! good start");
if (m.groupCount() == 1) {
System.out.println("start match price" + " : " + m.group(0));
}
if (m.groupCount() == 2) {
System.out.println("start match price" + " : " + m.group(1));
}
}
OUTPUT
found! good start
start match price : 500.00
found! good start
start match price : 522.30

how to filter out some substrings with regex effectively?

i want to filter out srcport and dstport from the input string. here is the code i tried:
String input = "2014<>10.100.2.3<><189>date=2014-01-16,time=11:26:14,devname=B3909601569,devid=B3909601569,logid=000013,type=traffic,srcip=192.168.192.123,srcport=2072,srcintf=port2,dstip=10.180.1.105,dstport=3206,dstintf=port1,sessionid=121543,status=close,policyid=196,service=MYSQL,proto=6,duration=10,sentbyte=3910,rcvdbyte=175085,sentpkt=74,rcvdpkt=132";
Pattern p = Pattern.compile("(srcport=)(\\d+).[\\s]?(dstport=)(\\d+)");
Matcher m = p.matcher(input);
StringBuffer result=new StringBuffer();
while (m.find()) {
System.out.println("Srcport: " + m.group(2) + " & ");
System.out.println("Dstport: " + m.group(4));
}
System.out.println(result);
but its not showing any output. Is there a mistake in the regex
Pattern p = Pattern.compile("(srcport=)(\\d+).[\\s]?(dstport=)(\\d+)");
or the println lines
System.out.println("Srcport: " + m.group(2) + " & ");
System.out.println("Dstport: " + m.group(4));"
any suggestions will be highly appreciated.
See following changes to both the regex and the captured groups:
String input = "2014<>10.100.2.3<><189>date=2014-01-16,time=11:26:14,devname=B3909601569,devid=B3909601569,logid=000013,type=traffic,srcip=192.168.192.123,srcport=2072,srcintf=port2,dstip=10.180.1.105,dstport=3206,dstintf=port1,sessionid=121543,status=close,policyid=196,service=MYSQL,proto=6,duration=10,sentbyte=3910,rcvdbyte=175085,sentpkt=74,rcvdpkt=132";
Pattern p = Pattern.compile("srcport=(\\d+).*?dstport=(\\d+)"); // update regex
Matcher m = p.matcher(input);
StringBuffer result=new StringBuffer();
while (m.find()) {
System.out.println("Srcport: " + m.group(1)); //print groups 1 + 2
System.out.println("Dstport: " + m.group(2));
}
System.out.println(result);
You forgot to use or(|) in your regex
srcport=(\\d+)|dstport=(\\d+)
Your code would be
while (m.find())
{
if(m.group().startsWith("srcport"))
System.out.println("Srcport: " + m.group(1) + " & ");
else
System.out.println("Dstport: " + m.group(1));
}
Try this :
Pattern p = Pattern.compile("srcport=(\\d+)|dstport=(\\d+)");
Try the below code. I have run this in my system and it it working fine.
String input = "2014<>10.100.2.3<><189>date=2014-01-16,time=11:26:14,devname=B3909601569,devid=B3909601569,logid=000013,type=traffic,srcip=192.168.192.123,srcport=2072,srcintf=port2,dstip=10.180.1.105,dstport=3206,dstintf=port1,sessionid=121543,status=close,policyid=196,service=MYSQL,proto=6,duration=10,sentbyte=3910,rcvdbyte=175085,sentpkt=74,rcvdpkt=132";
Pattern p = Pattern.compile("(srcport=)(\\d+)((.*)?)(dstport=)(\\d+)(\\.)*");
Matcher m = p.matcher(input);
StringBuffer result=new StringBuffer();
while (m.find()) {
System.out.println(m.group());
System.out.println("Srcport: " + m.group(2) );
System.out.println("Dstport: " + m.group(6));
}

Pattern lookahead

Pattern p = Pattern.compile("(ma)|([a-zA-Z_]+)");
Matcher m = p.matcher("ma");
m.find();
System.out.println("1 " + m.group(1) + ""); //ma
System.out.println("2 " + m.group(2)); // null
Matcher m = p.matcher("mad");
m.find();
System.out.println("1 " + m.group(1) + ""); //ma
System.out.println("2 " + m.group(2)); // null
But I need that the string "mad" would be in the 2nd group.
I think what you are looking for is something like:
(ma(?!d))|([a-zA-Z_]+)
from "perldoc perlre":
"(?!pattern)"
A zero-width negative look-ahead assertion. For
example
"/foo(?!bar)/" matches any occurrence of "foo" that
isn't
followed by "bar".
the only thing I'm not sure about is whether Java supports this syntax, but I think it does.
If you use matches instead of find, it will try to match the entire string against that pattern, which it can only do by putting mad in the second group:
import java.util.regex.*;
public class Test {
public static void main(String[] args) {
Pattern p = Pattern.compile("(ma)|([a-zA-Z_]+)");
Matcher m = p.matcher("ma");
m.matches();
System.out.println("1 " + m.group(1)); // ma
System.out.println("2 " + m.group(2)); // null
m = p.matcher("mad");
m.matches();
System.out.println("1 " + m.group(1)); // null
System.out.println("2 " + m.group(2)); // mad
}
}

Categories

Resources