I'm trying to use a java regex to extract data. Its matching my data, but I can't get the group data. I'm trying to get the data 1, xmlAggregator, 268803451, 3. Looking at the docs, I assume that if I put() around \d+, and \w+, I get the numbers and strings inside the group. Any suggestions on how to change the regex?
String:
Span(trace_id:1, name:XmlAggregator, id:268803451, parent_id:3)
Java code:
String pattern="Span\\(trace_id:(\\d+), name:(\\w+), id:(\\d+), parent_id:(\\d+), (duration:(\\d+))*";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(line);
int count = 0;
while(m.find()) {
System.out.println("Match number "+count);
System.out.println("start(): "+m.start());
System.out.println("end(): "+m.end());
System.out.println("Found value: " + m.group(count) );
count++;
}
Output:
Match number 0
start(): 0
end(): 64
Found value: Span(trace_id:1, name:XmlAggregator, id:268803451, parent_id:3,
Hoping to get:
Found value: 1
Found value: XmlAggregator
Found value: 268803451
Found value: 3
You can access the capture groups (the parts of the match inside your unescaped parentheses) using the group method on your match result:
System.out.println("Trace ID = " + m.group(1));
System.out.println("Name = " + m.group(2));
// etc...
Note that you start counting the capture groups from 1, not 0. This is because group 0 corresponds to the entire matched string.
Each value is inside a group. Therefore you can loop over the number of groups matched and for each one print the group number, value, start index, etc.:
if(m.find()) {
for(int count = 1; count <= m.groupCount(); count++) {
System.out.println("Match number " + count);
System.out.println("start(): " + m.start(count));
System.out.println("end(): " + m.end(count));
System.out.println("Found value: " + m.group(count));
}
}
Related
I have a string "Spec Files: 15 passed, 5 failed, 20 total (100% completed) in 00:08:53".
I need to use regex and print the following:
Passed: 15
Failed: 5
Total: 20
And also need to compute and print the pass percentage. Pls help.
I'm using below code:
String line = "Spec Files: 15 passed, 5 failed, 20 total (100% completed) in 00:08:53";
Pattern p = Pattern.compile("(\\d+)\\s+");
Matcher m = p.matcher(line);
while(m.find()) {
System.out.println(m.group());
}
You need a regex that captures the 2 elements you need : text and value, then print them in the good order :
String line = "Spec Files: 15 passed, 5 failed, 20 total (100% completed) in 00:08:53";
Pattern p = Pattern.compile("(\\d+)\\s+(\\w+)");
Matcher m = p.matcher(line);
while (m.find()) {
System.out.println(m.group(2) + ": " + m.group(1));
}
/*
passed: 15
failed: 5
total: 20
capitalize is not a built-in function so you can take a look at How to capitalize the first character of each word in a string
Pass %:
Map<String, Integer> map = new HashMap<>();
while (m.find()) {
System.out.println(m.group(2) + ": " + m.group(1));
map.put(m.group(2), Integer.parseInt(m.group(1)));
}
double passPercentage = map.get("passed") / (double) map.get("total");
System.out.println(passPercentage);
OR
int passed = 0, total = 0;
while (m.find()) {
System.out.println(m.group(2) + ": " + m.group(1));
if (m.group(2).equals("passed")) {
passed += Integer.parseInt(m.group(1));
} else if (m.group(2).equals("total")) {
total += Integer.parseInt(m.group(1));
}
}
double passPercentage = passed / (double) total;
System.out.println(passPercentage);
I have an input that looks like this : 0; expires=2016-12-27T16:52:39
I am trying extract from this only the date, using Pattern and Matcher.
private String extractDateFromOutput(String result) {
Pattern p = Pattern.compile("(expires=)(.+?)(?=(::)|$)");
Matcher m = p.matcher(result);
while (m.find()) {
System.out.println("group 1: " + m.group(1));
System.out.println("group 2: " + m.group(2));
}
return result;
}
Why does this matcher find more than 1 group ? The output is as follows:
group 1: expires=
group 2: 2016-12-27T17:04:39
How can I get only group 2 out of this?
Thank you !
Because you have used more than one capturing group in your regex.
Pattern p = Pattern.compile("expires=(.+?)(?=::|$)");
Just remove the capturing group around
expires
::
private String extractDateFromOutput(String result) {
Pattern p = Pattern.compile("expires=(.+?)(?=::|$)");
Matcher m = p.matcher(result);
while (m.find()) {
System.out.println("group 1: " + m.group(1));
// no group 2, accessing will gives you an IndexOutOfBoundsException
//System.out.println("group 2: " + m.group(2));
}
return result;
}
I've been working on matching a variable from a client. It reads as such:
0s
12s
1m15s
15m0s
1h0m5s
1h15m17s
I would like to capture all three groupings of digits within a single find.
(\d+)(?=h(\d+)m(\d+))*?(?=m(\d+))*?
The regex I have been working on above however will only grab the successive groups in each new find.
example:
input is 12s group 1 is 12 ... works.
input is 1m12s group 1 is 1 however to get the 12, I have to use find again to get to the next group of 12.
Just as note as it didn't occur to me right away, make sure to do a check if a group is null for capturing groups that are optional.
Try this way:
((\d+)h)?((\d+)m)?((\d+)s)
Then you capture group 2 for hour, group 4 for minutes and group 6 for seconds
See it working here: https://regex101.com/r/bZ4zW4/2
In a graphical way:
Debuggex Demo
EDIT
To get the results in JAVA (since your last edit) do as follow:
Pattern p = Pattern.compile("((\\d+)h)?((\\d+)m)?((\\d+)s)");
Matcher m = p.matcher("1h15m17s");
if (m.find()){
Integer hour = Integer.valueOf(m.group(2));
Integer minute = Integer.valueOf(m.group(4));
Integer second = Integer.valueOf(m.group(6));
System.out.println(hour + " - " + minute + " - " + second);
}
m = p.matcher("1h0m5s");
if (m.find()){
Integer hour = Integer.valueOf(m.group(2));
Integer minute = Integer.valueOf(m.group(4));
Integer second = Integer.valueOf(m.group(6));
System.out.println(hour + " - " + minute + " - " + second);
}
m = p.matcher("15m0s");
if (m.find()){
Integer minute = Integer.valueOf(m.group(4));
Integer second = Integer.valueOf(m.group(6));
System.out.println(minute + " - " + second);
}
m = p.matcher("12s");
if (m.find()){
Integer second = Integer.valueOf(m.group(6));
System.out.println(second);
}
m = p.matcher("0s");
if (m.find()){
Integer second = Integer.valueOf(m.group(6));
System.out.println(second);
}
The output will be respectively:
1 - 15 - 17
1 - 0 - 5
15 - 0
12
0
Note that in each case I'm getting a specific value. If you try to get a minute from a matcher that is nonexistent you will get a java.lang.NumberFormatException because it will return null. So you must check it first. This below block will end up in the mentioned exception:
m = p.matcher("0s");
if (m.find()){
Integer minute = Integer.valueOf(m.group(4)); //exception here
Integer second = Integer.valueOf(m.group(6));
System.out.println(second);
}
form: column1 = emp_no
extract:
key: column1
value: emp_no
first code:
String p1 = "column1 = emp_no";
String propertyRegexp = "^\\s*(\\w+)\\s*=\\s*(\\w+)\\s*$";
Pattern pattern = Pattern.compile(propertyRegexp);
Matcher matcher = pattern.matcher(p1);
System.out.println("groupCount: " + matcher.groupCount());
if(matcher.matches()) {
for(int i = 0; i < matcher.groupCount(); i++) {
System.out.println(i + ": " + matcher.group(i));
}
}
first result:
groupCount: 2
0: column1 = emp_no
1: column1
It is not possible to find a second result.
The second brackets change to double parentheses.
second code:
String p1 = "column1 = emp_no";
String propertyRegexp = "^\\s*(\\w+)\\s*=\\s*((\\w+))\\s*$";
Pattern pattern = Pattern.compile(propertyRegexp);
Matcher matcher = pattern.matcher(p1);
System.out.println("groupCount: " + matcher.groupCount());
if(matcher.matches()) {
for(int i = 0; i < matcher.groupCount(); i++) {
System.out.println(i + ": " + matcher.group(i));
}
}
second result:
groupCount: 3
0: column1 = emp_no
1: column1
2: emp_no
I want results are output.
What is different regex in first and second code?
Change your code to.
String p1 = "column1 = emp_no";
String propertyRegexp = "^\\s*(\\w+)\\s*=\\s*(\\w+)\\s*$";
Pattern pattern = Pattern.compile(propertyRegexp);
Matcher matcher = pattern.matcher(p1);
System.out.println("groupCount: " + matcher.groupCount());
if(matcher.matches()) {
for(int i = 1; i <= matcher.groupCount(); i++) { //see the changes
System.out.println(i + ": " + matcher.group(i));
}
}
0th group always contains the entire matched string.
Actual groups start from index 1
Check out this live demo
Groups in regex are indexed from 0, but group 0 is added by regex engine automatically to represent entire match. Your groups are indexed as 1 and 2.
So your first attempt was almost correct, you should simply change loop from
for(int i = 0; i < matcher.groupCount(); i++) {
to
for(int i = 1; i <= matcher.groupCount(); i++) {
// ^ ^
You can read more about groups at official Java tutorual about regex https://docs.oracle.com/javase/tutorial/essential/regex/groups.html
where we can find example showing how groups are numbered:
...capturing groups are numbered by counting their opening parentheses from left to right. In the expression ((A)(B(C))), for example, there are four such groups:
((A)(B(C)))
(A)
(B(C))
(C)
...
There is also a special group, group 0, which always represents the entire expression.
If I am given a integer value like 100, then I have to update all the occurrences of the text "Writing No 1" to "Writing No 101"
Actual text:
Collection reference 4 -> My Main Text 4 -> It's writing 3 Writing No 1
Writing No 2 Writing NO 3
I have given three previous references.
As I am given the 100, so I output would like this.
Collection reference 4 -> My Main Text 104 -> It's writing 3 Writing No 101
Writing No 102 Writing NO 103
I have given three previous references.
How to update Writing No 1 to Writing NO 101 and other in the same way using Java?
As per my understanding from your question, you got to replace the string based on the integer values.
Better you write a function for the same like this-
String text = new String("Writing No 1");
public void demo(int num) {
text .replace(text.slice(-1), num);
}
Where you can pass any integer value and can even use loop for the multiple string values.
Hope i was helpful.
Not 100% this is what you need, but I'll give it a try:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class StackOverflow {
public static void main(String[] args) {
int inc = 100;
String text = "Collection reference 4 -> My Main Text 4 -> It's writing 3 Writing No 1 \nWriting No 2 Writing NO 3 \nI have given three previous references.";
String patten = "(\\D+) (\\d) (\\D+) (\\d) (\\D+) (\\d) (\\D+) (\\d) (\\D+) (\\d) (\\D+) (\\d) (\\D+)";
Pattern pattern = Pattern.compile(patten, Pattern.DOTALL);
Matcher matcher = pattern.matcher(text);
matcher.find();
System.out.println(matcher.group(1) + " " + matcher.group(2) + " " + matcher.group(3) + " "
+ inc(matcher.group(4), inc) + " " + matcher.group(5) + " " + matcher.group(6) + " " + matcher.group(7)
+ " " + inc(matcher.group(8), inc) + " " + matcher.group(9) + " " + inc(matcher.group(10), inc) + " "
+ matcher.group(11) + " " + inc(matcher.group(12), inc) + matcher.group(13));
}
private static int inc(String base, int inc) {
return Integer.valueOf(base) + inc;
}
}