I can't get the first group of regex pattern in java - java

I'm trying to get the first group of a regex pattern.
I got this string from a lyric text:
[01:34][01:36]Blablablahh nanana
I'm this regex pattern to extract [01:34],[03:36] and the text.
Pattern timeLine = Pattern.compile("(\\[\\d\\d:\\d\\d\\])+(.*)");
But when I try to extract the first group [01:34] using group(1) it returns [03:36]
is there something wrong in the regex pattern?

Your problem is here
Pattern.compile("(\\[\\d\\d:\\d\\d\\])+(.*)");
^
This part of your pattern (\\[\\d\\d:\\d\\d\\])+ will match [01:34][01:36] because of + (which is greedy), but your group 1 can contain only one of [dd:dd] so it will store the last match found.
If you want to find only [01:34] you can correct your pattern by removing +. But you can also create simpler pattern
Pattern.compile("^\\[\\d\\d:\\d\\d\\]");
and use it with group(0) which is also called by group().
Pattern timeLine = Pattern.compile("^\\[\\d\\d:\\d\\d\\]");
Matcher m = timeLine.matcher("[01:34][01:36]Blablablahh nanana");
while (m.find()) {
System.out.println(m.group()); // prints [01:34]
}
In case you want to extract both [01:34][01:36] you can just add another parenthesis to your current regex like
Pattern.compile("((\\[\\d\\d:\\d\\d\\])+)(.*)");
This way entire match of (\\[\\d\\d:\\d\\d\\])+ will be in group 1.
You can also achieve it by removing (.*) from your original pattern and reading group 0.

I thin you are confused by the repeating match (\\[\\d\\d:\\d\\d\\])+ which returns just the last match as the group value. Try the following and see if it makes more sense to you:
String s = "[01:34][01:36]Blablablahh nanana";
Pattern timeLine = Pattern.compile("(\\[\\d\\d:\\d\\d\\])(\\[\\d\\d:\\d\\d\\])(.+)");
Matcher m = timeLine.matcher(s);
if (m.matches()) {
for (int i = 1; i <= m.groupCount(); i++) {
System.out.printf(" Group %d -> %s\n", i, m.group(i)); // prints [01:36]
}
}
which for me returns:
Group 1 -> [01:34]
Group 2 -> [01:36]
Group 3 -> Blablablahh nanana

I would simply grab the first part using a character class:
String timings = str.replaceAll("([\\[\\]\\d:]+).*", "$1");
And similarly the text:
String text = str.replaceAll("[\\[\\]\\d:]+", "");

Related

How to parse string using regex

I'm pretty new to java, trying to find a way to do this better. Potentially using a regex.
String text = test.get(i).toString()
// text looks like this in string form:
// EnumOption[enumId=test,id=machine]
String checker = text.replace("[","").replace("]","").split(",")[1].split("=")[1];
// checker becomes machine
My goal is to parse that text string and just return back machine. Which is what I did in the code above.
But that looks ugly. I was wondering what kinda regex can be used here to make this a little better? Or maybe another suggestion?
Use a regex' lookbehind:
(?<=\bid=)[^],]*
See Regex101.
(?<= ) // Start matching only after what matches inside
\bid= // Match "\bid=" (= word boundary then "id="),
[^],]* // Match and keep the longest sequence without any ']' or ','
In Java, use it like this:
import java.util.regex.*;
class Main {
public static void main(String[] args) {
Pattern pattern = Pattern.compile("(?<=\\bid=)[^],]*");
Matcher matcher = pattern.matcher("EnumOption[enumId=test,id=machine]");
if (matcher.find()) {
System.out.println(matcher.group(0));
}
}
}
This results in
machine
Assuming you’re using the Polarion ALM API, you should use the EnumOption’s getId method instead of deparsing and re-parsing the value via a string:
String id = test.get(i).getId();
Using the replace and split functions don't take the structure of the data into account.
If you want to use a regex, you can just use a capturing group without any lookarounds, where enum can be any value except a ] and comma, and id can be any value except ].
The value of id will be in capture group 1.
\bEnumOption\[enumId=[^=,\]]+,id=([^\]]+)\]
Explanation
\bEnumOption Match EnumOption preceded by a word boundary
\[enumId= Match [enumId=
[^=,\]]+, Match 1+ times any char except = , and ]
id= Match literally
( Capture group 1
[^\]]+ Match 1+ times any char except ]
)\]
Regex demo | Java demo
Pattern pattern = Pattern.compile("\\bEnumOption\\[enumId=[^=,\\]]+,id=([^\\]]+)\\]");
Matcher matcher = pattern.matcher("EnumOption[enumId=test,id=machine]");
if (matcher.find()) {
System.out.println(matcher.group(1));
}
Output
machine
If there can be more comma separated values, you could also only match id making use of negated character classes [^][]* before and after matching id to stay inside the square bracket boundaries.
\bEnumOption\[[^][]*\bid=([^,\]]+)[^][]*\]
In Java
String regex = "\\bEnumOption\\[[^][]*\\bid=([^,\\]]+)[^][]*\\]";
Regex demo
A regex can of course be used, but sometimes is less performant, less readable and more bug-prone.
I would advise you not use any regex that you did not come up with yourself, or at least understand completely.
PS: I think your solution is actually quite readable.
Here's another non-regex version:
String text = "EnumOption[enumId=test,id=machine]";
text = text.substring(text.lastIndexOf('=') + 1);
text = text.substring(0, text.length() - 1);
Not doing you a favor, but the downvote hurt, so here you go:
String input = "EnumOption[enumId=test,id=machine]";
Matcher matcher = Pattern.compile("EnumOption\\[enumId=(.+),id=(.+)\\]").matcher(input);
if(!matcher.matches()) {
throw new RuntimeException("unexpected input: " + input);
}
System.out.println("enumId: " + matcher.group(1));
System.out.println("id: " + matcher.group(2));

Regex capturing groups within logical OR

I have a set of strings I need to parse and extract values from. They look like:
/apple/1212d3fe
/cat/23224a2f4
/auto/445478eefd
/somethingelse/1234fded
It should match only apple, cat and auto. The output I expect is:
1212, d3fe
23224, a2f4
445478, eefd
null
I need to come up with a regex capturing groups to do the same. I am able to extract the second part but not the first one. The closest I came up with is:
String r2 = "^/(apple/[0-9]{4}|cat/[0-9]{5}|auto/[0-9]{6})([a-f0-9]{4})$";
System.out.println(r2);
Pattern pattern2 = Pattern.compile(r2);
Matcher matcher2 = pattern2.matcher("/apple/2323efff");
if (matcher2.find()) {
System.out.println(matcher2.group(1));
System.out.println(matcher2.group(2));
}
UPDATED QUESTION:
I have a set of strings I need to parse and extract values from. They look like:
/apple/1212d3fe
/cat/23e24a2f4
/auto/df5478eefd
/somethingelse/1234fded
It should match only apple, cat and auto. The output I expect is the everything after the 2nd '/' split as follows: 4 characters if 'apple', 5 characters if 'cat' and 6 characters if 'auto' like:
1212, d3fe
23e24, a2f4
df5478, eefd
null
I need to come up with a regex capturing groups to do the same. I am able to extract the second part but not the first one. The closest I came up with is:
String r2 = "^/(apple/[0-9]{4}|cat/[0-9]{5}|auto/[0-9]{6})([a-f0-9]{4})$";
System.out.println(r2);
Pattern pattern2 = Pattern.compile(r2);
Matcher matcher2 = pattern2.matcher("/apple/2323efff");
if (matcher2.find()) {
System.out.println(matcher2.group(1));
System.out.println(matcher2.group(2));
}
I can do it without the regex OR(|) but it breaks when I include it. Any help with the right regex?
Updated Answer:
As per your updated question you can use this regex based on lookbehind assertions:
/((?<=apple/).{4}|(?<=cat/).{5}|(?<=auto/).{6})(.+)$
RegEx Demo
This regex uses 2 capture groups after matching /
In 1st group we have 3 lookbehind conditions with alternations.
(?<=apple/).{4} makes sure that we match 4 characters that have apple/ on left hand side. Likewise we match 5 and 6 character strings that have cat/ and /auto/.
In 2nd capture group we match remaining characters before end of line.
You could use the regex \/[apple|auto|cat]+\/(\d*)(.*), See here
If you want the last group to have exactly 4 digits you can use this regex:
/(apple|cat|auto)/([0-9a-f]+)([0-9a-f]{4})
Here is a working example:
List<String> strings = Arrays.asList("/apple/1212d3fe", "/cat/23224a2f4", "/auto/445478eefd");
Pattern pattern = Pattern.compile("/(apple|cat|auto)/([0-9a-f]+)([0-9a-f]{4})");
for (String string : strings) {
Matcher matcher = pattern.matcher(string);
if (matcher.find()) {
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
System.out.println(matcher.group(3));
}
}
If you want for digits after apple, 5 after cat and 6 after auto you can split your algorithm in 2 parts:
List<String> strings = Arrays.asList("/apple/1212d3fe", "/cat/23224a2f4", "/auto/445478eefd", "/some/445478eefd");
Pattern firstPattern = Pattern.compile("/(apple|cat|auto)/([0-9a-f]+)");
for (String string : strings) {
Matcher firstMatcher = firstPattern.matcher(string);
if (firstMatcher.find()) {
String first = firstMatcher.group(1);
System.out.println(first);
int length = getLength(first);
Pattern secondPattern = Pattern.compile("([0-9a-f]{" + length + "})([0-9a-f]{4})");
Matcher secondMatcher = secondPattern.matcher(string);
if (secondMatcher.find()) {
System.out.println(secondMatcher.group(1));
System.out.println(secondMatcher.group(2));
}
}
}
private static int getLength(String key) {
switch (key) {
case "apple":
return 4;
case "cat":
return 5;
case "auto":
return 6;
}
throw new IllegalArgumentException("key not allowed");
}

add +n to the name of some Strings id ends with "v + number"

I have an array of Strings
Value[0] = "Documento v1.docx";
Value[1] = "Some_things.pdf";
Value[2] = "Cosasv12.doc";
Value[3] = "Document16.docx";
Value[4] = "Nodoc";
I want to change the name of the document and add +1 to the version of every document. But only the Strings of documents that ends with v{number} (v1, v12, etc).
I used the regex [v]+a*^ but only i obtain the "v" and not the number after the "v"
If all your strings ending with v + digits + extension are to be processed, use a pattern like v(\\d+)(?=\\.[^.]+$) and then manipulate the value of Group 1 inside the Matcher#appendReplacement method:
String[] strs = { "Documento v1.docx", "Some_things.pdf", "Cosasv12.doc", "Document16.docx", "Nodoc"};
Pattern pat = Pattern.compile("v(\\d+)(?=\\.[^.]+$)");
for (String s: strs) {
StringBuffer result = new StringBuffer();
Matcher m = pat.matcher(s);
while (m.find()) {
int n = 1 + Integer.parseInt(m.group(1));
m.appendReplacement(result, "v" + n);
}
m.appendTail(result);
System.out.println(result.toString());
}
See the Java demo
Output:
Documento v2.docx
Some_things.pdf
Cosasv13.doc
Document16.docx
Nodoc
Pattern details
v - a v
(\d+) - Group 1 value: one or more digits
(?=\.[^.]+$) - that are followed with a literal . and then 1+ chars other than . up to the end of the string.
The Regex v\d+ should match on the letter v, followed by a number (please note that you may need to write it as v\\d+ when assigning it to a String). Further enhancement of the Regex depends in what your code looks like. You may want to to wrap in a Capturing Group like (v\d+), or even (v(\d+)).
The first reference a quick search turns up is
https://docs.oracle.com/javase/tutorial/essential/regex/ ,
which should be a good starting point.
Try a regex like this:
([v])([1-9]{1,3})(\.)
notice that I've already included the point in order to have less "collisions" and a maximum of 999 versions({1,3}).
Further more I've used 3 different groups so that you can easily retrieve the version number increase it and replace the string.
Example:
String regex = ;
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(time);
if(matcher.matches()){
int version = matcher.group(2); // don't remember if is 0 or 1 based
}

Java regex returns full string instead of capture

Java Code:
String imagesArrayResponse = xmlNode.getChildText("files");
Matcher m = Pattern.compile("path\":\"([^\"]*)").matcher(imagesArrayResponse);
while (m.find()) {
String path = m.group(0);
}
String:
[{"path":"upload\/files\/56727570aaa08922_0.png","dir":"files","name":"56727570aaa08922_0","original_name":"56727570aaa08922_0.png"}{"path":"upload\/files\/56727570aaa08922_0.png","dir":"files","name":"56727570aaa08922_0","original_name":"56727570aaa08922_0.png"}{"path":"upload\/files\/56727570aaa08922_0.png","dir":"files","name":"56727570aaa08922_0","original_name":"56727570aaa08922_0.png"}{"path":"upload\/files\/56727570aaa08922_0.png","dir":"files","name":"56727570aaa08922_0","original_name":"56727570aaa08922_0.png"}]
m.group returns
path":"upload\/files\/56727570aaa08922_0.png"
instead of captured value of path. Where I am wrong?
See the documentation of group( int index ) method
When called with 0, it returns the entire string. Group 1 is the first.
To avoid such a trap, you should use named group with syntax :
"path\":\"(?<mynamegroup>[^\"]*)"
javadoc:
Capturing groups are indexed from left to right, starting at one. Group zero denotes the entire pattern, so the expression m.group(0) is equivalent to m.group().
m.group(1) will give you the Match. If there are more than one matchset (), it will be m.group(2), m.group(3),...
By convention, AFAIK in regex engines the 0th group is always the whole matched string. Nested groups start at 1.
Check out the grouping options in Matcher.
Matcher m =
Pattern.compile(
//<- (0) -> that's group(0)
// <-(1)-> that's group(1)
"path\":\"([^\"]*)").matcher(imagesArrayResponse);
Change your code to
while (m.find()) {
String path = m.group(1);
}
And you should be okay. This is also worth checking out: What is a non-capturing group? What does a question mark followed by a colon (?:) mean?

Regex - Match numbers & special cases

I'm trying to make a regex that would produce the following results :
for 7.0 + 5 - :asc + (8.256 - :b)^2 + :d/3 : 7.0, 5, :asc, 8.256, :b, 2, :d, 3
for -+*-/^^ )ç# : nothing
It's should first match numbers which can be float, so in my regex I have : [0-9]+(\\.[0-9])? but it should also mach special cases like :a or :Abc.
To be more precise, it should (if possible) match anything but mathematical operators /*+^- and parentheses.
So here is my final regex : ([0-9]+(\\.[0-9])?)|(:[a-zA-Z]+) but it's not working because matcher.groupCount() returns 3 for both of the examples I gave.
Groups are what you specifically group in the regex. Anything surrounded in parentheses is a group. (Hello) World has 1 group, Hello. What you need to be doing is finding all the matches.
In your code ([0-9]+(\\.[0-9])?)|(:[a-zA-Z]+), 3 sets of parentheses can be seen. This is why you will always be given 3 groups in every match.
Your code works fine as it is, here is an example:
String text = "7.0 + 5 - :asc + (8.256 - :b)^2 + :d/3";
Pattern p = Pattern.compile("([0-9]+(\\.[0-9]+)?)|(:[a-zA-Z]+)");
Matcher m = p.matcher(text);
List<String> matches = new ArrayList<String>();
while (m.find()) matches.add(m.group());
for (String match : matches) System.out.println(match);
The ArrayList matches will contain all of the matches that your regex finds.
The only change I made was add a + after the second [0-9].
Here is the output:
7.0
5
:asc
8.256
:b
2
:d
3
Here is some more information about groups in java.
Does that help?
Your regex is correct, run the following code:
String input = "7.0 + 5 - :asc + (8.256 - :b)^2 + :d/3"; // your input
String regex = "(\\d+(\\.\\d+)?)|(:[a-z-A-Z]+)"; // exactly yours.
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
System.out.println(matcher.group());
}
Your problem is the understanding of the method matcher.groupCount(). JavaDoc clearly says
Returns the number of capturing groups in this matcher's pattern.
([^\()+\-*\s])+ //put any mathematical operator inside square bracket

Categories

Resources