Splitting string with parentheses - java

I have a list of String that follows this pattern:
'Name with space (field1_field2) CONST'
Example :
'flow gavage(ZAB_B2_COCUM) BS'
'flowWithoutSpace (WitoutUnderscore) BS'
I would like to extract :
Name with space
The values inside the brackets
The CONST value after the brackets
For the string inside the parentheses () I am using :
\(.*\)
Not sure about the other fields

You may use
String[] results = s.split("\\s*[()]\\s*");
See the regex demo
Pattern details
\\s* - 0+ whitespaces
[()] - a ) or (
\\s* - 0+ whitespaces
If your strings are always in the format specified (no parentheses, (...), no parentheses), you will have:
Name with space = results[0]
The values inside the brackets = results[1]
The CONST value after the brackets = results[2]
If you want a more controlled approach use a matching regex:
Pattern.compile("^([^()]*)\\(([^()]*)\\)(.*)$")
See the regex demo
If you use it with Matcher#matches(), you may omit ^ and $ since that method requires a full string match.
Java demo:
String regex = "^([^()]*)\\(([^()]*)\\)(.*)$";
String s = "flow gavage(ZAB_B2_COCUM) BS";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(s);
if (matcher.matches()){
System.out.println(matcher.group(1).trim());
System.out.println(matcher.group(2).trim());
System.out.println(matcher.group(3).trim());
}
Here, the pattern means:
^ - start of the string (implicit in .matches())
([^()]*) - Capturing group 1: any 0+ chars other than ( and )
\\( - a (
([^()]*) - Capturing group 2: any 0+ chars other than ( and )
\\) - a )
(.*) - Capturing group 3: any 0+ chars, as many as possible, up to the end of the line (use ([^()]*) if you need to restrict ( and ) in this part, too).
$ - end of string (implicit in .matches())

Use the following:-
String line = "'Name with space (field1_field2) CONST'";
Pattern pattern = Pattern.compile("([A-Za-z\\s]+)\\((.*)\\)(.*)\\'");
Matcher matcher = pattern.matcher(line);
String nameWithSpace = "";
String fieldsValuesInBrackets = "";
String constantValue = "";
if (matcher.find()) {
nameWithSpace = matcher.group(1);
fieldsValuesInBrackets = matcher.group(2);
constantValue = matcher.group(3);
}

This expression will generate 3 groups:
(.*?)(\(.*?\))\s*?(.*)
First group will match name, second one will match values inside brackets, third one will match the constant.

Related

How do I make a regex part optional and finish regex at the same point if that optional part contains a string end ($)?

String s1 = "NetworkElement=Test,testWork=1:[456]";
String s2 = "NetworkElement=Test,testWork=1";
String regex = "(.*):\\[(.*)\\]";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(s1);
if(matcher.find()) {
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
}
Matcher matcher2 = pattern.matcher(s2);
if(matcher2.find()) {
System.out.println(matcher2.group(1));
System.out.println(matcher2.group(2));
}
/*
Expected output:
for s1 : NetworkElement=Test,testWork=1
456
for s2 : NetworkElement=Test,testWork=1
0
*/
Problem : This regex is working fine for String s1 but not for s2. for string s2, matcher2.find() return false.
You can use
^(.*?)(?::\[(.*?)])?$
^(.*?)(?::\[([^\]\[]*)])?$
See the regex demo.
In Java:
String regex = "^(.*?)(?::\\[(.*?)])?$";
// Or
String regex = "^(.*?)(?::\\[([^\\]\\[]*)])?$";
I added the ^ and $ anchors since you are using matcher.find(). If you switch to matcher.matches(), you can remove the anchors.
Details:
^ - start of string
(.*?) - Group 1: any zero or more chars other than line break chars as few as possible
(?::\[(.*?)])? - an optional sequence of :[, then Group 2 capturing any zero or more chars other than line break chars as few as possible and then a ] char (if you use [^\]\[]* it will match zero or more chars other than square brackets)
$ - end of string.

Parse string using Java Regex Pattern?

I have the below java string in the below format.
String s = "City: [name:NYK][distance:1100] [name:CLT][distance:2300] [name:KTY][distance:3540] Price:"
Using the java.util.regex package matter and pattern classes I have to get the output string int the following format:
Output: [NYK:1100][CLT:2300][KTY:3540]
Can you suggest a RegEx pattern which can help me get the above output format?
You can use this regex \[name:([A-Z]+)\]\[distance:(\d+)\] with Pattern like this :
String regex = "\\[name:([A-Z]+)\\]\\[distance:(\\d+)\\]";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(s);
StringBuilder result = new StringBuilder();
while (matcher.find()) {
result.append("[");
result.append(matcher.group(1));
result.append(":");
result.append(matcher.group(2));
result.append("]");
}
System.out.println(result.toString());
Output
[NYK:1100][CLT:2300][KTY:3540]
regex demo
\[name:([A-Z]+)\]\[distance:(\d+)\] mean get two groups one the upper letters after the \[name:([A-Z]+)\] the second get the number after \[distance:(\d+)\]
Another solution from #tradeJmark you can use this regex :
String regex = "\\[name:(?<name>[A-Z]+)\\]\\[distance:(?<distance>\\d+)\\]";
So you can easily get the results of each group by the name of group instead of the index like this :
while (matcher.find()) {
result.append("[");
result.append(matcher.group("name"));
//----------------------------^^
result.append(":");
result.append(matcher.group("distance"));
//------------------------------^^
result.append("]");
}
If the format of the string is fixed, and you always have just 3 [...] groups inside to deal with, you may define a block that matches [name:...] and captures the 2 parts into separate groups and use a quite simple code with .replaceAll:
String s = "City: [name:NYK][distance:1100] [name:CLT][distance:2300] [name:KTY][distance:3540] Price:";
String matchingBlock = "\\s*\\[name:([A-Z]+)]\\[distance:(\\d+)]";
String res = s.replaceAll(String.format(".*%1$s%1$s%1$s.*", matchingBlock),
"[$1:$2][$3:$4][$5:$6]");
System.out.println(res); // [NYK:1100][CLT:2300][KTY:3540]
See the Java demo and a regex demo.
The block pattern matches:
\\s* - 0+ whitespaces
\\[name: - a literal [name: substring
([A-Z]+) - Group n capturing 1 or more uppercase ASCII chars (\\w+ can also be used)
]\\[distance: - a literal ][distance: substring
(\\d+) - Group m capturing 1 or more digits
] - a ] symbol.
In the .*%1$s%1$s%1$s.* pattern, the groups will have 1 to 6 IDs (referred to with $1 - $6 backreferences from the replacement pattern) and the leading and final .* will remove start and end of the string (add (?s) at the start of the pattern if the string can contain line breaks).

RegEx: Matching n-char long sequence of repeating character

I want to split of a text string that might look like this:
(((Hello! --> ((( and Hello!
or
########No? --> ######## and No?
At the beginning I have n-times the same special character, but I want to match the longest possible sequence.
What I have at the moment is this regex:
([^a-zA-Z0-9])\\1+([a-zA-Z].*)
This one would return for the first example
( (only 1 time) and Hello!
and for the second
# and No!
How do I tell regEx I want the maximal long repetition of the matching character?
I am using RegEx as part of a Java program in case this matters.
I suggest the following solution with 2 regexps: (?s)(\\W)\\1+\\w.* for checking if the string contains same repeating non-word symbols at the start, and if yes, split with a mere (?<=\\W)(?=\\w) pattern (between non-word and a word character), else, just return a list containing the whole string (as if not split):
String ptrn = "(?<=\\W)(?=\\w)";
List<String> strs = Arrays.asList("(((Hello!", "########No?", "$%^&^Hello!");
for (String str : strs) {
if (str.matches("(?s)(\\W)\\1+\\w.*")) {
System.out.println(Arrays.toString(str.split(ptrn)));
}else { System.out.println(Arrays.asList(str)); }
}
See IDEONE demo
Result:
[(((, Hello!]
[########, No?]
[$%^&^Hello!]
Also, your original regex can be modified to fit the requirement like this:
String ptrn = "(?s)((\\W)\\2+)(\\w.*)";
List<String> strs = Arrays.asList("(((Hello!", "########No?", "$%^&^Hello!");
for (String str : strs) {
Pattern p = Pattern.compile(ptrn);
Matcher m = p.matcher(str);
if (m.matches()) {
System.out.println(Arrays.asList(m.group(1), m.group(3)));
}
else {
System.out.println(Arrays.asList(str));
}
}
See another IDEONE demo
That regex matches:
(?s) - DOTALL inline modifier (if the string has newline characters, .* will also match them).
((\\W)\\2+) - Capture group 1 matching and capturing into Group 2 a non-word character followed by the same character (since a backreference \2 is used) 1 or more times.
(\\w.*) - matches and captures into Group 3 a word character and then one or more characters.

What is wrong in regexp in Java

I want to get the word text2, but it returns null. Could you please correct it ?
String str = "Text SETVAR((&&text1 '&&text2'))";
Pattern patter1 = Pattern.compile("SETVAR\\w+&&(\\w+)'\\)\\)");
Matcher matcher = patter1.matcher(str);
String result = null;
if (matcher.find()) {
result = matcher.group(1);
}
System.out.println(result);
One way to do it is to match all possible pattern in parentheses:
String str = "Text SETVAR((&&text1 '&&text2'))";
Pattern patter1 = Pattern.compile("SETVAR[(]{2}&&\\w+\\s*'&&(\\w+)'[)]{2}");
Matcher matcher = patter1.matcher(str);
String result = "";
if (matcher.find()) {
result = matcher.group(1);
}
System.out.println(result);
See IDEONE demo
You can also use [^()]* inside the parentheses to just get to the value inside single apostrophes:
Pattern patter1 = Pattern.compile("SETVAR[(]{2}[^()]*'&&(\\w+)'[)]{2}");
^^^^^^
See another demo
Let me break down the regex for you:
SETVAR - match SETVAR literally, then...
[(]{2} - match 2 ( literally, then...
[^()]* - match 0 or more characters other than ( or ) up to...
'&& - match a single apostrophe and two & symbols, then...
(\\w+) - match and capture into Group 1 one or more word characters
'[)]{2} - match a single apostrophe and then 2 ) symbols literally.
Your regex doesn't match your string, because you didn't specify the opened parenthesis also \\w+ will match any combinations of word character and it won't match space and &.
Instead you can use a negated character class [^']+ which will match any combinations of characters with length 1 or more except one quotation :
String str = "Text SETVAR((&&text1 '&&text2'))";
"SETVAR\\(\\([^']+'&&(\\w+)'\\)\\)"
Debuggex Demo

Java regexto match tuples

I need to extract tuples out of string
e.g. (1,1,A)(2,1,B)(1,1,C)(1,1,D)
and thought some regex like:
String tupleRegex = "(\\(\\d,\\d,\\w\\))*";
would work but it just gives me the first tuple. What would be proper regex to match all the tuples in the strings.
Remove the * from the regex and iterate over the matches using a java.util.regex.Matcher:
String input = "(1,1,A)(2,1,B)(1,1,C)(1,1,D)";
String tupleRegex = "(\\(\\d,\\d,\\w\\))";
Pattern pattern = Pattern.compile(tupleRegex);
Matcher matcher = pattern.matcher(input);
while(matcher.find()) {
System.out.println(matcher.group());
}
The * character is a quantifier that matches zero or more tuples. Hence your original regex would match the entire input string.
One line solution using String.split() method and here is the pattern (?!^\\()(?=\\()
Arrays.toString("(1,1,A)(2,1,B)(1,1,C)(1,1,D)".split("(?!^\\()(?=\\()"))
output:
[(1,1,A), (2,1,B), (1,1,C), (1,1,D)]
Here is DEMO as well.
Pattern explanation:
(?! look ahead to see if there is not:
^ the beginning of the string
\( '('
) end of look-ahead
(?= look ahead to see if there is:
\( '('
) end of look-ahead

Categories

Resources