I am looking for a regular expression to split a string on commas. Sounds very simple, but there is another restriction. The parameters on the string could have commas surrounded by parenthesis which should not split the string.
Example:
1, 2, 3, add(4, 5, 6), 7, 8
^ ^ ^ ! ! ^ ^
The string should only be splitted by the commas marked with ^ and not with !.
I found a solution for it here: A regex to match a comma that isn't surrounded by quotes
Regex:
,(?=([^\(]*\([^\)]*\))*[^\)]*$)
But my string could be more complex:
1, 2, 3, add(4, 5, add(6, 7, 8), 9), 10, 11
^ ^ ^ ! ! ! ! ! ^ ^
For this string the result is wrong and i have no clue how to fix this or if it even is possible with regular expressions.
Have anyone an idea how to resolve this problem?
Thanks for your help!
Ok, I think a regular expression is not very useful for this. A small block of java might be easier.
So this is my java code for solving the problem:
public static void splitWithJava() {
String EXAMPLE = "1, 2, 3, add(4, 5, add(7, 8), 6), 7, 8";
List<String> list = new ArrayList<>();
int start = 0;
int pCount = 0;
for (int i = 0; i < EXAMPLE.length(); i++) {
char c = EXAMPLE.charAt(i);
switch (c) {
case ',': {
if (0 == pCount) {
list.add(EXAMPLE.substring(start, i).trim());
start = i + 1;
};
break;
}
case '(': {
pCount++;
break;
}
case ')': {
pCount--;
break;
}
}
}
list.add(EXAMPLE.substring(start).trim());
for (String str : list) {
System.out.println(str);
}
}
You can also achieve this using this regex: ([^,(]+(?=,|$)|[\w]+\(.*\)(?=,|$))
regex online demo
Considering this text 1, 2, 3, add(4, 5, add(6, 7, 8), 9), 10, 11 it creates groups based on commas (not surrounded by ())
So, the output would be:
Match 1
Group 1. 0-1 `1`
Match 2
Group 1. 2-4 ` 2`
Match 3
Group 1. 5-7 ` 3`
Match 4
Group 1. 9-35 `add(4, 5, add(6, 7, 8), 9)`
Match 5
Group 1. 36-39 ` 10`
Match 6
Group 1. 40-43 ` 11`
Related
let's say I have a string.
String str = "Hello6 9World 2, Nic8e D7ay!";
Matcher match = Pattern.compile("\\d+").matcher(str);
the line above would give me 6, 9, 2, 8 and 7, which is perfect!
But if my string changes to..
String str = "Hello69World 2, Nic8e D7ay!";
note that the space between 6 and 9 is removed in this string.
and if I run..
Matcher match = Pattern.compile("\\d+").matcher(str);
it would give me 69, 2, 8 and 7.
my requirement is to extract the single digit numbers only. here, what I need is 2, 8, 7 and omit 69.
could you please help me to improve my regex? Thank you!
For each digit, you have to check if it is not followed or preceded by a
digit
You can try this :
public static void main(String[] args) {
String str = "Hello69World 2, Nic8e D7ay!";
Pattern p = Pattern.compile("(?<!\\d)\\d(?!\\d)");
Matcher m = p.matcher(str);
while (m.find()) {
System.out.println(m.group());
}
System.out.println("***********");
str = "Hello6 9World 2, Nic8e D7ay!";
m = p.matcher(str);
while (m.find()) {
System.out.println(m.group());
}
}
O/P :
2
8
7
***********
6
9
2
8
7
public static void main(String[] args) {
String title = "Today, and tomorrow,2,1,2,5,0";
String[] titleSep = title.split(",");
System.out.println(Arrays.toString(titleSep));
System.out.println(titleSep[0]);
System.out.println(titleSep[1]);
}
output:
[Today, and tomorrow, 2, 1, 2, 5, 0]
Today
(space) and tomorrow
I want to treat "Today, and tomorrow" as a phrase representing the first index value of titleSep (do not want to separate at comma it contains).
What is the split method argument that would split the string only at commas NOT followed by a space?
(Java 8)
Use a negative look ahead:
String[] titleSep = title.split(",(?! )");
The regex (?! ) means "the input following the current position is not a space".
FYI a negative look ahead has the form (?!<some regex>) and a positive look ahead has the form (?=<some regex>)
The argument to the split function is a regex, so we can use a negative lookahead to split by comma-not-followed-by-space:
String title = "Today, and tomorrow,2,1,2,5,0";
String[] titleSep = title.split(",(?! )"); // comma not followed by space
System.out.println(Arrays.toString(titleSep));
System.out.println(titleSep[0]);
System.out.println(titleSep[1]);
The output is:
[Today, and tomorrow, 2, 1, 2, 5, 0]
Today, and tomorrow
2
So I've been looking at a lot of searches when it came to regex expressions, however i'm still pretty confused on how to set them up. The issue I'm having is that i'm trying to convert this given text given from an input file:
(42, 10) d (23, 1) (123, 4) (32, 10) d (12, 9)
such that I can stick it into an String array where it looks like this:
[42, 10, d, 23, 1, 123, 4, 32, 10, d, 12, 9]
Any tips?
I tried using a delimiter at first to get rid of the parentheses and commas however, delimiters puts each value on a whole separate line which sadly isn't what I'm aiming for. I'm essentially trying to ignore those special characters so I can assign for example 42 to an int a, and 10 to int b.
What language are you using? EDIT: nvm I see you're using Java. Premise is still there on how to do it, I'll get back to you in a bit with the Java version.
In perl this would be pretty simple.
use Data::Dumper;
my $var = "(42, 10) d (23, 1) (123, 4) (32, 10) d (12, 9)";
$var =~ s/\)/,/g;
$var =~ s/\(//g;
$var =~ s/d/d,/g;
$var =~ s/\s*//g;
my #arr = split /,/, $var;
print Dumper \#arr;
Java Version:
String content = "(42, 10) d (23, 1) (123, 4) (32, 10) d (12, 9)";
String[] split = null;
split = content.replace(")",",").replace("(","").replace("d","d,").replace(" ","").split(",");
for (String a : split)
{
System.out.println(a);
}
Although I guess this doesn't strictly answer your question, since it doesn't use regex. It just uses replace and split.
If you really want regex, then this works better when there is more complexity in your data.
Note: I just do alphanumeric data, since the d didn't seem to mean anything special
Pattern p = Pattern.compile("[A-Za-z0-9]+");
Matcher m = p.matcher("(42, 10) d (23, 1) (123, 4) (32, 10) d (12, 9)");
String delim = ",";
StringBuffer sb = new StringBuffer("[");
while (m.find()) {
sb.append(m.group()).append(delim);
}
sb.setLength(sb.length() - delim.length());
System.out.println(sb.append("]").toString());
Output
[42,10,d,23,1,123,4,32,10,d,12,9]
Use a List<String> if you do want to keep that data around.
Let's say I have a string: "(2 * 32) + 5 ^ 2"
I'd like to turn this into a String array: [(2, *, 32, ), +, 5, ^, 2]
i.e. I don't want to capture spaces in the original string and I want to split by whitespace characters.
So I tried string.split**("\\s+")** but the result looks like [(2,*,32), +, 5, ^, 2].
Can someone explain why it doesn't split "(2" into (,2? Thank you!
This works, and has the added benefit of not splitting when there are numbers longer than 1 digit, and not requiring spaces between tokens.
String str = "(2*32) + 5 ^ 2";
String[] tokens = str.replace(" ", "").split("\\b|(?=\\D)");
Output:
[ (, 2, *, 32, ), +, 5, ^, 2 ]
Ideone Demo
I have this line of code: temp5.replaceAll("\\W", "");
The contents of temp5 at this point are: [1, 2, 3, 4] but the regex doesn't remove anything. And when I do a toCharArray() I end up with this: [[, 1, ,, , 2, ,, , 3, ,, , 4, ]]
Am I not using the regex correctly? I was under the impression that \W should remove all punctuation and white space.
Note: temp5 is a String
And I just tested using \w, \W, and various others. Same output for all of them
Strings are immutable. replaceAll() returns the string with the changes made, it does not modify temp5. So you might do something like this instead:
temp5 = temp5.replaceAll("\\W", "");
After that, temp5 will be "1234".
String temp5="1, 2, 3, 4";
temp5=temp5.replaceAll("\\W", "");
System.out.println(temp5.toCharArray());
This will help