How to split a string based on parenthesis and avoiding whitespace

How to split a string based on parenthesis and avoiding whitespace - java

I'm trying to reformat a string using the str.split() on string such as
"(ABD) (DEFG) (HIJKLMN)" (has one or more spaces between)
I've tried using this RegEx (Java)
[the example string] .split("\\(|\\)")
My output keeps including the "" or " " in my array from splitting, which I don't want I would want my array to be such that
array[0] = "ABC"
array[1] = "DEFG"
etc.

I would perform two steps, use String.replaceAll(String, String) to remove the () characters. Then, split on white-space. Like,
String str = "(ABD) (DEFG) (HIJKLMN)";
System.out.println(Arrays.toString(str.replaceAll("[()]", "").split("\\W+")));
which outputs (as requested)
[ABD, DEFG, HIJKLMN]
Alternatively, you could use an ArrayList and compile a reusable Pattern to perform a grouping operation on the contents of () literals. Like,
String str = "(ABD) (DEFG) (HIJKLMN)";
Pattern p = Pattern.compile("\\((\\w+)\\)");
Matcher m = p.matcher(str);
List<String> matches = new ArrayList<>();
while (m.find()) {
matches.add(m.group(1));
}
System.out.println(matches);
which will continue to work in the face of input without white-space between ()(s) like String str = "(ABD)(DEFG)(HIJKLMN)";

Related

How can I divide a string with split() into separate strings with 15 items at each index?

Normally with split() it will divide a whole string by one regex.
So if I take String objects= "door,cat,house,trash";
and do objects.split(",") it will split it into an array of String[] objects= {"door","cat","house","trash"}; as you probably know.
But I don't want to separate it with every comma. I want to separate it with every 15th item. So my list would have String[] objects= {"door,cat,house,trash...obj15","obj1,obj2,obj3...obj15"};
Does that make sense? How would I go about doing that?

split() is not good for that. Use a find() loop instead.
Demo (Java 4+)
String input = "door,cat,house,trash,e5,f6,g7,h8,i9,j10,k11,l12,m13,n14,o15,p16,q17,r18,s19,t20";
Pattern p = Pattern.compile("(?=.)((?:[^,]*,){0,4}[^,]*),?");
List<String> result = new ArrayList<>();
for (Matcher m = p.matcher(input); m.find(); )
result.add(m.group(1));
for (String s : result)
System.out.println('"' + s + '"');
Demo (Java 9+)
String input = "door,cat,house,trash,e5,f6,g7,h8,i9,j10,k11,l12,m13,n14,o15,p16,q17,r18,s19,t20";
String[] result = Pattern.compile("(?=.)((?:[^,]*,){0,4}[^,]*),?").matcher(input)
.results().map(m -> m.group(1)).toArray(String[]::new);
Arrays.stream(result).forEach(s -> System.out.println('"' + s + '"'));
Output
"door,cat,house,trash,e5"
"f6,g7,h8,i9,j10"
"k11,l12,m13,n14,o15"
"p16,q17,r18,s19,t20"
You should of course change {0,4} to {0,14} if you want 15 values per block.

The other answers suggest using complicated regular expressions. I would avoid that. Use Guava to partition the split string into groups of fifteen, and join each group back up with commas:
String[] objects = "door,cat,trash,house,...";
List<String> list = Lists.newArrayList(objects.split(","));
String[] result = Lists.partition(list, 15).stream()
.map(each -> String.join(",", each))
.toArray(String[]::new);

First thing that came to mind is try splitting them with split(",") then loop through them and add any other symbol in the 15th place other than , (e.g $).
Then you split by that (e.g split("$"))
I don't know how effective it is but here you go.

Try the following regex pattern [a-z]+(,[a-z]+){14}. This will match
a list of 15 comma separated words. If I were you, I would switch to using the provided Matcher and Pattern classes to find this regex pattern (instead of using split).
Pattern pattern = Pattern.compile("[a-z]+(,[a-z]+){14}");
Matcher matcher = pattern.matcher("dog,door,cat,other,etc...");
while (matcher.find()) {
System.out.println(matcher.group());
}

How would I replace this function with a regex replace

I have a file name with this format yy_MM_someRandomString_originalFileName.
example:
02_01_fEa3129E_my Pic.png
I want replace the first 2 underscores with / so that the example becomes:
02/01/fEa3129E_my Pic.png
That can be done with replaceAll, but the problem is that files may contain underscores as well.
#Test
void test() {
final var input = "02_01_fEa3129E_my Pic.png";
final var formatted = replaceNMatches(input, "_", "/", 2);
assertEquals("02/01/fEa3129E_my Pic.png", formatted);
}
private String replaceNMatches(String input, String regex,
String replacement, int numberOfTimes) {
for (int i = 0; i < numberOfTimes; i++) {
input = input.replaceFirst(regex, replacement);
}
return input;
}
I solved this using a loop, but is there a pure regex way to do this?
EDIT: this way should be able to let me change a parameter and increase the amount of underscores from 2 to n.

You could use 2 capturing groups and use those in the replacement where the match of the _ will be replaced by /
^([^_]+)_([^_]+)_
Replace with:
$1/$2/
Regex demo | Java demo
For example:
String regex = "^([^_]+)_([^_]+)_";
String string = "02_01_fEa3129E_my Pic.png";
String subst = "$1/$2/";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
String result = matcher.replaceFirst(subst);
System.out.println(result);
Result
02/01/fEa3129E_my Pic.png

Your current solution has few problems:
It is inefficient - because each replaceFirst need to start from beginning of string so it needs to iterate over same starting characters many times.
It has a bug - because of point 1. while iterating from beginning instead of last modified place, we can replace value which was inserted previously.
For instance if we want to replace single character two times, each with X like abc -> XXc after code like
String input = "abc";
input = input.replaceFirst(".", "X"); // replaces a with X -> Xbc
input = input.replaceFirst(".", "X"); // replaces X with X -> Xbc
we will end up with Xbc instead of XXc because second replaceFirst will replace X with X instead of b with X.
To avoid that kind of problems you can rewrite your code to use Matcher#appendReplacement and Matcher#appendTail methods which ensures that we will iterate over input once and can replace each matched part with value we want
private static String replaceNMatches(String input, String regex,
String replacement, int numberOfTimes) {
Matcher m = Pattern.compile(regex).matcher(input);
StringBuilder sb = new StringBuilder();
int i = 0;
while(i++ < numberOfTimes && m.find() ){
m.appendReplacement(sb, replacement); // replaces currently matched part with replacement,
// and writes replaced version to StringBuilder
// along with text before the match
}
m.appendTail(sb); //lets add to builder text after last match
return sb.toString();
}
Usage example:
System.out.println(replaceNMatches("abcdefgh", "[efgh]", "X", 2)); //abcdXXgh

Why the string does not split?

While trying to split a string xyz213123kop234430099kpf4532 into tokens :
xyz213123
kop234430099
kpf4532
I wrote the following code
String s = "xyz213123kop234430099kpf4532";
String regex = "/^[a-zA-z]+[0-9]+$/";
String tokens[] = s.split(regex);
for(String t : tokens) {
System.out.println(t);
}
but instead of tokens, I get the whole string as one output. What is wrong with the regular expression I used ?

You can do that:
String s = "xyz213123kop234430099kpf4532";
String[] result = s.split("(?<=[0-9])(?=[a-z])");
The idea is to use zero width assertions to find the place where to cut the string, then I use a lookbehind (preceded by a digit [0-9]) and a lookahead (followed by a letter [a-z]).
These lookarounds are just checks and match nothing, thus the delimiter of the split is an empty string and no characters are removed from the result.

You could split on this matching between a number and not-a-number.
String s = "xyz213123kop234430099kpf4532";
String[] parts = s.split("(?<![^\\d])(?=\\D)");
for (String p : parts) {
System.out.println(p);
}
Output
xyz213123
kop234430099
kpf4532

There's nothing in your string that matches the regular expression, because your expression starts with ^ (beginning of string) and ends with $ (end of string). So it would either match the whole string, or nothing at all. But because it doesn't match the string, it is not found when you split the string into tokens. That's why you get just one big token.

You don't want to use split for that. The argument to split is the delimiter between tokens. You don't have that. Instead, you have a pattern that repeats and you want each match to the pattern. Try this instead:
String s = "xyz213123kop234430099kpf4532";
Pattern p = Pattern.compile("([a-zA-z]+[0-9]+)");
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group());
}
Output:
xyz213123
kop234430099
kpf4532
(I don't know by what logic you would have the second token be "3kop234430099" as in your posted question. I assume that the leading "3" is a typo.)

How to convert a String to String array in Java ( Ignore whitespace and parentheses )

The String will looks like this:
String temp = "IF (COND_ITION) (ACT_ION)";
// Only has one whitespace in either side of the parentheses
or
String temp = " IF (COND_ITION) (ACT_ION) ";
// Have more irrelevant whitespace in the String
// But no whitespace in condition or action
I hope to get a new String array which contains three elemets, ignore the parentheses:
String[] tempArray;
tempArray[0] = IF;
tempArray[1] = COND_ITION;
tempArray[2] = ACT_ION;
I tried to use String.split(regex) method but I don't know how to implement the regex.

If your input string will always be in the format you described, it is better to parse it based on the whole pattern instead of just the delimiter, as this code does:
Pattern pattern = Pattern.compile("(.*?)[/s]\\((.*?)\\)[/s]\\((.*?)\\)");
Matcher matcher = pattern.matcher(inputString);
String tempArray[3];
if(matcher.find()) {
tempArray[0] name = matcher.group(1);
tempArray[1] name = matcher.group(2);
tempArray[2] name = matcher.group(3);
}
Pattern breakdown:
(.*?) IF
[/s] white space
\\((.*?)\\) (COND_ITION)
[/s] white space
\\((.*?)\\) (ACT_ION)

You can use StringTokenizer to split into strings delimited by whitespace. From Java documentation:
The following is one example of the use of the tokenizer. The code:
StringTokenizer st = new StringTokenizer("this is a test");
while (st.hasMoreTokens()) {
System.out.println(st.nextToken());
}
prints the following output:
this
is
a
test
Then write a loop to process the strings to replace the parentheses.

I think you want a regular expression like "\\)? *\\(?", assuming any whitespace inside the parentheses is not to be removed. Note that this doesn't validate that the parentheses match properly. Hope this helps.

Java Split not working as expected

I am trying to use a simple split to break up the following string: 00-00000
My expression is: ^([0-9][0-9])(-)([0-9])([0-9])([0-9])([0-9])([0-9])
And my usage is:
String s = "00-00000";
String pattern = "^([0-9][0-9])(-)([0-9])([0-9])([0-9])([0-9])([0-9])";
String[] parts = s.split(pattern);
If I play around with the Pattern and Matcher classes I can see that my pattern does match and the matcher tells me my groupCount is 7 which is correct. But when I try and split them I have no luck.

String.split does not use capturing groups as its result. It finds whatever matches and uses that as the delimiter. So the resulting String[] are substrings in between what the regex matches. As it is the regex matches the whole string, and with the whole string as a delimiter there is nothing else left so it returns an empty array.
If you want to use regex capturing groups you will have to use Matcher.group(), String.split() will not do.

for your example, you could simply do this:
String s = "00-00000";
String pattern = "-";
String[] parts = s.split(pattern);

I can not be sure, but I think what you are trying to do is to get each matched group into an array.
Matcher matcher = Pattern.compile(pattern).matcher();
if (matcher.matches()) {
String s[] = new String[matcher.groupCount()) {
for (int i=0;i<matches.groupCount();i++) {
s[i] = matcher.group(i);
}
}
}

From the documentation:
String[] split(String regex) -- Returns: the array of strings computed by splitting this string around matches of the given regular expression
Essentially the regular expression is used to define delimiters in the input string. You can use capturing groups and backreferences in your pattern (e.g. for lookarounds), but ultimately what matters is what and where the pattern matches, because that defines what goes into the returned array.
If you want to split your original string into 7 parts using regular expression, then you can do something like this:
String s = "12-3456";
String[] parts = s.split("(?!^)");
System.out.println(parts.length); // prints "7"
for (String part : parts) {
System.out.println("Part [" + part + "]");
} // prints "[1] [2] [-] [3] [4] [5] [6] "
This splits on zero-length matching assertion (?!^), which is anywhere except before the first character in the string. This prevents the empty string to be the first element in the array, and trailing empty string is already discarded because we use the default limit parameter to split.
Using regular expression to get individual character of a string like this is an overkill, though. If you have only a few characters, then the most concise option is to use foreach on the toCharArray():
for (char ch : "12-3456".toCharArray()) {
System.out.print("[" + ch + "] ");
}
This is not the most efficient option if you have a longer string.
Splitting on -
This may also be what you're looking for:
String s = "12-3456";
String[] parts = s.split("-");
System.out.println(parts.length); // prints "2"
for (String part : parts) {
System.out.print("[" + part + "] ");
} // prints "[12] [3456] "

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to split a string based on parenthesis and avoiding whitespace - java

Related

How can I divide a string with split() into separate strings with 15 items at each index?

How would I replace this function with a regex replace

Why the string does not split?

How to convert a String to String array in Java ( Ignore whitespace and parentheses )

Java Split not working as expected

Categories

Resources