How to split the string with slash correctly - java

Code:
String line = "/abc/1/";
String[] tokens = line.split("/");
I want to get {"", "abc", "1", ""}.
However, the actual output is {"", "abc", "1"}.
What confuses me is why there is only one "", maybe there is something wrong with line.split("/").

Use the not-often-used second parameter of String#split:
String line = "/abc/1/";
String[] tokens = line.split("/", -1);
This returns {"", "abc", "1", ""}.
Demo
From the documentation for String#split(String pattern, int n):
If n is non-positive then the pattern will be applied as many times as possible and the array can have any length

Just a follow-up to Tim's answer, as the doc clearly points out there is a second flag we can use to control the times of the regex applied to the string. And there are three different options we have for the limit:
public String[] split(String regex, int limit)
If the limit n is positive then the returned array's length will be no greater than n, and the array's last entry will contain all the left.
If the limit n is negative then there is no limit and all the elements that match the pattern will be returned;
If the limit n is zero, then based on the No.2, all the suffixing/trailing empties will be discarded.
So to your problem, you should try:
line.split("/", -1); // include all results.

Related

Java, splitting string into array

I am trying to split a string into string array. And I have stumbled to something strange to me. I don't understand why it works like this.
String one, two;
one = "";
two = ":";
String[] devided1 = one.trim().split(":");
String[] devided2 = two.trim().split(":");
System.out.println("size: "+ devided1.length);
System.out.println("size: "+ devided2.length);
I get output:
size: 1
size: 0
Why is empty string giving me size of one, but string that only has the delimiter gives my array size of 0?
I saw more confusing things like: that size of "::" is 0, but size of ": :" is 2, not 3.
Can someone please explain it to me?
See the doc comment in source code or documentation for public String[] split(String regex, int limit) method.
Case 1:
String one = "";
String[] devided1 = one.trim().split(":");
The resulting array will have 1 element = original string String[1] [""], because expresion ":" was not match any part of the input string.
According to documentation:
If the
* expression does not match any part of the input then the resulting array
* has just one element, namely this string.
Case 2:
String two = ":";
String[] devided2 = two.trim().split(":");
The split(":") has default limit = 0. It means that from the resulting array trailing empty strings will be removed. So method splits ":" string to array with two empty strings and then remove them and as result we get empty array.
According to documentation:
If limit is zero then the pattern will be applied as many times as
possible, the array can have any length, and trailing empty strings
will be discarded.
Case 3:
String two = ":";
String[] devided2 = two.trim().split(":", -1);
We will get an array with two empty strings.
According to documentation:
If limit is non-positive then the pattern will be applied as many
times as possible and the array can have any length
Case 4:
String two = "::";
String[] devided2 = two.trim().split(":");
We will get empty array. It is the same like Case 2.
Case 5:
String one = ": :";
String[] devided1 = one.trim().split(":");
The method will split string to three array elements ["", " ", ""] and then remove empty strings from the end of array, because limit = 0. We will get String[2] ["", " "].
According to documentation:
If limit is zero then the pattern will be applied as many times as
possible, the array can have any length, and trailing empty strings
will be discarded.
This link is helpful:
https://konigsberg.blogspot.com/2009/11/final-thoughts-java-puzzler-splitting.html
Basically, it is for perl compatibility.
You can use split(":", -1) here if you don't want that behavior.
Otherwise, split(":") defaults to split(":", 0), and the difference is:
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/String.html#split(java.lang.String,int)
If the limit is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
If the limit is negative then the pattern will be applied as many times as possible and the array can have any length.
In case of ":" being splitted, it would result in {"" , ""}, but empty traling spaces will be discarded, so it will return an empty array.

How to divide a String into identical repeating parts

Given a String, I want to divide it up into substrings that are all identical. For example:
"abcabcabcabc" -> ["abc", "abc", "abc", "abc"]
"aaaaaa" -> ["a", "a", "a", "a", "a", "a"]
"abc" -> ["abc"]
My problem is figuring out the logic of finding where to break the characters. My approach initial attempt is:
public static void FindPattern(String s) {
int no_of_characters = 256;
int[] count = new int[no_of_characters];
Arrays.fill(count, 0);
for (int i= 0; i < s.length();i++){
count[s.charAt(i)]++;
}
}
public static void main(String[] args) {
String s = "abcabcabd";
FindPattern(s);
}
but I have no idea of where to go from there.
You can use regex to find the smallest substring that when repeated is the same as the whole string:
String part = str.replaceAll("^(.+?)\\1*$", "$1");
Breaking down the regex:
^ means "start of input"
(.*?) means "capture (as group 1) the smallest amount of input that will result in a match"
\1 is a back reference to group 1, meaning "another copy of what was captured in group 1"
* zero or more of the the back reference
$1 the replacement is what was captured in group 1
Because zero further copies are allowed to complete the match, when there is no repeating group, the whole string is returned, which is correct behaviour.
Once you have this string, you don't actually need to "divide" the string up, you just need n copies of it. However as a convenience you can split the sting equal parts by splitting on the length of the result of the above:
String[] parts = str.split("(?<=\\G.{" + str.replaceAll("^(.*?)\\1*$", "$1").length() + "})");
More simply, the split regex is (?<=\G.{n}), which means "there are n characters between the end of the previous match and the current position".

Java difference between "split(regEx)" and "split(regEx, 0)"?

Is there any difference between using split(regEx) and split(regEx, 0)?
Because the output is for the cases I tested the same. Ex:
String myS = this is stack overflow;
String[] mySA = myS.split(' ');
results in mySA === {'this','is','stack,'overflow'}
And
String myS = this is stack overflow;
String[] mySA = myS.split(' ', 0);
also results in mySA === {'this','is','stack,'overflow'}
Is there something "hidden" going on here? Or something else which needs to be known about the .split(regEx, 0)?
They are essentially the same.
Quoted from String.split(String regex) documentation:
This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.
Answering the question. Yes they're same.
Please find the split method of String class which intern calls the split(regex,0) method.
public String[] split(String regex) {
return split(regex, 0);
}
The limit parameter controls the number of times the pattern is
applied and therefore affects the length of the resulting array. If
the limit n is greater than zero then the pattern will be applied at
most n - 1 times, the array's length will be no greater than n, and
the array's last entry will contain all input beyond the last matched
delimiter. If n is non-positive then the pattern will be applied as
many times as possible and the array can have any length. If n is zero
then the pattern will be applied as many times as possible, the array
can have any length, and trailing empty strings will be discarded
For example the following code can give you some insight.
String myS = "this is stack overflow";
String[] mySA = myS.split(" ", 2);
String[] withOutLimit = myS.split(" ");
System.out.println(mySA.length); // prints 2
System.out.println(withOutLimit.length); // prints 4

Spliting a String upto nth delimiter in java

String s = "10.226.18.158:10.226.17.183:ABCD :AAAA"
My requirement is to split the string at up to 3rd : or up to 2nd :. i.e.
Something like String sa[] = s.split(), but with the regex splitting only up to 3rd or 2nd.
s[0] = "10.226.18.158"
s[1] = "10.226.17.183"
s[2] = "ABCD :AAAA"
According to the String#split() javadoc you can add a number to limit the number of splits.
s.split(":", 3);
Edit: as melwil metions This will return an array of up to the number passed in long.
So in your example of splitting up to 2nd : you would need to pass in 3.
s.split(":",3) returns the output
sa[0] = "10.226.18.158"
sa[1] = "10.226.17.183"
sa[2] = "ABCD :AAAA"
Relevent section quoted from the java doc about how the second argument (limit) works.
The limit parameter controls the number of times the pattern is
applied and therefore affects the length of the resulting array. If
the limit n is greater than zero then the pattern will be applied at
most n - 1 times, the array's length will be no greater than n, and
the array's last entry will contain all input beyond the last matched
delimiter. If n is non-positive then the pattern will be applied as
many times as possible and the array can have any length. If n is zero
then the pattern will be applied as many times as possible, the array
can have any length, and trailing empty strings will be discarded.
You can split your string basing on one non-whitespece character, \S{1}, followed by a colon, ::
String sa[] = s.split("\\S{1}:");

Replace a series of substrings by one copy

For a given word, I want to search for all the substrings that appear next to each other at least 3 times, and replace all of them by only one. I know how to do this when the substring is only one character. For instance, the code below returns "Bah" for the input string "Bahhhhhhh":
String term = "Bahhhhhhh";
term = term.replaceAll("(.)\\1{2,}", "$1");
However, I need a more generic pattern that converts "Bahahahaha" into "Baha".
String[] terms = { "Bahhhhhhh", "Bahahahaha" };
for (String term : terms) {
System.out.println(term.replaceAll("(.+?)\\1{2,}", "$1"));
}
Output:
Bah
Baha
This will work for repetitions of 1, 2, or 3 characters long.
String term = "Bahhhhhhh";
term = term.replaceAll("(.{1,3})\\1{2,}", "$1");
You'll want to be careful to avoid huge backtracking performance hits. That's why I limited it to 1-3 characters.

Categories

Resources