Java difference between "split(regEx)" and "split(regEx, 0)"? - java

Is there any difference between using split(regEx) and split(regEx, 0)?
Because the output is for the cases I tested the same. Ex:
String myS = this is stack overflow;
String[] mySA = myS.split(' ');
results in mySA === {'this','is','stack,'overflow'}
And
String myS = this is stack overflow;
String[] mySA = myS.split(' ', 0);
also results in mySA === {'this','is','stack,'overflow'}
Is there something "hidden" going on here? Or something else which needs to be known about the .split(regEx, 0)?

They are essentially the same.
Quoted from String.split(String regex) documentation:
This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.

Answering the question. Yes they're same.
Please find the split method of String class which intern calls the split(regex,0) method.
public String[] split(String regex) {
return split(regex, 0);
}
The limit parameter controls the number of times the pattern is
applied and therefore affects the length of the resulting array. If
the limit n is greater than zero then the pattern will be applied at
most n - 1 times, the array's length will be no greater than n, and
the array's last entry will contain all input beyond the last matched
delimiter. If n is non-positive then the pattern will be applied as
many times as possible and the array can have any length. If n is zero
then the pattern will be applied as many times as possible, the array
can have any length, and trailing empty strings will be discarded
For example the following code can give you some insight.
String myS = "this is stack overflow";
String[] mySA = myS.split(" ", 2);
String[] withOutLimit = myS.split(" ");
System.out.println(mySA.length); // prints 2
System.out.println(withOutLimit.length); // prints 4

Related

Java, splitting string into array

I am trying to split a string into string array. And I have stumbled to something strange to me. I don't understand why it works like this.
String one, two;
one = "";
two = ":";
String[] devided1 = one.trim().split(":");
String[] devided2 = two.trim().split(":");
System.out.println("size: "+ devided1.length);
System.out.println("size: "+ devided2.length);
I get output:
size: 1
size: 0
Why is empty string giving me size of one, but string that only has the delimiter gives my array size of 0?
I saw more confusing things like: that size of "::" is 0, but size of ": :" is 2, not 3.
Can someone please explain it to me?
See the doc comment in source code or documentation for public String[] split(String regex, int limit) method.
Case 1:
String one = "";
String[] devided1 = one.trim().split(":");
The resulting array will have 1 element = original string String[1] [""], because expresion ":" was not match any part of the input string.
According to documentation:
If the
* expression does not match any part of the input then the resulting array
* has just one element, namely this string.
Case 2:
String two = ":";
String[] devided2 = two.trim().split(":");
The split(":") has default limit = 0. It means that from the resulting array trailing empty strings will be removed. So method splits ":" string to array with two empty strings and then remove them and as result we get empty array.
According to documentation:
If limit is zero then the pattern will be applied as many times as
possible, the array can have any length, and trailing empty strings
will be discarded.
Case 3:
String two = ":";
String[] devided2 = two.trim().split(":", -1);
We will get an array with two empty strings.
According to documentation:
If limit is non-positive then the pattern will be applied as many
times as possible and the array can have any length
Case 4:
String two = "::";
String[] devided2 = two.trim().split(":");
We will get empty array. It is the same like Case 2.
Case 5:
String one = ": :";
String[] devided1 = one.trim().split(":");
The method will split string to three array elements ["", " ", ""] and then remove empty strings from the end of array, because limit = 0. We will get String[2] ["", " "].
According to documentation:
If limit is zero then the pattern will be applied as many times as
possible, the array can have any length, and trailing empty strings
will be discarded.
This link is helpful:
https://konigsberg.blogspot.com/2009/11/final-thoughts-java-puzzler-splitting.html
Basically, it is for perl compatibility.
You can use split(":", -1) here if you don't want that behavior.
Otherwise, split(":") defaults to split(":", 0), and the difference is:
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/String.html#split(java.lang.String,int)
If the limit is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
If the limit is negative then the pattern will be applied as many times as possible and the array can have any length.
In case of ":" being splitted, it would result in {"" , ""}, but empty traling spaces will be discarded, so it will return an empty array.

Java String split inconsistency

If I split "hello|" and "|hello" with "|" character, then I get one value for the first and two values for the second version.
String[] arr1 = new String("hello|").split("\\|");
String[] arr2 = new String("|hello").split("\\|");
System.out.println("arr1 length: " + arr1.length + "\narr2 length: " + arr2.length);
This prints out:
arr1 length: 1
arr2 length: 2
Why is this?
According to java docs. split creates an empty String if the first character is the separator, but doesn't create an empty String (or empty Strings) if the last character (or consecutive characters) is the separator. You will get the same behavior regardless of the separator you use.
Trailing empty String will not be included in array check the following statement.
String#split
This method works as if by invoking the two-argument split method with
the given expression and a limit argument of zero. Trailing empty
strings are therefore not included in the resulting array.
String#split always returns the array of strings computed by splitting this string around matches of the given regular expression.
Check the source code for the answer: http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/regex/Pattern.java#Pattern.compile%28java.lang.String%29
The last lines contains the answer:
int resultSize = matchList.size();
if (limit == 0)
while (resultSize > 0 && matchList.get(resultSize-1).equals(""))
resultSize--;
String[] result = new String[resultSize];
So the end will not be included if it is empty.

Why is String.split behaving like this?

My code is
public class Main
{
public static void main(String[] args)
{
String inputString = "#..#...##";
String[] abc = inputString.trim().split("#+");
for (int i = 0; i < abc.length; i++)
{
System.out.println(abc[i]);
}
System.out.println(abc.length);
}
}
The output abc is an array of length 3.
with abc[0] being an empty string. The other two elements in abc are .. and ...
If my inputString is "..##...". I don't get a empty string in the array returned by split function. The input String doesn't have trailing whitespace in both cases.
Can soemone explain me why do I get a extra space in the code shown above?
You don't get an extra space, you get the empty string (with length 0). It says so in the javadoc:
* <p> When there is a positive-width match at the beginning of this
* string then an empty leading substring is included at the beginning
* of the resulting array. A zero-width match at the beginning however
* never produces such empty leading substring
When you split by #+ and first character of input string is # then input is split at beginning itself and what you get is an empty string as first element of string. It is due to the fact that left hand side of first # is just anchor ^ which will give an empty string only in the resulting array.
From the Javadoc:
This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.
And Javadoc:
If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
Whenever you say .split to a String, it splits the String n number of times that condition is met.
So when you say
String inputString = "#..#...##";
and your condition for spliting is # and since the value before the first occurrence of # is empty, abc[0] will hold empty. Therefore count of abc will return 3 because abc[0]=nothing(empty string), abc[1]=.. abc[2]=...

Unexpected behavior of Java String split( )

I am trying to split a string using String split function, here's an example:
String[] list = " Hello ".split("\\s+");
System.out.println("String length: " + list.length);
for (String s : list) {
System.out.println("----");
System.out.println(s);
}
Here's the output:
String length: 2
----
----
Hello
As you can see, the leading whitespace becoming an empty element in the String array, but the trailing whitespace is not.
Does anyone know why?
You need to use the other split method which specifys the limit and specify a limit of -1
String[] list = " Hello ".split("\\s+", -1);
to preserve the trailing whitespace, - the default behavior is to omit the trailing spaces as per the javadoc
Edit (answer for comment):
To trim the leading space, you can strip off the leading space before splitting the String
String str = " Hello ".replaceAll("^\\s+", "");
String[] list = str.split("\\s+", -1);
From split documentation
This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.
so in reality split(regex) is the same as using
split(regex, 0);
and its documentation says
The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter. If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
so if you want to include trailing empty strings will just have to use non-zero value like
split("\\s+",10);
but this will also limit result array to max 10 elements. To get rid of this problem use some negative number like
split("\\s+",-1);

Spliting a String upto nth delimiter in java

String s = "10.226.18.158:10.226.17.183:ABCD :AAAA"
My requirement is to split the string at up to 3rd : or up to 2nd :. i.e.
Something like String sa[] = s.split(), but with the regex splitting only up to 3rd or 2nd.
s[0] = "10.226.18.158"
s[1] = "10.226.17.183"
s[2] = "ABCD :AAAA"
According to the String#split() javadoc you can add a number to limit the number of splits.
s.split(":", 3);
Edit: as melwil metions This will return an array of up to the number passed in long.
So in your example of splitting up to 2nd : you would need to pass in 3.
s.split(":",3) returns the output
sa[0] = "10.226.18.158"
sa[1] = "10.226.17.183"
sa[2] = "ABCD :AAAA"
Relevent section quoted from the java doc about how the second argument (limit) works.
The limit parameter controls the number of times the pattern is
applied and therefore affects the length of the resulting array. If
the limit n is greater than zero then the pattern will be applied at
most n - 1 times, the array's length will be no greater than n, and
the array's last entry will contain all input beyond the last matched
delimiter. If n is non-positive then the pattern will be applied as
many times as possible and the array can have any length. If n is zero
then the pattern will be applied as many times as possible, the array
can have any length, and trailing empty strings will be discarded.
You can split your string basing on one non-whitespece character, \S{1}, followed by a colon, ::
String sa[] = s.split("\\S{1}:");

Categories

Resources