There are in JDK exist following split overloading signature
public String[] split(String regex, int limit)
Always I suppose that limit is the maximum number of tokens the string will be split.
for example:
first snippet:
System.out.println(Arrays.toString("Andrew Carnegie:35:USA".split(":")));
out:
[Andrew Carnegie, 35, USA]
second snippet:
System.out.println(Arrays.toString("Andrew Carnegie:35:USA".split(":",2)));
out
[Andrew Carnegie, 35:USA]
But I noticed 1 more effect
System.out.println(Arrays.toString("Andrew Carnegie:35:USA:".split(":")));
out:
[Andrew Carnegie, 35, USA]
and
System.out.println(Arrays.toString("Andrew Carnegie:35:USA:".split(":",-1)));
out:
[Andrew Carnegie, 35, USA, ]
Thus added an empty element if string ends by delimiter.
Where can I find specific information about this effect?
Citing Johannes Weiß:
"When calling String.split(String), it calls String.split(String, 0) and that discards trailing empty strings (as the docs say it), when calling String.split(String, n) with n < 0 it won't discard anything."
From the JavaDoc for split:
The limit parameter controls the number of times the pattern is
applied and therefore affects the length of the resulting array. If
the limit n is greater than zero then the pattern will be applied at
most n - 1 times, the array's length will be no greater than n, and
the array's last entry will contain all input beyond the last matched
delimiter. If n is non-positive then the pattern will be applied as
many times as possible and the array can have any length. If n is zero
then the pattern will be applied as many times as possible, the array
can have any length, and trailing empty strings will be discarded.
Emphasis mine.
So,
if n is strictly positive then the number of elements in the resulting array will be limited by n
if n is zero then the array can have any length and trailing spaces will be discarded
if n is strictly negative then the array can have any length and trailing spaces will not be discarded
Related
I am trying to split the Value using a separator.
But I am finding the surprising results
String data = "5|6|7||8|9||";
String[] split = data.split("\\|");
System.out.println(split.length);
I am expecting to get 8 values. [5,6,7,EMPTY,8,9,EMPTY,EMPTY]
But I am getting only 6 values.
Any idea and how to fix. No matter EMPTY value comes at anyplace, it should be in array.
split(delimiter) by default removes trailing empty strings from result array. To turn this mechanism off we need to use overloaded version of split(delimiter, limit) with limit set to negative value like
String[] split = data.split("\\|", -1);
Little more details:
split(regex) internally returns result of split(regex, 0) and in documentation of this method you can find (emphasis mine)
The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array.
If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter.
If n is non-positive then the pattern will be applied as many times as possible and the array can have any length.
If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
Exception:
It is worth mentioning that removing trailing empty string makes sense only if such empty strings were created by the split mechanism. So for "".split(anything) since we can't split "" farther we will get as result [""] array.
It happens because split didn't happen here, so "" despite being empty and trailing represents original string, not empty string which was created by splitting process.
From the documentation of String.split(String regex):
This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.
So you will have to use the two argument version String.split(String regex, int limit) with a negative value:
String[] split = data.split("\\|",-1);
Doc:
If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter. If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
This will not leave out any empty elements, including the trailing ones.
String[] split = data.split("\\|",-1);
This is not the actual requirement in all the time. The Drawback of above is show below:
Scenerio 1:
When all data are present:
String data = "5|6|7||8|9|10|";
String[] split = data.split("\\|");
String[] splt = data.split("\\|",-1);
System.out.println(split.length); //output: 7
System.out.println(splt.length); //output: 8
When data is missing:
Scenerio 2: Data Missing
String data = "5|6|7||8|||";
String[] split = data.split("\\|");
String[] splt = data.split("\\|",-1);
System.out.println(split.length); //output: 5
System.out.println(splt.length); //output: 8
Real requirement is length should be 7 although there is data missing. Because there are cases such as when I need to insert in database or something else. We can achieve this by using below approach.
String data = "5|6|7||8|||";
String[] split = data.split("\\|");
String[] splt = data.replaceAll("\\|$","").split("\\|",-1);
System.out.println(split.length); //output: 5
System.out.println(splt.length); //output:7
What I've done here is, I'm removing "|" pipe at the end and then splitting the String. If you have "," as a seperator then you need to add ",$" inside replaceAll.
From String.split() API Doc:
Splits this string around matches of the given regular expression.
This method works as if by invoking the two-argument split method with
the given expression and a limit argument of zero. Trailing empty
strings are therefore not included in the resulting array.
Overloaded String.split(regex, int) is more appropriate for your case.
you may have multiple separators, including whitespace characters, commas, semicolons, etc. take those in repeatable group with []+, like:
String[] tokens = "a , b, ,c; ;d, ".split( "[,; \t\n\r]+" );
you'll have 4 tokens -- a, b, c, d
leading separators in the source string need to be removed before applying this split.
as answer to question asked:
String data = "5|6|7||8|9||";
String[] split = data.split("[\\| \t\n\r]+");
whitespaces added just in case if you'll have those as separators along with |
This question already has answers here:
How can split a string which contains only delimiter?
(5 answers)
Closed 8 years ago.
http://www.tutorialspoint.com/java/java_string_split.htm
Here is the syntax of this method: public String[] split(String regex, int limit) or public String[] split(String regex)
In the above link, I can understand the Str.split("-", 2) and Str.split("-", 3) examples.
However, I don't quite get the Str.split("-", 0), what's the role of zero of the limit? In some examples, I have also encountered negative limits, what is this?
The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. We have 3 possible values for this limit:
If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter.
If n is non-positive then the pattern will be applied as many times as possible and the array can have any length.
If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
You can read more here.
Str.split("-",0) is the same as Str.split("-")
Str.split("-", 0) is equivalent to Str.split("-"). I.e, there's no limit.
I've gone through the String's split method documentation but the results are not as expected. When we split a string with the limit argument set to a negative value it always append an empty value. Why should it do that? Consider some cases
// Case 1
String str = "1#2#3#";
System.out.println(str.split("#").length); // Prints 3
System.out.println(str.split("#", -1).length); // Prints 4
What i would expect here is both prints 3.
// Case 2
str = "";
System.out.println(str.split("#").length); // Prints 1
System.out.println(str.split("#", -1).length); // Prints 1
Now since no match is found the usual split method without limit was supposed to print 0 but it creats an array with an empty string.
// Case 3
str = "#";
System.out.println(str.split("#").length); // Prints 0
System.out.println(str.split("#", -1).length); // Prints 2
Now i have a match and the split method without limit argument works fine. Its is my expected output but why wouldnt it create an empty array in this case as in case 2?
// Case 4
str = "###";
System.out.println(str.split("#").length); // Prints 0
System.out.println(str.split("#", -1).length); // Prints 4
Here first split method is as expected but why does the second one gives 4 instead of 3?
// Case 5
str = "1#2#3#";
System.out.println(str.split("#", 0).length); // Prints 3
System.out.println(str.split("#", 3).length); // Prints 3
System.out.println(str.split("#", 4).length); // Prints 4
Now the last case with positive limit. If the positive amount is <= the number of match the result is as expected. But if we give a higher positive limit it again appends an empty string to the resulting array.
From the JavaDoc for String
The limit parameter controls the number of times the pattern is
applied and therefore affects the length of the resulting array. If
the limit n is greater than zero then the pattern will be applied at
most n - 1 times, the array's length will be no greater than n, and
the array's last entry will contain all input beyond the last matched
delimiter. If n is non-positive then the pattern will be applied as
many times as possible and the array can have any length. If n is zero
then the pattern will be applied as many times as possible, the array
can have any length, and trailing empty strings will be discarded.
Emphasis mine.
In the negative limit case empty matches are not discarded so, if I represent empty with E:
1#2#3# -> 1 # 2 # 3 # E
E -> E
# -> E # E
### -> E # E # E # E
In your last example (with a positive limit), empty trailing space is only discarded if n == 0.
The main source of confustion comes from the often missed section of the doc:
... If n is zero then ..., and trailing empty strings will be discarded.
Once you get that everything makes sense.
From the documentation
The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter. If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
It appears the negative limit behavior is predefined as maximize matches, and store anything else at the end.
This question already has answers here:
Java String split removed empty values
(5 answers)
Closed 1 year ago.
I want to split ",,," to a array of 4 "" using the String.split()
Here is my code:
String str = ",,,";
String[] tokens = str.split(",");
However, the result tokens were an an empty array: [], rather than an array of 4 "" (["","","",""]) as I wanted.
I have tested to change the str a little bit:
String str = ",,,1";
String[] tokens = str.split(",");
This time the result tokens were ["","","","1"]. This is close to what I want, but I really do not want to add this "1" before doing the split.
The problem is basically, the String.split() will return an empty array if it contains only empty elements "".
Can you help solve the problem?
You need to use the overloaded String#split(regex, limit) method which takes in the limit parameter.
String[] tokens = str.split(",", -1);
From the docs(emphasis mine):
The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter. If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
Explanation: When you do not provide the limit argument or provide "zero" as the limit, the split() discards trailing empty fields. When you provide a positive limit argument, it limits the number of fields to that particular limit. But when you provide a negative limit, the split() method allows any number of fields and also not discarding the trailing empty fields. To be more clear, have a look at the source code of the Pattern#split(regex, limit) which has this snippet at the end(comments have been added by me and were not present in the actual source code).
if (limit == 0) // When zero or no arg is given
while (resultSize > 0 && matchList.get(resultSize-1).equals("")) // if trailing entries are blank
resultSize--; // remove them out
Note: If you do not provide any limit argument, the split() method without limit argument calls the overloaded split() method like this.
public String[] split(String regex) {
return split(regex, 0);
}
And also note that, String#split(regex, limit) internally calls the Pattern#split(regex, limit).
I have a string like ||81|||01|| and I want to split the string with | symbol.
I had done this way,
String str = "||81|||01||";
System.out.println(str .split("\\|").length); //printing 6 . But I am expecting 8
what is wrong with this code? | How can I split this string with that character so that I will get expected length (8)?;
Using split("\\|") is the same as split("\\|", 0), where the limit parameter 0 tells the function "omit trailing empty strings". So you are missing the last two empty strings. Use the two-argument version and supply a negative number to obtain all parts (even trailing empty ones):
str.split("\\|", -1)
Print:
System.out.println(Arrays.toString(str.split("\\|")));
And you'll understand why it's printing 6.
You can try doing what you want using public String[] split(String regex, int limit):
The limit parameter controls the number of times the pattern is
applied and therefore affects the length of the resulting array.
So you should do:
System.out.println(str.split("\\|", -1).length);
Now, printing the array will print:
[, , 81, , , 01, , ] as you expected.
You can also use string.split(Pattern.quote("|"),-1) for spliting a string on a special character.
You need to use:
str.split("\\|", -1)
The second parameter is limit. From the javadoc:
The limit parameter controls the number of times the pattern is
applied and therefore affects the length of the resulting array. If
the limit n is greater than zero then the pattern will be applied at
most n - 1 times, the array's length will be no greater than n, and
the array's last entry will contain all input beyond the last matched
delimiter. If n is non-positive then the pattern will be applied as
many times as possible and the array can have any length. If n is zero
then the pattern will be applied as many times as possible, the array
can have any length, and trailing empty strings will be discarded.
str.split("\\|", -1) will do the necessary.
Possible duplicate : Here
String str = "||81|||01||";
System.out.println(str.split("\\|", 8).length);
The second argument to split specifies maximum number of matches. Single argument split is like invoking split(str, 0) which leaves out trailing strings. See javadoc of both for more explaination.