Java : split a string that containing special characters - java

I have a string like ||81|||01|| and I want to split the string with | symbol.
I had done this way,
String str = "||81|||01||";
System.out.println(str .split("\\|").length); //printing 6 . But I am expecting 8
what is wrong with this code? | How can I split this string with that character so that I will get expected length (8)?;

Using split("\\|") is the same as split("\\|", 0), where the limit parameter 0 tells the function "omit trailing empty strings". So you are missing the last two empty strings. Use the two-argument version and supply a negative number to obtain all parts (even trailing empty ones):
str.split("\\|", -1)

Print:
System.out.println(Arrays.toString(str.split("\\|")));
And you'll understand why it's printing 6.
You can try doing what you want using public String[] split(String regex, int limit):
The limit parameter controls the number of times the pattern is
applied and therefore affects the length of the resulting array.
So you should do:
System.out.println(str.split("\\|", -1).length);
Now, printing the array will print:
[, , 81, , , 01, , ] as you expected.

You can also use string.split(Pattern.quote("|"),-1) for spliting a string on a special character.

You need to use:
str.split("\\|", -1)
The second parameter is limit. From the javadoc:
The limit parameter controls the number of times the pattern is
applied and therefore affects the length of the resulting array. If
the limit n is greater than zero then the pattern will be applied at
most n - 1 times, the array's length will be no greater than n, and
the array's last entry will contain all input beyond the last matched
delimiter. If n is non-positive then the pattern will be applied as
many times as possible and the array can have any length. If n is zero
then the pattern will be applied as many times as possible, the array
can have any length, and trailing empty strings will be discarded.

str.split("\\|", -1) will do the necessary.
Possible duplicate : Here

String str = "||81|||01||";
System.out.println(str.split("\\|", 8).length);
The second argument to split specifies maximum number of matches. Single argument split is like invoking split(str, 0) which leaves out trailing strings. See javadoc of both for more explaination.

Related

Weird behavior of Java's String.split() [duplicate]

I am trying to split the Value using a separator.
But I am finding the surprising results
String data = "5|6|7||8|9||";
String[] split = data.split("\\|");
System.out.println(split.length);
I am expecting to get 8 values. [5,6,7,EMPTY,8,9,EMPTY,EMPTY]
But I am getting only 6 values.
Any idea and how to fix. No matter EMPTY value comes at anyplace, it should be in array.
split(delimiter) by default removes trailing empty strings from result array. To turn this mechanism off we need to use overloaded version of split(delimiter, limit) with limit set to negative value like
String[] split = data.split("\\|", -1);
Little more details:
split(regex) internally returns result of split(regex, 0) and in documentation of this method you can find (emphasis mine)
The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array.
If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter.
If n is non-positive then the pattern will be applied as many times as possible and the array can have any length.
If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
Exception:
It is worth mentioning that removing trailing empty string makes sense only if such empty strings were created by the split mechanism. So for "".split(anything) since we can't split "" farther we will get as result [""] array.
It happens because split didn't happen here, so "" despite being empty and trailing represents original string, not empty string which was created by splitting process.
From the documentation of String.split(String regex):
This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.
So you will have to use the two argument version String.split(String regex, int limit) with a negative value:
String[] split = data.split("\\|",-1);
Doc:
If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter. If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
This will not leave out any empty elements, including the trailing ones.
String[] split = data.split("\\|",-1);
This is not the actual requirement in all the time. The Drawback of above is show below:
Scenerio 1:
When all data are present:
String data = "5|6|7||8|9|10|";
String[] split = data.split("\\|");
String[] splt = data.split("\\|",-1);
System.out.println(split.length); //output: 7
System.out.println(splt.length); //output: 8
When data is missing:
Scenerio 2: Data Missing
String data = "5|6|7||8|||";
String[] split = data.split("\\|");
String[] splt = data.split("\\|",-1);
System.out.println(split.length); //output: 5
System.out.println(splt.length); //output: 8
Real requirement is length should be 7 although there is data missing. Because there are cases such as when I need to insert in database or something else. We can achieve this by using below approach.
String data = "5|6|7||8|||";
String[] split = data.split("\\|");
String[] splt = data.replaceAll("\\|$","").split("\\|",-1);
System.out.println(split.length); //output: 5
System.out.println(splt.length); //output:7
What I've done here is, I'm removing "|" pipe at the end and then splitting the String. If you have "," as a seperator then you need to add ",$" inside replaceAll.
From String.split() API Doc:
Splits this string around matches of the given regular expression.
This method works as if by invoking the two-argument split method with
the given expression and a limit argument of zero. Trailing empty
strings are therefore not included in the resulting array.
Overloaded String.split(regex, int) is more appropriate for your case.
you may have multiple separators, including whitespace characters, commas, semicolons, etc. take those in repeatable group with []+, like:
String[] tokens = "a , b, ,c; ;d, ".split( "[,; \t\n\r]+" );
you'll have 4 tokens -- a, b, c, d
leading separators in the source string need to be removed before applying this split.
as answer to question asked:
String data = "5|6|7||8|9||";
String[] split = data.split("[\\| \t\n\r]+");
whitespaces added just in case if you'll have those as separators along with |

Java String split operation

When I split a string in java as shown in code below, it gives length as 3.
String data = "DK4..||VSDK4|";
String[] tokens = data.split("\\|");
System.out.println(tokens.length);
However it should be 4, what could be the reason behind this?
From the split method documentation:
Trailing empty strings are therefore not included in the resulting array.
This is because split with a single argument calls split with second argument as 0, which, according to that method's documentation
If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
See the doc of split(String regex):
Splits this string around matches of the given regular expression.
This method works as if by invoking the two-argument split method with
the given expression and a limit argument of zero. Trailing empty
strings are therefore not included in the resulting array.
You can use split(String regex, int limit) with a negative limit to get all:
String[] tokens = data.split("\\|", -1);
If n is non-positive then the pattern will be applied as many times as
possible and the array can have any length. If n is zero then the
pattern will be applied as many times as possible, the array can have
any length, and trailing empty strings will be discarded.

Splitting an empty string in Java seems to violate documentation by not discarding trailing empty strings

System.out.println(",".split(",", 0).length);
System.out.println("".split(",", 0).length);
prints:
0
1
This seems odd. According to the documentation for String.split(pattern, n),
If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
In the second case, when splitting an empty string, this rule seems to be ignored. Is this expected behavior?
As from docs
If the expression does not match any part of the input then the
resulting array has just one element, namely this string
"".split(",", 0).length mean it is similar to this
System.out.println(new String[]{""}.length);
There was no , in the string "" so the array contain single element "" an empty string , result in array length as 1
another example
System.out.println("aaa".split(",", 0).length); // 1
System.out.println("aaa".split("," , 0)[0]); // aaa

String#split. Side effect of limit parameter

There are in JDK exist following split overloading signature
public String[] split(String regex, int limit)
Always I suppose that limit is the maximum number of tokens the string will be split.
for example:
first snippet:
System.out.println(Arrays.toString("Andrew Carnegie:35:USA".split(":")));
out:
[Andrew Carnegie, 35, USA]
second snippet:
System.out.println(Arrays.toString("Andrew Carnegie:35:USA".split(":",2)));
out
[Andrew Carnegie, 35:USA]
But I noticed 1 more effect
System.out.println(Arrays.toString("Andrew Carnegie:35:USA:".split(":")));
out:
[Andrew Carnegie, 35, USA]
and
System.out.println(Arrays.toString("Andrew Carnegie:35:USA:".split(":",-1)));
out:
[Andrew Carnegie, 35, USA, ]
Thus added an empty element if string ends by delimiter.
Where can I find specific information about this effect?
Citing Johannes Weiß:
"When calling String.split(String), it calls String.split(String, 0) and that discards trailing empty strings (as the docs say it), when calling String.split(String, n) with n < 0 it won't discard anything."
From the JavaDoc for split:
The limit parameter controls the number of times the pattern is
applied and therefore affects the length of the resulting array. If
the limit n is greater than zero then the pattern will be applied at
most n - 1 times, the array's length will be no greater than n, and
the array's last entry will contain all input beyond the last matched
delimiter. If n is non-positive then the pattern will be applied as
many times as possible and the array can have any length. If n is zero
then the pattern will be applied as many times as possible, the array
can have any length, and trailing empty strings will be discarded.
Emphasis mine.
So,
if n is strictly positive then the number of elements in the resulting array will be limited by n
if n is zero then the array can have any length and trailing spaces will be discarded
if n is strictly negative then the array can have any length and trailing spaces will not be discarded

How to split a comma-delimited string into an array of empty strings [duplicate]

This question already has answers here:
Java String split removed empty values
(5 answers)
Closed 1 year ago.
I want to split ",,," to a array of 4 "" using the String.split()
Here is my code:
String str = ",,,";
String[] tokens = str.split(",");
However, the result tokens were an an empty array: [], rather than an array of 4 "" (["","","",""]) as I wanted.
I have tested to change the str a little bit:
String str = ",,,1";
String[] tokens = str.split(",");
This time the result tokens were ["","","","1"]. This is close to what I want, but I really do not want to add this "1" before doing the split.
The problem is basically, the String.split() will return an empty array if it contains only empty elements "".
Can you help solve the problem?
You need to use the overloaded String#split(regex, limit) method which takes in the limit parameter.
String[] tokens = str.split(",", -1);
From the docs(emphasis mine):
The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter. If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
Explanation: When you do not provide the limit argument or provide "zero" as the limit, the split() discards trailing empty fields. When you provide a positive limit argument, it limits the number of fields to that particular limit. But when you provide a negative limit, the split() method allows any number of fields and also not discarding the trailing empty fields. To be more clear, have a look at the source code of the Pattern#split(regex, limit) which has this snippet at the end(comments have been added by me and were not present in the actual source code).
if (limit == 0) // When zero or no arg is given
while (resultSize > 0 && matchList.get(resultSize-1).equals("")) // if trailing entries are blank
resultSize--; // remove them out
Note: If you do not provide any limit argument, the split() method without limit argument calls the overloaded split() method like this.
public String[] split(String regex) {
return split(regex, 0);
}
And also note that, String#split(regex, limit) internally calls the Pattern#split(regex, limit).

Categories

Resources