I don't see why does the following output makes sense.
String split method on an empty String returning an array of String with length 1
String[] split = "".split(",");
System.out.println(split.length);
Returns array of String with length 1
String[] split = "Java".split(",");
System.out.println(split.length);
Returns array of String with length 1
How to differentiate??
From the documentation:
The array returned by this method contains each substring of this string that is terminated by another substring that matches the given expression or is terminated by the end of the string.
To answer your question, it does what it is expected to do: the returned substring is terminated by the end of the input string (as there was no , to be found). The documentation also states:
If the expression does not match any part of the input then the resulting array has just one element, namely this string.
Note that this is a consequence of the first statement. It is not an additional circumstance that the Java developers added in case the search string could not be found.
I hit this, too. What it's returning is the string up to but not including the split character. If you want to get no strings, use StringTokenizer:
StringTokenizer st = new StringTokenizer(someString,',');
int numberOfSubstrings = st.countTokens();
It's returning the original string (which in this case is the empty string) since there was no , to split on.
It returns one because you are measuring the size of the split array, which contains one element: an empty string.
Related
I am trying to split the Value using a separator.
But I am finding the surprising results
String data = "5|6|7||8|9||";
String[] split = data.split("\\|");
System.out.println(split.length);
I am expecting to get 8 values. [5,6,7,EMPTY,8,9,EMPTY,EMPTY]
But I am getting only 6 values.
Any idea and how to fix. No matter EMPTY value comes at anyplace, it should be in array.
split(delimiter) by default removes trailing empty strings from result array. To turn this mechanism off we need to use overloaded version of split(delimiter, limit) with limit set to negative value like
String[] split = data.split("\\|", -1);
Little more details:
split(regex) internally returns result of split(regex, 0) and in documentation of this method you can find (emphasis mine)
The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array.
If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter.
If n is non-positive then the pattern will be applied as many times as possible and the array can have any length.
If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
Exception:
It is worth mentioning that removing trailing empty string makes sense only if such empty strings were created by the split mechanism. So for "".split(anything) since we can't split "" farther we will get as result [""] array.
It happens because split didn't happen here, so "" despite being empty and trailing represents original string, not empty string which was created by splitting process.
From the documentation of String.split(String regex):
This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.
So you will have to use the two argument version String.split(String regex, int limit) with a negative value:
String[] split = data.split("\\|",-1);
Doc:
If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter. If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
This will not leave out any empty elements, including the trailing ones.
String[] split = data.split("\\|",-1);
This is not the actual requirement in all the time. The Drawback of above is show below:
Scenerio 1:
When all data are present:
String data = "5|6|7||8|9|10|";
String[] split = data.split("\\|");
String[] splt = data.split("\\|",-1);
System.out.println(split.length); //output: 7
System.out.println(splt.length); //output: 8
When data is missing:
Scenerio 2: Data Missing
String data = "5|6|7||8|||";
String[] split = data.split("\\|");
String[] splt = data.split("\\|",-1);
System.out.println(split.length); //output: 5
System.out.println(splt.length); //output: 8
Real requirement is length should be 7 although there is data missing. Because there are cases such as when I need to insert in database or something else. We can achieve this by using below approach.
String data = "5|6|7||8|||";
String[] split = data.split("\\|");
String[] splt = data.replaceAll("\\|$","").split("\\|",-1);
System.out.println(split.length); //output: 5
System.out.println(splt.length); //output:7
What I've done here is, I'm removing "|" pipe at the end and then splitting the String. If you have "," as a seperator then you need to add ",$" inside replaceAll.
From String.split() API Doc:
Splits this string around matches of the given regular expression.
This method works as if by invoking the two-argument split method with
the given expression and a limit argument of zero. Trailing empty
strings are therefore not included in the resulting array.
Overloaded String.split(regex, int) is more appropriate for your case.
you may have multiple separators, including whitespace characters, commas, semicolons, etc. take those in repeatable group with []+, like:
String[] tokens = "a , b, ,c; ;d, ".split( "[,; \t\n\r]+" );
you'll have 4 tokens -- a, b, c, d
leading separators in the source string need to be removed before applying this split.
as answer to question asked:
String data = "5|6|7||8|9||";
String[] split = data.split("[\\| \t\n\r]+");
whitespaces added just in case if you'll have those as separators along with |
First thing first, here is my code:
String line = "Events|1005435529|7021370073||PAGELOAD|2017-06-19T12:04:40||JI||ServerHostName|ServerIPAddress|9P2_D2jB9Toct7PDTJ7zwLUmWfEYz6Y4akyOKn2g4CepveMH4wr3!46548593!1497854077121|||||||||||";
int offset = line.indexOf("Events");
String zeroIn = line.substring(offset);
String[] jsonElements = zeroIn.split("\\|");
System.out.println(Arrays.asList(jsonElements));
Output:
[Events, 1005435529, 7021370073, , PAGELOAD, 2017-06-19T12:04:40, , JI, , ServerHostName, ServerIPAddress, 9P2_D2jB9Toct7PDTJ7zwLUmWfEYz6Y4akyOKn2g4CepveMH4wr3!46548593!1497854077121]`
I also notice spaces added to each array element at the beginning.
My question is that I have almost 10 empty pipeline symbols at the end of the String line while as the first second and third occurance of empty pipeline symbols is respected, the last ones are missed and don't add up in the array. What do I miss here?
split(java.lang.String regex) calls split(java.lang.String regex ,int limit) with an argument of 0.
If n is zero then the pattern will be applied as many times as
possible, the array can have any length, and trailing empty strings
will be discarded.
You may call this method by yourself with a positive value (and large enough to be sure to include all tokens) to prevent empty tokens from being discarded :
String[] jsonElements = zeroIn.split("\\|", zeroIn.length());
Note : from the comments below, using a negative value is indeed a better way to do this :
String[] jsonElements = zeroIn.split("\\|", -1);
If n is non-positive then the pattern will be applied as many times as
possible and the array can have any length.
From String class and split method doc:
Trailing empty strings are therefore not included in the resulting array.
So, after last occurrence of not empty string, rest will be not included in array.
The accepted answer explains the limitations you oberved splitting on a single character delimeter. I thought I would offer this answer if you need the ability to retain empty tokens in your output. If you split using a lookaround, e.g. a lookbehind, then you would end up with distinct entries even when two pipes have nothing in between them:
String line = "Events|1005435529|7021370073||PAGELOAD|2017-06-19T12:04:40||JI||ServerHostName|ServerIPAddress|9P2_D2jB9Toct7PDTJ7zwLUmWfEYz6Y4akyOKn2g4CepveMH4wr3!46548593!1497854077121|||||||||||";
String[] parts = line.split("(?<=\\|)");
for (String part : parts) {
System.out.println(part);
}
Demo here:
Rextester
System.out.println(",".split(",", 0).length);
System.out.println("".split(",", 0).length);
prints:
0
1
This seems odd. According to the documentation for String.split(pattern, n),
If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
In the second case, when splitting an empty string, this rule seems to be ignored. Is this expected behavior?
As from docs
If the expression does not match any part of the input then the
resulting array has just one element, namely this string
"".split(",", 0).length mean it is similar to this
System.out.println(new String[]{""}.length);
There was no , in the string "" so the array contain single element "" an empty string , result in array length as 1
another example
System.out.println("aaa".split(",", 0).length); // 1
System.out.println("aaa".split("," , 0)[0]); // aaa
I am trying to split a string that has two numbers and possibly a letter that will look similar to:
(2,3) (2,6) p (8,5) p (5,6)
I am trying:
String inputTokens = input.split([(),\\s]);
but that leaves me with with a bunch of empty strings in the tokens array. How do I stop them from appearing in the first place?
For clarification: By empty string I mean a string containing nothing, not even a space
Add the "one or more times" greediness quantifier to your character class:
String[] inputTokens = input.split("[(),\\s]+");
This will result in one leading empty String, which is unavoidable when using the split() method and splitting away the immediate start of the String and otherwise no empty Strings.
String inputTokens[] = input.split("[(),\\s]+");
This will read the whitespace as part of the regex so there will be no empty entries in your array.
I've written some code which utilizes the split() method to return the first item delimited by periods. After a little testing I found that the array I split the string into has a length of 0 so I assume it's not splitting at all. It may be relevant that in some cases there is no period and I want the entire string returned. To compensate for this, I added a period onto the end of each String. See below:
longText=longText+".";
String tempName[]=longText.split(".");
String realName=tempName[0];
System.out.println(realName);
return realName;
The method String#split takes a regular expression as an argument. See Java Doc
The following: Split String on dot . as delimiter will help you