How to split Strings in Java including blanks (like in Python) - java

I'm reading a comma-delimited list into Java where the elements may include blanks and single spaces. Here's a few sample lines:
,achieve,achievement,achievable,,, (note the space before the first comma)
agree,agreement,, ,agreeable,agreeably (note the space between commas)
,apartment,, (no spaces)
In Java, the resulting String[] from using line.split(",") changes all blank elements to spaces except trailing ones, which it omits, like this:
" ", "achieve", "achievement", "achievable"
"agree", "agreement", " ", " ", "agreeable", "agreeably"
" ", "apartment"
I need all blank elements to be rendered as empty strings and single space elements to be rendered as single spaces, like this:
" ", "achieve", "achievement", "achievable", "", "", ""
"agree", "agreement", "", " ", "agreeable", "agreeably"
"", "apartment", "", ""
How to do this in Java?

To avoid removing trailing empty elements use split(delimiter, limit) with negative limit value like
split(",", -1).
DEMO:
String[] tests = {
" ,achieve,achievement,achievable,,,",
"agree,agreement,, ,agreeable,agreeably",
",apartment,,"
};
for (String line : tests){
String[] elements = line.split(",", -1);
StringJoiner sj = new StringJoiner( "\", \"", "\"", "\"");
//delimiter, prefix, suffix
for (String element : elements){
sj.add(element);
}
System.out.println(sj);
}
Output:
" ", "achieve", "achievement", "achievable", "", "", ""
"agree", "agreement", "", " ", "agreeable", "agreeably"
"", "apartment", "", ""

If you want to split on commas AND any surrounding whitespace, you can use this
str.trim().split("\\s+,\\s+")

Here's a simple test program which I think illustrates what you are looking for:
public class s1 {
public static void main( String[] args ) {
// String si = " ,achieve,achievement,achievable,,,";
// String si = "agree,agreement,, ,agreeable,agreeably";
String si = ",apartment,,";
String[] so = si.split(" *, *", -1); /* split on comma and any space(s) next to it */
for (String s : so) {
System.out.println('"' + s + '"');
}
}
}

If you want to replicate the exact behaviour of Python's str.split(), you need to trim for spaces and then use the overload that accepts a regular expression to match on white spaces like this:
str.trim().split("\\s+")

line.split(",") works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.
Instead if you use public String[] split(String regex, int limit) and call it with line.split(",", <any negative int>) then the pattern will be applied as many times as possible and the array can have any length.
So you can call it like line.split(",", -9).
The following is what happens with different limit values:
limit = 0 : the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
limit > 0 : the pattern will be applied at most limit - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter
limit < 0 : the pattern will be applied as many times as possible and the array can have any length
Check the doc for more clarification.

Related

Java, splitting string into array

I am trying to split a string into string array. And I have stumbled to something strange to me. I don't understand why it works like this.
String one, two;
one = "";
two = ":";
String[] devided1 = one.trim().split(":");
String[] devided2 = two.trim().split(":");
System.out.println("size: "+ devided1.length);
System.out.println("size: "+ devided2.length);
I get output:
size: 1
size: 0
Why is empty string giving me size of one, but string that only has the delimiter gives my array size of 0?
I saw more confusing things like: that size of "::" is 0, but size of ": :" is 2, not 3.
Can someone please explain it to me?
See the doc comment in source code or documentation for public String[] split(String regex, int limit) method.
Case 1:
String one = "";
String[] devided1 = one.trim().split(":");
The resulting array will have 1 element = original string String[1] [""], because expresion ":" was not match any part of the input string.
According to documentation:
If the
* expression does not match any part of the input then the resulting array
* has just one element, namely this string.
Case 2:
String two = ":";
String[] devided2 = two.trim().split(":");
The split(":") has default limit = 0. It means that from the resulting array trailing empty strings will be removed. So method splits ":" string to array with two empty strings and then remove them and as result we get empty array.
According to documentation:
If limit is zero then the pattern will be applied as many times as
possible, the array can have any length, and trailing empty strings
will be discarded.
Case 3:
String two = ":";
String[] devided2 = two.trim().split(":", -1);
We will get an array with two empty strings.
According to documentation:
If limit is non-positive then the pattern will be applied as many
times as possible and the array can have any length
Case 4:
String two = "::";
String[] devided2 = two.trim().split(":");
We will get empty array. It is the same like Case 2.
Case 5:
String one = ": :";
String[] devided1 = one.trim().split(":");
The method will split string to three array elements ["", " ", ""] and then remove empty strings from the end of array, because limit = 0. We will get String[2] ["", " "].
According to documentation:
If limit is zero then the pattern will be applied as many times as
possible, the array can have any length, and trailing empty strings
will be discarded.
This link is helpful:
https://konigsberg.blogspot.com/2009/11/final-thoughts-java-puzzler-splitting.html
Basically, it is for perl compatibility.
You can use split(":", -1) here if you don't want that behavior.
Otherwise, split(":") defaults to split(":", 0), and the difference is:
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/String.html#split(java.lang.String,int)
If the limit is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
If the limit is negative then the pattern will be applied as many times as possible and the array can have any length.
In case of ":" being splitted, it would result in {"" , ""}, but empty traling spaces will be discarded, so it will return an empty array.

How do I escape parentheses in java 7?

I'm trying to split some input from BufferedReader.readLine()
String delimiters = " ,()";
String[] s = in.readLine().split(delimiters);
This gives me a runtime error.
Things I have tried that don't work:
String delimiters = " ,\\(\\)";
String delimiters = " ,[()]";
String[] s = in.readLine().split(Pattern.quote("() ,"));
I tried replacing the () using .replaceAll, didn't work
I tried this:
input = input.replaceAll(Pattern.quote("("), " ");
input = input.replaceAll(Pattern.quote(")"), " ");
input = input.replaceAll(Pattern.quote(","), " ");
String[] s = input.split(" ");
but s[] ends up with blank slots that look like this -> "" no clue why its doing that
Mine works, for
String delimiters = "[ \\(\\)]"
Edit:
You forgot Square brakcets which represents, "Any of the characters in the box will be used as delimiters", its a regex.
Edit:
To remove the empty elements: Idea is to replace any anagram of set of delimiters to just 1 delimiter
Like.
// regex to match any anagram of a given set of delimiters in square brackets
String r = "(?!.*(.).*\1)[ \\(\\)]";
input = input.replaceAll(r, "(");
// this will result in having double or more combinations of a single delimiter, so replace them with just one
input = input.replaceAll("[(]+", "(");
Then you will have the input, with any single delimiter. Then use the split, it will not have any blank words.
From your comment:
but I am only input 1 line: (1,3), (6,5), (2,3), (9,1) and I need 13652391 so s[0] = 1, s[1]=3, ... but I get s[0] = "" s[1] = "" s[2] = 1
You get that because your delimiters are either " ", ",", "(" or ")" so it will split at every single delimiter, even if there is no other characters between them, in which case it will be split into an empty string.
There is an easy fix to this problem, just remove the empty elements!
List<String> list = Arrays.stream(
"(1,3), (6,5), (2,3), (9,1)".split("[(), ]")).filter(x -> !x.isEmpty())
.collect(Collectors.toList());
But then you get a List as the result instead of an array.
Another way to do this, is to replace "[(), ]" with "":
String result = "(1,3), (6,5), (2,3), (9,1)".replaceAll("[(), ]", "");
This will give you a string as a result. But from the comment I'm not sure whether you wanted a string or not. If you want an array, just call .split("") and it will be split into individual characters.

Regex does not store the element in the first index

I have a function which takes a String containing a math expression such as 6+9*8 or 4+9 and it evaluates them from left to right (without normal order of operation rules).
I've been stuck with this problem for the past couple of hours and have finally found the culprit BUT I have no idea why it is doing what it does. When I split the string through regex (.split("\\d") and .split("\\D")), I make it go into 2 arrays, one is a int[] where it contains the numbers involved in the expression and a String[] where it contains the operations.
What I've realized is that when I do the following:
String question = "5+9*8";
String[] mathOperations = question.split("\\d");
for(int i = 0; i < mathOperations.length; i++) {
System.out.println("Math Operation at " + i + " is " + mathOperations[i]);
}
it does not put the first operation sign in index 0, rather it puts it in index 1. Why is this?
This is the system.out on the console:
Math Operation at 0 is
Math Operation at 1 is +
Math Operation at 2 is *
Because on position 0 of mathOperations there's an empty String. In other words
mathOperations = {"", "+", "*"};
According to split documentation
The array returned by this method contains each substring of this
string that is terminated by another substring that matches the given
expression or is terminated by the end of the string. ...
Why isn't there an empty string at the end of the array too?
Trailing empty strings are therefore not included in the resulting
array.
More detailed explanation - your regex matched the String like this:
"(5)+(9)*(8)" -> "" + (5) + "+" + (9) + "*" + (8) + ""
but the trailing empty string is discarded as specified by the documentation.
(hope this silly illustration helps)
Also a thing worth noting, the regex you used "\\d", would split following string "55+5" into
["", "", "+"]
That's because you match only a single character, you should probably use "\\d+"
You may find the following variation on your program helpful, as one split does the jobs of both of yours...
public class zw {
public static void main(String[] args) {
String question = "85+9*8-900+77";
String[] bits = question.split("\\b");
for (int i = 0; i < bits.length; ++i) System.out.println("[" + bits[i] + "]");
}
}
and its output:
[]
[85]
[+]
[9]
[*]
[8]
[-]
[900]
[+]
[77]
In this program, I used \b as a "zero-width boundary" to do the splitting. No characters were harmed during the split, they all went into the array.
More info here: https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
and here: http://www.regular-expressions.info/wordboundaries.html

Java. How to remove white space on array

For example
I split a string "+name" by +. I got an white space" " and the "name" in the array(this doesn't happen if my string is "name+").
t="+name";
String[] temp=t.split("\\+");
the above code produces
temp[0]=" "
temp[1]=name
I only wants to get "name" without whitespace..
Also if t="name+" then temp[0]=name. I'm wondering what is difference between name+ and +name. Why do I get different output.
simply loop thru the items in array like the one below and remove white space
for (int i = 0; i < temp.length; i++){
temp[i] = if(!temp[i].trim().equals("") || temp[i]!=null)temp[i].trim();
}
The value of the first array item is not a space (" ") but an empty string (""). The following snippet demonstrates the behaviour and provides a workaround: I simply strip leading delimiters from the input. Note, that this should never be used for processing csv files, because a leading delimiter will create an empty column value which is usually wanted.
for (String s : "+name".split("\\+")) {
System.out.printf("'%s'%n", s);
}
System.out.println();
for (String s : "name+".split("\\+")) {
System.out.printf("'%s'%n", s);
}
System.out.println();
for (String s : "+name".replaceAll("^\\+", "").split("\\+")) {
System.out.printf("'%s'%n", s);
}
You get the extra element for "+name"'s case is because of non-empty value "name" after the delimiter.
The split() function only "trims" the trailing delimiters that result to empty elements at the end of an array. See JavaSE Manual.
Examples of .split("\\+") output:
"+++++" = { } // zero length array because all are trailing delimiters
"+name+" = { "", "name" } // trailing delimiter removed
"name+++++" = { "name" } // trailing delimiter removed
"name+" = { "name" } // trailing delimiter removed
"++name+" = { "", "", "name" } // trailing delimiter removed
I would suggest preventing to have those extra delimiters on both ends rather than cleaning up afterwards.
to remove white space
str.replaceAll("\\W","").
String yourString = "name +";
yourString = yourString.replaceAll("\\W", "");
yourArray = yourString.split("\\+");
For a one liner :
String temp[] = t.replaceAll("(^\\++)?(\\+)?(\\+*)?", "$2").split("\\+");
This will replace all multiple plus signs by one, or a plus sign at the start by empty String, and then split on plus signs.
Which will basically eliminate empty Strings in the result.
split(String regex) is equivalent to split(String regex, int limit) with limit = 0. And the documentation of the latter states :
If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
Which is why a '+' at the start works differently than a '+' at the end
You might want to give guavas Splitter a try. It has a nice fluent api to deal with emptyStrings, trim(), etc.
#Test
public void test() {
final String t1 = "+name";
final String t2 = "name+";
assertThat(split(t1), hasSize(1));
assertThat(split(t1).get(0), is("name"));
assertThat(split(t2), hasSize(1));
assertThat(split(t2).get(0), is("name"));
}
private List<String> split(final String sequence) {
final Splitter splitter = Splitter.on("+").omitEmptyStrings().trimResults();
return Lists.newArrayList(splitter.split(sequence));
}

Java Split not working as expected

I am trying to use a simple split to break up the following string: 00-00000
My expression is: ^([0-9][0-9])(-)([0-9])([0-9])([0-9])([0-9])([0-9])
And my usage is:
String s = "00-00000";
String pattern = "^([0-9][0-9])(-)([0-9])([0-9])([0-9])([0-9])([0-9])";
String[] parts = s.split(pattern);
If I play around with the Pattern and Matcher classes I can see that my pattern does match and the matcher tells me my groupCount is 7 which is correct. But when I try and split them I have no luck.
String.split does not use capturing groups as its result. It finds whatever matches and uses that as the delimiter. So the resulting String[] are substrings in between what the regex matches. As it is the regex matches the whole string, and with the whole string as a delimiter there is nothing else left so it returns an empty array.
If you want to use regex capturing groups you will have to use Matcher.group(), String.split() will not do.
for your example, you could simply do this:
String s = "00-00000";
String pattern = "-";
String[] parts = s.split(pattern);
I can not be sure, but I think what you are trying to do is to get each matched group into an array.
Matcher matcher = Pattern.compile(pattern).matcher();
if (matcher.matches()) {
String s[] = new String[matcher.groupCount()) {
for (int i=0;i<matches.groupCount();i++) {
s[i] = matcher.group(i);
}
}
}
From the documentation:
String[] split(String regex) -- Returns: the array of strings computed by splitting this string around matches of the given regular expression
Essentially the regular expression is used to define delimiters in the input string. You can use capturing groups and backreferences in your pattern (e.g. for lookarounds), but ultimately what matters is what and where the pattern matches, because that defines what goes into the returned array.
If you want to split your original string into 7 parts using regular expression, then you can do something like this:
String s = "12-3456";
String[] parts = s.split("(?!^)");
System.out.println(parts.length); // prints "7"
for (String part : parts) {
System.out.println("Part [" + part + "]");
} // prints "[1] [2] [-] [3] [4] [5] [6] "
This splits on zero-length matching assertion (?!^), which is anywhere except before the first character in the string. This prevents the empty string to be the first element in the array, and trailing empty string is already discarded because we use the default limit parameter to split.
Using regular expression to get individual character of a string like this is an overkill, though. If you have only a few characters, then the most concise option is to use foreach on the toCharArray():
for (char ch : "12-3456".toCharArray()) {
System.out.print("[" + ch + "] ");
}
This is not the most efficient option if you have a longer string.
Splitting on -
This may also be what you're looking for:
String s = "12-3456";
String[] parts = s.split("-");
System.out.println(parts.length); // prints "2"
for (String part : parts) {
System.out.print("[" + part + "] ");
} // prints "[12] [3456] "

Categories

Resources