Remove empty Strings after splitting a StringBuilder into Array Java

Remove empty Strings after splitting a StringBuilder into Array Java - java

Sorry if this question has already been asked, but I could only find results of c#.
So I have this StringBuilder:
StringBuilder sb = new StringBuilder(" 111 11 ");
and I want to split it into an array using this method:
String[] ar = sb.toString().split(" ");
As expected the result array has some empty entries. My question is if I can remove these empty spaces directly when I split the StringBuilder or I have to do it afterwards.

split takes a regex. So:
String[] ar = sb.toString().split("\\s+");
The string \\s is regexp-ese for 'any whitespace', and the + is: 1 or more of it. If you want to split on spaces only (and not on newlines, tabs, etc), try: String[] ar = sb.toString().split(" +"); which is literally: "split on one or more spaces".
This trick works for just about any separator. For example, split on commas? Try: .split("\\s*,\\s*"), which is: 0 or more whitespace, a comma, followed by 0 or more whitespace (and regexes take as much as they can).
Note that this trick does NOT get rid of leading and trailing whitespace. But to do that, use trim. Putting it all together:
String[] ar = sb.toString().trim().split("\\s+");
and for commas:
String[] ar = sb.toString().trim().split("\\s*,\\s*");

I would use guava for this:
String t = " 111 11 ";
Splitter.on(Pattern.compile("\\s+"))
.omitEmptyStrings()
.split(t)
.forEach(System.out::println);

If you do not want to depend on any third party dependencies and do not want to regex filtering,
You can do it in one line with Java 8 Streams API:
Arrays.stream(sb.toString().trim().split(" ")).filter(s-> !s.equals("")).map(s -> s.trim()).toArray();
For a detailed multiline version of the previous:
Arrays.stream(sb.toString()
.trim() // Trim the starting and ending whitespaces from string
.split(" ")) // Split the regarding to spaces
.filter(s-> !s.equals("")) // Filter the non-empty elements from the stream
.map(s -> s.trim()) // Trim the starting and ending whitespaces from element
.toArray(); // Collect the elements to object array
Here is the working code for demonstration:
StringBuilder sb = new StringBuilder(" 111 11 ");
Object[] array = Arrays.stream(sb.toString().trim().split(" ")).filter(s-> !s.equals("")).map(s -> s.trim()).toArray();
System.out.println("(" + array[0] + ")");
System.out.println("(" + array[1] + ")");

There is couple of regex to deal with it, i would also prefer #rzwitserloot method,
but if you would like to see more.
Check it here : How do I split a string with any whitespace chars as delimiters?
glenatron has explained it :
In most regex dialects there are a set of convenient character summaries you can use for this kind of thing - these are good ones to remember:
\w - Matches any word character.
\W - Matches any nonword character.
\s - Matches any white-space character.
\S - Matches anything but white-space characters.
\d - Matches any digit.
\D - Matches anything except digits.
A search for "Regex Cheatsheets" should reward you with a whole lot of useful summaries.
Thanks to glenatron

You can use turnkey solution from Apache Commons.
Here is an example:
StringBuilder sb = new StringBuilder(" 111 11 ");
String trimmedString = StringUtils.normalizeSpace(sb.toString());
String[] trimmedAr = trimmedString.split(" ");
System.out.println(Arrays.toString(trimmedAr));
Output: [111, 11].

Related

Java split using regex lookahead - character not followed by character

I need to split the string to the substings in order to sort them to quoted and not quoted ones. The single quote character is used as a separator, and two single quotes represents an escape sequence and means that they shall not be used for splitting.
For example:
"111 '222''22' 3333"
shall be splitted as
"111", "222''22", "3333"
no matter with or without whitespaces.
So, I wrote the following code, but it does not work. Tried lookbehind with "\\'(?<!\\')" as well, but with no success. Please help
String rgxSplit="\\'(?!\\')";
String text="";
Scanner s=new Scanner(System.in);
System.out.println("\""+rgxSplit+"\"");
text=s.nextLine();
while(!text.equals(""))
{
String [] splitted=text.split(rgxSplit);
for(int i=0;i<splitted.length;i++)
{
if(i%2==0)
{
System.out.println("+" + splitted[i]);
}
else
{
System.out.println("-" + splitted[i]);
}
}
text=s.nextLine();
}
Output:
$ java ParseTest
"\'(?!\')"
111 '222''22' 3333
+111
-222'
+22
- 3333

This should split on a single quote (when it is not doubled), and in the case of three consecutive, it will group the first two and will split on the third.
String [] splitted=text.split("(?<!') *' *(?!')|(?<='') *' *");

To split on single apostrophes use look arounds both sides of the apostrophe:
String[] parts = str.split(" *(?<!')'(?!') *");
See live demo on ideone.

Regex - Replace strings containing only one but repeating char

I want to replaceAll strings like:
"aaaa"
"zzzzzzz"
"----------"
"TTTTTT"
"...."
String contains only one char, but > 3 times.
I use Java. I can replace a specific char (like "a") with more than 3 times, but don't know how to do this with any char:
str = str.replaceAll("^[a]{4,}$", "");
Any idea? If this can't be done in regex, how would you do it?

Any char can be matched with . and Pattern.DOTALL modifier.
To check if it is the same, we can capture the first character and use a backreference to match the same text, and a limiting quantifier {3,} to check for at least 3 occurrences of the same substring.
See a regex and IDEONE demo:
List<String> strs = Arrays.asList("aaaa", "zzzzzzz", "----------", "TTTTTT", "....");
for (String str : strs)
System.out.println("\"" + str.replaceAll("(?s)^(.)\\1{3,}$", "") + "\"");

why split() produces extra , after sets limit -1

I want to split Area Code and preceding number from Telephone number without brackets so i did this.
String pattern = "[\\(?=\\)]";
String b = "(079)25894029".trim();
String c[] = b.split(pattern,-1);
for (int a = 0; a < c.length; a++)
System.out.println("c[" + a + "]::->" + c[a] + "\nLength::->"+ c[a].length());
Output:
c[0]::-> Length::->0
c[1]::->079 Length::->3
c[2]::->25894029 Length::->8
Expected Output:
c[0]::->079 Length::->3
c[1]::->25894029 Length::->8
So my question is why split() produces and extra blank at the start, e.g
[, 079, 25894029]. Is this its behavior, or I did something go wrong here?
How can I get my expected outcome?

First you have unnecessary escaping inside your character class. Your regex is same as:
String pattern = "[(?=)]";
Now, you are getting an empty result because ( is the very first character in the string and split at 0th position will indeed cause an empty string.
To avoid that result use this code:
String str = "(079)25894029";
toks = (Character.isDigit(str.charAt(0))? str:str.substring(1)).split( "[(?=)]" );
for (String tok: toks)
System.out.printf("<<%s>>%n", tok);
Output:
<<079>>
<<25894029>>

From the Java8 Oracle docs:
When there is a positive-width match at the beginning of this string
then an empty leading substring is included at the beginning of the
resulting array. A zero-width match at the beginning however never
produces such empty leading substring.
You can check that the first character is an empty string, if yes then trim that empty string character.

Your regex has problems, as does your approach - you can't solve it using your approach with any regex. The magic one-liner you seek is:
String[] c = b.replaceAll("^\\D+|\\D+$", "").split("\\D+");
This removes all leading/trailing non-digits, then splits on non-digits. This will handle many different formats and separators (try a few yourself).
See live demo of this:
String b = "(079)25894029".trim();
String[] c = b.replaceAll("^\\D+|\\D+$", "").split("\\D+");
System.out.println(Arrays.toString(c));
Producing this:
[079, 25894029]

Extracting numbers into a string array

I have a string which is of the form
String str = "124333 is the otp of candidate number 9912111242.
Please refer txn id 12323335465645 while referring blah blah.";
I need 124333, 9912111242 and 12323335465645 in a string array. I have tried this with
while (Character.isDigit(sms.charAt(i)))
I feel that running the above said method on every character is inefficient. Is there a way I can get a string array of all the numbers?

Use a regex (see Pattern and matcher):
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher(<your string here>);
while (m.find()) {
//m.group() contains the digits you want
}
you can easily build ArrayList that contains each matched group you find.
Or, as other suggested, you can split on non-digits characters (\D):
"blabla 123 blabla 345".split("\\D+")
Note that \ has to be escaped in Java, hence the need of \\.

You can use String.split():
String[] nbs = str.split("[^0-9]+");
This will split the String on any group of non-numbers digits.

And this works perfectly for your input.
String str = "124333 is the otp of candidate number 9912111242. Please refer txn id 12323335465645 while referring blah blah.";
System.out.println(Arrays.toString(str.split("\\D+")));
Output:
[124333, 9912111242, 12323335465645]
\\D+ Matches one or more non-digit characters. Splitting the input according to one or more non-digit characters will give you the desired output.

Java 8 style:
long[] numbers = Pattern.compile("\\D+")
.splitAsStream(str)
.mapToLong(Long::parseLong)
.toArray();
Ah if you only need a String array, then you can just use String.split as the other answers suggests.

Alternatively, you can try this:
String str = "124333 is the otp of candidate number 9912111242. Please refer txn id 12323335465645 while referring blah blah.";
str = str.replaceAll("\\D+", ",");
System.out.println(Arrays.asList(str.split(",")));
\\D+ matches one or more non digits
Output
[124333, 9912111242, 12323335465645]

First thing comes into my mind is filter and split, then i realized that it can be done via
String[] result =str.split("\\D+");
\D matches any non-digit character, + says that one or more of these are needed, and leading \ escapes the other \ since \D would be parsed as 'escape character D' which is invalid

Java parsing a string with lots of whitespace

I have a string with multiple spaces, but when I use the tokenizer it breaks it apart at all of those spaces. I need the tokens to contain those spaces. How can I utilize the StringTokenizer to return the values with the tokens I am splitting on?

You'll note in the docs for the StringTokenizer that it is recommended it shouldn't be used for any new code, and that String.split(regex) is what you want
String foo = "this is some data in a string";
String[] bar = foo.split("\\s+");
Edit to add: Or, if you have greater needs than a simple split, then use the Pattern and Matcher classes for more complex regular expression matching and extracting.
Edit again: If you want to preserve your space, actually knowing a bit about regular expressions really helps:
String[] bar = foo.split("\\b+");
This will split on word boundaries, preserving the space between each word as a String;
public static void main( String[] args )
{
String foo = "this is some data in a string";
String[] bar = foo.split("\\b");
for (String s : bar)
{
System.out.print(s);
if (s.matches("^\\s+$"))
{
System.out.println("\t<< " + s.length() + " spaces");
}
else
{
System.out.println();
}
}
}
Output:
this
<< 1 spaces
is
<< 6 spaces
some
<< 2 spaces
data
<< 6 spaces
in
<< 3 spaces
a
<< 1 spaces
string

Sounds like you may need to use regular expressions (http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/package-summary.html) instead of StringTokenizer.

Use String.split("\\s+") instead of StringTokenizer.
Note that this will only extract the non-whitespace characters separated by at least one whitespace character, if you want leading/trailing whitespace characters included with the non-whitespace characters that will be a completely different solution!
This requirement isn't clear from your original question, and there is an edit pending that tries to clarify it.
StringTokenizer in almost every non-contrived case is the wrong tool for the job.

I think It will be good if you use first replaceAll function to replace all the multiple spaces by a single space and then do tokenization using split function.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Remove empty Strings after splitting a StringBuilder into Array Java - java

I would use guava for this: String t = " 111 11 "; Splitter.on(Pattern.compile("\\s+")) .omitEmptyStrings() .split(t) .forEach(System.out::println);

You can use turnkey solution from Apache Commons. Here is an example: StringBuilder sb = new StringBuilder(" 111 11 "); String trimmedString = StringUtils.normalizeSpace(sb.toString()); String[] trimmedAr = trimmedString.split(" "); System.out.println(Arrays.toString(trimmedAr)); Output: [111, 11].

Related

Java split using regex lookahead - character not followed by character

Regex - Replace strings containing only one but repeating char

why split() produces extra , after sets limit -1

Extracting numbers into a string array

Java parsing a string with lots of whitespace

Categories

Resources