Java. How to remove white space on array

Java. How to remove white space on array - java

For example
I split a string "+name" by +. I got an white space" " and the "name" in the array(this doesn't happen if my string is "name+").
t="+name";
String[] temp=t.split("\\+");
the above code produces
temp[0]=" "
temp[1]=name
I only wants to get "name" without whitespace..
Also if t="name+" then temp[0]=name. I'm wondering what is difference between name+ and +name. Why do I get different output.

simply loop thru the items in array like the one below and remove white space
for (int i = 0; i < temp.length; i++){
temp[i] = if(!temp[i].trim().equals("") || temp[i]!=null)temp[i].trim();
}

The value of the first array item is not a space (" ") but an empty string (""). The following snippet demonstrates the behaviour and provides a workaround: I simply strip leading delimiters from the input. Note, that this should never be used for processing csv files, because a leading delimiter will create an empty column value which is usually wanted.
for (String s : "+name".split("\\+")) {
System.out.printf("'%s'%n", s);
}
System.out.println();
for (String s : "name+".split("\\+")) {
System.out.printf("'%s'%n", s);
}
System.out.println();
for (String s : "+name".replaceAll("^\\+", "").split("\\+")) {
System.out.printf("'%s'%n", s);
}

You get the extra element for "+name"'s case is because of non-empty value "name" after the delimiter.
The split() function only "trims" the trailing delimiters that result to empty elements at the end of an array. See JavaSE Manual.
Examples of .split("\\+") output:
"+++++" = { } // zero length array because all are trailing delimiters
"+name+" = { "", "name" } // trailing delimiter removed
"name+++++" = { "name" } // trailing delimiter removed
"name+" = { "name" } // trailing delimiter removed
"++name+" = { "", "", "name" } // trailing delimiter removed
I would suggest preventing to have those extra delimiters on both ends rather than cleaning up afterwards.

to remove white space
str.replaceAll("\\W","").

String yourString = "name +";
yourString = yourString.replaceAll("\\W", "");
yourArray = yourString.split("\\+");

For a one liner :
String temp[] = t.replaceAll("(^\\++)?(\\+)?(\\+*)?", "$2").split("\\+");
This will replace all multiple plus signs by one, or a plus sign at the start by empty String, and then split on plus signs.
Which will basically eliminate empty Strings in the result.
split(String regex) is equivalent to split(String regex, int limit) with limit = 0. And the documentation of the latter states :
If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
Which is why a '+' at the start works differently than a '+' at the end

You might want to give guavas Splitter a try. It has a nice fluent api to deal with emptyStrings, trim(), etc.
#Test
public void test() {
final String t1 = "+name";
final String t2 = "name+";
assertThat(split(t1), hasSize(1));
assertThat(split(t1).get(0), is("name"));
assertThat(split(t2), hasSize(1));
assertThat(split(t2).get(0), is("name"));
}
private List<String> split(final String sequence) {
final Splitter splitter = Splitter.on("+").omitEmptyStrings().trimResults();
return Lists.newArrayList(splitter.split(sequence));
}

Related

Java regex: Replace all characters with `+` except instances of a given string

I have the following problem which states
Replace all characters in a string with + symbol except instances of the given string in the method
so for example if the string given was abc123efg and they want me to replace every character except every instance of 123 then it would become +++123+++.
I figured a regular expression is probably the best for this and I came up with this.
str.replaceAll("[^str]","+")
where str is a variable, but its not letting me use the method without putting it in quotations. If I just want to replace the variable string str how can I do that? I ran it with the string manually typed and it worked on the method, but can I just input a variable?
as of right now I believe its looking for the string "str" and not the variable string.
Here is the output its right for so many cases except for two :(
List of open test cases:
plusOut("12xy34", "xy") → "++xy++"
plusOut("12xy34", "1") → "1+++++"
plusOut("12xy34xyabcxy", "xy") → "++xy++xy+++xy"
plusOut("abXYabcXYZ", "ab") → "ab++ab++++"
plusOut("abXYabcXYZ", "abc") → "++++abc+++"
plusOut("abXYabcXYZ", "XY") → "++XY+++XY+"
plusOut("abXYxyzXYZ", "XYZ") → "+++++++XYZ"
plusOut("--++ab", "++") → "++++++"
plusOut("aaxxxxbb", "xx") → "++xxxx++"
plusOut("123123", "3") → "++3++3"

Looks like this is the plusOut problem on CodingBat.
I had 3 solutions to this problem, and wrote a new streaming solution just for fun.
Solution 1: Loop and check
Create a StringBuilder out of the input string, and check for the word at every position. Replace the character if doesn't match, and skip the length of the word if found.
public String plusOut(String str, String word) {
StringBuilder out = new StringBuilder(str);
for (int i = 0; i < out.length(); ) {
if (!str.startsWith(word, i))
out.setCharAt(i++, '+');
else
i += word.length();
}
return out.toString();
}
This is probably the expected answer for a beginner programmer, though there is an assumption that the string doesn't contain any astral plane character, which would be represented by 2 char instead of 1.
Solution 2: Replace the word with a marker, replace the rest, then restore the word
public String plusOut(String str, String word) {
return str.replaceAll(java.util.regex.Pattern.quote(word), "#").replaceAll("[^#]", "+").replaceAll("#", word);
}
Not a proper solution since it assumes that a certain character or sequence of character doesn't appear in the string.
Note the use of Pattern.quote to prevent the word being interpreted as regex syntax by replaceAll method.
Solution 3: Regex with \G
public String plusOut(String str, String word) {
word = java.util.regex.Pattern.quote(word);
return str.replaceAll("\\G((?:" + word + ")*+).", "$1+");
}
Construct regex \G((?:word)*+)., which does more or less what solution 1 is doing:
\G makes sure the match starts from where the previous match leaves off
((?:word)*+) picks out 0 or more instance of word - if any, so that we can keep them in the replacement with $1. The key here is the possessive quantifier *+, which forces the regex to keep any instance of the word it finds. Otherwise, the regex will not work correctly when the word appear at the end of the string, as the regex backtracks to match .
. will not be part of any word, since the previous part already picks out all consecutive appearances of word and disallow backtrack. We will replace this with +
Solution 4: Streaming
public String plusOut(String str, String word) {
return String.join(word,
Arrays.stream(str.split(java.util.regex.Pattern.quote(word), -1))
.map((String s) -> s.replaceAll("(?s:.)", "+"))
.collect(Collectors.toList()));
}
The idea is to split the string by word, do the replacement on the rest, and join them back with word using String.join method.
Same as above, we need Pattern.quote to avoid split interpreting the word as regex. Since split by default removes empty string at the end of the array, we need to use -1 in the second parameter to make split leave those empty strings alone.
Then we create a stream out of the array and replace the rest as strings of +. In Java 11, we can use s -> String.repeat(s.length()) instead.
The rest is just converting the Stream to an Iterable (List in this case) and joining them for the result

This is a bit trickier than you might initially think because you don't just need to match characters, but the absence of specific phrase - a negated character set is not enough. If the string is 123, you would need:
(?<=^|123)(?!123).*?(?=123|$)
https://regex101.com/r/EZWMqM/1/
That is - lookbehind for the start of the string or "123", make sure the current position is not followed by 123, then lazy-repeat any character until lookahead matches "123" or the end of the string. This will match all characters which are not in a "123" substring. Then, you need to replace each character with a +, after which you can use appendReplacement and a StringBuffer to create the result string:
String inputPhrase = "123";
String inputStr = "abc123efg123123hij";
StringBuffer resultString = new StringBuffer();
Pattern regex = Pattern.compile("(?<=^|" + inputPhrase + ")(?!" + inputPhrase + ").*?(?=" + inputPhrase + "|$)");
Matcher m = regex.matcher(inputStr);
while (m.find()) {
String replacement = m.group(0).replaceAll(".", "+");
m.appendReplacement(resultString, replacement);
}
m.appendTail(resultString);
System.out.println(resultString.toString());
Output:
+++123+++123123+++
Note that if the inputPhrase can contain character with a special meaning in a regular expression, you'll have to escape them first before concatenating into the pattern.

You can do it in one line:
input = input.replaceAll("((?:" + str + ")+)?(?!" + str + ").((?:" + str + ")+)?", "$1+$2");
This optionally captures "123" either side of each character and puts them back (a blank if there's no "123"):

So instead of coming up with a regular expression that matches the absence of a string. We might as well just match the selected phrase and append + the number of skipped characters.
StringBuilder sb = new StringBuilder();
Matcher m = Pattern.compile(Pattern.quote(str)).matcher(input);
while (m.find()) {
for (int i = 0; i < m.start(); i++) sb.append('+');
sb.append(str);
}
int remaining = input.length() - sb.length();
for (int i = 0; i < remaining; i++) {
sb.append('+');
}

Absolutely just for the fun of it, a solution using CharBuffer (unexpectedly it took a lot more that I initially hoped for):
private static String plusOutCharBuffer(String input, String match) {
int size = match.length();
CharBuffer cb = CharBuffer.wrap(input.toCharArray());
CharBuffer word = CharBuffer.wrap(match);
int x = 0;
for (; cb.remaining() > 0;) {
if (!cb.subSequence(0, size < cb.remaining() ? size : cb.remaining()).equals(word)) {
cb.put(x, '+');
cb.clear().position(++x);
} else {
cb.clear().position(x = x + size);
}
}
return cb.clear().toString();
}

To make this work you need a beast of a pattern. Let's say you you are operating on the following test case as an example:
plusOut("abXYxyzXYZ", "XYZ") → "+++++++XYZ"
What you need to do is build a series of clauses in your pattern to match a single character at a time:
Any character that is NOT "X", "Y" or "Z" -- [^XYZ]
Any "X" not followed by "YZ" -- X(?!YZ)
Any "Y" not preceded by "X" -- (?<!X)Y
Any "Y" not followed by "Z" -- Y(?!Z)
Any "Z" not preceded by "XY" -- (?<!XY)Z
An example of this replacement can be found here: https://regex101.com/r/jK5wU3/4
Here is an example of how this might work (most certainly not optimized, but it works):
import java.util.regex.Pattern;
public class Test {
public static void plusOut(String text, String exclude) {
StringBuilder pattern = new StringBuilder("");
for (int i=0; i<exclude.length(); i++) {
Character target = exclude.charAt(i);
String prefix = (i > 0) ? exclude.substring(0, i) : "";
String postfix = (i < exclude.length() - 1) ? exclude.substring(i+1) : "";
// add the look-behind (?<!X)Y
if (!prefix.isEmpty()) {
pattern.append("(?<!").append(Pattern.quote(prefix)).append(")")
.append(Pattern.quote(target.toString())).append("|");
}
// add the look-ahead X(?!YZ)
if (!postfix.isEmpty()) {
pattern.append(Pattern.quote(target.toString()))
.append("(?!").append(Pattern.quote(postfix)).append(")|");
}
}
// add in the other character exclusion
pattern.append("[^" + Pattern.quote(exclude) + "]");
System.out.println(text.replaceAll(pattern.toString(), "+"));
}
public static void main(String [] args) {
plusOut("12xy34", "xy");
plusOut("12xy34", "1");
plusOut("12xy34xyabcxy", "xy");
plusOut("abXYabcXYZ", "ab");
plusOut("abXYabcXYZ", "abc");
plusOut("abXYabcXYZ", "XY");
plusOut("abXYxyzXYZ", "XYZ");
plusOut("--++ab", "++");
plusOut("aaxxxxbb", "xx");
plusOut("123123", "3");
}
}
UPDATE: Even this doesn't quite work because it can't deal with exclusions that are just repeated characters, like "xx". Regular expressions are most definitely not the right tool for this, but I thought it might be possible. After poking around, I'm not so sure a pattern even exists that might make this work.

The problem in your solution that you put a set of instance string str.replaceAll("[^str]","+") which it will exclude any character from the variable str and that will not solve your problem
EX: when you try str.replaceAll("[^XYZ]","+") it will exclude any combination of character X , character Y and character Z from your replacing method so you will get "++XY+++XYZ".
Actually you should exclude a sequence of characters instead in str.replaceAll.
You can do it by using capture group of characters like (XYZ) then use a negative lookahead to match a string which does not contain characters sequence : ^((?!XYZ).)*$
Check this solution for more info about this problem but you should know that it may be complicated to find regular expression to do that directly.
I have found two simple solutions for this problem :
Solution 1:
You can implement a method to replace all characters with '+' except the instance of given string:
String exWord = "XYZ";
String str = "abXYxyzXYZ";
for(int i = 0; i < str.length(); i++){
// exclude any instance string of exWord from replacing process in str
if(str.substring(i, str.length()).indexOf(exWord) + i == i){
i = i + exWord.length()-1;
}
else{
str = str.substring(0,i) + "+" + str.substring(i+1);//replace each character with '+' symbol
}
}
Note : str.substring(i, str.length()).indexOf(exWord) + i this if statement will exclude any instance string of exWord from replacing process in str.
Output:
+++++++XYZ
Solution 2:
You can try this Approach using ReplaceAll method and it doesn't need any complex regular expression:
String exWord = "XYZ";
String str = "abXYxyzXYZ";
str = str.replaceAll(exWord,"*"); // replace instance string with * symbol
str = str.replaceAll("[^*]","+"); // replace all characters with + symbol except *
str = str.replaceAll("\\*",exWord); // replace * symbol with instance string
Note : This solution will work only if your input string str doesn't contain any * symbol.
Also you should escape any character with a special meaning in a regular expression in phrase instance string exWord like : exWord = "++".

How to split Strings in Java including blanks (like in Python)

I'm reading a comma-delimited list into Java where the elements may include blanks and single spaces. Here's a few sample lines:
,achieve,achievement,achievable,,, (note the space before the first comma)
agree,agreement,, ,agreeable,agreeably (note the space between commas)
,apartment,, (no spaces)
In Java, the resulting String[] from using line.split(",") changes all blank elements to spaces except trailing ones, which it omits, like this:
" ", "achieve", "achievement", "achievable"
"agree", "agreement", " ", " ", "agreeable", "agreeably"
" ", "apartment"
I need all blank elements to be rendered as empty strings and single space elements to be rendered as single spaces, like this:
" ", "achieve", "achievement", "achievable", "", "", ""
"agree", "agreement", "", " ", "agreeable", "agreeably"
"", "apartment", "", ""
How to do this in Java?

To avoid removing trailing empty elements use split(delimiter, limit) with negative limit value like
split(",", -1).
DEMO:
String[] tests = {
" ,achieve,achievement,achievable,,,",
"agree,agreement,, ,agreeable,agreeably",
",apartment,,"
};
for (String line : tests){
String[] elements = line.split(",", -1);
StringJoiner sj = new StringJoiner( "\", \"", "\"", "\"");
//delimiter, prefix, suffix
for (String element : elements){
sj.add(element);
}
System.out.println(sj);
}
Output:
" ", "achieve", "achievement", "achievable", "", "", ""
"agree", "agreement", "", " ", "agreeable", "agreeably"
"", "apartment", "", ""

If you want to split on commas AND any surrounding whitespace, you can use this
str.trim().split("\\s+,\\s+")

Here's a simple test program which I think illustrates what you are looking for:
public class s1 {
public static void main( String[] args ) {
// String si = " ,achieve,achievement,achievable,,,";
// String si = "agree,agreement,, ,agreeable,agreeably";
String si = ",apartment,,";
String[] so = si.split(" *, *", -1); /* split on comma and any space(s) next to it */
for (String s : so) {
System.out.println('"' + s + '"');
}
}
}

If you want to replicate the exact behaviour of Python's str.split(), you need to trim for spaces and then use the overload that accepts a regular expression to match on white spaces like this:
str.trim().split("\\s+")

line.split(",") works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.
Instead if you use public String[] split(String regex, int limit) and call it with line.split(",", <any negative int>) then the pattern will be applied as many times as possible and the array can have any length.
So you can call it like line.split(",", -9).
The following is what happens with different limit values:
limit = 0 : the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
limit > 0 : the pattern will be applied at most limit - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter
limit < 0 : the pattern will be applied as many times as possible and the array can have any length
Check the doc for more clarification.

How to use split function when input is new line?

The question is we have to split the string and write how many words we have.
Scanner in = new Scanner(System.in);
String st = in.nextLine();
String[] tokens = st.split("[\\W]+");
When I gave the input as a new line and printed the no. of tokens .I have got the answer as one.But i want it as zero.What should i do? Here the delimiters are all the symbols.

Short answer: To get the tokens in str (determined by whitespace separators), you can do the following:
String str = ... //some string
str = str.trim() + " "; //modify the string for the reasons described below
String[] tokens = str.split("\\s+");
Longer answer:
First of all, the argument to split() is the delimiter - in this case one or more whitespace characters, which is "\\s+".
If you look carefully at the Javadoc of String#split(String, int) (which is what String#split(String) calls), you will see why it behaves like this.
If the expression does not match any part of the input then the resulting array has just one element, namely this string.
This is why "".split("\\s+") would return an array with one empty string [""], so you need to append the space to avoid this. " ".split("\\s+") returns an empty array with 0 elements, as you want.
When there is a positive-width match at the beginning of this string then an empty leading substring is included at the beginning of the resulting array.
This is why " a".split("\\s+") would return ["", "a"], so you need to trim() the string first to remove whitespace from the beginning.
If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
Since String#split(String) calls String#split(String, int) with the limit argument of zero, you can add whitespace to the end of the string without changing the number of words (because trailing empty strings will be discarded).
UPDATE:
If the delimiter is "\\W+", it's slightly different because you can't use trim() for that:
String str = ...
str = str.replaceAll("^\\W+", "") + " ";
String[] tokens = str.split("\\W+");

public static void main(String[] args) {
Scanner in = new Scanner(System.in);
String line = null;
while (!(line = in.nextLine()).isEmpty()) {
//logic
}
System.out.print("Empty Line");
}
output
Empty Line

Remove trailing substring from String in Java

I am looking to remove parts of a string if it ends in a certain string.
An example would be to take this string: "am.sunrise.ios#2x.png"
And remove the #2x.png so it looks like: "am.sunrise.ios"
How would I go about checking to see if the end of a string contains "#2x.png" and remove it?

You could check the lastIndexOf, and if it exists in the string, use substring to remove it:
String str = "am.sunrise.ios#2x.png";
String search = "#2x.png";
int index = str.lastIndexOf(search);
if (index > 0) {
str = str.substring(0, index);
}

Assuming you have a string initialized as String file = "am.sunrise.ios#2x.png";.
if(file.endsWith("#2x.png"))
file = file.substring(0, file.lastIndexOf("#2x.png"));
The endsWith(String) method returns a boolean determining if the string has a certain suffix. Depending on that you can replace the string with a substring of itself starting with the first character and ending before the index of the character that you are trying to remove.

private static String removeSuffixIfExists(String key, String suffix) {
return key.endswith(suffix)
? key.substring(0, key.length() - suffix.length())
: key;
}
}
String suffix = "#2x.png";
String key = "am.sunrise.ios#2x.png";
String output = removeSuffixIfExists(key, suffix);

public static void main(String [] args){
String word = "am.sunrise.ios#2x.png";
word = word.replace("#2x.png", "");
System.out.println(word);
}

If you want to generally remove entire content of string from # till end you can use
yourString = yourString.replaceAll("#.*","");
where #.* is regex (regular expression) representing substring starting with # and having any character after it (represented by .) zero or more times (represented by *).
In case there will be no #xxx part your string will be unchanged.
If you want to change only this particular substring #2x.png (and not substirng like #3x.png) while making sure that it is placed at end of your string you can use
yourString = yourString.replaceAll("#2x\\.png$","");
where
$ represents end of string
\\. represents . literal (we need to escape it since like shown earlier . is metacharacter representing any character)

Since I was trying to do this on an ArrayList of items similarly styled I ended up using the following code:
for (int image = 0; image < IBImages.size(); image++) {
IBImages.set(image, IBImages.get(image).split("~")[0].split("#")[0].split(".png")[0]);
}
If I have a list of images with the names
[am.sunrise.ios.png, am.sunrise.ios#2x.png, am.sunrise.ios#3x.png, am.sunrise.ios~ipad.png, am.sunrise.ios~ipad#2x.png]
This allows me to split the string into 2 parts.
For example, "am.sunrise.ios~ipad.png" will be split into "am.sunrise.ios" and "~ipad.png" if I split on "~". I can just get the first part back by referencing [0]. Therefore I get what I'm looking for in one line of code.
Note that image is "am.sunrise.ios~ipad.png"

You could use String.split():
public static void main(String [] args){
String word = "am.sunrise.ios#2x.png";
String[] parts = word.split("#");
if (parts.length == 2) {
System.out.println("looks like user#host...");
System.out.println("User: " + parts[0]);
System.out.println("Host: " + parts[1]);
}
}
Then you haven an array of Strings, where the first element contains the part before "#" and the second element the part after the "#".

Combining the answers 1 and 2:
String str = "am.sunrise.ios#2x.png";
String search = "#2x.png";
if (str.endsWith(search)) {
str = str.substring(0, str.lastIndexOf(search));
}

Java String.contains() to take care of natural numbers

I'm a computer science student learning Java, and as an exercise, we're doing a permutation algorhythm.
Now, i'm stuck at a point where i need to search for a natural number within a String full of numbers, splitted by a comma:
String myString = "0,1,2,10,14,";
The problem is i'm using...
myString.contains(String.valueOf(anInteger);
...to check for the presence of a specific number. This works for numbers from 0 to 9, but when looking for a more-than-1-digit number, the program does not recognize it as a natural number.
In other words, and as an example: "14" is not the integer 14, its just a string with an "1", and a "4"; so, if i run...
String myString = "0,1,2,10,14,";
if (myString.contains(myString.valueOf(4))) { doSomething(); }
...the "if" statement will be true, since the integer "4" is present in the string, as part of the natural number "14".
At this point, i've been searching through StackOverflow and other pages for a solution, and learnt i should use Pattern and Matcher.
My question is: what's the best way to do use them?
Relevant part of my code:
for (int i = 0; i<r; i++)
{
if (!act.contains(String.valueOf(i)))
{
...
}
...
}
I use this method several times in my code, so an exact substitution would be nice.
Thank you all in advance!

You only need a method call to matches():
if (myString.matches(".*\\b" + anInteger + "\\b.*"))
// string contains the number
This works using by creating a regex that has a word boundary (\b) at either end of the target number. The leading and trailing .* are required because matches() must match the whole string to return true.

Look into how to split a String into an array of String. So:
String[] splitStrings = myString.split(",")
ArrayList<Integer> parsedInts = new ArrayList<Integer>();
for (String str : splitStrings) {
parsedInts.add(Integer.parseInt(str));
}
then in your for loop:
if (parsedInts.contains(i)) {
// body
}

Something like this:
String myString = "0,1,2,10,14,";
String[] split = myString.split(",");
for (String string : split) {
int num = Integer.parseInt(string);
if (num == 4) {
System.out.println(num);
// ...
}
}

String myString = "0,1,2,10,14,2323232";
String[] allList = myString.split(",");
for (String string : allList) {
if(string.matches("[0-9]*"))
{
System.out.println("Its number with value "+string);
}
}

I think you need to pick all the numbers in the given string and find the permutation.
I think you need to Tokenize the given string with the Comma Separator.
When I do such program, I divide my logic to parse the String and write the logic in another method. Below is the snippet
String myString = "0,1,2,10,14,";
StringTokenizer st2 = new StringTokenizer(myString , ",");
while (st2.hasMoreElements()) {
doSomething(st2.nextElement());
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java. How to remove white space on array - java

simply loop thru the items in array like the one below and remove white space for (int i = 0; i < temp.length; i++){ temp[i] = if(!temp[i].trim().equals("") || temp[i]!=null)temp[i].trim(); }

to remove white space str.replaceAll("\\W","").

String yourString = "name +"; yourString = yourString.replaceAll("\\W", ""); yourArray = yourString.split("\\+");

Related

Java regex: Replace all characters with `+` except instances of a given string

How to split Strings in Java including blanks (like in Python)

How to use split function when input is new line?

Remove trailing substring from String in Java

Java String.contains() to take care of natural numbers

Categories

Resources