Java substring: 'string index out of range' - java

I'm guessing I'm getting this error because the string is trying to substring a null value. But wouldn't the ".length() > 0" part eliminate that issue?
Here is the Java snippet:
if (itemdescription.length() > 0) {
pstmt2.setString(3, itemdescription.substring(0,38));
}
else {
pstmt2.setString(3, "_");
}
I got this error:
java.lang.StringIndexOutOfBoundsException: String index out of range: 38
at java.lang.String.substring(Unknown Source)
at MASInsert2.itemimport(MASInsert2.java:192)
at MASInsert2.processRequest(MASInsert2.java:125)
at MASInsert2.doGet(MASInsert2.java:219)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:627)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:729)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:269)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
at org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:835)
at org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:640)
at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1286)
at java.lang.Thread.run(Unknown Source)

It is a pity that substring is not implemented in a way that handles short strings – like in other languages e.g. Python.
Ok, we cannot change that and have to consider this edge case every time we use substr, instead of if-else clauses I would go for this shorter variant:
myText.substring(0, Math.min(6, myText.length()))

I"m guessing i'm getting this error
because the string is trying to
substring a Null value. But wouldn't
the ".length() > 0" part eliminate
that issue?
No, calling itemdescription.length() when itemdescription is null would not generate a StringIndexOutOfBoundsException, but rather a NullPointerException since you would essentially be trying to call a method on null.
As others have indicated, StringIndexOutOfBoundsException indicates that itemdescription is not at least 38 characters long. You probably want to handle both conditions (I assuming you want to truncate):
final String value;
if (itemdescription == null || itemdescription.length() <= 0) {
value = "_";
} else if (itemdescription.length() <= 38) {
value = itemdescription;
} else {
value = itemdescription.substring(0, 38);
}
pstmt2.setString(3, value);
Might be a good place for a utility function if you do that a lot...

I would recommend apache commons lang. A one-liner takes care of the problem.
pstmt2.setString(3, StringUtils.defaultIfEmpty(
StringUtils.subString(itemdescription,0, 38), "_"));

You really need to check if the string's length is greater to or equal to 38.

Java's substring method fails when you try and get a substring starting at an index which is longer than the string.
An easy alternative is to use Apache Commons StringUtils.substring:
public static String substring(String str, int start)
Gets a substring from the specified String avoiding exceptions.
A negative start position can be used to start n characters from the end of the String.
A null String will return null. An empty ("") String will return "".
StringUtils.substring(null, *) = null
StringUtils.substring("", *) = ""
StringUtils.substring("abc", 0) = "abc"
StringUtils.substring("abc", 2) = "c"
StringUtils.substring("abc", 4) = ""
StringUtils.substring("abc", -2) = "bc"
StringUtils.substring("abc", -4) = "abc"
Parameters:
str - the String to get the substring from, may be null
start - the position to start from, negative means count back from the end of the String by this many characters
Returns:
substring from start position, null if null String input
Note, if you can't use Apache Commons lib for some reason, you could just grab the parts you need from the source
// Substring
//-----------------------------------------------------------------------
/**
* <p>Gets a substring from the specified String avoiding exceptions.</p>
*
* <p>A negative start position can be used to start {#code n}
* characters from the end of the String.</p>
*
* <p>A {#code null} String will return {#code null}.
* An empty ("") String will return "".</p>
*
* <pre>
* StringUtils.substring(null, *) = null
* StringUtils.substring("", *) = ""
* StringUtils.substring("abc", 0) = "abc"
* StringUtils.substring("abc", 2) = "c"
* StringUtils.substring("abc", 4) = ""
* StringUtils.substring("abc", -2) = "bc"
* StringUtils.substring("abc", -4) = "abc"
* </pre>
*
* #param str the String to get the substring from, may be null
* #param start the position to start from, negative means
* count back from the end of the String by this many characters
* #return substring from start position, {#code null} if null String input
*/
public static String substring(final String str, int start) {
if (str == null) {
return null;
}
// handle negatives, which means last n characters
if (start < 0) {
start = str.length() + start; // remember start is negative
}
if (start < 0) {
start = 0;
}
if (start > str.length()) {
return EMPTY;
}
return str.substring(start);
}

substring(0,38) means the String has to be 38 characters or longer. If not, the "String index is out of range".

if (itemdescription != null && itemdescription.length() > 0) {
pstmt2.setString(3, itemdescription.substring(0, Math.min(itemdescription.length(), 38)));
} else {
pstmt2.setString(3, "_");
}

I'm assuming your column is 38 characters in length, so you want to truncate itemdescription to fit within the database. A utility function like the following should do what you want:
/**
* Truncates s to fit within len. If s is null, null is returned.
**/
public String truncate(String s, int len) {
if (s == null) return null;
return s.substring(0, Math.min(len, s.length()));
}
then you just call it like so:
String value = "_";
if (itemdescription != null && itemdescription.length() > 0) {
value = truncate(itemdescription, 38);
}
pstmt2.setString(3, value);

itemdescription is shorter than 38 chars. Which is why the StringOutOfBoundsException is being thrown.
Checking .length() > 0 simply makes sure the String has some not-null value, what you need to do is check that the length is long enough. You could try:
if(itemdescription.length() > 38)
...

You must check the String length. You assume that you can do substring(0,38) as long as String is not null, but you actually need the String to be of at least 38 characters length.

When this is appropriate, I use matches instead of substring.
With substring:
if( myString.substring(1,17).equals("Someting I expect") ) {
// Do stuff
}
// Does NOT work if myString is too short
With matches (must use Regex notation):
if( myString.matches("Someting I expect.*") ) {
// Do stuff
}
// This works with all strings

Should anyone face the same problem.
Do this:
str.substring (...(trim()) ;
Hope it helps somebodies 😎

You get this if itemdescription is shorter than 38 characters
You can look which exceptions are thrown and when in the JAVA API
in you case for String#substring(int,int): https://docs.oracle.com/javase/9/docs/api/java/lang/String.html#substring-int-int-
substring
public String substring(int beginIndex, int endIndex)
. . .
Throws:
IndexOutOfBoundsException
if the beginIndex is negative,
or endIndex is larger than the length of this String object,
or beginIndex is larger than endIndex.
(same applies to previous java versions as well)

Related

String concatenation and + operator [duplicate]

This question already has answers here:
String concatenation: concat() vs "+" operator
(12 answers)
Closed 8 years ago.
I was trying out string concatenation and the '+' operator on a string and encountered the following-
String xyz = "Hello" + null;
System.out.println("xyz= " +xyz);
String abc= "Hello".concat(null);
System.out.println("abc= " +abc);
The output for the first one was : Hellonull
The output for the second one was a Null Pointer exception
I don't understand why there were two different outputs.
When you concatenate null by + operator, it is always converted to "null" String. This explains the first output Hellonull.
The concat function looks internally like this:
public String concat(String s) {
int i = s.length();
if (i == 0) {
return this;
} else {
char ac[] = new char[count + i];
getChars(0, count, ac, 0);
s.getChars(0, i, ac, count);
return new String(0, count + i, ac);
}
}
Source: String concatenation: concat() vs "+" operator
As you can see, it calls s.length(), which in your case means null.length(); which causes the NullPointerException for your String abc= "Hello".concat(null); statement.
Edit: I just decompiled my own String.concat(String s) function and its implementation looks a little bit different, but the reason for the NullPointerException stays the same.
From Docs
If the reference is null, it is converted to the string "null" (four ASCII characters n, u, l, l).
Otherwise, the conversion is performed as if by an invocation of the toString method of the referenced object with no arguments; but if the result of invoking the toString method is null, then the string "null" is used instead.
"Hello" + null returns the same result as "Hello".concat(String.valueOf(null)).
String.valueOf(null) returns the string "null".
/**
* Concatenates this string and the specified string.
*
* #param string
* the string to concatenate
* #return a new string which is the concatenation of this string and the
* specified string.
*/
public String concat(String string) {
if (string.count > 0 && count > 0) {
char[] buffer = new char[count + string.count];
System.arraycopy(value, offset, buffer, 0, count);
System.arraycopy(string.value, string.offset, buffer, count, string.count);
return new String(0, buffer.length, buffer);
}
return count == 0 ? string : this;
}
the source code's first line in contact function calls the null's count. So it will throw Null Pointer exception.
Calling concat() on a null reference gives NPE, hence different results as "+" operator treats null reference as "null".

Confusing output from String.split

I do not understand the output of this code:
public class StringDemo{
public static void main(String args[]) {
String blank = "";
String comma = ",";
System.out.println("Output1: "+blank.split(",").length);
System.out.println("Output2: "+comma.split(",").length);
}
}
And got the following output:
Output1: 1
Output2: 0
Documentation:
For: System.out.println("Output1: "+blank.split(",").length);
The array returned by this method contains each substring of this string that is terminated by another substring that matches the given expression or is terminated by the end of the string. The substrings in the array are in the order in which they occur in this string. If the expression does not match any part of the input then the resulting array has just one element, namely this string.
It will simply return the entire string that's why it returns 1.
For the second case, String.split will discard the , so the result will be empty.
String.split silently discards trailing separators
see guava StringsExplained too
Everything happens according to plan, but let's do it step by step (I hope you have some time).
According to documentation (and source code) of split(String regex) method:
This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero.
So when you invoke
split(String regex)
you are actually getting result from the split(String regex, int limit) method which is invoked in a way:
split(regex, 0)
So here limit is set to 0.
You need to know a few things about this parameter:
If limit is positive you are limiting length of result array to a positive number you specified, so "axaxaxaxa".split("x",2) will return an array, ["a", "axaxaxa"], not ["a","a","a","a","a"].
If limit is 0 then you are not limiting the length of the result array. But it also means that any trailing empty strings will be removed. For example:
"fooXbarX".split("X")
will at start generate an array which will look like:
["foo", "bar", ""]
("barX" split on "X" generates "bar" and ""), but since split removes all trailing empty string, it will return
["foo", "bar"]
Behaviour of negative value of limit is similar to behaviour where limit is set to 0 (it will not limit length of result array). The only difference is that it will not remove empty strings from the end of the result array. In other words
"fooXbarX".split("X",-1)
will return ["foo", "bar", ""]
Lets take a look at the case,
",".split(",").length
which (as explained earlier) is same as
",".split(",", 0).length
This means that we are using a version of split which will not limit the length of the result array, but will remove all trailing empty strings, "". You need to understand that when we split one thing we are always getting two things.
In other words, if we split "abc" in place of b, we will get "a" and "c".
The tricky part is to understand that if we split "abc" in c we will get "ab" and "" (empty string).
Using this logic, if we split "," on , we will get "" and "" (two empty strings).
You can check it using split with negative limit:
for (String s: ",".split(",", -1)){
System.out.println("\""+s+"\"");
}
which will print
""
""
So as we see result array here is at first ["", ""].
But since by default we are using limit set to 0, all trailing empty strings will be removed. In this case, the result array contains only trailing empty strings, so all of them will be removed, leaving you with empty array [] which has length 0.
To answer the case with
"".split(",").length
you need to understand that removing trailing empty strings makes sense only if such trailing empty strings ware result of splitting (and most probably are not needed).
So if there were not any places on which we could split, there is no chance that empty strings ware created, so there is no point in running this "cleaning" process.
This information is mentioned in documentation of split(String regex, int limit) method where you can read:
If the expression does not match any part of the input then the resulting array has just one element, namely this string.
You can also see this behaviour in source code of this method (from Java 8):
2316 public String[] split(String regex, int limit) {2317 /* fastpath if the regex is a2318 (1)one-char String and this character is not one of the2319 RegEx's meta characters ".$|()[{^?*+\\", or2320 (2)two-char String and the first char is the backslash and2321 the second is not the ascii digit or ascii letter.2322 */2323 char ch = 0;2324 if (((regex.value.length == 1 &&2325 ".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||2326 (regex.length() == 2 &&2327 regex.charAt(0) == '\\' &&2328 (((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&2329 ((ch-'a')|('z'-ch)) < 0 &&2330 ((ch-'A')|('Z'-ch)) < 0)) &&2331 (ch < Character.MIN_HIGH_SURROGATE ||2332 ch > Character.MAX_LOW_SURROGATE))2333 {2334 int off = 0;2335 int next = 0;2336 boolean limited = limit > 0;2337 ArrayList<String> list = new ArrayList<>();2338 while ((next = indexOf(ch, off)) != -1) {2339 if (!limited || list.size() < limit - 1) {2340 list.add(substring(off, next));2341 off = next + 1;2342 } else { // last one2343 //assert (list.size() == limit - 1);2344 list.add(substring(off, value.length));2345 off = value.length;2346 break;2347 }2348 }2349 // If no match was found, return this2350 if (off == 0)2351 return new String[]{this};2353 // Add remaining segment2354 if (!limited || list.size() < limit)2355 list.add(substring(off, value.length));2357 // Construct result2358 int resultSize = list.size();2359 if (limit == 0) {2360 while (resultSize > 0 && list.get(resultSize - 1).length() == 0) {2361 resultSize--;2362 }2363 }2364 String[] result = new String[resultSize];2365 return list.subList(0, resultSize).toArray(result);2366 }2367 return Pattern.compile(regex).split(this, limit);2368 }
where you can find
if (off == 0)
return new String[]{this};
fragment which means
if (off == 0) - if off (position from which method should start searching for next possible match for regex passed as split argument) is still 0 after iterating over entire string, we didn't find any match, so the string was not split
return new String[]{this}; - in that case let's just return an array with original string (represented by this).
Since "," couldn't be found in "" even once, "".split(",") must return an array with one element (empty string on which you invoked split). This means that the length of this array is 1.
BTW. Java 8 introduced another mechanism. It removes leading empty strings (if they ware created while splitting process) if we split using zero-length regex (like "" or with look-around (?<!x)). More info at: Why in Java 8 split sometimes removes empty strings at start of result array?
From the Java 1.7 Documentation
Splits the string around matches of the given regular expression.
split() method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.
In the Case 1 blank.split(",") does not match any part of the input then the resulting array has just one element, namely this String.
It will return entire String. So, the length will be 1.
In the Case 2 comma.split(",") will return empty.
split() expecting a regex as argument, return result array to matching with that regex.
So, the length is 0
For Example(Documentation)
The string "boo:and:foo", yields the following results with these expressions:
Regex Result
: { "boo", "and", "foo" }
o { "b", "", ":and:f" }
Parameters:
regex - the delimiting regular expression
Returns:
the array of strings computed by splitting this string around matches of the given regular expression
Throws:
PatternSyntaxException - if the regular expression's syntax is invalid
From String class javadoc for the public String[] split(String regex) method:
Splits this string around matches of the given regular expression.
This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.
In the first case, the expression does not match any part of the input so we got an array with only one element - the input.
In the second case, the expression matches input and split should return two empty strings; but, according to javadoc, they are discarded (because they are trailing and empty).
We can take a look into the source code of java.util.regex.Pattern which is behind String.split. Way down the rabbit hole the method
public String[] split(CharSequence input, int limit)
is invoked.
Input ""
For input "" this method is called as
String[] parts = split("", 0);
The intersting part of this method is:
int index = 0;
boolean matchLimited = limit > 0;
ArrayList<String> matchList = new ArrayList<>();
Matcher m = matcher(input);
while(m.find()) {
// Tichodroma: this will not happen for our input
}
// If no match was found, return this
if (index == 0)
return new String[] {input.toString()};
And that is what happens: new String[] {input.toString()} is returned.
Input ","
For input ","the intersting part is
// Construct result
int resultSize = matchList.size();
if (limit == 0)
while (resultSize > 0 && matchList.get(resultSize-1).equals(""))
resultSize--;
String[] result = new String[resultSize];
return matchList.subList(0, resultSize).toArray(result);
Here resultSize == 0 and limit == 0 so new String[0] is returned.
From JDK 1.7
public String[] split(String regex, int limit) {
/* fastpath if the regex is a
(1)one-char String and this character is not one of the
RegEx's meta characters ".$|()[{^?*+\\", or
(2)two-char String and the first char is the backslash and
the second is not the ascii digit or ascii letter.
*/
char ch = 0;
if (((regex.count == 1 &&
".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
(regex.length() == 2 &&
regex.charAt(0) == '\\' &&
(((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
((ch-'a')|('z'-ch)) < 0 &&
((ch-'A')|('Z'-ch)) < 0)) &&
(ch < Character.MIN_HIGH_SURROGATE ||
ch > Character.MAX_LOW_SURROGATE))
{
int off = 0;
int next = 0;
boolean limited = limit > 0;
ArrayList<String> list = new ArrayList<>();
while ((next = indexOf(ch, off)) != -1) {
if (!limited || list.size() < limit - 1) {
list.add(substring(off, next));
off = next + 1;
} else { // last one
//assert (list.size() == limit - 1);
list.add(substring(off, count));
off = count;
break;
}
}
// If no match was found, return this
if (off == 0)
return new String[] { this };
// Add remaining segment
if (!limited || list.size() < limit)
list.add(substring(off, count));
// Construct result
int resultSize = list.size();
if (limit == 0)
while (resultSize > 0 && list.get(resultSize-1).length() == 0)
resultSize--;
String[] result = new String[resultSize];
return list.subList(0, resultSize).toArray(result);
}
return Pattern.compile(regex).split(this, limit);
}
So for this case, the regex will be handled by the first if.
For the first case blank.split(",")
// If no match was found, return this
if (off == 0)
return new String[] { this };
So, this function will return an array which contains one element if there is no matched.
For the second case comma.split(",")
List<String> list = new ArrayList<>();
//...
int resultSize = list.size();
if (limit == 0)
while (resultSize > 0 && list.get(resultSize-1).length() == 0)
resultSize--;
String[] result = new String[resultSize];
return list.subList(0, resultSize).toArray(result);
As you notice, the last while loop has removed all empty element in the end of the list, so the resultSize is 0.
String blank = "";
String comma = ",";
System.out.println("Output1: "+blank.split(",").length); // case 1
System.out.println("Output2: "+comma.split(",").length); // case 2
case 1 - Here blank.split(",") will return "" since there is no , in blank you get the same, So length will be 1
case 2- Here comma.split(",") will return empty array, you have to scape , if you want to count comma with length 1 else length will be 0
Again comma.split(",") split() expecting a regex as argument it will return result array to matching with that regex.
The array returned by this method contains each substring of this
string that is terminated by another substring that matches the given
expression or is terminated by the end of the string.
Else
If the expression does not match any part of the input then the
resulting array has just one element, namely this string.
The API for the split method states that "If the expression does not match any part of the input then the resulting array has just one element, namely this string."
So, as the String blank doesn't contain a ",", a String[] with one element (i.e. blank itself) is returned.
For the String comma, "nothing" is left of the original string thus an empty array is returned.
This seems to be the best solution if you want to process the returned result, e. g.
String[] splits = aString.split(",");
for(String split: splits) {
// do something
}

How to remove leading and trailing whitespace from the string in Java?

I want to remove the leading and trailing whitespace from string:
String s = " Hello World ";
I want the result to be like:
s == "Hello world";
s.trim()
see String#trim()
Without any internal method, use regex like
s.replaceAll("^\\s+", "").replaceAll("\\s+$", "")
or
s.replaceAll("^\\s+|\\s+$", "")
or just use pattern in pure form
String s=" Hello World ";
Pattern trimmer = Pattern.compile("^\\s+|\\s+$");
Matcher m = trimmer.matcher(s);
StringBuffer out = new StringBuffer();
while(m.find())
m.appendReplacement(out, "");
m.appendTail(out);
System.out.println(out+"!");
String s="Test ";
s= s.trim();
I prefer not to use regular expressions for trivial problems. This would be a simple option:
public static String trim(final String s) {
final StringBuilder sb = new StringBuilder(s);
while (sb.length() > 0 && Character.isWhitespace(sb.charAt(0)))
sb.deleteCharAt(0); // delete from the beginning
while (sb.length() > 0 && Character.isWhitespace(sb.charAt(sb.length() - 1)))
sb.deleteCharAt(sb.length() - 1); // delete from the end
return sb.toString();
}
Use the String class trim method. It will remove all leading and trailing whitespace.
http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/String.html
String s=" Hello World ";
s = s.trim();
For more information See This
Simply use trim(). It only eliminate the start and end excess white spaces of a string.
String fav = " I like apple ";
fav = fav.trim();
System.out.println(fav);
Output:
I like apple //no extra space at start and end of the string
String.trim() answers the question but was not an option for me.
As stated here :
it simply regards anything up to and including U+0020 (the usual space character) as whitespace, and anything above that as non-whitespace.
This results in it trimming the U+0020 space character and all β€œcontrol code” characters below U+0020 (including the U+0009 tab character), but not the control codes or Unicode space characters that are above that.
I am working with Japanese where we have full-width characters οΌ¬ο½‰ο½‹ο½…γ€€ο½”ο½ˆο½‰ο½“, the full-width space would not be trimmed by String.trim().
I therefore made a function which, like xehpuk's snippet, use Character.isWhitespace().
However, this version is not using a StringBuilder and instead of deleting characters, finds the 2 indexes it needs to take a trimmed substring out of the original String.
public static String trimWhitespace(final String stringToTrim) {
int endIndex = stringToTrim.length();
// Return the string if it's empty
if (endIndex == 0) return stringToTrim;
int firstIndex = -1;
// Find first character which is not a whitespace, if any
// (increment from beginning until either first non whitespace character or end of string)
while (++firstIndex < endIndex && Character.isWhitespace(stringToTrim.charAt(firstIndex))) { }
// If firstIndex did not reach end of string, Find last character which is not a whitespace,
// (decrement from end until last non whitespace character)
while (--endIndex > firstIndex && Character.isWhitespace(stringToTrim.charAt(endIndex))) { }
// Return substring using indexes
return stringToTrim.substring(firstIndex, endIndex + 1);
}
s = s.trim();
More info:
http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#trim()
Why do you not want to use predefined methods? They are usually most efficient.
See String#trim() method
Since Java 11 String class has strip() method which is used to returns a string whose value is this string, with all leading and trailing white space removed. This is introduced to overcome the problem of trim method.
Docs: https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/String.html#strip()
Example:
String str = " abc ";
// public String strip()
str = str.strip(); // Returns abc
There are two more useful methods in Java 11+ String class:
stripLeading() : Returns a string whose value is this string,
with all leading white space removed.
stripTrailing() : Returns a string whose value is this string,
with all trailing white space removed.
While #xehpuk's method is good if you want to avoid using regex, but it has O(n^2) time complexity. The following solution also avoids regex, but is O(n):
if(s.length() == 0)
return "";
char left = s.charAt(0);
char right = s.charAt(s.length() - 1);
int leftWhitespace = 0;
int rightWhitespace = 0;
boolean leftBeforeRight = leftWhitespace < s.length() - 1 - rightWhitespace;
while ((left == ' ' || right == ' ') && leftBeforeRight) {
if(left == ' ') {
leftWhitespace++;
left = s.charAt(leftWhitespace);
}
if(right == ' ') {
rightWhitespace++;
right = s.charAt(s.length() - 1 - rightWhitespace);
}
leftBeforeRight = leftWhitespace < s.length() - 1 - rightWhitespace;
}
String result = s.substring(leftWhitespace, s.length() - rightWhitespace);
return result.equals(" ") ? "" : result;
This counts the number of trailing whitespaces in the beginning and end of the string, until either the "left" and "right" indices obtained from whitespace counts meet, or both indices have reached a non-whitespace character. Afterwards, we either return the substring obtained using the whitespace counts, or the empty string if the result is a whitespace (needed to account for all-whitespace strings with odd number of characters).

How to trim the whitespace from a string? [duplicate]

This question already has answers here:
Strip Leading and Trailing Spaces From Java String
(8 answers)
Closed 4 years ago.
I am writing this function for a J2ME application, so I don't have some of the more advanced / modern Java classes available to me. I am getting java.lang.ArrayIndexOutOfBoundsException on this. So, apparently either it doesn't like the way I've initialized the newChars array, or I'm not doing something correctly when calling System.arraycopy.
/*
* remove any leading and trailing spaces
*/
public static String trim(String str) {
char[] chars = str.toCharArray();
int len = chars.length;
// leading
while ( (len > 0 ) && ( chars[0] == ' ' ) ) {
char[] newChars = new char[] {}; // initialize empty array
System.arraycopy(chars, 1, newChars, 0, len - 1);
chars = newChars;
len = chars.length;
}
// TODO: trailing
return chars.toString();
}
The simple way to trim leading and trailing whitespace is to call String.trim(). With Java 11 and later, you can also use String.strip() which uses a different interpretation of what "white space" means1.
If you just want to trim just leading and trailing spaces (rather than all leading and trailing whitespace), there is an Apache commons method called StringUtils.strip(String, String) that can do this; call it with " " as the 2nd argument.
Your attempted code has a number of bugs, and is fundamentally inefficient. If you really want to implement this yourself, then you should:
count the leading space characters
count the trailing space characters
if either count is non-zero, call String.substring(from, end) to create a new string containing the characters you want to keep.
This approach avoids copying any characters2.
1 - The different meanings are explained in the respective javadocs. Alternatively, read the answers to Difference between String trim() and strip() methods in Java 11.
2 - Actually, that depends on the implementation of String. For some implementations there will be no copying, for others a single copy is made. But either is an improvement on your approach, which entails a minimum of 2 copies, and more if there are any characters to trim.
String.trim() is very old, at least to java 1.3. You don't have this?
Apache StringUtils.strip is the best answer here that works with all expected white space characters (not just space), and can be downloaded here:
Here's the relevant code ripped from this source file to implement it in your own class if you wanted, but really, just download and use StringUtils to get more bang for your buck! Note that you can use StringUtils.stripStart to trim any leading character from a java string as well.
public static final int INDEX_NOT_FOUND = -1
public static String strip(final String str) {
return strip(str, null);
}
public static String stripStart(final String str, final String stripChars) {
int strLen;
if (str == null || (strLen = str.length()) == 0) {
return str;
}
int start = 0;
if (stripChars == null) {
while (start != strLen && Character.isWhitespace(str.charAt(start))) {
start++;
}
} else if (stripChars.isEmpty()) {
return str;
} else {
while (start != strLen && stripChars.indexOf(str.charAt(start)) != INDEX_NOT_FOUND) {
start++;
}
}
return str.substring(start);
}
public static String stripEnd(final String str, final String stripChars) {
int end;
if (str == null || (end = str.length()) == 0) {
return str;
}
if (stripChars == null) {
while (end != 0 && Character.isWhitespace(str.charAt(end - 1))) {
end--;
}
} else if (stripChars.isEmpty()) {
return str;
} else {
while (end != 0 && stripChars.indexOf(str.charAt(end - 1)) != INDEX_NOT_FOUND) {
end--;
}
}
return str.substring(0, end);
}
public static String strip(String str, final String stripChars) {
if (str == null || str.length() == 0) {
return str;
}
str = stripStart(str, stripChars);
return stripEnd(str, stripChars);
}
First of all, what others said about String.trim(). Really, don't reinvent the wheel.
But for the record, what's going wrong with your code is that Java arrays aren't resizeable. When you initially set up your target array, you create it as a size 0 array. You then tell System.arraycopy to stuff len - 1 characters in there. That's not going to work. If you wanted it to work, you'd need to set up the array as:
char[] newChars = new char[len - 1];
But that's amazingly inefficient, reallocating and copying a new array each time through the loop. Use the three steps that Stephen C mentioned, ending with a substring.
With JDK/11, now you can make use of the String.strip API to return a string whose value is this string, with all leading and trailing whitespace removed. The javadoc for the same is :
/**
* Returns a string whose value is this string, with all leading
* and trailing {#link Character#isWhitespace(int) white space}
* removed.
* <p>
* If this {#code String} object represents an empty string,
* or if all code points in this string are
* {#link Character#isWhitespace(int) white space}, then an empty string
* is returned.
* <p>
* Otherwise, returns a substring of this string beginning with the first
* code point that is not a {#link Character#isWhitespace(int) white space}
* up to and including the last code point that is not a
* {#link Character#isWhitespace(int) white space}.
* <p>
* This method may be used to strip
* {#link Character#isWhitespace(int) white space} from
* the beginning and end of a string.
*
* #return a string whose value is this string, with all leading
* and trailing white space removed
*
* #see Character#isWhitespace(int)
*
* #since 11
*/
public String strip()
The sample cases for these could be:--
System.out.println("".strip());
System.out.println(" both ".strip());
System.out.println(" leading".strip());
System.out.println("trailing ".strip());
If you don't want to use String.trim() method, then it can be implemented like below. The logic will handle different scenarios like space, tab and other special characters.
public static String trim(String str){
int i=0;
int j = str.length();
char[] charArray = str.toCharArray();
while((i<j) && charArray[i] <=' '){
i++;
}
while((i<j) && charArray[j-1]<= ' '){
j--;
}
return str.substring(i, j+1);
}
public static void main(String[] args) {
System.out.println(trim(" abcd ght trip "));
}
The destination array newChars is not large enough to hold the values copied. You need to initialize it to the length of the data you intend to copy (so, length - 1).
You can use Guava CharMatcher.
String outputString = CharMatcher.whitespace().trimFrom(inputString);
Note: This works because whitespace is all in the BMP.

How can I check if a single character appears in a string?

In Java is there a way to check the condition:
"Does this single character appear at all in string x"
without using a loop?
You can use string.indexOf('a').
If the char a is present in string :
it returns the the index of the first occurrence of the character in
the character sequence represented by this object, or -1 if the
character does not occur.
String.contains() which checks if the string contains a specified sequence of char values
String.indexOf() which returns the index within the string of the first occurence of the specified character or substring (there are 4 variations of this method)
I'm not sure what the original poster is asking exactly. Since indexOf(...) and contains(...) both probably use loops internally, perhaps he's looking to see if this is possible at all without a loop? I can think of two ways off hand, one would of course be recurrsion:
public boolean containsChar(String s, char search) {
if (s.length() == 0)
return false;
else
return s.charAt(0) == search || containsChar(s.substring(1), search);
}
The other is far less elegant, but completeness...:
/**
* Works for strings of up to 5 characters
*/
public boolean containsChar(String s, char search) {
if (s.length() > 5) throw IllegalArgumentException();
try {
if (s.charAt(0) == search) return true;
if (s.charAt(1) == search) return true;
if (s.charAt(2) == search) return true;
if (s.charAt(3) == search) return true;
if (s.charAt(4) == search) return true;
} catch (IndexOutOfBoundsException e) {
// this should never happen...
return false;
}
return false;
}
The number of lines grow as you need to support longer and longer strings of course. But there are no loops/recurrsions at all. You can even remove the length check if you're concerned that that length() uses a loop.
You can use 2 methods from the String class.
String.contains() which checks if the string contains a specified sequence of char values
String.indexOf() which returns the index within the string of the first occurence of the specified character or substring or returns -1 if the character is not found (there are 4 variations of this method)
Method 1:
String myString = "foobar";
if (myString.contains("x") {
// Do something.
}
Method 2:
String myString = "foobar";
if (myString.indexOf("x") >= 0 {
// Do something.
}
Links by: Zach Scrivena
String temp = "abcdefghi";
if(temp.indexOf("b")!=-1)
{
System.out.println("there is 'b' in temp string");
}
else
{
System.out.println("there is no 'b' in temp string");
}
If you need to check the same string often you can calculate the character occurrences up-front. This is an implementation that uses a bit array contained into a long array:
public class FastCharacterInStringChecker implements Serializable {
private static final long serialVersionUID = 1L;
private final long[] l = new long[1024]; // 65536 / 64 = 1024
public FastCharacterInStringChecker(final String string) {
for (final char c: string.toCharArray()) {
final int index = c >> 6;
final int value = c - (index << 6);
l[index] |= 1L << value;
}
}
public boolean contains(final char c) {
final int index = c >> 6; // c / 64
final int value = c - (index << 6); // c - (index * 64)
return (l[index] & (1L << value)) != 0;
}}
To check if something does not exist in a string, you at least need to look at each character in a string. So even if you don't explicitly use a loop, it'll have the same efficiency. That being said, you can try using str.contains(""+char).
Is the below what you were looking for?
int index = string.indexOf(character);
return index != -1;
Yes, using the indexOf() method on the string class. See the API documentation for this method
String.contains(String) or String.indexOf(String) - suggested
"abc".contains("Z"); // false - correct
"zzzz".contains("Z"); // false - correct
"Z".contains("Z"); // true - correct
"πŸ˜€andπŸ˜€".contains("πŸ˜€"); // true - correct
"πŸ˜€andπŸ˜€".contains("πŸ˜‚"); // false - correct
"πŸ˜€andπŸ˜€".indexOf("πŸ˜€"); // 0 - correct
"πŸ˜€andπŸ˜€".indexOf("πŸ˜‚"); // -1 - correct
String.indexOf(int) and carefully considered String.indexOf(char) with char to int widening
"πŸ˜€andπŸ˜€".indexOf("πŸ˜€".charAt(0)); // 0 though incorrect usage has correct output due to portion of correct data
"πŸ˜€andπŸ˜€".indexOf("πŸ˜‚".charAt(0)); // 0 -- incorrect usage and ambiguous result
"πŸ˜€andπŸ˜€".indexOf("πŸ˜‚".codePointAt(0)); // -1 -- correct usage and correct output
The discussions around character is ambiguous in Java world
can the value of char or Character considered as single character?
No. In the context of unicode characters, char or Character can sometimes be part of a single character and should not be treated as a complete single character logically.
if not, what should be considered as single character (logically)?
Any system supporting character encodings for Unicode characters should consider unicode's codepoint as single character.
So Java should do that very clear & loud rather than exposing too much of internal implementation details to users.
String class is bad at abstraction (though it requires confusingly good amount of understanding of its encapsulations to understand the abstraction πŸ˜’πŸ˜’πŸ˜’ and hence an anti-pattern).
How is it different from general char usage?
char can be only be mapped to a character in Basic Multilingual Plane.
Only codePoint - int can cover the complete range of Unicode characters.
Why is this difference?
char is internally treated as 16-bit unsigned value and could not represent all the unicode characters using UTF-16 internal representation using only 2-bytes. Sometimes, values in a 16-bit range have to be combined with another 16-bit value to correctly define character.
Without getting too verbose, the usage of indexOf, charAt, length and such methods should be more explicit. Sincerely hoping Java will add new UnicodeString and UnicodeCharacter classes with clearly defined abstractions.
Reason to prefer contains and not indexOf(int)
Practically there are many code flows that treat a logical character as char in java.
In Unicode context, char is not sufficient
Though the indexOf takes in an int, char to int conversion masks this from the user and user might do something like str.indexOf(someotherstr.charAt(0))(unless the user is aware of the exact context)
So, treating everything as CharSequence (aka String) is better
public static void main(String[] args) {
System.out.println("πŸ˜€andπŸ˜€".indexOf("πŸ˜€".charAt(0))); // 0 though incorrect usage has correct output due to portion of correct data
System.out.println("πŸ˜€andπŸ˜€".indexOf("πŸ˜‚".charAt(0))); // 0 -- incorrect usage and ambiguous result
System.out.println("πŸ˜€andπŸ˜€".indexOf("πŸ˜‚".codePointAt(0))); // -1 -- correct usage and correct output
System.out.println("πŸ˜€andπŸ˜€".contains("πŸ˜€")); // true - correct
System.out.println("πŸ˜€andπŸ˜€".contains("πŸ˜‚")); // false - correct
}
Semantics
char can handle most of the practical use cases. Still its better to use codepoints within programming environment for future extensibility.
codepoint should handle nearly all of the technical use cases around encodings.
Still, Grapheme Clusters falls out of the scope of codepoint level of abstraction.
Storage layers can choose char interface if ints are too costly(doubled). Unless storage cost is the only metric, its still better to use codepoint. Also, its better to treat storage as byte and delegate semantics to business logic built around storage.
Semantics can be abstracted at multiple levels. codepoint should become lowest level of interface and other semantics can be built around codepoint in runtime environment.
package com;
public class _index {
public static void main(String[] args) {
String s1="be proud to be an indian";
char ch=s1.charAt(s1.indexOf('e'));
int count = 0;
for(int i=0;i<s1.length();i++) {
if(s1.charAt(i)=='e'){
System.out.println("number of E:=="+ch);
count++;
}
}
System.out.println("Total count of E:=="+count);
}
}
static String removeOccurences(String a, String b)
{
StringBuilder s2 = new StringBuilder(a);
for(int i=0;i<b.length();i++){
char ch = b.charAt(i);
System.out.println(ch+" first index"+a.indexOf(ch));
int lastind = a.lastIndexOf(ch);
for(int k=new String(s2).indexOf(ch);k > 0;k=new String(s2).indexOf(ch)){
if(s2.charAt(k) == ch){
s2.deleteCharAt(k);
System.out.println("val of s2 : "+s2.toString());
}
}
}
System.out.println(s1.toString());
return (s1.toString());
}
you can use this code. It will check the char is present or not. If it is present then the return value is >= 0 otherwise it's -1. Here I am printing alphabets that is not present in the input.
import java.util.Scanner;
public class Test {
public static void letters()
{
System.out.println("Enter input char");
Scanner sc = new Scanner(System.in);
String input = sc.next();
System.out.println("Output : ");
for (char alphabet = 'A'; alphabet <= 'Z'; alphabet++) {
if(input.toUpperCase().indexOf(alphabet) < 0)
System.out.print(alphabet + " ");
}
}
public static void main(String[] args) {
letters();
}
}
//Ouput Example
Enter input char
nandu
Output :
B C E F G H I J K L M O P Q R S T V W X Y Z
If you see the source code of indexOf in JAVA:
public int indexOf(int ch, int fromIndex) {
final int max = value.length;
if (fromIndex < 0) {
fromIndex = 0;
} else if (fromIndex >= max) {
// Note: fromIndex might be near -1>>>1.
return -1;
}
if (ch < Character.MIN_SUPPLEMENTARY_CODE_POINT) {
// handle most cases here (ch is a BMP code point or a
// negative value (invalid code point))
final char[] value = this.value;
for (int i = fromIndex; i < max; i++) {
if (value[i] == ch) {
return i;
}
}
return -1;
} else {
return indexOfSupplementary(ch, fromIndex);
}
}
you can see it uses a for loop for finding a character. Note that each indexOf you may use in your code, is equal to one loop.
So, it is unavoidable to use loop for a single character.
However, if you want to find a special string with more different forms, use useful libraries such as util.regex, it deploys stronger algorithm to match a character or a string pattern with Regular Expressions. For example to find an email in a string:
String regex = "^(.+)#(.+)$";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(email);
If you don't like to use regex, just use a loop and charAt and try to cover all cases in one loop.
Be careful recursive methods has more overhead than loop, so it's not recommended.
how about one uses this ;
let text = "Hello world, welcome to the universe.";
let result = text.includes("world");
console.log(result) ....// true
the result will be a true or false
this always works for me
You won't be able to check if char appears at all in some string without atleast going over the string once using loop / recursion ( the built-in methods like indexOf also use a loop )
If the no. of times you look up if a char is in string x is more way more than the length of the string than I would recommend using a Set data structure as that would be more efficient than simply using indexOf
String s = "abc";
// Build a set so we can check if character exists in constant time O(1)
Set<Character> set = new HashSet<>();
int len = s.length();
for(int i = 0; i < len; i++) set.add(s.charAt(i));
// Now we can check without the need of a loop
// contains method of set doesn't use a loop unlike string's contains method
set.contains('a') // true
set.contains('z') // false
Using set you will be able to check if character exists in a string in constant time O(1) but you will also use additional memory ( Space complexity will be O(n) ).

Categories

Resources