String.trim() removes not only spaces in Java

String.trim() removes not only spaces in Java - java

String.trim() in java removes all characters whose ascii value is less than or equal to 20 (space).
Any idea why Java did that instead of removing only space (ascii char 20)
public String trim() {
int len = count;
int st = 0;
int off = offset; /* avoid getfield opcode */
char[] val = value; /* avoid getfield opcode */
while ((st < len) && (val[off + st] <= ' ')) {
st++;
}
while ((st < len) && (val[off + len - 1] <= ' ')) {
len--;
}
return ((st > 0) || (len < count)) ? substring(st, len) : this;
}

Because there are many different ways of having empty space, besides that " " space character. Quoting the javadoc:
Returns a copy of the string, with leading and trailing whitespace omitted.
The javadoc is clear here: it is not about space but white space. Things that would show up as "empty" - but that are in fact different from plain " " empty strings.
In other words: this is a convenience method. Such methods are designed to provide that functionality that users do need/expect.
It would be absolutely counter-intuition to provide a trim() method that only works spaces.
A very typical scenario is: you receive some string. It could be entered by a user, it could be read from a file and represent a whole line. You are not interested in any trailing tabs, spaces, new line characters. Thus the fathers of the Java language give you a method to get rid of all these different characters easily. Instead of you calling trimSpaces(), trimTabs(), trimNewLines(), etc. pp.

The ASCII character code for space is actually 32, not 20. But if you look at the sorts of characters which come before 32 you will find many types of whitespace, such as tab and carriage return. The asssumption is the average user would want to strip all such whitespace surrounding a string.
To round out the answer given by #GhostCat here is a one-liner you can use to selectively trim only space:
String input = " Hello World! ";
input = input.replaceAll("[ ]*(.*)[ ]*", "$1");

The below one-liner works. The one given by #Tim Biegeleisen doesn't remove a trailing space.
String input = " Hello World! ";
input = input.replaceFirst("^\\s++", "").replaceFirst("\\s++$","");

Related

"number of segments in a string" not working for a particular input

I have to find "contiguous sequence of non-space characters" in a string.
My output is coming wrong for the input
Input=", , , , a, eaefa"
My answer is coming as 13 instead of 6.Though I have only counted words except for spaces.
class Solution {
public int countSegments(String s)
{
if(s.isEmpty()){
return 0;
}
else
{
int count=0;
String s1[]=s.split(" ");
for(int i=0;i<s1.length;i++)
{
if(s1[i]!=" ")
count++;
}
return count;
}
}
}

Others have suggesting using:
s.split("\\s+").length
However, there are complications in using split. Specifically, the above will give incorrect answers for strings with leading spaces. Even if these issues are fixed it's still overly expensive as we're creating count new strings, and an array to hold them.
We can implement countSegments directly by iterating through the string and counting the number of times we go from a non-space character to a space character, or the end of the string:
public static int countSegments(String s)
{
int count = 0;
for(int i=1; i<=s.length(); i++)
{
if((s.charAt(i-1) != ' ') && (i == s.length() || s.charAt(i) == ' ')) count++;
}
return count;
}
Test:
for(String s : new String[] {"", " ", "a", " a", "a ", " a ", ", , , , a, eaefa"})
System.out.format("<%s> : %d%n", s, countSegments(s));
Output:
<> : 0
< > : 0
<a> : 1
< a> : 1
<a > : 1
< a > : 1
<, , , , a, eaefa> : 6

You should use split on multiple spaces, and then you have the segments already divided up for you, so you don't need to make a for-loop or anything.
//The trim is because split gets messed up with leading spaces, as SirRaffleBuffle said
s = s.trim();
if (s.isEmpty()) return 0;
return s.split("\\s+").length;
If you want only sequences of alphanumeric characters, you can try this regex instead: "\\W+"
If you want only sequences of English letters, you can do the same thing but with the regex "[^A-Za-z]+".
Here, it splits on multiple spaces instead of just one.
The way you're currently doing it, you count every single letter that's not a whitespace instead of "contiguous sequences of no-space characters". That's why you're getting 13 instead of 6.
Notice that count is incremented anytime it finds something that isn't a space, but if you do want to do this with a for-loop, you should have a boolean flag telling you that you've entered a sequence, so you only increment count when that flat was previously false (you were outside a sequence) and then you find a space.
Also, using != for String comparison is wrong, you should use the equals method.

“number of segments in a string” not working for a particular input
You can do it easily by using the regex, \\s+ as follows:
public class Main {
public static void main(String[] args) {
String str = ", , , , a, eaefa";
str = str.trim();// Remove the leading and trailing space
System.out.println(str.isEmpty() ? 0 : str.split("\\s+").length);
}
}
Output:
6
The regex, \\s+ matches on one or more consecutive spaces.
On a side note, you are using != to compare strings, which is not correct. Note that == and != are used to compare the references, not the values.

Input:- I Love India Output:- I1 Love4 India5 [duplicate]

This question already has answers here:
How do I properly compare strings in C?
(10 answers)
Closed 3 years ago.
What I'm doing wrong here in this code?
Take input as a one string.than Print the length of each word.
for example, the length of i is 1 and length of Love is 4 so print the length after the each word.
include
include
int main()
{
int i,n,count=0;
char str[20];
gets(str);
n=strlen(str);
for(i=0;i<n;i++){
if(str[i]==" "){
printf("%d",count);
count=0;
}else{
printf("%c",str[i]);
count++;
}
}
return 0;
}

The line if(str[i]==" "){ is wrong.
" " is a string and it consists of two bytes: the space character and a terminating NUL character.
You should use if(str[i]==' '){ instead.
' ' is a character and you should compare it with str[i], which is also a character.
Also, it seems you forgot to print a space character after the numbers.
One more point is that you should print the length of the last word
even if there isn't a space character after the last word.
By the way, you shouldn't use gets(),
which has unavoidable risk of buffer overrun, deprecated in C99 and deleted from C11.
You should use fgets(), which takes buffer size, instead.
fgets() saves newline characters read to the buffer while gets() doesn't,
so you should remove the newline characters if you don't want them.
An example of corrected code:
#include <stdio.h>
#include<string.h>
int main()
{
int i,n,count=0;
char str[20 + 1]; // allocate one more character for a newline character
char* lf; // for searching for a newline character
fgets(str, sizeof(str), stdin); // use fgets() instead if gets()
if ((lf = strchr(str, '\n')) != NULL) *lf = '\0'; // remove a newline character if one exists
n=strlen(str);
for(i=0;i<=n;i++){ // change < to <= for processing the terminating NUL character
if(str[i]==' ' || str[i]=='\0'){ // compare with a character, not a string
if (count>0) printf("%d",count); // avoid duplicate printing for duplicate space characters
if(i+1<n) printf(" "); // print a space if the string continues
count=0;
}else{
printf("%c",str[i]);
count++;
}
}
return 0;
}

A few issues in the code:
Comparing a character with a string literal.
No space after printing a particular word and its length.
For the last word, the comparison has to be with a null-terminator.
Not taking the null-terminator into account in the condition check of the for loop.
Making these changes, you will get the required output.
Demo here.
Note: gets is deprecated and considered dangerous to use. Use fgets instead.

Counting the occurrences of string in Java using string.split()

I'm new to Java. I thought I would write a program to count the occurrences of a character or a sequence of characters in a sentence. I wrote the following code. But I then saw there are some ready-made options available in Apache Commons.
Anyway, can you look at my code and say if there is any rookie mistake? I tested it for a couple of cases and it worked fine. I can think of one case where if the input is a big text file instead of a small sentence/paragraph, the split() function may end up being problematic since it has to handle a large variable. However this is my guess and would love to have your opinions.
private static void countCharInString() {
//Get the sentence and the search keyword
System.out.println("Enter a sentence\n");
Scanner in = new Scanner(System.in);
String inputSentence = in.nextLine();
System.out.println("\nEnter the character to search for\n");
String checkChar = in.nextLine();
in.close();
//Count the number of occurrences
String[] splitSentence = inputSentence.split(checkChar);
int countChar = splitSentence.length - 1;
System.out.println("\nThe character/sequence of characters '" + checkChar + "' appear(s) '" + countChar + "' time(s).");
}
Thank you :)

Because of edge cases, split() is the wrong approach.
Instead, use replaceAll() to remove all other characters then use the length() of what's left to calculate the count:
int count = input.replaceAll(".*?(" + check + "|$)", "$1").length() / check.length();
FYI, the regex created (for example when check = 'xyz'), looks like ".*?(xyz|$)", which means "everything up to and including 'xyz' or end of input", and is replaced by the captured text (either `'xyz' or nothing if it's end of input). This leaves just a string of 0-n copies the check string. Then dividing by the length of check gives you the total.
To protect against the check being null or zero-length (causing a divide-by-zero error), code defensively like this:
int count = check == null || check.isEmpty() ? 0 : input.replaceAll(".*?(" + check + "|$)", "$1").length() / check.length();

A flaw that I can immediately think of is that if your inputSentence only consists of a single occurrence of checkChar. In this case split() will return an empty array and your count will be -1 instead of 1.
An example interaction:
Enter a sentence
onlyme
Enter the character to search for
onlyme
The character/sequence of characters 'onlyme' appear(s) '-1' time(s).
A better way would be to use the .indexOf() method of String to count the occurrences like this:
while ((i = inputSentence.indexOf(checkChar, i)) != -1) {
count++;
i = i + checkChar.length();
}

split is the wrong approach for a number of reasons:
String.split takes a regular expression
Regular expressions have characters with special meanings, so you cannot use it for all characters (without escaping them). This requires an escaping function.
Performance String.split is optimized for single characters. If this were not the case, you would be creating and compiling a regular expression every time. Still, String.split creates one object for the String[] and one object for each String in it, every time that you call it. And you have no use for these objects; all you want to know is the count. Although a future all-knowing HotSpot compiler might be able to optimize that away, the current one does not - it is roughly 10 times as slow as simply counting characters as below.
It will not count correctly if you have repeating instances of your checkChar
A better approach is much simpler: just go and count the characters in the string that match your checkChar. If you think about the steps you need to take count characters, that's what you'd end up with by yourself:
public static int occurrences(String str, char checkChar) {
int count = 0;
for (int i = 0, l = str.length(); i < l; i++) {
if (str.charAt(i) == checkChar)
count++;
}
return count;
}
If you want to count the occurrence of multiple characters, it becomes slightly tricker to write with some efficiency because you don't want to create a new substring every time.
public static int occurrences(String str, String checkChars) {
int count = 0;
int offset = 0;
while ((offset = str.indexOf(checkChars, offset)) != -1) {
offset += checkChars.length();
count++;
}
return count;
}
That's still 10-12 times as fast to match a two-character string than String.split()
Warning: Performance timings are ballpark figures that depends on many circumstances. Since the difference is an order of magnitude, it's safe to say that String.split is slower in general. (Tests performed on jdk 1.8.0-b28 64-bit, using 10 million iterations, verified that results were stable and the same with and without -Xcomp, after performing tests 10 times in same JVM instances.)

JAVA: Space delimiting all non-numerical characters in a String

I am having some trouble with modifying Strings to be space delimited under the special case of adding spaces to all non-numerical characters.
My code must take a string representing a math equation, and split it up into it's individual parts. It does so using space delimits between values This part works great if the string is already delimited.
The problem is that I do not always get a space delimited input. To deal with this, I want to first insert these spaces so that the array is created properly.
What my code must do is take any character that is NOT a number, and add a space before and after it.
Something like this:
3*24+321 becomes 3 * 24 + 321
or
((3.0)*(2.5)) becomes ( ( 3.0 ) * ( 2.5 ) )
Obviously I need to avoid inserting space in the numbers, or 2.5 becomes 2 . 5, and then gets entered into the array as 3 elements. which it is not.
So far, I have tried using
String InputLineDelmit = InputLine.replaceAll("\B", " ");
which successfully changes a string of all letters "abcd" to "a b c d"
But it makes mistakes when it runs into numbers. Using this method, I have gotten that:
(((1)*(2))) becomes ( ( (1) * (2) ) ) ---- * The numbers must be separate from parens
12.7+3.1 becomes 1 2.7+3.1 ----- * 12.7 is split
51/3 becomes 5 1/3 ----- * same issue
and 5*4-2 does not change at all.
So, I know that \D can be used as a regular expression for all non-numbers in java. However, my attempts to implement this (by replacing, or combining it with \B above) have led either to compiler errors or it REPLACING the char with a space, not adding one.
EDIT:
==== Answered! ====
It wont let me add my own answer because I'm new, but an edit to neo108's code below (which, itself, does not work) did the job. What i did was change it to check isDigit, not isLetter, and then do nothing in that case (or in the special case of a decimal, for doubles). Else, the character is changed to have spaces on either side.
public static void main(String[] args){
String formula = "12+((13.0)*(2.5)-17*2)+(100/3)-7";
StringBuilder builder = new StringBuilder();
for (int i = 0; i < formula.length(); i++){
char c = formula.charAt(i);
char cdot = '.';
if(Character.isDigit(c) || c == cdot) {
builder.append(c);
}
else {
builder.append(" "+c+" ");
}
}
System.out.println("OUTPUT:" + builder);
}
OUTPUT: 12 + ( ( 13.0 ) * ( 2.5 ) - 17 * 2 ) + ( 100 / 3 ) - 7
However, any ideas on how to do this more succinctly, and also a decent explanation of StringBuilders, would be appreciated. Namely what is with this limit of 16 chars that I read about on javadocs, as the example above shows that you CAN have more output.

Something like this should work...
String formula = "Ab((3.0)*(2.5))";
StringBuilder builder = new StringBuilder();
for (int i = 0; i < formula.length(); i++){
char c = formula.charAt(i);
if(Character.isLetter(c)) {
builder.append(" "+c+" ");
} else {
builder.append(c);
}
}

Define the operations in your math equation + - * / () etc
Convert your equation string to char[]
Traverse through the char[] one char at a time and append the read char to a StringBuilder object.
If you encounter any character matching with the operations defined, then add a space before and after that character and then append this t o the StringBuilder object.
Well this is one of the algorithm you can implement. There might be other ways of doing it as well.

How to remove leading and trailing whitespace from the string in Java?

I want to remove the leading and trailing whitespace from string:
String s = " Hello World ";
I want the result to be like:
s == "Hello world";

s.trim()
see String#trim()
Without any internal method, use regex like
s.replaceAll("^\\s+", "").replaceAll("\\s+$", "")
or
s.replaceAll("^\\s+|\\s+$", "")
or just use pattern in pure form
String s=" Hello World ";
Pattern trimmer = Pattern.compile("^\\s+|\\s+$");
Matcher m = trimmer.matcher(s);
StringBuffer out = new StringBuffer();
while(m.find())
m.appendReplacement(out, "");
m.appendTail(out);
System.out.println(out+"!");

String s="Test ";
s= s.trim();

I prefer not to use regular expressions for trivial problems. This would be a simple option:
public static String trim(final String s) {
final StringBuilder sb = new StringBuilder(s);
while (sb.length() > 0 && Character.isWhitespace(sb.charAt(0)))
sb.deleteCharAt(0); // delete from the beginning
while (sb.length() > 0 && Character.isWhitespace(sb.charAt(sb.length() - 1)))
sb.deleteCharAt(sb.length() - 1); // delete from the end
return sb.toString();
}

Use the String class trim method. It will remove all leading and trailing whitespace.
http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/String.html

String s=" Hello World ";
s = s.trim();
For more information See This

Simply use trim(). It only eliminate the start and end excess white spaces of a string.
String fav = " I like apple ";
fav = fav.trim();
System.out.println(fav);
Output:
I like apple //no extra space at start and end of the string

String.trim() answers the question but was not an option for me.
As stated here :
it simply regards anything up to and including U+0020 (the usual space character) as whitespace, and anything above that as non-whitespace.
This results in it trimming the U+0020 space character and all “control code” characters below U+0020 (including the U+0009 tab character), but not the control codes or Unicode space characters that are above that.
I am working with Japanese where we have full-width characters Ｌｉｋｅ　ｔｈｉｓ, the full-width space would not be trimmed by String.trim().
I therefore made a function which, like xehpuk's snippet, use Character.isWhitespace().
However, this version is not using a StringBuilder and instead of deleting characters, finds the 2 indexes it needs to take a trimmed substring out of the original String.
public static String trimWhitespace(final String stringToTrim) {
int endIndex = stringToTrim.length();
// Return the string if it's empty
if (endIndex == 0) return stringToTrim;
int firstIndex = -1;
// Find first character which is not a whitespace, if any
// (increment from beginning until either first non whitespace character or end of string)
while (++firstIndex < endIndex && Character.isWhitespace(stringToTrim.charAt(firstIndex))) { }
// If firstIndex did not reach end of string, Find last character which is not a whitespace,
// (decrement from end until last non whitespace character)
while (--endIndex > firstIndex && Character.isWhitespace(stringToTrim.charAt(endIndex))) { }
// Return substring using indexes
return stringToTrim.substring(firstIndex, endIndex + 1);
}

s = s.trim();
More info:
http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#trim()
Why do you not want to use predefined methods? They are usually most efficient.

See String#trim() method

Since Java 11 String class has strip() method which is used to returns a string whose value is this string, with all leading and trailing white space removed. This is introduced to overcome the problem of trim method.
Docs: https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/String.html#strip()
Example:
String str = " abc ";
// public String strip()
str = str.strip(); // Returns abc
There are two more useful methods in Java 11+ String class:
stripLeading() : Returns a string whose value is this string,
with all leading white space removed.
stripTrailing() : Returns a string whose value is this string,
with all trailing white space removed.

While #xehpuk's method is good if you want to avoid using regex, but it has O(n^2) time complexity. The following solution also avoids regex, but is O(n):
if(s.length() == 0)
return "";
char left = s.charAt(0);
char right = s.charAt(s.length() - 1);
int leftWhitespace = 0;
int rightWhitespace = 0;
boolean leftBeforeRight = leftWhitespace < s.length() - 1 - rightWhitespace;
while ((left == ' ' || right == ' ') && leftBeforeRight) {
if(left == ' ') {
leftWhitespace++;
left = s.charAt(leftWhitespace);
}
if(right == ' ') {
rightWhitespace++;
right = s.charAt(s.length() - 1 - rightWhitespace);
}
leftBeforeRight = leftWhitespace < s.length() - 1 - rightWhitespace;
}
String result = s.substring(leftWhitespace, s.length() - rightWhitespace);
return result.equals(" ") ? "" : result;
This counts the number of trailing whitespaces in the beginning and end of the string, until either the "left" and "right" indices obtained from whitespace counts meet, or both indices have reached a non-whitespace character. Afterwards, we either return the substring obtained using the whitespace counts, or the empty string if the result is a whitespace (needed to account for all-whitespace strings with odd number of characters).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

String.trim() removes not only spaces in Java - java

The below one-liner works. The one given by #Tim Biegeleisen doesn't remove a trailing space. String input = " Hello World! "; input = input.replaceFirst("^\\s++", "").replaceFirst("\\s++$","");

Related

"number of segments in a string" not working for a particular input

Input:- I Love India Output:- I1 Love4 India5 [duplicate]

Counting the occurrences of string in Java using string.split()

JAVA: Space delimiting all non-numerical characters in a String

How to remove leading and trailing whitespace from the string in Java?

Categories

Resources