Finding the number of words in a string [duplicate]

Finding the number of words in a string [duplicate] - java

This question already has answers here:
how to count the exact number of words in a string that has empty spaces between words?
(9 answers)
Closed 9 years ago.
I can't seem to figure out why this doesn't work, but I may have just missed some simple logic. The method doesn't seem to find the last word when there isn't a space after it, so i'm guessing something is wrong with i == itself.length() -1 , but it seems to me that it would return true; you're on the last character and it isn't a whitespace.
public void numWords()
{
int numWords = 0;
for (int i = 1; i <= itself.length()-1; i ++)
{
if (( i == (itself.length() - 1) || itself.charAt (i) <= ' ') && itself.charAt(i-1) > ' ')
numWords ++;
}
System.out.println(numWords);
}
itself is the string. I am comparing the characters the way I am because that's how it is shown in the book, but please let me know if there are better ways.

Naïve approach: treat everything that has a space following it as a word. With that, simply count the number of elements as the result of a String#split operation.
public int numWords(String sentence) {
if(null != sentence) {
return sentence.split("\\s").length;
} else {
return 0;
}
}

Try,
int numWords = (itself==null) ? 0 : itself.split("\\s+").length;

So basically what it seems you're trying to do it to count all chunks of whitespace in a string. I'll fix up your code and use my head compiler to help you out with the problems you're experiencing.
public void numWords()
{
int numWords = 0;
// Don't check the last character as it doesn't matter if it's ' '
for (int i = 1; i < itself.length() - 1; i++)
{
// If the char is space and the next one isn't, count a new word
if (itself.charAt(i) == ' ' && itself.charAt(i - 1) != ' ') {
numWords++;
}
}
System.out.println(numWords);
}
This is a very naive algorithm and fails in a few cases, if the string ends in multiple spaces for example 'hello world ', it would count 3 words.
Note that if I was going to implement such a method I would go with a regex approach similar to Makoto's answer in order to simplify the code.

The following code fragment does job better:
if(sentence == null) {
return 0;
}
sentence = sentence.trim();
if ("".equals(sentence)) {
return 0;
}
return sentence.split("\\s+").length;
The regex \\s+ works correctly in case of several spaces. trim()
removes trailng and leading spaces Additional empty line check
prevents result 1 for empty string.

Related

split regular expression string by main "or" operators

I just wanted to ask if there is an easy way to do this, before I start building a fully fletched regex interpreter or at least a quite big state machine, just to figure out what degree the or operators have and where to split. To make things clearer let's put a random example here:
String regex = "test (1|2|3)|testing\\||tester\\nNextLine[ab|]|(test)";
The result I want is the following, spliting the regex by its main or operators:
String[] result = { "test (1|2|3)", "testing\\|", "tester\\nNextLine[ab|]", "(test)" };
As mentioned I already have some ideas on complex solutions that involve going through the string char by char, skipping escaped characters, figuring out where all the brackets open and close, on what bracket-level that character is, adding the indices of those level 0 '|' characters to a list and splitting the string by those indices, but I am searching for a simple one- or two-liner
aka a more beautiful solution. Is there one?
To clarify this even further - I want all alternatives like this in one string array
UPDATE: Not the most beautiful version, but I actually implemented something like a state machine for this now:
private ArrayList<String> parseFilters(String regex) {
ArrayList<Integer> indices = new ArrayList<>();
Stack<Integer> brackets = new Stack<>();
int level = 0;
int bracketType = -1;
char lastChar = ' ';
char currentChar = ' ';
for (int i = 0; i < regex.length(); i++) {
currentChar = regex.charAt(i);
if (lastChar == '\\' || "^$?*+".indexOf(currentChar) >= 0)
;
else if (level == 0 && "|".indexOf(currentChar) >= 0)
indices.add(i + 1);
else if ((bracketType = "([{".indexOf(currentChar)) >= 0) {
brackets.push(bracketType);
level++;
} else if ((bracketType = ")]}".indexOf(currentChar)) >= 0) {
if (bracketType == brackets.peek()) {
brackets.pop();
level--;
}
}
lastChar = currentChar;
}
ArrayList<String> results = new ArrayList<>();
int lastIndex = 0;
for (int i : indices)
results.add(regex.substring(lastIndex, (lastIndex = i) - 1));
results.add(regex.substring(lastIndex));
return results;
}

Here is a proof of concept Java program splitting on double || and leaving single | untouched.
This would be more complicated to achieve with regex.
We have to double escape each pipe because the pattern is parsed twice, once when it is loaded into the variable and again when it is used as a pattern. \\|\\| is thus reduced to ||.
class split{
public static void main(String[] args){
String lineTest = "test (1|2|3)|testing\\||tester\\nNextLine[ab||]|(test)";
String separated[] =
lineTest.split("\\|\\|");
for ( int i = 0; i < separated.length;i++){
System.out.println( separated[i]);
}
}
}
The output is:
test (1|2|3)|testing\
tester\nNextLine[ab
]|(test)

Remove consecutive duplicate characters from a String

I'm trying to remove duplicate characters from a string recursively.
I don't know how to fix this code to remain the first character when characters have different cases.
/**
* Remove consecutive duplicate characters from a String. <br>
* Case should not matter, if two or more consecutive duplicate <br>
* characters have different cases, then the first letter should be kept.
* #param word A word with possible consecutive duplicate characters.
* #return A word without any consecutive duplicate characters.
*/
public static String dedupeChars(String word){
if ( word.length() <= 1 )
return word;
if( word.substring(0,1).equalsIgnoreCase(word.substring(1,2)) )
return dedupeChars(word.substring(1));
else
return word.substring(0,1) + dedupeChars(word.substring(1));
}

You were on the right track, but your logic was a bit off. Consider this version, with explanation below the code:
public static String dedupeChars(String word) {
if (word.length() <= 1) {
return word;
}
if (word.substring(0,1).equalsIgnoreCase(word.substring(1,2))) {
return dedupeChars(word.substring(0, 1) + word.substring(2));
}
else {
return word.substring(0,1) + dedupeChars(word.substring(1));
}
}
System.out.println(dedupeChars("aaaaBbBBBbCDdefghiIIiJ"));
This prints:
aBCDefghiJ
For an explanation of the algorithm, your base case was correct, and for a single character word, we just return than character. For the case where the first character be identical to the second one, we splice out that second character and then recursively call dedupeChars() again. For example, here is what happens with the input string shown above:
aaaaBbBBBbCDdefghiIIiJ
aaaBbBBBbCDdefghiIIiJ
aaBbBBBbCDdefghiIIiJ
aBbBBBbCDdefghiIIiJ
That is, we splice out duplicates, always retaining the first occurrence, until there are no more duplicates.
By the way, in practice you might instead want to use regex here, for a much more concise solution:
String input = "aaaaBbBBBbCDdefghiIIiJ";
input = input.replaceAll("(?i)(.)\\1+", "$1");
System.out.println(input);
This prints:
aBCDefghiJ
Here we just tell the regex engine to remove all duplicates of any single letter, retaining only the first letter in the series.

I have a different way to achieve your purpose
and I think your code is too expensive to remove duplicate characters(ignore uppercase or lowercase,just keep the first one).
public static String removeDup(String s) {
char[] chars = s.toCharArray();
StringBuilder sb = new StringBuilder();
for (int i = chars.length - 1; i > 0; i--) {
if (chars[i] == chars[i - 1]) {
continue;
}
if (chars[i] < 97) {
if (chars[i] == (chars[i - 1] - 32)) {
continue;
}
} else {
if (chars[i] == (chars[i - 1] + 32)) {
continue;
}
}
sb.append(chars[i]);
}
sb.append(chars[0]);
return sb.reverse().toString();}
For the input "aaaaBbBBBbCDdefghiIIiJ" the output will be "aBCDefghiJ"

To find smallest word in a string in java

This is the code that i have written for finding the smallest word in a string but whenever i try to run it in eclipse it shows me an (String index out of range -2147483648) error in nested while statement, that i had marked , i do not understand the cause of it since my program seems to be running well in the range i.e less than length of the input string.
Thanks in advance!!
import java.util.Scanner;
public class Minword {
public static String minLengthWord(String input){
// Write your code here
int count[]=new int[50],i,j=0,len=input.length();
String output = "";
for(i=0;i<len;i++)
{
if(input.charAt(i)!=' ')
{
count[j]++;
}
else
j++;
}
int minidx=0;
for(i=1;i<j;i++)
{
if(count[minidx]>count[i])
minidx=i;
}
int words=0;
i=0;
while(words<=minidx)
{
if(words==minidx)
{
***while(i<len && input.charAt(i)!=' ')***
{
output+=input.charAt(i);
i++;
}
}
else if(i<len && input.charAt(i)==' ')
words++;
i++;
}
return output;
}
public static void main(String[] args) {
Scanner s=new Scanner(System.in);
String input,output;
input=s.nextLine();
output=minLengthWord(input);
}
}

I have problems following your code, but to get the shortest word's length, you can use a Stream and min(). Your minLengthWord method could be like:
String f = "haha hah ha jajaja";
OptionalInt shortest = Arrays.stream(f.split(" ")).mapToInt(String::length).min();
System.out.println(shortest.getAsInt());

You are using the variable i, which is a signed int, so it ranges from -2147483648 to 2147483647.
The following case shows your problem:
i = 2147483647;
i++;
After the increment, i's value will be -2147483648 due to a int overflow. Check this question.
It seems you are getting a huge input, thus it is causing the problem.

Well, -2147483648 is the maximal integer + 1. You have a wrap around. The variable i got so big that it start on the negative side again.
You have to use a long if you want to process texts that are larger than 2 GB.

while(words<=minidx)
{
if(words==minidx)
{
***while(i<len && input.charAt(i)!=' ')***
{
output+=input.charAt(i);
i++;
}
}
else if(i<len && input.charAt(i)==' ')
words++;
i++;
}
Your problem is you when words and minidx are both 0, your outer while loop is always true and words are always equal to minidx, and i keeps increasing until reaches its maximum number.
you need to add break after your inner while loop and secondly, you need to change i<j to i<=j
Below is the corrected code:
int minidx = 0;
for (i = 1; i <= j; i++) { //-------------------------> change i<j to i<=j
if (count[minidx] > count[i])
minidx = i;
}
int words = 0;
i = 0;
System.out.println(minidx);
while (words <= minidx) {
if (words == minidx) {
while (i < len && input.charAt(i) != ' ') {
output += input.charAt(i);
i++;
}
break; //-------------------------> add break statement here.
} else if (i < len && input.charAt(i) == ' ') {
words++;
}
i++;
}

When I tried running your code with an input of "Hello World", minidx was 0 before the while loop. words is also 0, so words<=minidx is true and the loop is entered. words==minidx is true (they're both 0), so the if statement is entered. Because it never enters the else if (which is the only place words is changed), words is always 0. So the loop becomes an infinite loop. In the meantime, i just keeps growing, until it overflows and becomes negative.

Here's a version that makes use of Java 8's Stream API:
Remove all your code from minLengthWord Method and paste below code it will work and resolve your runtime issue too
List<String> words = Arrays.asList(input.split(" "));
String shortestWord = words.stream().min(
Comparator.comparing(
word -> word.length()))
.get();
System.out.println(shortestWord);

How to count number of letters in sentence

I'm looking for simple way to find the amount of letters in a sentence.
All I was finding during research were ways to find a specific letter, but not from all kinds.
How I do that?
What I currently have is:
sentence = the sentence I get from the main method
count = the number of letters I want give back to the main method
public static int countletters(String sentence) {
// ....
return(count);
}

You could manually parse the string and count number of characters like:
for (index = 1 to string.length()) {
if ((value.charAt(i) >= 'A' && value.charAt(i) <= 'Z') || (value.charAt(i) >= 'a' && value.charAt(i) <= 'z')) {
count++;
}
}
//return count

A way to do this could stripping every unwanted character from the String and then check it's length. This could look like this:
public static void main(String[] args) throws Exception {
final String sentence = " Hello, this is the 1st example sentence!";
System.out.println(countletters(sentence));
}
public static int countletters(String sentence) {
final String onlyLetters = sentence.replaceAll("[^\\p{L}]", "");
return onlyLetters.length();
}
The stripped String looks like:
Hellothisisthestexamplesentence
And the length of it is 31.
This code uses String#replaceAll which accepts a Regular Expression and it uses the category \p{L} which matches every letter in a String. The construct [^...] inverts that, so it replaces every character which is not a letter with an empty String.
Regular Expressions can be expensive (for the performance) and if you are bound to have the best performance, you can try to use other methods, like iterating the String, but this solution has the much cleaner code. So if clean code counts more for you here, then feel free to use this.
Also mind that \\p{L} detects unicode letters, so this will also correctly treat letters from different alphabets, like cyrillic. Other solutions currently only support latin letters.

SMA's answer does the job, but it can be slightly improved:
public static int countLetters(String sentence) {
int count = 0;
for (int i = 0; i < sentence.length; i ++)
{
char c = Character.toUpperCase(value.charAt(i));
if (c >= 'A' && c <= 'Z')
count ++;
}
return count;
}

This is so much easy if you use lambda expression:
long count = sentence.chars().count();
working example here: ideone

use the .length() method to get the length of the string, the length is the amount of characters it contains without the nullterm
if you wish to avoid spaces do something like
String input = "The quick brown fox";
int count = 0;
for (int i=0; i<input.length(); i++) {
if (input.charAt(i) != ' ') {
++count;
}
}
System.out.println(count);
if you wish to avoid other white spaces use a regex, you can refer to this question for more details

import java.util.Scanner;
public class Main {
public static void main(String[] args){
Scanner sc=new Scanner(System.in);
String str = sc.nextLine();
int count = 0;
for (int i = 0; i < str.length(); i++) {
if (Character.isLetter(str.charAt(i)))
count++;
}
System.out.println(count);
}
}

Given a string find the first embedded occurrence of an integer

This was asked in an interview:
Given in any string, get me the first occurence of an integer.
For example
Str98 then it should return 98
Str87uyuy232 -- it should return 87
I gave the answer as loop through the string and compared it with numeric characters, as in
if ((c >= '0') && (c <= '9'))
Then I got the index of the number, parsed it and returned it. Somehow he was not convinced.
Can any one share the best possible solution?

With a regex, it's pretty simple:
String s = new String("Str87uyuy232");
Matcher matcher = Pattern.compile("\\d+").matcher(s);
matcher.find();
int i = Integer.valueOf(matcher.group());
(Thanks to Eric Mariacher)

Using java.util.Scanner :
int res = new Scanner("Str87uyuy232").useDelimiter("\\D+").nextInt();
The purpose of a Scanner is to extract tokens from an input (here, a String). Tokens are sequences of characters separated by delimiters. By default, the delimiter of a Scanner is the whitespace, and the tokens are thus whitespace-delimited words.
Here, I use the delimiter \D+, which means "anything that is not a digit". The tokens that our Scanner can read in our string are "87" and "232". The nextInt() method will read the first one.
nextInt() throws java.util.NoSuchElementException if there is no token to read. Call the method hasNextInt() before calling nextInt(), to check that there is something to read.

There are two issues with this solution.
Consider the test cases - there are 2 characters '8' and '7', and they both form the integer 87 that you should be returning. (This is the main issue)
This is somewhat pedantic, but the integer value of the character '0' isn't necessarily less than the value of '1', '2', etc. It probably almost always is, but I imagine interviewers like to see this sort of care. A better solution would be
if (Character.isDigit(c)) { ... }
There are plenty of different ways to do this. My first thought would be:
int i = 0;
while (i < string.length() && !Character.isDigit(string.charAt(i))) i++;
int j = i;
while (j < string.length() && Character.isDigit(string.charAt(j))) j++;
return Integer.parseInt(string.substring(i, j)); // might be an off-by-1 here
Of course, as mentioned in the comments, using the regex functionality in Java is likely the best way to do this. But of course many interviewers ask you to do things like this without libraries, etc...

String input = "Str87uyuy232";
Matcher m = Pattern.compile("[^0-9]*([0-9]+).*").matcher(input);
if (m.matches()) {
System.out.println(m.group(1));
}

Just in case you wanted non-regex and not using other utilities.
here you go
public static Integer returnInteger(String s)
{
if(s== null)
return null;
else
{
char[] characters = s.toCharArray();
Integer value = null;
boolean isPrevDigit = false;
for(int i =0;i<characters.length;i++)
{
if(isPrevDigit == false)
{
if(Character.isDigit(characters[i]))
{
isPrevDigit = true;
value = Character.getNumericValue(characters[i]);
}
}
else
{
if(Character.isDigit(characters[i]))
{
value = (value*10)+ Character.getNumericValue(characters[i]);
}
else
{
break;
}
}
}
return value;
}
}

You could go to a lower level too. A quick look at ASCII values reveals that alphabetical characters start at 65. Digits go from 48 - 57. With that being the case, you can simply 'and' n character against 127 and see if that value meets a threshold, 48 - 57.
char[] s = "Str87uyuy232".toCharArray();
String fint = "";
Boolean foundNum = false;
for (int i = 0; i < s.length; i++)
{
int test = s[i] & 127;
if (test < 58 && test > 47)
{
fint += s[i];
foundNum = true;
}
else if (foundNum)
break;
}
System.out.println(fint);
Doing this wouldn't be good for the real world (different character sets), but as a puzzle solution is fun.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Finding the number of words in a string [duplicate] - java

Naïve approach: treat everything that has a space following it as a word. With that, simply count the number of elements as the result of a String#split operation. public int numWords(String sentence) { if(null != sentence) { return sentence.split("\\s").length; } else { return 0; } }

Try, int numWords = (itself==null) ? 0 : itself.split("\\s+").length;

Related

split regular expression string by main "or" operators

Remove consecutive duplicate characters from a String

To find smallest word in a string in java

How to count number of letters in sentence

Given a string find the first embedded occurrence of an integer

Categories

Resources