Exception "String must not end with a space" in Java - java

Need to write a method signature for a method called wordCount() that takes a String parameter, and returns the number of words in that String.
For the purposes of this question, a ‘word’ is any sequence of characters; it does not have to be a real English word. Words are separated by spaces.
For example: wordCount(“Java”) should return the value 1.
I have written a code, but the problem is in throwing exceptions. I have an error saying: "a string containing must not end with a space in java" and "a string containing must not start with a space in java"
my try:
int wordCount(String s){
if (s==null) throw new NullPointerException ("string must not be null");
int counter=0;
for(int i=0; i<=s.length()-1; i++){
if(Character.isLetter(s.charAt(i))){
counter++;
for(;i<=s.length()-1;i++){
if(s.charAt(i)==' '){
counter++;
}
}
}
}
return counter;
}

You're on the right track with your exception handling, but not quite there (as you've noticed).
Try the code below:
public int wordCount(final String sentence) {
// If sentence is null, throw IllegalArgumentException.
if(sentence == null) {
throw new IllegalArgumentException("Sentence cannot be null.");
}
// If sentence is empty, throw IllegalArgumentException.
if(sentence.equals("")) {
throw new IllegalArgumentException("Sentence cannot be empty.");
}
// If sentence ends with a space, throw IllegalArgumentException. "$" matches the end of a String in regex.
if(sentence.matches(".* $")) {
throw new IllegalArgumentException("Sentence cannot end with a space.");
}
// If sentence starts with a space, throw IllegalArgumentException. "^" matches the start of a String in regex.
if(sentence.matches("^ .*")) {
throw new IllegalArgumentException("Sentence cannot start with a space.");
}
int wordCount = 0;
// Do wordcount operation...
return wordCount;
}
Regular Expressions (or "regex" to the cool kids in the know) are fantastic tools for String validation and searching. The method above practices fail-fast implementation, that is that the method will fail before performing expensive processing tasks that will just fail anyway.
I'd suggest brushing up on both practices covered here, bot regex and exception handling. Some excellent resources to help you get started are included below:
You Don’t Know Anything About Regular Expressions: A Complete Guide
Understanding Java Exceptions
Debuggex - A wonderful tool to help understand and debug regex

I would use the String.split() method. This takes a regular expression which returns a string array containing the substrings. It is easy enough from there to get and return the length of the array.
This sounds like homework so I will leave the specific regular expression to you: but it should be very short, perhaps even one character long.

I would use String.split() to handle this scenario. It will be more efficient than the pasted code. Make sure you check for empty characters. This will help with sentences with multiple spaces (e.g. "This_sentences_has__two_spaces).
public int wordCount(final String sentence) {
int wordCount = 0;
String trimmedSentence = sentence.trim();
String[] words = trimmedSentence.split(" ");
for (int i = 0; i < words.length; i++) {
if (words[i] != null && !words[i].equals("")) {
wordCount++;
}
}
return wordCount;
}

I would use the splitter from google guava library. It will work more correctry, because standart String.split() working incorrectly even in this simple case:
// there is only two words, but between 'a' and 'b' are two spaces
System.out.println("a b".split(" ").length);// print '3' becouse but it think than there is
// empty line between these two spaces
With guava you can do just this:
Iterables.size(Splitter.on(" ").trimResults().omitEmptyStrings().split("same two_spaces"));// 2

Related

How do you find the alphabetically last letter of a string using recursion (no loops!) and without using arrays in Java?

Got something for you all.
As the title of the problem suggests, I am trying to implement a non-array, non-looping, recursive method to find the alphabetically last letter in a string.
I think that I understand the nature of the problem I'm trying to solve, but I don't know how to start with the base case and then the recursion.
Can anyone be willing to solve this problem?
In this case, I would like the following code:
//Method Definition
public static String findZenithLetter(String str) {
//Put actual working Java code that finds the alphabetically last letter of the desired string here.
//Use recursion, not loops! :)
//Don't use arrays! ;)
}
//Driver Code
System.out.println(findZenithLetter("I can reach the apex, at the top of the world."));
//Should print the String "x" if implemented properly
I have tried to attempt numerous, but currently failed ways of solving this problem, including but not limited to:
Sorting the string by alphabetical order then finding the last letter of the new string, excluding punctuation marks.
Using the compareTo() method to compare two letters of the string side by side, but that has yet to work as I am so tempted to use loops, not recursion. I need a recursive method to solve this, though. :)
In the end, the best piece of code that I've written for this problem was just a drawn-out way to compute just the last character of a string and not actually THE alphabetically last character.
This is quite simple. All you need is just iterate (in the recursion of course), and check all characters int he string with local maximum.
public static char findZenithLetter(String str) {
return findZenithLetter(str, 0, 'a');
}
private static char findZenithLetter(String str, int i, char maxCh) {
if (i >= str.length())
return maxCh;
char ch = Character.toLowerCase(str.charAt(i));
if (Character.isLetter(ch))
maxCh = ch > maxCh ? ch : maxCh;
return findZenithLetter(str, i + 1, maxCh);
}
Nibble off the first character at each recursion, returning the greater of it and the greatest found in the rest of the input:
public static String findZenithLetter(String str) {
if (str.isEmpty()) {
return ""; // what's returned if no letters found
}
String next = str.substring(0, 1);
String rest = findZenithLetter(str.substring(1));
return Character.isLetter(next.charAt(0)) && next.compareToIgnoreCase(rest) > 0 ? next : rest;
}
See live demo.
The check for Character.isLetter() prevents non-letter characters, which may be "greater than" letters being returned.
If no letters are found, a blank is returned.

most efficient way to check if a string contains specific characters

I have a string that should contain only specific characters: {}()[]
I've created a validate method that checks if the string contains forbidden characters (by forbidden characters I mean everything that is not {}()[] )
Here is my code:
private void validate(String string) {
char [] charArray = string.toCharArray();
for (Character c : charArray) {
if (!"{}()[]".contains(c.toString())){
throw new IllegalArgumentException("The string contains forbidden characters");
}
}
}
I'm wondering if there are better ways to do it since my approach doesn't seem right.
If I took the way you implement this, I would personally modify it like below:
private static void validate(String str) {
for (char c : str.toCharArray()) {
if ("{}()[]".indexOf(c) < 0){
throw new IllegalArgumentException("The string contains forbidden characters");
}
}
}
The changes are as follows:
Not declaring a temporary variable for the char array.
Using indexOf to find a character instead of converting c to String to use .contains().
Looping on the primitive char since you no longer need
toString().
Not naming the parameter string as this can cause confusion and is not good practice.
Note: contains calls indexOf(), so this does also technically save you a method call each iteration.
I'd suggest using Stream if you are using Java 8.
This allow you omit char to String boxing stuff.
private void validate_stream(String str) {
if(str.chars().anyMatch(a -> a==125||a==123||a==93||a==91||a==41||a==40))
throw new IllegalArgumentException("The string contains forbidden characters");
}
The numbers are ASCII codes for forbidden characters, you can replace them with chars if you want:
(a -> a=='{'||a=='}'||a=='['||a==']'||a=='('||a==')')
I hope this works for you: I have added my code along with your code.
I have used a regex pattern, where \\ escapes brackets, which has special meaning in regex. And use matches method of string, it try to matches the given string value with given reg ex pattern. In this case as we used not(!), if we give string like "{}()[]as", it satisfies the if not condition and prints "not matched", otherwise if we give string like "{}()[]", else case will will print. You can change this how you like by throwing exception.
private static void validate(String string)
{
String pattern = "\\{\\}\\(\\)\\[\\]";
if(!string.matches(pattern)) {
System.out.println("not matched:"+string);
}
else {
System.out.println("data matched:"+string);
}
char [] charArray = string.toCharArray();
for (Character c : charArray) {
if (!"{}()[]".contains(c.toString())){
throw new IllegalArgumentException("The string contains forbidden characters");
}
}
}
All the brackets are Meta characters, referenced here:
http://tutorials.jenkov.com/java-regex/index.html

How to properly use java Pattern object to match string patterns

I wrote a code that does several string operations including checking whether a given string matches with a certain regular expression. It ran just fine with 70,000 input but it started to give me out of memory error when I iteratively ran it for five-fold cross validation. It just might be the case that I have to assign more memory, but I have a feeling that I might have written an inefficient code, so wanted to double check if I didn't make any obvious mistake.
static Pattern numberPattern = Pattern.compile("^[a-zA-Z]*([0-9]+).*");
public static boolean someMethod(String line) {
String[] tokens = line.split(" ");
for(int i=0; i<tokens.length; i++) {
tokens[i] = tokens[i].replace(",", "");
tokens[i] = tokens[i].replace(";", "");
if(numberPattern.matcher(tokens[i]).find()) return true;
}
return false;
}
and I have also many lines like below:
token.matches("[a-z]+[A-Z][a-z]+");
Which way is more memory efficient? Do they look efficient enough? Any advice is appreciated!
Edited:
Sorry, I had a wrong code, which I intended to modify before posting this question but I forgot at the last minute. But the problem was I had many similar looking operations all over, aside from the fact that the example code did not make sense, I wanted to know if regexp comparison part was efficient.
Thanks for all of your comments, I'll look through and modify the code following the advice!
Well, first at all, try a second look at your code... it will always return a "true" value ! You are not reading the 'match' variable, just putting values....
At second, String is immutable, so, each time you're splitting, you're creating another instances... why don't you try so create a pattern that makes the matches you want ignoring the commas and semicolons? I'm not sure, but I think it will take you less memory...
Yes, this code is inefficient indeed because you can return immediately once you've found that match = true; (no point to continue looping).
Further, are you sure you need to break the line into tokens ? why not check the regex only once ?
And last, if all comparisons checks failed, you should return false (last line).
Instead of altering the text and splitting it you can put it all in the regex.
// the \\b means it must be the start of the String or a word
static Pattern numberPattern = Pattern.compile("\\b[a-zA-Z,;]*[0-9,;]*[0-9]");
// return true if the string contains
// a number which might have letters in front
public static boolean someMethod(String line) {
return numberPattern.matcher(line).find());
}
Aside from what #alfasin has mentioned in his answer, you should avoid duplicating code; Rewrite the following:
{
tokens[i] = tokens[i].replace(",", "");
tokens[i] = tokens[i].replace(";", "");
}
Into:
tokens[i] = tokens[i].replaceAll(",|;", "");
And please just compute this before it was .split(), such that the operation doesn't have to be repeated within the loop:
String[] tokens = line.replaceAll(",|;", "").split(" ");
^^^^^^^^^^^^^^^^^^^^^^
Edit: After staring at your code for a bit I think I have a better solution, using regex ;)
public static boolean someMethod(String line) {
return Pattern.compile("\\b[a-zA-Z]*\\d")
.matcher(line.replaceAll(",|;", "")).find();
}
Online Regex DemoOnline Code Demo
\b is a Word Boundary.
It asserts position at the Boundary of a word (Start of line + after spacing)
Code Demo STDOUT:
foo does not match
bar does not match
bar1 does match
foo baz bar bar1 lolz does match
password_01 does not match

Counting the occurrences of string in Java using string.split()

I'm new to Java. I thought I would write a program to count the occurrences of a character or a sequence of characters in a sentence. I wrote the following code. But I then saw there are some ready-made options available in Apache Commons.
Anyway, can you look at my code and say if there is any rookie mistake? I tested it for a couple of cases and it worked fine. I can think of one case where if the input is a big text file instead of a small sentence/paragraph, the split() function may end up being problematic since it has to handle a large variable. However this is my guess and would love to have your opinions.
private static void countCharInString() {
//Get the sentence and the search keyword
System.out.println("Enter a sentence\n");
Scanner in = new Scanner(System.in);
String inputSentence = in.nextLine();
System.out.println("\nEnter the character to search for\n");
String checkChar = in.nextLine();
in.close();
//Count the number of occurrences
String[] splitSentence = inputSentence.split(checkChar);
int countChar = splitSentence.length - 1;
System.out.println("\nThe character/sequence of characters '" + checkChar + "' appear(s) '" + countChar + "' time(s).");
}
Thank you :)
Because of edge cases, split() is the wrong approach.
Instead, use replaceAll() to remove all other characters then use the length() of what's left to calculate the count:
int count = input.replaceAll(".*?(" + check + "|$)", "$1").length() / check.length();
FYI, the regex created (for example when check = 'xyz'), looks like ".*?(xyz|$)", which means "everything up to and including 'xyz' or end of input", and is replaced by the captured text (either `'xyz' or nothing if it's end of input). This leaves just a string of 0-n copies the check string. Then dividing by the length of check gives you the total.
To protect against the check being null or zero-length (causing a divide-by-zero error), code defensively like this:
int count = check == null || check.isEmpty() ? 0 : input.replaceAll(".*?(" + check + "|$)", "$1").length() / check.length();
A flaw that I can immediately think of is that if your inputSentence only consists of a single occurrence of checkChar. In this case split() will return an empty array and your count will be -1 instead of 1.
An example interaction:
Enter a sentence
onlyme
Enter the character to search for
onlyme
The character/sequence of characters 'onlyme' appear(s) '-1' time(s).
A better way would be to use the .indexOf() method of String to count the occurrences like this:
while ((i = inputSentence.indexOf(checkChar, i)) != -1) {
count++;
i = i + checkChar.length();
}
split is the wrong approach for a number of reasons:
String.split takes a regular expression
Regular expressions have characters with special meanings, so you cannot use it for all characters (without escaping them). This requires an escaping function.
Performance String.split is optimized for single characters. If this were not the case, you would be creating and compiling a regular expression every time. Still, String.split creates one object for the String[] and one object for each String in it, every time that you call it. And you have no use for these objects; all you want to know is the count. Although a future all-knowing HotSpot compiler might be able to optimize that away, the current one does not - it is roughly 10 times as slow as simply counting characters as below.
It will not count correctly if you have repeating instances of your checkChar
A better approach is much simpler: just go and count the characters in the string that match your checkChar. If you think about the steps you need to take count characters, that's what you'd end up with by yourself:
public static int occurrences(String str, char checkChar) {
int count = 0;
for (int i = 0, l = str.length(); i < l; i++) {
if (str.charAt(i) == checkChar)
count++;
}
return count;
}
If you want to count the occurrence of multiple characters, it becomes slightly tricker to write with some efficiency because you don't want to create a new substring every time.
public static int occurrences(String str, String checkChars) {
int count = 0;
int offset = 0;
while ((offset = str.indexOf(checkChars, offset)) != -1) {
offset += checkChars.length();
count++;
}
return count;
}
That's still 10-12 times as fast to match a two-character string than String.split()
Warning: Performance timings are ballpark figures that depends on many circumstances. Since the difference is an order of magnitude, it's safe to say that String.split is slower in general. (Tests performed on jdk 1.8.0-b28 64-bit, using 10 million iterations, verified that results were stable and the same with and without -Xcomp, after performing tests 10 times in same JVM instances.)

String splitting

I have a string in what is the best way to put the things in between $ inside a list in java?
String temp = $abc$and$xyz$;
how can i get all the variables within $ sign as a list in java
[abc, xyz]
i can do using stringtokenizer but want to avoid using it if possible.
thx
Maybe you could think about calling String.split(String regex) ...
The pattern is simple enough that String.split should work here, but in the more general case, one alternative for StringTokenizer is the much more powerful java.util.Scanner.
String text = "$abc$and$xyz$";
Scanner sc = new Scanner(text);
while (sc.findInLine("\\$([^$]*)\\$") != null) {
System.out.println(sc.match().group(1));
} // abc, xyz
The pattern to find is:
\$([^$]*)\$
\_____/ i.e. literal $, a sequence of anything but $ (captured in group 1)
1 and another literal $
The […] is a character class. Something like [aeiou] matches one of any of the lowercase vowels. [^…] is a negated character class. [^aeiou] matches one of anything but the lowercase vowels.
(…) is used for grouping. (pattern) is a capturing group and creates a backreference.
The backslash preceding the $ (outside of character class definition) is used to escape the $, which has a special meaning as the end of line anchor. That backslash is doubled in a String literal: "\\" is a String of length one containing a backslash).
This is not a typical usage of Scanner (usually the delimiter pattern is set, and tokens are extracted using next), but it does show how'd you use findInLine to find an arbitrary pattern (ignoring delimiters), and then using match() to access the MatchResult, from which you can get individual group captures.
You can also use this Pattern in a Matcher find() loop directly.
Matcher m = Pattern.compile("\\$([^$]*)\\$").matcher(text);
while (m.find()) {
System.out.println(m.group(1));
} // abc, xyz
Related questions
Validating input using java.util.Scanner
Scanner vs. StringTokenizer vs. String.Split
Just try this one:temp.split("\\$");
I would go for a regex myself, like Riduidel said.
This special case is, however, simple enough that you can just treat the String as a character sequence, and iterate over it char by char, and detect the $ sign. And so grab the strings yourself.
On a side node, I would try to go for different demarkation characters, to make it more readable to humans. Use $ as start-of-sequence and something else as end-of-sequence for instance. Or something like I think the Bash shell uses: ${some_value}. As said, the computer doesn't care but you debugging your string just might :)
As for an appropriate regex, something like (\\$.*\\$)* or so should do. Though I'm no expert on regexes (see http://www.regular-expressions.info for nice info on regexes).
Basically I'd ditto Khotyn as the easiest solution. I see you post on his answer that you don't want zero-length tokens at beginning and end.
That brings up the question: What happens if the string does not begin and end with $'s? Is that an error, or are they optional?
If it's an error, then just start with:
if (!text.startsWith("$") || !text.endsWith("$"))
return "Missing $'s"; // or whatever you do on error
If that passes, fall into the split.
If the $'s are optional, I'd just strip them out before splitting. i.e.:
if (text.startsWith("$"))
text=text.substring(1);
if (text.endsWith("$"))
text=text.substring(0,text.length()-1);
Then do the split.
Sure, you could make more sophisticated regex's or use StringTokenizer or no doubt come up with dozens of other complicated solutions. But why bother? When there's a simple solution, use it.
PS There's also the question of what result you want to see if there are two $'s in a row, e.g. "$foo$$bar$". Should that give ["foo","bar"], or ["foo","","bar"] ? Khotyn's split will give the second result, with zero-length strings. If you want the first result, you should split("\$+").
If you want a simple split function then use Apache Commons Lang which has StringUtils.split. The java one uses a regex which can be overkill/confusing.
You can do it in simple manner writing your own code.
Just use the following code and it will do the job for you
import java.util.ArrayList;
import java.util.List;
public class MyStringTokenizer {
/**
* #param args
*/
public static void main(String[] args) {
List <String> result = getTokenizedStringsList("$abc$efg$hij$");
for(String token : result)
{
System.out.println(token);
}
}
private static List<String> getTokenizedStringsList(String string) {
List <String> tokenList = new ArrayList <String> ();
char [] in = string.toCharArray();
StringBuilder myBuilder = null;
int stringLength = in.length;
int start = -1;
int end = -1;
{
for(int i=0; i<stringLength;)
{
myBuilder = new StringBuilder();
while(i<stringLength && in[i] != '$')
i++;
i++;
while((i)<stringLength && in[i] != '$')
{
myBuilder.append(in[i]);
i++;
}
tokenList.add(myBuilder.toString());
}
}
return tokenList;
}
}
You can use
String temp = $abc$and$xyz$;
String array[]=temp.split(Pattern.quote("$"));
List<String> list=new ArrayList<String>();
for(int i=0;i<array.length;i++){
list.add(array[i]);
}
Now the list has what you want.

Categories

Resources