Count the number of dots[.] , ! and ? in a text

Count the number of dots[.] , ! and ? in a text - java

Im working on java project and I need to count the number of dots,! and ? in a string. My current approach is to use regex. I used the following code but its not giving the correct result.
for(int i=0; i<words.length; i++){
String w = words[i];
if(w.matches("(.)+[.!?]")){
count++; //increasing the count.
}
}
For some other function I have converted the string into array of words. So I'm using it in this.
I want to increase the count by one for each occurrence of dot,! or ? indicating a terminating point of a sentence. For example
test. - count increases by 1
test.. - count increase by 1
test?. - count increases by 1
Repeated use of symbols shouldn't increase the count.
Can you tell me what is wrong in here?

Use a wildcard in the regex.
int count = 0;
for( int i = 0; i < words.length; i++ )
if( words[i].matches(".*[.!?]") )
count++;
.*[.!?] will match all strings that end in a period, exclamation point, or question mark.
The first . is unescaped, and stands for any character. The * means 0 or more of the previous thing. So 0 or more of any character. The . in the brackets is escaped, so it's just a regular period.

The easiest way is this one-liner:
int count = str.length() - str.replaceAll("[.!?]+", "").length()
Rather than count char matches, delete them and compare lengths.

You can do -
public static void main(String args[])
{
String str = "Test, test!.\tTEST:\nTeST?;";
Pattern p = Pattern.compile("[.!?]");
Matcher matcher = p.matcher(str);
int count = 0;
while(matcher.find()) {
count++;
}
System.out.println("Count : " + count);
}
and the output is - 3 as expected.
Can you tell me why the same regex in str.matches("[.!?]) is not giving the expected result?
Because str.matches("[.!?]) matches the whole String and not if regex is found in the String. Above regex will work if string is '.', '?', or '!' -
String s = ".";
System.out.println(s.matches("[.!?]"));
will give true.

Related

Recursion: Longest Palindrome Substring

This is a very common problem in which we would have to find the longest substring which is also a palindrome substring for the given input string.
Now there are multiple possible approaches to this and I am aware about Dynamic programming solution, expand from middle etc. All these solutions should be used for any practical usecase.
I was experimenting with using recursion to solve this problem and trying to implement the simple idea.
Let us assume that s is the given input string and i and j represent any valid character indexes of input string. So if s[i] == s[j], my longest substring would be:
s.charAt(i) + longestSubstring(s, i + 1, j - 1) + s.charAt(j)
And if these two characters are not equal then:
max of longestSubstring(s, i + 1, j) or longestSubstring(s, i, j - 1)
I tried to implement this solution below:
// end is inclusive
private static String longestPalindromeHelper(String s, int start, int end) {
if (start > end) {
return "";
} else if (start == end) {
return s.substring(start, end + 1);
}
// if the character at start is equal to end
if (s.charAt(start) == s.charAt(end)) {
// I can concatenate the start and end characters to my result string
// plus I can concatenate the longest palindrome in start + 1 to end - 1
// now logically this makes sense to me, but this would fail in the case
// for ex: a a c a b d k a c a a (space added for visualization)
// when start = 3 (a character)
// end = 7 (again end character)
// it will go in recursion with start = 4 and end = 6 from now onwards
// there is no palindrome substrings apart from the single character
// substring (which are palindrome by itself) so recursion tree for
// start = 3 and end = 7 would return any single character from b d k
// let's say it returns b so result would be a a c a b a c a a
// this would be correct answer for longest palindrome subsequence but
// not substring because for sub strings I need to have consecutive
// characters
return s.charAt(start)
+ longestPalindromeHelper(s, start + 1, end - 1) + s.charAt(end);
} else {
// characters are not equal, increment start
String s1 = longestPalindromeHelper(s, start + 1, end);
String s2 = longestPalindromeHelper(s, start, end - 1);
return s1.length() > s2.length() ? s1 : s2;
}
}
public static String longestPalindrome(String s) {
return longestPalindromeHelper(s, 0, s.length() - 1);
}
public static void main(String[] args) throws Exception {
String ans = longestPalindrome("aacabdkacaa");
System.out.println("Answer => " + ans);
}
For a moment let us forgot about time complexity or runtime. I am focused towards making it work for simple case above.
As you can see in the comments I got the idea why this is failing but I tried hard to rectify the problem following the exactly same approach. I don't want to use loops here.
What could be the possible fix for this following same approach?
Note: I am interested in the actual string as answer and not the length. FYI I had a look at all the other questions and it seems no one is following this approach for correctness so I am trying.

Once you have a call wherein s[i] == s[j], you could flip a boolean flag or switch to a modified method that communicates to child calls that they can no longer use the "don't match, try i + 1 and j - 1" branch (else condition). This ensures you're looking at substrings, not subsequences, for the remainder of the recursion.
Secondly, for the substring variant, even if s[i] == s[j], you should also try i + 1 and j - 1 as if these characters didn't match, because one or both of these characters might not be part of the final best substring between i and j. In the subsequence version, there's never any reason not to add any matching characters to the current palindromic subsequence for the range i to j, but that's not always the case with substrings.
For example, given input "aabcbda" and we're at a call frame where i = 1 and j = length - 1, we need to maximize over three possibilities:
The best substring includes both 'a' characters. Call the subroutine with the flag that says we have to consume from both ends on down and can no longer try skipping characters.
The best substring might still include s[i] but not s[j], try j - 1.
The best substring might still include s[j] but not s[i], try i + 1.
Another observation: it might make more sense to pass best indices up the helper call chain, then grab the longest palindromic substring based on these indices at the very end in the wrapper function.
On a similar note, if you're struggling, you might simplify the problem and return the longest palindromic substring length using your recursive method, then switch to getting the actual substring itself. This makes it easier to focus on the subsequence logic without the return value complicating things as much.

It is much easier to use loops here, rather than recursion, something like this:
public static void main(String[] args) {
System.out.println(longestPalindrome("abbqa")); // bb
System.out.println(longestPalindrome("aacabdkacaa")); // aca
System.out.println(longestPalindrome("aacabdkaccaa")); // acca
}
public static String longestPalindrome(String str) {
String palindrome = "";
for (int i = 0; i < str.length(); i++) {
for (int j = i; j < str.length(); j++) {
String substring = str.substring(i, j);
if (isPalindrome(substring)
&& substring.length() > palindrome.length()) {
palindrome = substring;
}
}
}
return palindrome;
}
public static boolean isPalindrome(String str) {
for (int i = 0; i < str.length() / 2; i++) {
if (str.charAt(i) != str.charAt(str.length() - i - 1)) {
return false;
}
}
return true;
}

How to increment conditionally?

so im a complete beginner and I was wondering if it was possible to increment a counter conditionally. I am trying to count the letter “I” in a sentence and everytime i pass an “I”, i want counter to increment by 1 but if there is more than 1 of these together “III” it still only increments by 1 until there a character after it like “IIIaI” which would count as 2 instances.
Is this possible?
Sorry guys, here is my code:
public static int countTheIs(string sentence){
int iCounter = 0;
String iCount = "iI"; //both cases included
for (int j = 0; j < sentence.length(); j++){
char ch =sentence.charAt(j);
if (iCount.indexOf(ch) != -1){
iCounter++;
}
}
}

You are actually quite far already, all you need to do is to check the previous character. This can be done the following way:
String sentence = "Test i two II three iIi";
int iCounter = 0;
String iCount = "iI";
for (int j = 0; j < sentence.length(); j++){
char current = sentence.charAt(j);
char previous; //1
if (j==0) {
previous = 'Z'; //2
} else {
previous = sentence.charAt(j-1); //3
}
if (iCount.indexOf(current) != -1 && iCount.indexOf(previous) == -1 ){ //4
iCounter++;
}
}
Let me explain to you what I have done, according to my // tags
//1 We make a new char variable holding the previous character.
//2 Because the first index of the String has no previous characters, we will set it to a random, non-matching character to prevent errors at the start. I picked Z in this example.
//3 If there is a previous character, we get this by subtracting 1 from j
//4 We check in the if statement if the currenct character is in iCount, and the previous character is not in iCount. If this is the case, the counter will increase.
When the above code is ran, the result will output 3.

OK, I'm going to assume that you have a string input, you are counting by using a loop and then using charAt(x)(x is the number you use to increment the loop) and then comparing.
Simply check if charAt(x-1) is also I. If it is, then don't increment i. Also, you want to make sure x>0 otherwise it will throw an error.

Please run the below code:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class CountI {
public static void main(String[] args) {
String input = "IIiaIii";
String regex = "([A-Za-z])\\1+";
Pattern pattern = Pattern.compile(regex , Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(input);
String output = matcher.replaceAll("$1");
int result = 0;
for(int i = 0; i < output.length(); i++){
if(output.charAt(i) == 73 || output.charAt(i) == 105){
result++;
}
}
System.out.println(result);
}
}
Output:
2
Process finished with exit code 0

You want Regular Expressions and the Java Pattern class (https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html).
In my sample below I used "w" instead of "l" because it's easier to distinguish. Using regular expressions, define a pattern that will capture one or more consecutive occurrences of the letter: w+, then use a matcher, count the number of times it matches.
String input = "wwowwee w w w";
Pattern p = Pattern.compile("w+");
Matcher matcher = p.matcher(input);
int count = 0;
while(matcher.find()) {
count++;
}
System.out.println("Count: " + count);
Or, simply split the string and count the number of splits:
String input = "wwowwee w w w";
Pattern p = Pattern.compile("w+");
String[] tokens = p.split(input);
System.out.println("token count: " + tokens.length);
Both give the correct results.
Edit: This doesn't answer the question about incrementing a counter conditionally, but it solves the problem that question was posed to address.

Maximum repeating sequence instead of longest repeating sequence

I am trying to get the most repeated sequence of characters in a string.
For example :
Input:
s = "abccbaabccba"
Output:
2
I have used dynamic programming to figure out the repeating sequence, but this returns the longest repeating character sequence. For example:
Input:
s = "abcabcabcabc"
Output:
2
2(abcabc,abcabc) instead of 4(abc,abc,abc,abc)
Here is the part of the code where I'm filling the DP table and extracting repeating sequence. Can anyone suggest how I can get the most repeating sequence?
//Run through the string and fill the DP table.
char[] chars = s.toCharArray();
for(int i = 1; i <= length; i++){
for(int j = 1; j <= length; j++){
if( chars[i-1] == chars[j-1] && Math.abs(i-j) > table[i-1][j-1]){
table[i][j] = table[i-1][j-1] + 1;
if(table[i][j] > max_length_sub){
max_length_sub = table[i][j];
array_index = Math.min(i, j);
}
}else{
table[i][j] = 0;
}
}
}
//Check if there was a repeating sequence and return the number of times it occurred.
if( max_length_sub > 0 ){
String temp = s;
String subSeq = "";
for(int i = (array_index - max_length_sub); i< max_length_sub; i++){
subSeq = subSeq + s.charAt(i);
}
System.out.println( subSeq );
Pattern pattern = Pattern.compile(subSeq);
Matcher matcher = pattern.matcher(s);
int count = 0;
while (matcher.find())
count++;
// To find left overs - doesn't seem to matter
String[] splits = temp.split(subSeq);
if (splits.length == 0){
return count;
}else{
return 0;
}
}

Simple and dump, the the smallest sequence to be considered is a pair of characters (*):
loop over the whole String an get every consecutive pair of characters, like using a for and substring to get the characters;
count the occurrence of that pair in the String, create a method countOccurrences() using indexof(String, int) or regular expressions; and
store the greatest count, use one variable maxCount outside the loop and an if to check if the actual count is greater (or Math.max())
(*) if "abc" occurs 5 times, than "ab" (and "bc") will occur at least 5 times too - so it is enough to search just for "ab" and "bc", not need to check "abc"
Edit without leftovers, see comments, summary:
check if the first character is repeated over the whole string, if not
check if the 2 initial characters are repeated all over, if not
check if the 3 ...
at least 2 counters/loops needed: one for the number of characters to test, second for the position being tested. Some arithmetic could be used to improve performance: the length of the string must be divisible by the number of repeated characters without remainder.

Why am I getting java.lang.StringIndexOutOfBoundsException?

I want to write a program that prints words incrementally until a complete sentence appears. For example : I need to write (input), and output:
I
I need
I need to
I need to write.
Here is my code:
public static void main(String[] args) {
String sentence = "I need to write.";
int len = sentence.length();
int numSpace=0;
System.out.println(sentence);
System.out.println(len);
for(int k=0; k<len; k++){
if(sentence.charAt(k)!='\t')
continue;
numSpace++;
}
System.out.println("Found "+numSpace +"\t in the string.");
int n=1;
for (int m = 1; m <=3; m++) {
n=sentence.indexOf('\t',n-1);
System.out.println("ligne"+m+sentence.substring(0, n));
}
}
and this is what I get:
I need to write.
16
Found 0 in the string.
Exception in thread "main" java.lang.StringIndexOutOfBoundsException:
String index out of range: -1 at
java.lang.String.substring(String.java:1937) at
split1.Split1.main(Split1.java:36) Java Result: 1 BUILD SUCCESSFUL
(total time: 0 seconds)
I don't understand why numSpace doesn't count the occurrences of spaces, nor why I don't get the correct output (even if I replace numSpace by 3 for example).

You don't have a \t character, so indexOf(..) returns -1
You try a substring from 0 to -1 - fails
The solution is to check:
if (n > -1) {
System.out.prinltn(...);
}

Your loop looking for numSpace is incorrect. You are looking for a \t which is a tab character, of which there are none in the string.
Further, when you loop in the bottom, you get an exception because you are trying to parse by that same\t, which will again return no results. The value of n in n=sentence.indexOf('\t',n-1); is going to return -1 which means "there is not last index of what you are looking for". Then you try to get an actual substring with the value of -1 which is an invalid substring, so you get an exception.

You are mistaken by the concept of \t which is an escape sequence for a horizontal tab and not for a whitespace character (space). Searching for ' ' would do the trick and find the whitespaces in your sentence.

This looks like homework, so my answer is a hint.
Hint: read the javadoc for String.indexOf paying attention to what it says about the value returned when the string / character is not found.
(In fact - even if this is not formal homework, you are clearly a Java beginner. And beginners need to learn that the javadocs are the first place to look when using an unfamiliar method.)

The easiest way to solve this I guess would be to split the String first by using the function String.split. Something like this:
static void sentence(String snt) {
String[] split = snt.split(" ");
for (int i = 0; i < split.length; i++) {
for (int j = 0; j <= i; j++) {
if (i == 1 && j == 0) System.out.print(split[j]);
else System.out.printf(" %s", split[j]);
}
}
}
As other people pointed out. You are counting every characters except tabs(\t) as a space. You need to check for spaces by
if (sentence.charAt(k) == ' ')

\t represents a tab. To look for a space, just use ' '.
.indexOf() returns -1 if it can't find a character in the string. So we keep looping until .indexOf() returns -1.
Use of continue wasn't really needed here. We increment numSpaces when we encounter a space.
System.out.format is useful when we want to mix literal strings and variables. No ugly +s needed.
String sentence = "I need to write.";
int len = sentence.length();
int numSpace = 0;
for (int k = 0; k < len; k++) {
if (sentence.charAt(k) == ' ') {
numSpace++;
}
}
System.out.format("Found %s in the string.\n", numSpace);
int index = sentence.indexOf(' ');
while(index > -1) {
System.out.println(sentence.substring(0, index));
index = sentence.indexOf(' ', index + 1);
}
System.out.println(sentence);
}

Try this, it should pretty much do what you want. I figure you have already finished this so I just made the code real fast. Read the comments for the reasons behind the code.
public static void main(String[] args) {
String sentence = "I need to write.";
int len = sentence.length();
String[] broken = sentence.split(" "); //Doing this instead of the counting of characters is just easier...
/*
* The split method makes it where it populates the array based on either side of a " "
* (blank space) so at the array index of 0 would be 'I' at 1 would be "need", etc.
*/
boolean done = false;
int n = 0;
while (!done) { // While done is false do the below
for (int i = 0; i <= n; i++) { //This prints out the below however many times the count of 'n' is.
/*
* The reason behind this is so that it will print just 'I' the first time when
* 'n' is 0 (because it only prints once starting at 0, which is 'I') but when 'n' is
* 1 it goes through twice making it print 2 times ('I' then 'need") and so on and so
* forth.
*/
System.out.print(broken[i] + " ");
}
System.out.println(); // Since the above method is a print this puts an '\n' (enter) moving the next prints on the next line
n++; //Makes 'n' go up so that it is larger for the next go around
if (n == broken.length) { //the '.length' portion says how many indexes there are in the array broken
/* If you don't have this then the 'while' will go on forever. basically when 'n' hits
* the same number as the amount of words in the array it stops printing.
*/
done = true;
}
}
}

Finding pairs in strings

I was wondering if i can get some help with this problem. Suppose I had a string
34342
I would like to find the number of pairs in this string, which would be two. How would i go about doing that?
EDIT: Ok what i really wanted was to match the occurrences of characters that are the same in the string.

You can use backreferences to find pairs of things that appear in a row:
(\d+)\1
This will match one or more digit character followed by the same sequence again. \1 is a backreference which refers to the contents of the first capturing group.
If you want to match numbers that appear multiple times in the string, you could use a pattern like
(\d)(?=\d*\1)
Again we're using a backreference, but this time we also use a lookahead as well. A lookahead is a zero-width assertion which specifies something that must be matched (or not matched, if using a negative lookahead) after the current position in the string, but doesn't consume any characters or move the position the regex engine is at in the string. In this case, we will assert that the contents of the first capture group must be found again, though not necessarily directly beside the first one. By specifying \d* within the lookahead, it will only be considered a pair if it is within the same number (so if there's a space between numbers, the pair won't be matched -- if this is undesired, the \d can be changed to ., which will match any character).
It'll match the first 3 and 4 in 34342 and the first 1, 2, 3, and 4 in 12332144. Note however that if you have an odd number of repetitions, you will get an extra match (ie. 1112 will match the first two 1s), because lookaheads do not consume.

Here's one way, if a regex doesn't seem appropriate. One method here uses a map, the other uses pure arrays. I don't really know what a pair is. Is "555" three pairs, one pair, or what? So these routines print a list of all characters that occur more than once.
public class Pairs {
public static void main(String[] args) {
usingMap("now is the time for all good men");
System.out.println("-----------");
usingArrays("now is the time for all good men");
}
private static void usingMap(String s) {
Map<Character, Integer> m = new TreeMap<Character, Integer>();
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if (m.containsKey(c)) {
m.put(c, m.get(c) + 1);
} else {
m.put(c, 1);
}
}
for (Character c : m.keySet()) {
if (m.get(c) > 1) {
System.out.println(c + ":" + m.get(c));
}
}
}
private static void usingArrays(String s) {
int count[] = new int[256];
for (int i = 0; i < count.length; i++) count[i] = 0;
for (int i = 0; i < s.length(); i++) {
count[s.charAt(i)]++;
}
for (int i = 0; i < count.length; i++) {
if (count[i] > 1) {
System.out.println((char) i + ":" + count[i]);
}
}
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Count the number of dots[.] , ! and ? in a text - java

The easiest way is this one-liner: int count = str.length() - str.replaceAll("[.!?]+", "").length() Rather than count char matches, delete them and compare lengths.

Related

Recursion: Longest Palindrome Substring

How to increment conditionally?

Maximum repeating sequence instead of longest repeating sequence

Why am I getting java.lang.StringIndexOutOfBoundsException?

Finding pairs in strings

Categories

Resources