Efficient way to find longest streak of characters in string

Efficient way to find longest streak of characters in string - java

This code works fine but I'm looking for a way to optimize it. If you look at the long string, you can see 'l' appears five times consecutively. No other character appears this many times consecutively. So, the output is 5. Now, the problem is this method checks each and every character and even after the max is found, it continues to check the remaining characters. Is there a more efficient way?
public class Main {
public static void main(String[] args) {
System.out.println(longestStreak("KDDiiigllllldddfnnlleeezzeddd"));
}
private static int longestStreak(String str) {
int max = 0;
for (int i = 0; i < str.length(); i++) {
int count = 0;
for (int j = i; j < str.length(); j++) {
if (str.charAt(i) == str.charAt(j)) {
count++;
} else break;
}
if (count > max) max = count;
}
return max;
}
}

We could add variable for previous char count in single iteration. Also as an additional optimisation we stop iteration if i + max - currentLenght < str.length(). It means that max can not be changed:
private static int longestStreak(String str) {
int maxLenght = 0;
int currentLenght = 1;
char prev = str.charAt(0);
for (int index = 1; index < str.length() && isMaxCanBeChanged(str, maxLenght, currentLenght, index); index++) {
char currentChar = str.charAt(index);
if (currentChar == prev) {
currentLenght++;
} else {
maxLenght = Math.max(maxLenght, currentLenght);
currentLenght = 1;
}
prev = currentChar;
}
return Math.max(maxLenght, currentLenght);
}
private static boolean isMaxCanBeChanged(String str, int max, int currentLenght, int index) {
return index + max - currentLenght < str.length();
}

Here is a regex magic solution, which although a bit brute force perhaps gets some brownie points. We can iterate starting with the number of characters in the original input, decreasing by one at a time, trying to do a regex replacement of continuous characters of that length. If the replacement works, then we know we found the longest streak.
String input = "KDDiiigllllldddfnnlleeezzeddd";
for (int i=input.length(); i > 0; --i) {
String replace = input.replaceAll(".*?(.)(\\1{" + (i-1) + "}).*", "$1");
if (replace.length() != input.length()) {
System.out.println("longest streak is: " + replace);
}
}
This prints:
longest streak is: lllll

Yes there is. C++ code:
string str = "KDDiiigllllldddfnnlleeezzeddd";
int longest_streak = 1, current_streak = 1; char longest_letter = str[0];
for (int i = 1; i < str.size(); ++i) {
if (str[i] == str[i - 1])
current_streak++;
else current_streak = 1;
if (current_streak > longest_streak) {
longest_streak = current_streak;
longest_letter = str[i];
}
}
cout << "The longest streak is: " << longest_streak << " and the character is: " << longest_letter << "\n";
LE: If needed, I can provide the Java code for it, but I think you get the idea.
public class Main {
public static void main(String[] args) {
System.out.println(longestStreak("KDDiiigllllldddfnnlleeezzeddd"));
}
private static int longestStreak(String str) {
int longest_streak = 1, current_streak = 1; char longest_letter = str.charAt(0);
for (int i = 1; i < str.length(); ++i) {
if (str.charAt(i) == str.charAt(i - 1))
current_streak++;
else current_streak = 1;
if (current_streak > longest_streak) {
longest_streak = current_streak;
longest_letter = str.charAt(i);
}
}
return longest_streak;
}
}

The loop could be rewritten a bit smaller, but mainly the condition can be optimized:
i < str.length() - max

Using Stream and collector. It should give all highest repeated elements.
Code:
String lineString = "KDDiiiiiiigllllldddfnnlleeezzeddd";
String[] lineSplit = lineString.split("");
Map<String, Integer> collect = Arrays.stream(lineSplit)
.collect(Collectors.groupingBy(Function.identity(), Collectors.summingInt(e -> 1)));
int maxValueInMap = (Collections.max(collect.values()));
for (Entry<String, Integer> entry : collect.entrySet()) {
if (entry.getValue() == maxValueInMap) {
System.out.printf("Character: %s, Repetition: %d\n", entry.getKey(), entry.getValue());
}
}
Output:
Character: i, Repetition: 7
Character: l, Repetition: 7
P.S I am not sure how efficient this code it. I just learned Streams.

Related

How to count all the matching sub-string using regex?

I am trying to count all instances of a substring from .PBAAP.B with P A B in that sequence and can have 1-3 symbols in between them (inclusive).
The output should be 2
.P.A...B
.P..A..B
What I've tried so far is
return (int) Pattern
.compile("P.{0,2}A.{0,2}B")
.matcher(C)
.results()
.count();
But I only get output 1. My guess is that in both cases, the group is PBAAP.B. So instead of 2, I get 1.
I could write an elaborate function to achieve what I am trying to do, but I was wondering if there was a way to do it with regex.
int count = 0;
for (int i = 0; i < C.length(); i++) {
String p = Character.toString(C.charAt(i));
if (p.equalsIgnoreCase("P")) {
for (int j = X; j <= Y; j++) {
if (i + j < C.length()) {
String a = Character.toString(C.charAt(i + j));
if (a.equals("A")) {
for (int k = X; k <= Y; k++) {
if (i + j + k < C.length()) {
String b = Character.toString(C.charAt(i + j + k));
if (b.equalsIgnoreCase("B")) {
count++;
}
}
}
}
}
return count;

To my knowledge, you will only get a boolean response out of a regex match - either it is a match or it isn't. Thus, I can't think of a solution solving your problem using regex.
count() is used, if you want to check if there are multiple matches at different indices - which is not what you want. For instance, the following snippet will return 2 as th is found at 11-12 and at 20-21:
Pattern
.compile("(th)")
.matcher("let's test this together")
.results()
.count();
In order to keep the solution without regex readable and extensible, you may want to use regression.
Class to keep track of the latest match of each letter:
public class Letter {
private char letter;
private int latestMatch;
// constructor
// getters + setters
}
Class to detect different matches:
public class MatchFinder {
final static String C = "......PAA.BBBadsfjksPeAkBB";
final static Letter[] LETTERS = new Letter[3];
final static int SYMBOLS_THRESHOLD = 4;
public static void main(String[] args) {
LETTERS[0] = new Letter('P', -1);
LETTERS[1] = new Letter('A', -1);
LETTERS[2] = new Letter('B', -1);
int count = countMatches(0, C.length(), 0, 0, -1);
System.out.println(count);
}
public static int countMatches(int start, int end, int letterIndex, int currentCount, int latestMatch) {
for (int i = start; i < end; i++) {
if (i < C.length()) {
Character c = Character.toLowerCase(C.charAt(i));
if (c.equals(LETTERS[letterIndex].getLowercaseLetter()) && i > latestMatch) {
LETTERS[letterIndex].setLatestMatch(i);
if (letterIndex + 1 < LETTERS.length) {
int childStart = LETTERS[letterIndex].getLatestMatch() + 1;
return countMatches(childStart, childStart + SYMBOLS_THRESHOLD, letterIndex + 1, currentCount, -1);
}
currentCount++;
}
}
}
if (letterIndex > 0) {
int parentLetterIndex = letterIndex - 1;
int latestParentMatch = LETTERS[parentLetterIndex].getLatestMatch();
if (letterIndex > 1) {
int parentStart = LETTERS[letterIndex - 2].getLatestMatch() + 1;
return countMatches(parentStart, parentStart + SYMBOLS_THRESHOLD, parentLetterIndex, currentCount, latestParentMatch);
} else {
return countMatches(LETTERS[parentLetterIndex].getLatestMatch() + 1, C.length(), 0, currentCount, latestParentMatch);
}
}
return currentCount;
}
}

Most efficient way to search for unknown patterns in a string?

I am trying to find patterns that:
occur more than once
are more than 1 character long
are not substrings of any other known pattern
without knowing any of the patterns that might occur.
For example:
The string "the boy fell by the bell" would return 'ell', 'the b', 'y '.
The string "the boy fell by the bell, the boy fell by the bell" would return 'the boy fell by the bell'.
Using double for-loops, it can be brute forced very inefficiently:
ArrayList<String> patternsList = new ArrayList<>();
int length = string.length();
for (int i = 0; i < length; i++) {
int limit = (length - i) / 2;
for (int j = limit; j >= 1; j--) {
int candidateEndIndex = i + j;
String candidate = string.substring(i, candidateEndIndex);
if(candidate.length() <= 1) {
continue;
}
if (string.substring(candidateEndIndex).contains(candidate)) {
boolean notASubpattern = true;
for (String pattern : patternsList) {
if (pattern.contains(candidate)) {
notASubpattern = false;
break;
}
}
if (notASubpattern) {
patternsList.add(candidate);
}
}
}
}
However, this is incredibly slow when searching large strings with tons of patterns.

You can build a suffix tree for your string in linear time:
https://en.wikipedia.org/wiki/Suffix_tree
The patterns you are looking for are the strings corresponding to internal nodes that have only leaf children.

You could use n-grams to find patterns in a string. It would take O(n) time to scan the string for n-grams. When you find a substring by using a n-gram, put it into a hash table with a count of how many times that substring was found in the string. When you're done searching for n-grams in the string, search the hash table for counts greater than 1 to find recurring patterns in the string.
For example, in the string "the boy fell by the bell, the boy fell by the bell" using a 6-gram will find the substring "the boy fell by the bell". A hash table entry with that substring will have a count of 2 because it occurred twice in the string. Varying the number of words in the n-gram will help you discover different patterns in the string.
Dictionary<string, int>dict = new Dictionary<string, int>();
int count = 0;
int ngramcount = 6;
string substring = "";
// Add entries to the hash table
while (count < str.length) {
// copy the words into the substring
int i = 0;
substring = "";
while (ngramcount > 0 && count < str.length) {
substring[i] = str[count];
if (str[i] == ' ')
ngramcount--;
i++;
count++;
}
ngramcount = 6;
substring.Trim(); // get rid of the last blank in the substring
// Update the dictionary (hash table) with the substring
if (dict.Contains(substring)) { // substring is already in hash table so increment the count
int hashCount = dict[substring];
hashCount++;
dict[substring] = hashCount;
}
else
dict[substring] = 1;
}
// Find the most commonly occurrring pattern in the string
// by searching the hash table for the greatest count.
int maxCount = 0;
string mostCommonPattern = "";
foreach (KeyValuePair<string, int> pair in dict) {
if (pair.Value > maxCount) {
maxCount = pair.Value;
mostCommonPattern = pair.Key;
}
}

I've written this just for fun. I hope I have understood the problem correctly, this is valid and fast enough; if not, please be easy on me :) I might optimize it a little more I guess, if someone finds it useful.
private static IEnumerable<string> getPatterns(string txt)
{
char[] arr = txt.ToArray();
BitArray ba = new BitArray(arr.Length);
for (int shingle = getMaxShingleSize(arr); shingle >= 2; shingle--)
{
char[] arr1 = new char[shingle];
int[] indexes = new int[shingle];
HashSet<int> hs = new HashSet<int>();
Dictionary<int, int[]> dic = new Dictionary<int, int[]>();
for (int i = 0, count = arr.Length - shingle; i <= count; i++)
{
for (int j = 0; j < shingle; j++)
{
int index = i + j;
arr1[j] = arr[index];
indexes[j] = index;
}
int h = getHashCode(arr1);
if (hs.Add(h))
{
int[] indexes1 = new int[indexes.Length];
Buffer.BlockCopy(indexes, 0, indexes1, 0, indexes.Length * sizeof(int));
dic.Add(h, indexes1);
}
else
{
bool exists = false;
foreach (int index in indexes)
if (ba.Get(index))
{
exists = true;
break;
}
if (!exists)
{
int[] indexes1 = dic[h];
if (indexes1 != null)
foreach (int index in indexes1)
if (ba.Get(index))
{
exists = true;
break;
}
}
if (!exists)
{
foreach (int index in indexes)
ba.Set(index, true);
int[] indexes1 = dic[h];
if (indexes1 != null)
foreach (int index in indexes1)
ba.Set(index, true);
dic[h] = null;
yield return new string(arr1);
}
}
}
}
}
private static int getMaxShingleSize(char[] arr)
{
for (int shingle = 2; shingle <= arr.Length / 2 + 1; shingle++)
{
char[] arr1 = new char[shingle];
HashSet<int> hs = new HashSet<int>();
bool noPattern = true;
for (int i = 0, count = arr.Length - shingle; i <= count; i++)
{
for (int j = 0; j < shingle; j++)
arr1[j] = arr[i + j];
int h = getHashCode(arr1);
if (!hs.Add(h))
{
noPattern = false;
break;
}
}
if (noPattern)
return shingle - 1;
}
return -1;
}
private static int getHashCode(char[] arr)
{
unchecked
{
int hash = (int)2166136261;
foreach (char c in arr)
hash = (hash * 16777619) ^ c.GetHashCode();
return hash;
}
}
Edit
My previous code has serious problems. This one is better:
private static IEnumerable<string> getPatterns(string txt)
{
Dictionary<int, int> dicIndexSize = new Dictionary<int, int>();
for (int shingle = 2, count0 = txt.Length / 2 + 1; shingle <= count0; shingle++)
{
Dictionary<string, int> dic = new Dictionary<string, int>();
bool patternExists = false;
for (int i = 0, count = txt.Length - shingle; i <= count; i++)
{
string sub = txt.Substring(i, shingle);
if (!dic.ContainsKey(sub))
dic.Add(sub, i);
else
{
patternExists = true;
int index0 = dic[sub];
if (index0 >= 0)
{
dicIndexSize[index0] = shingle;
dic[sub] = -1;
}
}
}
if (!patternExists)
break;
}
List<int> lst = dicIndexSize.Keys.ToList();
lst.Sort((a, b) => dicIndexSize[b].CompareTo(dicIndexSize[a]));
BitArray ba = new BitArray(txt.Length);
foreach (int i in lst)
{
bool ok = true;
int len = dicIndexSize[i];
for (int j = i, max = i + len; j < max; j++)
{
if (ok) ok = !ba.Get(j);
ba.Set(j, true);
}
if (ok)
yield return txt.Substring(i, len);
}
}
Text in this book took 3.4sec in my computer.

Suffix arrays are the right idea, but there's a non-trivial piece missing, namely, identifying what are known in the literature as "supermaximal repeats". Here's a GitHub repo with working code: https://github.com/eisenstatdavid/commonsub . Suffix array construction uses the SAIS library, vendored in as a submodule. The supermaximal repeats are found using a corrected version of the pseudocode from findsmaxr in Efficient repeat finding via suffix arrays
(Becher–Deymonnaz–Heiber).
static void FindRepeatedStrings(void) {
// findsmaxr from https://arxiv.org/pdf/1304.0528.pdf
printf("[");
bool needComma = false;
int up = -1;
for (int i = 1; i < Len; i++) {
if (LongCommPre[i - 1] < LongCommPre[i]) {
up = i;
continue;
}
if (LongCommPre[i - 1] == LongCommPre[i] || up < 0) continue;
for (int k = up - 1; k < i; k++) {
if (SufArr[k] == 0) continue;
unsigned char c = Buf[SufArr[k] - 1];
if (Set[c] == i) goto skip;
Set[c] = i;
}
if (needComma) {
printf("\n,");
}
printf("\"");
for (int j = 0; j < LongCommPre[up]; j++) {
unsigned char c = Buf[SufArr[up] + j];
if (iscntrl(c)) {
printf("\\u%.4x", c);
} else if (c == '\"' || c == '\\') {
printf("\\%c", c);
} else {
printf("%c", c);
}
}
printf("\"");
needComma = true;
skip:
up = -1;
}
printf("\n]\n");
}
Here's a sample output on the text of the first paragraph:
Davids-MBP:commonsub eisen$ ./repsub input
["\u000a"
," S"
," as "
," co"
," ide"
," in "
," li"
," n"
," p"
," the "
," us"
," ve"
," w"
,"\""
,"–"
,"("
,")"
,". "
,"0"
,"He"
,"Suffix array"
,"`"
,"a su"
,"at "
,"code"
,"com"
,"ct"
,"do"
,"e f"
,"ec"
,"ed "
,"ei"
,"ent"
,"ere's a "
,"find"
,"her"
,"https://"
,"ib"
,"ie"
,"ing "
,"ion "
,"is"
,"ith"
,"iv"
,"k"
,"mon"
,"na"
,"no"
,"nst"
,"ons"
,"or"
,"pdf"
,"ri"
,"s are "
,"se"
,"sing"
,"sub"
,"supermaximal repeats"
,"te"
,"ti"
,"tr"
,"ub "
,"uffix arrays"
,"via"
,"y, "
]

I would use Knuth–Morris–Pratt algorithm (linear time complexity O(n)) to find substrings. I would try to find the largest substring pattern, remove it from the input string and try to find the second largest and so on. I would do something like this:
string pattern = input.substring(0,lenght/2);
string toMatchString = input.substring(pattern.length, input.lenght - 1);
List<string> matches = new List<string>();
while(pattern.lenght > 0)
{
int index = KMP(pattern, toMatchString);
if(index > 0)
{
matches.Add(pattern);
// remove the matched pattern occurences from the input string
// I would do something like this:
// 0 to pattern.lenght gets removed
// check for all occurences of pattern in toMatchString and remove them
// get the remaing shrinked input, reassign values for pattern & toMatchString
// keep looking for the next largest substring
}
else
{
pattern = input.substring(0, pattern.lenght - 1);
toMatchString = input.substring(pattern.length, input.lenght - 1);
}
}
Where KMP implements Knuth–Morris–Pratt algorithm. You can find the Java implementations of it at Github or Princeton or write it yourself.
PS: I don't code in Java and it is quick try to my first bounty about to close soon. So please don't give me the stick if I missed something trivial or made a +/-1 error.

Algo: Find anagram of given string at a given index in lexicographically sorted order

Need to write an Algo to find Anagram of given string at a given index in lexicographically sorted order. For example:
Consider a String: ABC then all anagrams are in sorted order: ABC ACB
BAC BCA CAB CBA. So, for index 5 value is: CAB. Also, consider the case of duplicates like for AADFS anagram would be DFASA at index 32
To do this I have written Algo but I think there should be something less complex than this.
import java.util.*;
public class Anagram {
static class Word {
Character c;
int count;
Word(Character c, int count) {
this.c = c;
this.count = count;
}
}
public static void main(String[] args) {
System.out.println(findAnagram("aadfs", 32));
}
private static String findAnagram(String word, int index) {
// starting with 0 that's y.
index--;
char[] array = word.toCharArray();
List<Character> chars = new ArrayList<>();
for (int i = 0; i < array.length; i++) {
chars.add(array[i]);
}
// Sort List
Collections.sort(chars);
// To maintain duplicates
List<Word> words = new ArrayList<>();
Character temp = chars.get(0);
int count = 1;
int total = chars.size();
for (int i = 1; i < chars.size(); i++) {
if (temp == chars.get(i)) {
count++;
} else {
words.add(new Word(temp, count));
count = 1;
temp = chars.get(i);
}
}
words.add(new Word(temp, count));
String anagram = "";
while (index > 0) {
Word selectedWord = null;
// find best index
int value = 0;
for (int i = 0; i < words.size(); i++) {
int com = combination(words, i, total);
if (index < value + com) {
index -= value;
if (words.get(i).count == 1) {
selectedWord = words.remove(i);
} else {
words.get(i).count--;
selectedWord = words.get(i);
}
break;
}
value += com;
}
anagram += selectedWord.c;
total--;
}
// put remaining in series
for (int i = 0; i < words.size(); i++) {
for (int j = 0; j < words.get(i).count; j++) {
anagram += words.get(i).c;
}
}
return anagram;
}
private static int combination(List<Word> words, int index, int total) {
int value = permutation(total - 1);
for (int i = 0; i < words.size(); i++) {
if (i == index) {
int v = words.get(i).count - 1;
if (v > 0) {
value /= permutation(v);
}
} else {
value /= permutation(words.get(i).count);
}
}
return value;
}
private static int permutation(int i) {
if (i == 1) {
return 1;
}
return i * permutation(i - 1);
}
}
Can someone help me with less complex logic.

I write the following code to solve your problem.
I assume that the given String is sorted.
The permutations(String prefix, char[] word, ArrayList permutations_list) function generates all possible permutations of the given string without duplicates and store them in a list named permutations_list. Thus, the word: permutations_list.get(index -1) is the desired output.
For example, assume that someone gives us the word "aab".
We have to solve this problem recursively:
Problem 1: permutations("","aab").
That means that we have to solve the problem:
Problem 2: permutations("a","ab").
String "ab" has only two letters, therefore the possible permutations are "ab" and "ba". Hence, we store in permutations_list the words "aab" and "aba".
Problem 2 has been solved. Now we go back to problem 1.
We swap the first "a" and the second "a" and we realize that these letters are the same. So we skip this case(we avoid duplicates).
Next, we swap the first "a" and "b". Now, the problem 1 has changed and we want to solve the new one:
Problem 3: permutations("","baa").
The next step is to solve the following problem:
Problem 4: permutations("b","aa").
String "aa" has only two same letters, therefore there is one possible permutation "aa". Hence, we store in permutations_list the word "baa"
Problem 4 has been solved. Finally, we go back to problem 3 and problem 3 has been solved. The final permutations_list contains "aab", "aba" and "baa".
Hence, findAnagram("aab", 2) returns the word "aba".
import java.util.ArrayList;
import java.util.Arrays;
public class AnagramProblem {
public static void main(String args[]) {
System.out.println(findAnagram("aadfs",32));
}
public static String findAnagram(String word, int index) {
ArrayList<String> permutations_list = new ArrayList<String>();
permutations("",word.toCharArray(), permutations_list);
return permutations_list.get(index - 1);
}
public static void permutations(String prefix, char[] word, ArrayList<String> permutations_list) {
boolean duplicate = false;
if (word.length==2 && word[0]!=word[1]) {
String permutation1 = prefix + String.valueOf(word[0]) + String.valueOf(word[1]);
permutations_list.add(permutation1);
String permutation2 = prefix + String.valueOf(word[1]) + String.valueOf(word[0]);
permutations_list.add(permutation2);
return;
}
else if (word.length==2 && word[0]==word[1]) {
String permutation = prefix + String.valueOf(word[0]) + String.valueOf(word[1]);
permutations_list.add(permutation);
return;
}
for (int i=0; i < word.length; i++) {
if (!duplicate) {
permutations(prefix + word[0], new String(word).substring(1,word.length).toCharArray(), permutations_list);
}
if (i < word.length - 1) {
char temp = word[0];
word[0] = word[i+1];
word[i+1] = temp;
}
if (i < word.length - 1 && word[0]==word[i+1]) duplicate = true;
else duplicate = false;
}
}
}

I think your problem will become a lot simpler if you considerate generating the anagrams in alphabetical order, so you don't have to sort them afterwards.
The following code (from Generating all permutations of a given string) generates all permutations of a String. The order of these permutations are given by the initial order of the input String. If you sort the String beforehand, the anagrams will thus be added in sorted order.
to prevent duplicates, you can simply maintain a Set of Strings you have already added. If this Set does not contain the anagram you're about to add, then you can safely add it to the list of anagrams.
Here is the code for the solution i described. I hope you find it to be simpler than your solution.
public class Anagrams {
private List<String> sortedAnagrams;
private Set<String> handledStrings;
public static void main(String args[]) {
Anagrams anagrams = new Anagrams();
List<String> list = anagrams.permutations(sort("AASDF"));
System.out.println(list.get(31));
}
public List<String> permutations(String str) {
handledStrings = new HashSet<String>();
sortedAnagrams = new ArrayList<String>();
permutation("", str);
return sortedAnagrams;
}
private void permutation(String prefix, String str) {
int n = str.length();
if (n == 0){
if(! handledStrings.contains(prefix)){
//System.out.println(prefix);
sortedAnagrams.add(prefix);
handledStrings.add(prefix);
}
}
else {
for (int i = 0; i < n; i++)
permutation(prefix + str.charAt(i), str.substring(0, i) + str.substring(i + 1, n));
}
}
public static String sort(String str) {
char[] arr = str.toCharArray();
Arrays.sort(arr);
return new String(arr);
}
}

If you create a "next permutation" method which alters an array to its next lexicographical permutation, then your base logic could be to just invoke that method n-1 times in a loop.
There's a nice description with code that can be found here. Here's both the basic pseudocode and an example in Java adapted from that page.
/*
1. Find largest index i such that array[i − 1] < array[i].
(If no such i exists, then this is already the last permutation.)
2. Find largest index j such that j ≥ i and array[j] > array[i − 1].
3. Swap array[j] and array[i − 1].
4. Reverse the suffix starting at array[i].
*/
boolean nextPermutation(char[] array) {
int i = array.length - 1;
while (i > 0 && array[i - 1] >= array[i]) i--;
if (i <= 0) return false;
int j = array.length - 1;
while (array[j] <= array[i - 1]) j--;
char temp = array[i - 1];
array[i - 1] = array[j];
array[j] = temp;
j = array.length - 1;
while (i < j) {
temp = array[i];
array[i] = array[j];
array[j] = temp;
i++;
j--;
}
return true;
}

How to get the count of unmatched character in two strings?

I need to get the count of Unmatched character in two strings. for example
string 1 "hari", string 2 "malar"
Now i need to remove the duplicates from both string ['a' & 'r'] are common in both strings so remove that, now string 1 contain "hi" string 2 contain "mla".
Remaining count = 5
I tried this code, its working fine if duplicate / repeart is not available in same sting like here 'a' come twice in string 2 so my code is didn't work properly.
for (int i = 0; i < first.length; i++) {
for (int j = 0; j < second.length; j++) {
if(first[i] == second[j])
{
getstrings = new ArrayList<String>();
count=count+1;
Log.d("Matches", "string char that matched "+ first[i] +"==" + second[j]);
}
}
}
int tot=(first.length + second.length) - count;
here first & second refers to
char[] first = nameone.toCharArray();
char[] second = nametwo.toCharArray();
this code is working fine for String 1 "sri" string 2 "hari" here in a string character didn't repeat so this above code is working fine. Help me to solve this ?

Here is my solution,
public static void RemoveMatchedCharsInnStrings(String first,String second)
{
for(int i = 0 ;i < first.length() ; i ++)
{
char c = first.charAt(i);
if(second.indexOf(c)!= -1)
{
first = first.replaceAll(""+c, "");
second = second.replaceAll(""+c, "");
}
}
System.out.println(first);
System.out.println(second);
System.out.println(first.length() + second.length());
}
Hope it is what you need. if not i'll update my answer

I saw the other answers and thought: There must be a more declarative and composable way of doing this!
There is, but it's far longer...
public static void main(String[] args) {
String first = "hari";
String second = "malar";
Map<Character, Integer> differences = absoluteDifference(characterCountOf(first), characterCountOf(second));
System.out.println(sumOfCounts(differences));
}
public static Map<Character, Integer> characterCountOf(String text) {
Map<Character, Integer> result = new HashMap<Character, Integer>();
for (int i=0; i < text.length(); i++) {
Character c = text.charAt(i);
result.put(c, result.containsKey(c) ? result.get(c) + 1 : 1);
}
return result;
}
public static <K> Set<K> commonKeys(Map<K, ?> first, Map<K, ?> second) {
Set<K> result = new HashSet<K>(first.keySet());
result.addAll(second.keySet());
return result;
}
public static <K> Map<K, Integer> absoluteDifference(Map<K, Integer> first, Map<K, Integer> second) {
Map<K, Integer> result = new HashMap<K, Integer>();
for (K key: commonKeys(first, second)) {
Integer firstCount = first.containsKey(key) ? first.get(key) : 0;
Integer secondCount = second.containsKey(key) ? second.get(key) : 0;
Integer resultCount = Math.max(firstCount, secondCount) - Math.min(firstCount, secondCount);
if (resultCount > 0) result.put(key, resultCount);
}
return result;
}
public static Integer sumOfCounts(Map<?, Integer> map) {
Integer sum = 0;
for (Integer count: map.values()) {
sum += count;
}
return sum;
}
This is the solution I prefer - but it's lot longer. You've tagged the question with Android, so I didn't use any Java 8 features, which would reduce it a bit (but not as much as I would have hoped for).
However it produces meaningful intermediate results. But it's still so much longer :-(

Try out this code:
String first = "hari";
String second = malar;
String tempFirst = "";
String tempSecond = "";
int maxSize = ((first.length() > second.length()) ? (first.length()) : (second.length()));
for (int i = 0; i < maxSize; i++) {
if (i >= second.length()) {
tempFirst += first.charAt(i);
} else if (i >= first.length()) {
tempSecond += second.charAt(i);
} else if (first.charAt(i) != second.charAt(i)) {
tempFirst += first.charAt(i);
tempSecond += second.charAt(i);
}
}
first = tempFirst;
second = tempSecond;

you need to break; as soon as the match is found:
public static void main(String[] args) {
String nameone="hari";
String nametwo="malar";
char[] first = nameone.toCharArray();
char[] second = nametwo.toCharArray();
List<String>getstrings=null;
int count=0;
for (int i = 0; i < first.length; i++) {
for (int j = 0; j < second.length; j++) {
if(first[i] == second[j])
{
getstrings = new ArrayList<String>();
count++;
System.out.println("Matches"+ "string char that matched "+ first[i] +"==" + second[j]);
break;
}
}
}
//System.out.println(count);
int tot=(first.length-count )+ (second.length - count);
System.out.println("Remaining after match from both strings:"+tot);
}
prints:
Remaining after match from both strings:5

Two things you are missing here.
In the if condition, when the two characters matches, you need to increment count by 2, not one as you are eliminating from both strings.
You need to put a break in the in condition as you are always matching for the first occurrence of the character.
Made those two changes in your code as below, and now it prints the result as you expected.
for (int i = 0; i < first.length; i++) {
for (int j = 0; j < second.length; j++) {
if(first[i] == second[j])
{
count=count+2;
break;
}
}
}
int tot=(first.length + second.length) - count;
System.out.println("Result = "+tot);

You just need to loop over two strings if characters are matched increment the count and just remove those count from total len of two characters
s = 'hackerhappy'\
t = 'hackerrank'\
count = 0
for i in range(len(s)):
for j in range(len(t)):
if s[i] == t[j]:
count += 2
break
char_unmatched = (len(s)+len(t)) - count
char_unmatched contains the count of number of characters from both the strings that are not equal

count directly repeated substring occurence

I am trying to count the number of directly repeatings of a substring in a string.
String s = "abcabcdabc";
String t = "abc";
int count = 2;
EDIT:
because some people are asking, i try to clarify this: there are 3 times t in s but i need the number of times t is repeated without any other character. that would result in 2, because the d in my example is not the starting character of t. ('d' != 'a').
Another example to clarify this:
String s = "fooabcabcdabc";
String t = "abc";
int count = 0;
I know how to count the number of occurrences in the string, i need it to be repeating from left to right without interruption!
Here is what i have so far, but i think i made a simple mistake in it...
public static int countRepeat(String s, String t){
if(s.length() == 0 || t.length() == 0){
return 0;
}
int count = 0;
if(t.length() == 1){
System.out.println(s+" | " + t);
for (int i = 0; i < s.length(); i++) {
if (s.charAt(i) != t.charAt(0)){
return count;
}
count++;
}
}else{
System.out.println(s+" | " + t);
for (int i = 0; i < s.length(); i++) {
int tchar = (i- (count*(t.length()-1)));
System.out.println(i+ " | " + tchar);
if (s.charAt(i) != t.charAt(tchar)){
return count;
}
if(tchar >= t.length()-1){
count++;
}
}
}
return count;
}
what am i doing wrong? And is there a better/faster way to do this?

There exists a str.indexOf(substring,index) method in the String API.
In pseudocode this would mean something like this:
declare integer variable as index
declare integer variable as count
while index <= (length of string - length of substring)
index = indexOf substring from index
if index >= 0
increment count
end if
end while

Using indexOf() makes the code much easier:
public static int startingRepeats(final String haystack, final String needle)
{
String s = haystack;
final int len = needle.length();
// Special case...
if (len == 0)
return 0;
int count = 0;
while (s.startsWith(needle)) {
count++;
s = s.subString(len);
}
return count;
}

This version does not allocate new objects (substrings, etc) and just look for the characters where they are supposed to be.
public static void main(String[] args) {
System.out.println(countRepeat("abcabcabc", "abc")); // 3
System.out.println(countRepeat("abcabcdabc", "abc")); // 2
System.out.println(countRepeat("abcabcabcx", "abc")); // 3
System.out.println(countRepeat("xabcabcabc", "abc")); // 0
}
public static int countRepeat(String s, String t){
int n = 0; // Ocurrences
for (int i = 0; i < s.length(); i ++) { // i is index in s
int j = i % t.length(); // corresponding index in t
boolean last = j == t.length() - 1; // this should be the last char in t
if (s.charAt(i) == t.charAt(j)) { // Matches?
if (last) { // Matches and it is the last
n++;
}
} else { // Do not match. finished!
break;
}
}
return n;
}

Here is the another calculator:
String s = "abcabcdabc";
String t = "abc";
int index = 0;
int count = 0;
while ((index = s.indexOf(t, index)) != -1) {
index += t.length();
count++;
}
System.out.println("count = " + count);

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Efficient way to find longest streak of characters in string - java

The loop could be rewritten a bit smaller, but mainly the condition can be optimized: i < str.length() - max

Related

How to count all the matching sub-string using regex?

Most efficient way to search for unknown patterns in a string?

Algo: Find anagram of given string at a given index in lexicographically sorted order

How to get the count of unmatched character in two strings?

count directly repeated substring occurence

Categories

Resources