I have an app that assists studying for Scrabble. Most searches are much faster than by desktop version in C#, except the Word Builder. This search shows all the words that can be formed from a given set of letters A-Z, or blanks.
What can I do to get it to run faster?
I've considered using a Trie, but haven't found a way to support the use of blanks.
I am using a SimpleCursorAdapter to populate the ListView, which is why I am returning a cursor.
public Cursor getCursor_subanagrams(String term, String filters, String ordering) {
if (term.trim() == "")
return null;
// only difference between this and anagram is changing the length filter
char[] a = term.toCharArray(); // anagram
int[] first = new int[26]; // letter count of anagram
int c; // array position
int blankcount = 0;
// initialize word to anagram
for (c = 0; c < a.length; c++) {
if (a[c] == '?') {
blankcount++;
continue;
}
first[a[c] - 'A']++;
}
// gets pool of words to search through
String lenFilter = String.format("Length(Word) <= %1$s AND Length(Word) <= %2$s", LexData.getMaxLength(), term.length());
Cursor cursor = database.rawQuery("SELECT WordID as _id, Word, WordID, FrontHooks, BackHooks, " +
"InnerFront, InnerBack, Anagrams, ProbFactor, OPlayFactor, Score \n" +
"FROM `" + LexData.getLexName() + "` \n" +
"WHERE (" + lenFilter +
filters +
" ) " + ordering, null);
// creates new cursor to add valid words to
MatrixCursor matrixCursor = new MatrixCursor(new String[]{"_id", "Word", "WordID", "FrontHooks", "BackHooks", "InnerFront", "InnerBack",
"Anagrams", "ProbFactor", "OPlayFactor", "Score"});
// THIS NEEDS TO BE FASTER
while (cursor.moveToNext()) {
String word = cursor.getString(1);
char[] b = word.toCharArray();
if (isAnagram(first, b, blankcount)) {
matrixCursor.addRow(get_CursorRow(cursor));
}
}
cursor.close();
return matrixCursor;
}
private boolean isAnagram(int[] anagram, char[] word, int blankcount) {
int matchcount = blankcount;
int c; // each letter
int[] second = {0,0,0,0,0, 0,0,0,0,0, 0,0,0,0,0, 0,0,0,0,0, 0,0,0,0,0, 0};
for (c = 0; c < word.length; c++)
second[word[c] - 'A']++;
for (c = 0; c < 26; c++)
{
matchcount += (anagram[c]<second[c]) ? anagram[c]:second[c];
}
if (matchcount == word.length)
return true;
return false;
}
Focus on speeding up the most typical case, which is where the word is not a (sub)anagram, and you return false. If you can identify as quickly as possible when it is not possible to make word out of anagram then you can avoid the expensive test.
One way to do this is using a bitmask of the letters in the words. You don't need to store letter counts, because if the number of unique letters in word that are not in anagram is greater than the number of blanks, then there is no way you can make it and you can quickly return false. If not then you can proceed to the more expensive test taking letter counts into account.
You can precompute the bitmasks like this:
private int letterMask(char[] word)
{
int c, mask = 0;
for (c = 0; c < word.length; c++)
mask |= (1 << (word[c] - 'A'));
return mask;
}
Add an extra column to your database to store the letter bitmask for each word, add it to your cursor, and compute the letter bitmask for the letters in term and store in termMask. Then inside your cursor loop you can do a test like this:
// compute mask of bits in mask that are not in term:
int missingLettersMask = cursor.getInt(8) & ~termMask;
if(missingLettersMask != 0)
{
// check if we could possibly make up for these letters using blanks:
int remainingBlanks = blankcount;
while((remainingBlanks-- > 0) && (missingLettersMask != 0))
missingLettersMask &= (missingLettersMask - 1); // remove one bit
if(missingLettersMask != 0)
continue; // move onto the next word
}
// word can potentially be made from anagram, call isAnagram:
There are ways to speed up your anagram checking function. Samgak has pointed out one. Another obvious optimisation is to return false if the word is longer than the number of available letters plus blanks. In the end, these are all micro-optimisations and you will end up checking your whole dictionary.
You said you have considered using a trie. That's a good solution, in my opinion, because the structure of the trie will only make you check relevant words. Build it like this:
Sort the letters of each word, so that "triangle" and "integral" will both become "aegilnrt".
Insert the sorted word into the trie.
Where you would place the end-marker in a normal trie, you place a list of possible words.
If you were looking for exact anagrams, you would sort the word to check, traverse the trie and print out the list of possible anagrams at the end. But here, you have to deal with partial anagrams and with blanks:
Regular traversal means you take the next letter of the word and then descent the corresponding link in the tree,if it exists.
Partial anagrams can be found by ignoring the next letter without descending in the trie.
Blanks can be dealt with by descending all possible branches of a trie and decreasing the number of blanks.
When you have blanks, you will end up with duplicates. For example, if you have the letters A, B and C and a blank tile, you can make the word CAB, but you can get there on four different ways: CAB, _AB, C_B, CA_.
You could get around this by storing the result list in a data structure that eliminates duplicates, such as a set or ordered set, but you would still go down the same paths several times in order to create the duplicates.
A better solution is to keep track of which trie nodes you have visited with which parameters, i.e. with with remaining unused letters and blanks. You can then cut such paths short. Here's an implementation in pseudocode:
function find_r(t, str, blanks, visited)
{
// don't revisit explored paths
key = make_key(t, str, blanks);
if (key in visited) return [];
visited ~= key;
if (str.length == 0 and blanks == 0) {
// all resources have been used: return list of anagrams
return t.word;
} else {
res = [];
c = 0;
if (str.length > 0) {
c = str[0];
// regular traversal: use current letter and descend
if (c in t.next) {
res ~= find_r(t.next[c], str[1:], blanks, visited);
}
# partial anagrams: skip current letter and don't descend
l = 1
while (l < str.length and str[l] == c) l++;
res ~= find_r(t, str[l:], blanks, visited);
}
if (blanks > 0) {
// blanks: decrease blanks and descend
for (i in t.next) {
if (i < c) {
res ~= find_r(t.next[i], str, blanks - 1, visited);
}
}
}
return res;
}
}
(Here, ~ denotes list concatenation or set insertion; [beg=0:end=length] denotes string slices; in tests whether a dictionary or set contains a key.)
Once you've built the tree, this solution is fast when there are no blanks, but it gets exponentially worse with each blank and with larger letter pools. Testing with one blank is still reasonably fast, but with two blanks, it is on par with your existing solution.
Now there are at most two blanks in a Scrabble game and the rack can hold only up to seven tiles, so it may not be as bad in practice. Another question is whether the search should consider words obtained with two blanks. The results list will be very long and it will contain all two-letter words. The player may be more interested in high-scoring words that can be played with a single blank.
Related
This is a very common problem in which we would have to find the longest substring which is also a palindrome substring for the given input string.
Now there are multiple possible approaches to this and I am aware about Dynamic programming solution, expand from middle etc. All these solutions should be used for any practical usecase.
I was experimenting with using recursion to solve this problem and trying to implement the simple idea.
Let us assume that s is the given input string and i and j represent any valid character indexes of input string. So if s[i] == s[j], my longest substring would be:
s.charAt(i) + longestSubstring(s, i + 1, j - 1) + s.charAt(j)
And if these two characters are not equal then:
max of longestSubstring(s, i + 1, j) or longestSubstring(s, i, j - 1)
I tried to implement this solution below:
// end is inclusive
private static String longestPalindromeHelper(String s, int start, int end) {
if (start > end) {
return "";
} else if (start == end) {
return s.substring(start, end + 1);
}
// if the character at start is equal to end
if (s.charAt(start) == s.charAt(end)) {
// I can concatenate the start and end characters to my result string
// plus I can concatenate the longest palindrome in start + 1 to end - 1
// now logically this makes sense to me, but this would fail in the case
// for ex: a a c a b d k a c a a (space added for visualization)
// when start = 3 (a character)
// end = 7 (again end character)
// it will go in recursion with start = 4 and end = 6 from now onwards
// there is no palindrome substrings apart from the single character
// substring (which are palindrome by itself) so recursion tree for
// start = 3 and end = 7 would return any single character from b d k
// let's say it returns b so result would be a a c a b a c a a
// this would be correct answer for longest palindrome subsequence but
// not substring because for sub strings I need to have consecutive
// characters
return s.charAt(start)
+ longestPalindromeHelper(s, start + 1, end - 1) + s.charAt(end);
} else {
// characters are not equal, increment start
String s1 = longestPalindromeHelper(s, start + 1, end);
String s2 = longestPalindromeHelper(s, start, end - 1);
return s1.length() > s2.length() ? s1 : s2;
}
}
public static String longestPalindrome(String s) {
return longestPalindromeHelper(s, 0, s.length() - 1);
}
public static void main(String[] args) throws Exception {
String ans = longestPalindrome("aacabdkacaa");
System.out.println("Answer => " + ans);
}
For a moment let us forgot about time complexity or runtime. I am focused towards making it work for simple case above.
As you can see in the comments I got the idea why this is failing but I tried hard to rectify the problem following the exactly same approach. I don't want to use loops here.
What could be the possible fix for this following same approach?
Note: I am interested in the actual string as answer and not the length. FYI I had a look at all the other questions and it seems no one is following this approach for correctness so I am trying.
Once you have a call wherein s[i] == s[j], you could flip a boolean flag or switch to a modified method that communicates to child calls that they can no longer use the "don't match, try i + 1 and j - 1" branch (else condition). This ensures you're looking at substrings, not subsequences, for the remainder of the recursion.
Secondly, for the substring variant, even if s[i] == s[j], you should also try i + 1 and j - 1 as if these characters didn't match, because one or both of these characters might not be part of the final best substring between i and j. In the subsequence version, there's never any reason not to add any matching characters to the current palindromic subsequence for the range i to j, but that's not always the case with substrings.
For example, given input "aabcbda" and we're at a call frame where i = 1 and j = length - 1, we need to maximize over three possibilities:
The best substring includes both 'a' characters. Call the subroutine with the flag that says we have to consume from both ends on down and can no longer try skipping characters.
The best substring might still include s[i] but not s[j], try j - 1.
The best substring might still include s[j] but not s[i], try i + 1.
Another observation: it might make more sense to pass best indices up the helper call chain, then grab the longest palindromic substring based on these indices at the very end in the wrapper function.
On a similar note, if you're struggling, you might simplify the problem and return the longest palindromic substring length using your recursive method, then switch to getting the actual substring itself. This makes it easier to focus on the subsequence logic without the return value complicating things as much.
It is much easier to use loops here, rather than recursion, something like this:
public static void main(String[] args) {
System.out.println(longestPalindrome("abbqa")); // bb
System.out.println(longestPalindrome("aacabdkacaa")); // aca
System.out.println(longestPalindrome("aacabdkaccaa")); // acca
}
public static String longestPalindrome(String str) {
String palindrome = "";
for (int i = 0; i < str.length(); i++) {
for (int j = i; j < str.length(); j++) {
String substring = str.substring(i, j);
if (isPalindrome(substring)
&& substring.length() > palindrome.length()) {
palindrome = substring;
}
}
}
return palindrome;
}
public static boolean isPalindrome(String str) {
for (int i = 0; i < str.length() / 2; i++) {
if (str.charAt(i) != str.charAt(str.length() - i - 1)) {
return false;
}
}
return true;
}
I implemented KMP pattern searching algorithm in my program to search through a table of objects. I was wondering how I could evaluate the time efficiency of my function.
I read some sources and said the time complexity of KMP algorithm was O(n) [excluding the space complexity]
So I will iterate through a list of objects from 1 to N item, searching each item for a pattern match. Once there is a match, I will break out of the loop but that doesn't really affect my evaluation (I think).
So since I iterate all my items, is my Big O notation : O(n^2) since it takes O(n) to find a match and O(n) to iterate through all my items.
Here is my code for some insight:
while(itr.hasNext()){
//M is the length of our pattern
i = 0; //index of text to be searched
j = 0; //index of our pattern
txt = itr.next().toString(); //Store the key inside txt string
N = txt.length(); //length of our text to be search
//Check if the searchText is equal or less than key in the dictionary
//If our searchText is more than the key length, there is no use of searching
if(M <= N){
while (i < N) {
//Check if the searchText.charAt equals to txt.charAt
//Increase i,j if matches to compare next character(s)
if (searchText.charAt(j) == txt.charAt(i)) {
j++;
i++;
}else{ //If the chars at our pattern and text does not match
if (j != 0) //if it's not the first index of our pattern
j--; //reduce one index
else
i++; //otherwise move onto the next index of our text to be searched
}
//Check whether the length of the searchText equals to the match counter
//It means that the searchKey exists in our dictionary
if (j == M) {
System.out.println((String.format("%-35s", txt)) + get((K)txt));
counter++; //Holds the number of entries found
j--;
break; //No need to look anymore since there's a match
}
}
}
}
If you have n number of items in your list & on an average each element size is m, then search time complexity would be O(mn). Search time complexity of KMP would be O(m) for each element. You need to multiply it by n.
So I'm working on some Java exercises, and one that has caught my attention recently is trying to produce all permutations of a String using iteration. There are plenty of examples online - however, a lot of them seem very complex and I'm not able to follow.
I have tried using my own method, which when tested with a string of length 3 it works fine. The method is to (for each letter) keep moving a letter along the string, swapping it with whatever letter is in front of it. E.g.
index: 012
string: abc
(iteration 1) swap 'a' (index 0) with letter after it 'b' (index 0+1) : bac
(iteration 2) swap 'a' (index 1) with letter after it 'c' (index 1+1) : bca
(iteration 3) swap 'a' (index 2) with letter after it 'b' (index 0) : acb
current permutations: abc (original), bac, bca, acb
(iteration 3) swap 'b' (index 1) with letter after it 'c' (index 1+1) : acb
(iteration 4) swap 'b' (index 2) with letter after it 'a' (index 0) : bca
(iteration 5) swap 'b' (index 0) with letter after it 'c' (index 1) : cba
current permutations: abc (original), bac, bca, acb, acb, cba
...
This is how I implemented it in Java:
String str = "abc"; // string to permute
char[] letters = str.toCharArray(); // split string into char array
int setLength = factorial(letters.length); // amount of permutations = n!
HashSet<String> permutations = new HashSet<String>(); // store permutations in Set to avoid duplicates
permutations.add(str); // add original string to set
// algorithm as described above
for (int i = 0; i < setLength; i++) {
for (int j = 0; j < letters.length; j++) {
int k;
if (j == letters.length - 1) {
k = 0;
} else {
k = j + 1;
}
letters = swap(letters, j, k);
String perm = new String(letters);
permutations.add(perm);
}
}
The problem is if I input a string of length 4, I only end up with 12 permutations (4x3) - if I input a string of length 5, I only end up with 20 permutations (5x4).
Is there a simple modification I could make to this algorithm to get every possible permutation? Or does this particular method only work for strings of length 3?
Appreciate any feedback!
Suppose the input is "abcd". This is how your algorithm will work
bacd
bacd
bcad
bcda
If you observe carefully, "a" was getting positioned at all indexes and the following consecutive letter was getting replaced with "a". However, after your algorithm has produced "bacd" - it should be followed by "badc" also, which will be missing from your output.
For string of length 4, When you calculated the number of permutations as factorial, you understand that the first position can be occupied by 4 characters, followed by 3, 2 and 1. However, in your case when the first two positions are occupied by "ba" there are two possibilities for 3rd position, i.e. c and d. While your algorithm correctly finds "cd", it fails to find "dc" - because, the loop does not break the problem into further subproblems, i.e. "cd" has two permutations, respectively "cd" and "dc".
Thus, the difference in count of your permutations and actual answer will increase as the length of string increases.
To easily break problems into sub-problem and solve it, many algorithm uses recursion.
However, you could look into Generate list of all possible permutations of a string for good iterative answers.
Also, as the length of string grows, calculating number of permutation is not advisable.
While I do not know of a way to expand upon your current method of switching places (I've attempted this before to no luck), I do know of a fairly straightforward method of going about it
//simple method to set up permutate
private static void permutations(String s)
{
permutate(s, "");
}
//takes the string of chars to swap around (s) and the base of the string to add to
private static void permutate(String s, String base)
{
//nothing left to swap, just print out
if(s.length() <= 1)
System.out.println(base + s);
else
//loop through the string of chars to flip around
for(int i = 0; i < s.length(); i++)
//call with a smaller string of chars to flip (not including selected char), add selected char to base
permutate(s.substring(0, i) + s.substring(i + 1), base + s.charAt(i));
}
The goal with this recursion is to delegate as much processing as possible to something else, breaking the problem down bit by bit. It's easy to break down this problem by choosing a char to be first, then telling a function to figure out the rest. This can then be done for each char until they've all been chosen once
My book provides the following code for a function that computes all the permutations of a string of unique characters (see code below), and says that the running time is O(n!), "since there are n! permutations."
I don't understand how they've computed the running time as O(n!). I assume they mean "n" is the length of the original string. I think that the running time should be something like O((n + 1)XY), since the getPerms function will be called (n + 1) times, and X and Y can represent the running times of the outer and inner for loops respectively. Can someone explain to me why this is wrong / the book's answer is right?
Thanks.
public static ArrayList<String> getPerms(String str)
{
if (str == null)
return null;
ArrayList<String> permutations = new ArrayList<String>();
if (str.length() == 0)
permutations.add("");
return permutations;
char first = str.charAt(0); //first character of string
String remainder = str.substring(1); //remove first character
ArrayList<String> words = getPerms(remainder);
for (String word: words)
{
for (i = 0; i <= word.length(); i++)
{
String s = insertCharAt(word, first, i);
permutations.add(s)
}
}
return permutations;
}
public static String insertCharAt(String word, char c, int j)
{
String start = word.substring(0, i);
String end = word.substring(i);
return start + c + end;
}
Source: Cracking the Coding Interview
From our intuition, it is clear that there is no existing algorithm that generate permutation of N items that perform better than O(n!) because there are n! possibility.
You can reduce the recursive code into recurrence equation because gePerm(n) where n is a string with n length will call getPerm(n-1). Then, we use all the value returns by it and put a inner loop that loop N times. So we have
Pn = nPn-1
P1 = 1
It is easy to see that Pn = n! by telescoping the equation.
If you have hard times visualize how we come up with this equation, you can also think of this way
ArrayList<String> words = getPerms(remainder);
for (String word: words) // P(n-1)
{
for (i = 0; i <= word.length(); i++) // nP(n-1)
{
String s = insertCharAt(word, first, i);
permutations.add(s)
}
}
The count of permutations of N elements is N * (N - 1) * (N - 2) * ... * 2 * 1, i.e. N!.
First character can be any one of N characters. Next character can be one of remained N - 1 characters. Now we have N * (N - 1) possible cases already.
So, continuing we'll have N * (N - 1) * (N - 2) * ... cases at each step.
Cause the count of permutations of N elements is N!, then there isn't an implementation that can permutate an array of length N faster than N!.
I am in a beginner Java class and I haven't gotten the chance to learn how to avoid duplicated values when storing values inside arrays.
String[] newAlphabet = new String[26];
for(int I = 0; I < newAlphabet.length; I++){
int random = (65 + (int)(Math.random() * ((90 - 65) + 1));
char ascii = (char)random;
String letters = ascii + "";
if(letters != newAlphabet[0] && letters != newAlphabet[1] ... so on and so on until
newAlphabet[25])
newAlphabet[I] = letters;
}//end
So this is my pseudo code for part of my program and the point of it is to avoid having duplicated letters inside the array.
The problem that I am having is inside the if statement. Instead of typing letters != newAlphabet[] to 25, is there another way of doing it?
I have seen some of the forums in stackedoverflow that I should use HashSet but I have not learned that? I can ask my teacher if I am allowed but is there another way to avoid this problem?
I have been thinking of using for-each loop to search through all the elements in the array but I haven't thought out the plan long enough if it's valid.
As you are talking about a beginner Java class, I am assuming you are fairly new to programming. So, rather than just give you a library function that will do it for you, let's walk through the steps of how to do this with just the basic code so you can get a better idea of what is going on behind the scenes.
Firstly, for any repetitive action, think loops. You want to check, for each letter in your new alphabet, if the one you are about to add matches it. So...
boolean exists = false; //indicates whether we have found a match
for (int j = 0; j < 26; j++) { //for each letter in the new alphabet
//true if this one, or a previous one is a match
exists = exists || letters == newAlphabet[i];
}
//if we don't have a match, add the new letter
if (!exists) newAlphabet[I] = letters;
Now, as you are building up your new alphabet as we go, we don't have a full 26 letters for most cases of running this code, so only check the parts of the new alphabet we have defined:
boolean exists = false;
for (int j = 0; j < I; j++) { //note in this line we stop before the insertion point
exists = exists || letters == newAlphabet[i];
}
if (!exists) newAlphabet[I] = letters;
Finally, we don't need to keep checking if we have already found a match, so we can change the loop to stop when we have found a match:
boolean exists = false;
int j = 0;
while (!exists && j < I) { //we now also stop if we have already found a match
exists = letters == newAlphabet[i];
//as we are stopping at the first match,
//we no longer need to allow for previous matches
}
if (!exists) newAlphabet[I] = letters;
You could use the asList method:
if( Arrays.asList(newAlphabet).contains(letters) ) {
newAlphabet[I] = letters;
}
It's not the most efficient, but since your array is only 26 elements long, I would favor clarity over efficiency.
Some explanation: asList is a static method on the Arrays class. This just means that we don't have to create an Arrays object to call it. We simply say Arrays.asList() and pass it the arguments. The asList method takes an array (newAlhpabet in this case) as a parameter, and builds a java.util.List out of it. This means that we can call List methods on the return value. contains() is a method on List that returns true if the List contains an element that is equal to the parameter (letters in this case).
Based on this line it looks like all you're trying to do is produce the letters A to Z in some other order:
int random = (65 + (int)(Math.random() * ((90 - 65) + 1));
If I'm understanding that right, then really all you're trying to do is shuffle the alphabet:
// Initialize new alphabet array
String originalAlphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
char[] newAlphabet = originalAlphabet.toCharArray();
// Shuffle the new alphabet by swapping each character to a random position
for (int i=0; i<26; i++) {
int j = (int)(Math.random() * 26);
char temp = newAlphabet[i];
newAlphabet[i] = newAlphabet[j];
newAlphabet[j] = temp;
}
// Print the new alphabet
for (int i=0; i<26; i++) {
System.out.print(newAlphabet[i]);
}
System.out.println();
Here's a sample output: VYMTBIPWHKZNGUCDLRAQFSOEJX
You have a couple options.
Loop through the array and do basically what you're doing now.
Insert the characters in sorted order so you can perform binary search to determine if a letter is already in the list. As a bonus, if you use option 2, you'll already know the insertion point.
Check out Arrays.binarySearch(): http://docs.oracle.com/javase/7/docs/api/java/util/Arrays.html
You could use this :
if(Arrays.binarySearch(newAlphabet, letters) < 0){
newAlphabet[I] = letters;
}
You should either include a while loop to make sure each index of the array is filled before moving to the next or you could make use of the return value of Arrays.binarySearch which is (-(insertion index) - 1) to fill the array and exit when the array is filled up.