Hashmap with a boolean or hashset - java

If the value of hashmap is a Boolean then is it worth using a Hashset ? My question would be confusing but its not easy to frame up right words to ask either. The code below explains my problem. The code solves the following question Given some cubes, can cubes be arranged such that an input word can be formed from top view of cubes? eg: assume an imaginary cube with only 3 surfaces where cube1: {a, b, c} and cube2 {m, n, o}, then word an can be formed but word ap cannot be formed. I have two approaches for this question, use a HashMap<String, boolean> or use a Hashset. Advantage of Hashmap is that it does not cause a lot of rehashing. advantage of hashset is code looks (atleast to me) as smaller and cleaner. What is the right solution / industry wide followed practice in such cases"
OPTION 1: char[][] m is the set of cubes where rows are cubes and columns are surfaces
public static boolean checkWord(char[][] m, String word) {
final Map<Character, Boolean> charAvailable = new HashMap<Character, Boolean>();
char[] chWords = word.toCharArray();
for (char ch : chWords) {
charAvailable.put(ch, true);
}
return findWordExists(m, charAvailable, 0);
}
private static boolean findWordExists (char[][] m, Map<Character, Boolean> charAvailable, int cubeNumber) {
if (cubeNumber == m.length) {
Collection<Boolean> booleanValues = charAvailable.values();
for (boolean available : booleanValues) {
if (available) return false;
}
return true;
}
for (int i = 0; i < m[cubeNumber].length; i++) {
if (charAvailable.get(m[cubeNumber][i]) == Boolean.TRUE) {
charAvailable.put(m[cubeNumber][i], false);
if (findWordExists(m, charAvailable, cubeNumber + 1)) {
return true;
}
charAvailable.put(m[cubeNumber][i], true);
}
}
return false;
}
OPTION 2: char[][] m is the set of cubes where rows are cubes and columns are surfaces
public static boolean checkWord(char[][] m, String word) {
final Set<Character> charAvailable = new HashSet<Character>();
char[] chWords = word.toCharArray();
for (char ch : chWords) {
System.out.println(" adding: " + ch);
charAvailable.add(ch);
}
return findWordExists(m, charAvailable, 0);
}
private static boolean findWordExists (char[][] m, Set<Character> charAvailable, int cubeNumber) {
if (cubeNumber == m.length) {
return charAvailable.isEmpty();
}
for (int i = 0; i < m[cubeNumber].length; i++) {
if (charAvailable.contains(m[cubeNumber][i])) {
charAvailable.remove(m[cubeNumber][i]);
if (findWordExists(m, charAvailable, cubeNumber + 1)) {
return true;
}
charAvailable.add(m[cubeNumber][i]);
}
}
return false;
}

The HashSet will be more memory-efficient and time-efficient, but leaves some ambiguity depending on the application.
Consider the scenario where a program processes many custom objects of some type User, and records their answer to a "yes or no" question. There are 3 possible states a User could be in during this processing:
User says "yes"
User says "no"
User has not been processed yet
Using a HashSet alone (i.e. without an additional data structure), depending on the situation it may be ambiguous as to whether or not a User not found in the HashSet has replied "no", or simply not been processed yet. A HashMap, although being less efficient because it must store multiple Boolean instances, would allow you to differentiate between the 3 cases listed above.
Note that in practicality, there are many cases where the 3rd case can be eliminated (e.g. by iterating through every User instance so you can assume each processed User is only encountered once), and the HashSet would be the appropriate choice.

The solution that uses Set will probably be more readable, and easier to maintain - for example you avoid a whole class of problems when you modify your code like "what happens if I put a false value in the map - will it break my code?".
As a side note, Java's HashSet is quite inefficient memory-wise and actually uses a HashMap under the covers. Still, in most cases it is code readability and maintainability that matters most and not implementation details like this.

Related

Find top k frequent words in real time data stream

I am trying to solve an algorithms problem using java tree set.
The problem as follows:
Find top k frequent words in realtime data stream.
Implement three methods for Topk Class:
TopK(k). The constructor.
add(word). Add a new word.
topk(). Get the current top k frequent words.
And my thought was to use a hashmap to remember frequencies and a treeset as a buffer.
My implementation passed most of the case, except one:
TopK(10)
add("aw")
add("fb")
add("fb")
topk()
The answer supposed to be [fb,aw] but now it's [fb,aw, fb]
However, my code passed test case like:
TopK(10)
add("iiiiii")
add("fb")
add("fb")
topk()
and
TopK(10)
add("fb")
add("fb")
topk()
I have no idea what's wrong, so I printed some value when the comparator is called. It gave me this:
aw aw
11111111
fb aw
33333333
fb aw
33333333
fb aw
222222222
fb aw
222222222
Which means, the second "fb" was compared to "aw" twice and the comparator is done. I spent hours to debug and I have found nothing so far.
Here is my implementation:
public class TopK {
int size;
HashMap<String, Integer> map;
TreeSet<String> seen;
public TopK(int k) {
// do intialization if necessary
size = k;
seen = new TreeSet<String>(new Comparator<String>(){
#Override
public int compare(String str1, String str2){
System.out.println(str1 + " "+ str2);
if (str1.equals(str2)){
System.out.println("11111111");
return 0;
}
// important !https://www.jiuzhang.com/qa/7646/
// 128 以后integer就不同了
int number1 = map.get(str1);
int number2 = map.get(str2);
if (number1 != number2){
System.out.println("222222222");
return map.get(str1) - map.get(str2);
} else {
System.out.println("33333333");
return str2.compareTo(str1);
}
}
});
map = new HashMap<String, Integer>();
}
/*
* #param word: A string
* #return: nothing
*/
public void add(String word) {
// write your code here
if (!map.containsKey(word)){
map.put(word, 0);
}
map.put(word, map.get(word) + 1);
if (seen.contains(word)){
seen.remove(word);
seen.add(word);
} else {
seen.add(word);
if (seen.size() > size){
seen.pollFirst();
}
}
}
/*
* #return: the current top k frequent words.
*/
public List<String> topk() {
// Write your code here
List<String> results = new ArrayList<String>();
Iterator it = seen.iterator();
while(it.hasNext()) {
String str = (String)it.next();
results.add(0, str);
}
return results;
}
}
Our first clue is that case:
aw
fb
fb
Fails but:
iiiii
fb
fb
Succeed.
This can happen only because of the line: return str2.compareTo(str1); - if the number of appearances are different order by string compare (this could be check easily - please do that).
The only explanation I can think of is the contains function of java TreeSet has "optimization" of searching only until where the element should be - if you have order and the element is not where it should be then consider it as none exist in the TreeSet (think about array that should be sort for checking for number you run in log(n) but no on all array - so if he exist in wrong position you will miss him).
Notice that you change the place where the element should be before checking the contains function. So let look at the 2th iteration - we have fb and aw both with value of 1 in the map. On the TreeSet the are as [fb,aw] (because string compare as explain before). Now you change the map and fb have value of 2 -> it should be in the last place but the contains function compare to aw and think it should be after it - but he is the last element so it assume fb does not exist and just add him -> That why you see 2 compares between fb and aw - one for the contain and one for the add.
Hope that was understandable....

Find all valid words when given a string of characters (Recursion / Binary Search)

I'd like some feedback on a method I tried to implement that isn't working 100%. I'm making an Android app for practice where the user is given 20 random letters. The user then uses these letters to make a word of whatever size. It then checks a dictionary to see if it is a valid English word.
The part that's giving me trouble is with showing a "hint". If the user is stuck, I want to display the possible words that can be made. I initially thought recursion. However, with 20 letters this can take quite a long time to execute. So, I also implemented a binary search to check if the current recursion path is a a prefix to anything in the dictionary. I do get valid hints to be output however it's not returning all possible words. Do I have a mistake here in my recursion thinking? Also, is there a recommended, faster algorithm? I've seen a method in which you check each word in a dictionary and see if the characters can make each word. However, I'd like to know how effective my method is vs. that one.
private static void getAllWords(String letterPool, String currWord) {
//Add to possibleWords when valid word
if (letterPool.equals("")) {
//System.out.println("");
} else if(currWord.equals("")){
for (int i = 0; i < letterPool.length(); i++) {
String curr = letterPool.substring(i, i+1);
String newLetterPool = (letterPool.substring(0, i) + letterPool.substring(i+1));
if(dict.contains(curr)){
possibleWords.add(curr);
}
boolean prefixInDic = binarySearch(curr);
if( !prefixInDic ){
break;
} else {
getAllWords(newLetterPool, curr);
}
}
} else {
//Every time we add a letter to currWord, delete from letterPool
//Attach new letter to curr and then check if in dict
for(int i=0; i<letterPool.length(); i++){
String curr = currWord + letterPool.substring(i, i+1);
String newLetterPool = (letterPool.substring(0, i) + letterPool.substring(i+1));
if(dict.contains(curr)) {
possibleWords.add(curr);
}
boolean prefixInDic = binarySearch(curr);
if( !prefixInDic ){
break;
} else {
getAllWords(newLetterPool, curr);
}
}
}
private static boolean binarySearch(String word){
int max = dict.size() - 1;
int min = 0;
int currIndex = 0;
boolean result = false;
while(min <= max) {
currIndex = (min + max) / 2;
if (dict.get(currIndex).startsWith(word)) {
result = true;
break;
} else if (dict.get(currIndex).compareTo(word) < 0) {
min = currIndex + 1;
} else if(dict.get(currIndex).compareTo(word) > 0){
max = currIndex - 1;
} else {
result = true;
break;
}
}
return result;
}
The simplest way to speed up your algorithm is probably to use a Trie (a prefix tree)
Trie data structures offer two relevant methods. isWord(String) and isPrefix(String), both of which take O(n) comparisons to determine whether a word or prefix exist in a dictionary (where n is the number of letters in the argument). This is really fast because it doesn't matter how large your dictionary is.
For comparison, your method for checking if a prefix exists in your dictionary using binary search is O(n*log(m)) where n is the number of letters in the string and m is the number of words in the dictionary.
I coded up a similar algorithm to yours using a Trie and compared it to the code you posted (with minor modifications) in a very informal benchmark.
With 20-char input, the Trie took 9ms. The original code didn't complete in reasonable time so I had to kill it.
Edit:
As to why your code doesn't return all hints, you don't want to break if the prefix is not in your dict. You should continue to check the next prefix instead.
Is there a recommended, faster algorithm?
See Wikipedia article on "String searching algorithm", in particular the section named "Algorithms using a finite set of patterns", where "finite set of patterns" is your dictionary.
The Aho–Corasick algorithm listed first might be a good choice.

Find first non repeating character, cost O(n)

First of all i know how find first non repeating character if the string contain Ascii table, like: `"abccba.."
the question or the problem is: how can find first non repeating character from string/buffer contain mix letters? i mean we don`t know what the language is!
maybe is English or Arabic or is mix between two language, and i must do that in O(n).
if we used HashMap then get and put cost O(1) [PROVE]?
what the kind of input! is string or another container?
Make a <Character, Integer> map counting the number of occurrences, go through your whole String by character at a time and add it to map if it's not already there (with count of 1), or increment count of occurences if it already is there. Then go through your whole String again, and return the first item you find that has count 1. Counting number of occurrences is not necessary here, could be done in a different way or stopped at 2, but it may be useful if you are trying to expand your code later to do more. Of course you need to take care of potential unicode problems or something if you are using arabic letters, but I think that's not what you're asking about. This solution is O(N), you are going through your String 2 times, and map operations are cheaper.
In case the HashMap solution is not good enough, here’s what I can think of: Make two instances of java.util.BitSet, seenOnce and seenTwice, each with 0x10000 bits. For each char, if it is already in seenOnce, add it to seenTwice, otherwise add it to seenOnce. Second time throgh the string, print out the first character not in seenTwice.
In C#... O(n) time complexity...
public char GetFirstRecuringChar(string text)
{
if (string.IsNullOrEmpty(text) || string.IsNullOrWhiteSpace(text))
{
throw new ArgumentNullException();
}
IDictionary<char, CharIndex> map = new Dictionary<char, CharIndex>();
for (int i = 0; i < text.Length; i++)
{
if (map.ContainsKey(text[i]))
{
map[text[i]].Count++;
}
else
{
CharIndex ci = new CharIndex { Index = i, Ch = text[i], Count = 1 };
map.Add(text[i], ci);
}
}
int lowestIndex = int.MaxValue;
foreach (CharIndex charIndex in map.Values)
{
if (charIndex.Count == 1)
{
if (lowestIndex > charIndex.Index)
{
lowestIndex = charIndex.Index;
}
}
}
char answer = '\n';
if (lowestIndex != int.MaxValue)
{
foreach (CharIndex charIndex in map.Values)
{
if (charIndex.Index == lowestIndex)
{
answer = charIndex.Ch;
break;
}
}
}
return answer;
}
private class CharIndex
{
public char Ch { get; set; }
public int Index { get; set; }
public int Count { get; set; }
}
Unit tests:
[TestCase("ississippitotalm", 'o')]
[TestCase("a", 'a')]
[TestCase("teeter", 'r')]
[TestCase("teeterxyz", 'r')]
[TestCase(".......................................x.................f.........y...xkiuytreeee", 'f')]
public void GetFirstRecuringCharTest(string text, char expectedAnswer)
{
char result = runner.GetFirstRecuringChar(text);
Assert.That(result, Is.EqualTo(expectedAnswer));
}

A more efficient way of finding English words that are one letter off from eachother

I wrote a little program that tries to find a connection between two equal length English words. Word A will transform into Word B by changing one letter at a time, each newly created word has to be an English word.
For example:
Word A = BANG
Word B = DUST
Result:
BANG -> BUNG ->BUNT -> DUNT -> DUST
My process:
Load an English wordlist(consist of 109582 words) into a Map<Integer, List<String>> _wordMap = new HashMap();, key will be the word length.
User put in 2 words.
createGraph creates a graph.
calculate the shortest path between those 2 nodes
prints out the result.
Everything works perfectly fine, but I am not satisfied with the time it took in step 3.
See:
Completely loaded 109582 words!
CreateMap took: 30 milsecs
CreateGraph took: 17417 milsecs
(HOISE : HORSE)
(HOISE : POISE)
(POISE : PRISE)
(ARISE : PRISE)
(ANISE : ARISE)
(ANILE : ANISE)
(ANILE : ANKLE)
The wholething took: 17866 milsecs
I am not satisfied with the time it takes create the graph in step 3, here's my code for it(I am using JgraphT for the graph):
private List<String> _wordList = new ArrayList(); // list of all 109582 English words
private Map<Integer, List<String>> _wordMap = new HashMap(); // Map grouping all the words by their length()
private UndirectedGraph<String, DefaultEdge> _wordGraph =
new SimpleGraph<String, DefaultEdge>(DefaultEdge.class); // Graph used to calculate the shortest path from one node to the other.
private void createGraph(int wordLength){
long before = System.currentTimeMillis();
List<String> words = _wordMap.get(wordLength);
for(String word:words){
_wordGraph.addVertex(word); // adds a node
for(String wordToTest : _wordList){
if (isSimilar(word, wordToTest)) {
_wordGraph.addVertex(wordToTest); // adds another node
_wordGraph.addEdge(word, wordToTest); // connecting 2 nodes if they are one letter off from eachother
}
}
}
System.out.println("CreateGraph took: " + (System.currentTimeMillis() - before)+ " milsecs");
}
private boolean isSimilar(String wordA, String wordB) {
if(wordA.length() != wordB.length()){
return false;
}
int matchingLetters = 0;
if (wordA.equalsIgnoreCase(wordB)) {
return false;
}
for (int i = 0; i < wordA.length(); i++) {
if (wordA.charAt(i) == wordB.charAt(i)) {
matchingLetters++;
}
}
if (matchingLetters == wordA.length() - 1) {
return true;
}
return false;
}
My question:
How can I improve my algorithm inorder to speed up the process?
For any redditors that are reading this, yes I created this after seeing the thread from /r/askreddit yesterday.
Here's a starting thought:
Create a Map<String, List<String>> (or a Multimap<String, String> if you've using Guava), and for each word, "blank out" one letter at a time, and add the original word to the list for that blanked out word. So you'd end up with:
.ORSE => NORSE, HORSE, GORSE (etc)
H.RSE => HORSE
HO.SE => HORSE, HOUSE (etc)
At that point, given a word, you can very easily find all the words it's similar to - just go through the same process again, but instead of adding to the map, just fetch all the values for each "blanked out" version.
You probably need to run it through a profiler to see where most of the time is taken, especially since you are using library classes - otherwise you might put in a lot of effort but see no significant improvement.
You could lowercase all the words before you start, to avoid the equalsIgnoreCase() on every comparison. In fact, this is an inconsistency in your code - you use equalsIgnoreCase() initially, but then compare chars in a case-sensitive way: if (wordA.charAt(i) == wordB.charAt(i)). It might be worth eliminating the equalsIgnoreCase() check entirely, since this is doing essentially the same thing as the following charAt loop.
You could change the comparison loop so it finishes early when it finds more than one different letter, rather than comparing all the letters and only then checking how many are matching or different.
(Update: this answer is about optimizing your current code. I realize, reading your question again, that you may be asking about alternative algorithms!)
You can have the list of words of same length sorted, and then have a loop nesting of the kind for (int i = 0; i < n; ++i) for (int j = i + 1; j < n; ++j) { }.
And in isSimilar count the differences and on 2 return false.

Find word in dictionary of unknown size using only a method to get a word by index

A few days ago I had interview in some big company, name is not required :), and interviewer asked me to find solution to the next task:
Predefined:
There is dictionary of words with unspecified size, we just know that all words in dictionary are sorted (for example by alphabet). Also we have just a one method
String getWord(int index) throws IndexOutOfBoundsException
Needs:
Need to develop algorithm to find some input word in dictionary using java. For this we should implement method
public boolean isWordInTheDictionary(String word)
Limitations:
We cannot change the internal structure of dictionary, we have no access to internal structure, we do not know counts of elements in dictionary.
Issues:
I have developed modified-binary search, and will publish my variant(works variant) of algorithm, but are there another variants with logarithmic complexity? My variant has complexity O(logN).
My variant of implementation:
public class Dictionary {
private static final int BIGGEST_TOP_MASK = 0xF00000;
private static final int LESS_TOP_MASK = 0x0F0000;
private static final int FULL_MASK = 0xFFFFFF;
private String[] data;
private static final int STEP = 100; // for real test step should be Integer.MAX_VALUE
private int shiftIndex = -1;
private static final int LESS_MASK = 0x0000FF;
private static final int BIG_MASK = 0x00FF00;
public Dictionary() {
data = getData();
}
String getWord(int index) throws IndexOutOfBoundsException {
return data[index];
}
public String[] getData() {
return new String[]{"a", "aaaa", "asss", "az", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "test", "u", "v", "w", "x", "y", "z"};
}
public boolean isWordInTheDictionary(String word) {
boolean isFound = false;
int constantIndex = STEP; // predefined step
int flag = 0;
int i = 0;
while (true) {
i++;
if (flag == FULL_MASK) {
System.out.println("Word is not found ... Steps " + i);
break;
}
try {
String data = getWord(constantIndex);
if (null != data) {
int compareResult = word.compareTo(data);
if (compareResult > 0) {
if ((flag & LESS_MASK) == LESS_MASK) {
constantIndex = prepareIndex(false, constantIndex);
if (shiftIndex == 1)
flag |= BIGGEST_TOP_MASK;
} else {
constantIndex = constantIndex * 2;
}
flag |= BIG_MASK;
} else if (compareResult < 0) {
if ((flag & BIG_MASK) == BIG_MASK) {
constantIndex = prepareIndex(true, constantIndex);
if (shiftIndex == 1)
flag |= LESS_TOP_MASK;
} else {
constantIndex = constantIndex / 2;
}
flag |= LESS_MASK;
} else {
// YES!!! We found word.
isFound = true;
System.out.println("Steps " + i);
break;
}
}
} catch (IndexOutOfBoundsException e) {
if (flag > 0) {
constantIndex = prepareIndex(true, constantIndex);
flag |= LESS_MASK;
} else constantIndex = constantIndex / 2;
}
}
return isFound;
}
private int prepareIndex(boolean isBiggest, int constantIndex) {
shiftIndex = (int) Math.ceil(getIndex(shiftIndex == -1 ? constantIndex : shiftIndex));
if (isBiggest)
constantIndex = constantIndex - shiftIndex;
else
constantIndex = constantIndex + shiftIndex;
return constantIndex;
}
private double getIndex(double constantIndex) {
if (constantIndex <= 1)
return 1;
return constantIndex / 2;
}
}
It sounds like the part they really want you to think about is how to handle the fact that you don't know the size of the dictionary. I think they assume that you can give them a binary search. So the real question is how do you manipulate the range of the search as it progresses.
Once you have found a value in the dictionary that is greater than your search target (or out of bounds), the rest looks like standard binary search. The hard part is how do you optimally expand the range when the target value is greater than the dictionary value that you've looked up. It looks like you are expanding by a factor of 1.5. This could be really problematic with a huge dictionary and a small fixed initial step like you have (100). Think if there were 50 million words how many times your algorithm would have to expand the range upwards if you're searching for 'zebra'.
Here's an idea: use the ordered nature of the collection to your advantage by assuming the first letter of each word is evenly distributed amongst the letters of the alphabet (this will never be true, but without knowing more about the collection of words it's probably the best you can do). Then weight the amount of your range expansion by how far from the end you would expect the dictionary word to be.
So if you took your initial step of 100 and looked up the dictionary word at that index and it was 'aardvark', you would expand your range a lot more for the next step than if it was 'walrus.' Still O(log n) but probably much better for most collections of words.
Here is an alternative implementation that uses Collections.binarySearch. It fails if one of the words in the list starts with the Character '\uffff' (that is Unicode 0xffff and not a legal not a valid unicode character).
public static class ListProxy extends AbstractList<String> implements RandomAccess
{
#Override public String get( int index )
{
try {
return getWord( index );
} catch( IndexOutOfBoundsException ex ) {
return "\uffff";
}
}
#Override public int size()
{
return Integer.MAX_VALUE;
}
}
public static boolean isWordInTheDictionary( String word )
{
return Collections.binarySearch( new ListProxy(), word ) >= 0;
}
Update: I modified it so that it implements RandomAccess since the binarySearch in Collections would otherwise use a iterator based search on such a large list which would be extremely slow. This should now however be decently fast since the binary search will need only 31 iterations even though the List pretends to be as large as possible.
Here is a slightly modified version that remembers the smallest failed index to converge its proclaimed size to the actual size of the dictionary en passant and thus avoids almost all exceptions in successive lookups. Although you would need to create a new ListProxy instance whenever the size of the dictionary could have changed.
public static class ListProxy extends AbstractList<String> implements RandomAccess
{
private int size = Integer.MAX_VALUE;
#Override public String get( int index )
{
try {
if( index < size )
return getWord( index );
} catch( IndexOutOfBoundsException ex ) {
size = index;
}
return "\uffff";
}
#Override public int size()
{
return size;
}
}
private static ListProxy listProxy = new ListProxy();
public static boolean isWordInTheDictionary( String word )
{
return Collections.binarySearch( listProxy , word ) >= 0;
}
You have the right idea, but I think your implementation is overly complicated. You want to do a binary search, but you don't know what the upper bound is. So instead of starting at the middle, you start at index 1 (assuming dictionary indexes start at 0).
If the word you're looking for is "less than" the current dictionary word, halve the distance between the current index and your "low" value. ("low" starts at 0, of course).
If the word you're looking for is "greater than" the word at the index you just examined, then either halve the distance between the current index and your "high" value ("high" starts at 2) or, if index and "high" are the same, double the index.
If doubling the index gives you an out of range exception, you halve the distance between the current value and the doubled value. So if going from 16 to 32 throws an exception, try 24. And, of course, keep track of the fact that 32 is more than the max.
So a search sequence might look like 1, 2, 4, 8, 16, 12, 14 - found!
It's the same concept as a binary search, but rather than starting with low = 0, high = n-1, you start with low = 0, high = 2, and double the high value when you need to. It's still O(log N), although the constant is going to be a bit larger than with a "normal" binary search.
You can incur a one-time cost of O(n), if you know that the dictionary will not change. You can add all the words in the dictionary to a hashtable, and then any subsequent calls to isWordInDictionary() will be O(1) (in theory).
Use the getWord() API to copy the entire contents of the dictionary into a more sensible data structure (e.g. hash table, trie, perhaps even augmented by a Bloom filter). ;-)
In a different language:
#!/usr/bin/perl
$t=0;
$cur=1;
$under=0;
$EOL=int(rand(1000000))+1;
$TARGET=int(rand(1000000))+1;
if ($TARGET>$EOL)
{
$x=$EOL;
$EOL=$TARGET;
$TARGET=$x;
}
print "Looking for $TARGET with EOL $EOL\n";
sub testWord($)
{
my($a)=#_;
++$t;
return 0 if ($a eq $TARGET);
return -2 if ($a > $EOL);
return 1 if ($a > $TARGET);
return -1;
}
while ($r = testWord($cur))
{
print "Tested $cur, got $r\n";
if ($r == 1) { $over=$cur; }
if ($r == -1) { $under=$cur; }
if ($r == -2) { $over = $cur; }
if ($over)
{
$cur = int(($over-$under)/2)+$under;
$cur++ if ($cur <= $under);
$cur-- if ($cur >= $over);
}
else
{
$cur *= 2;
}
}
print "Found $TARGET at $r in $t tests\n";
The main benefit of this one is it is a bit simpler to understand. I think it may be more efficient if your first guesses are below the target since I don't think you are taking advantage of the space you have already "searched", but that is just with a quick glance at your code. Since it is looking for numbers for simplicity, it doesn't have to deal with not finding the target, but that is an easy extension.
#Sergii Zagriichuk hope the interview went well. Good luck with that.
I think just as #alexcoco said Binary Search is the answer.
Other options I see are only available if you could extend the dictionary. You could make it slightly better. E.g. You could count the words on each letter, and keep their track this way you would effectively had to work only on a subset of words.
Or yea as guys are saying to entirely implement your own dictionary structure.
I know this doesn't answer you question properly. But I cannot see other possibilities.
BTW would be nice to see your algorithm.
EDIT:
Expanding on my comment under answer of bshields...
#Sergii Zagriichuk even better it would be to remember the last index where we had null (no word), I think. Then at each run you could check if it is still true. If not then expand the range to a 'previous index' obtained by reversing the binary search behaviour, so we have null again. This way you would always adjust the size of the range of your search algorithm, thus adapting to the current state of the dictionary as needed. Plus the changes would have to be significant in order to cause your range adjustment so the adjustment wouldn't have any real negative impact on the algorithm. Also dictionaries tend to be static in nature so this should work :)
On one hand yes you are right with binary search implementation. But on the other hand in case dictionary is static and is not changed between lookups - we could suggest different algorithm. Here we have common problem - string sorting/search is different comparing to sorting/searching int array, so getWord(int i).compareTo(string) is O(min(length0, length1)).
Suppose we have request to find words w0, w1, ... wN, during lookup we could build up a tree with indicies (probably some suffix tree will good enough for this task).
During next lookup request we have following set a1, a2, ... aM, so to decrease average time we could first decrease range by searching position in the tree.
The problem with this implementation is concurrency and memory usage, so next step is implementing strategy to make search tree smaller.
PS: main aim was to check ideas and problems you suggest.
Well i think the info that dictionary is sorted can be utilized in a better way.
Say you are looking for a word "Zebra" , whereas the first guess search resulted in "abcg".
So we can use this info in chossing the second guess index . like in my case the resulted word is starting with a , whereas i am looking for something starting with z. So rather than making a static jump , i can make some calculated jump based on the current result and desired result. So in this way suppose if my next jump takes me to the word "yvu" , i now i am very near , so i will make a rather slow small jump than in the prev case.
Here is my solution.. uses O(logn) operations. First part of the code tries to find a estimate of the length and then the second part takes advantage of the fact that the dictionary is sorted and performs a binary search.
boolean isWordInTheDictionary(String word){
if (word == null){
return false;
}
// estimate the length of the dictionary array
long len=2;
String temp= getWord(len);
while(true){
len = len * 2;
try{
temp = getWord(len);
}catch(IndexOutOfBoundsException e){
// found upped bound break from loop
break;
}
}
// Do a modified binary search using the estimated length
long beg = 0 ;
long end = len;
String tempWrd;
while(true){
System.out.println(String.format("beg: %s, end=%s, (beg+end)/2=%s ", beg,end,(beg+end)/2));
if(end - beg <= 1){
return false;
}
long idx = (beg+end)/2;
tempWrd = getWord(idx);
if(tempWrd == null){
end=idx;
continue;
}
if ( word.compareTo(tempWrd) > 0){
beg = idx;
}
else if(word.compareTo(tempWrd) < 0){
end= idx;
}else{
// found the word..
System.out.println(String.format("getword at index: %s, =%s", idx,getWord(idx)));
return true;
}
}
}
Assuming the dictionary is 0-based, I would decompose the search in two parts.
First, given that the index to parameter to getWord() is an integer, and assuming that the index must be a number between 0 and the maximum positive integer, perform a binary search over that range in order to find the maximum valid index (irrespective of the word values). This operation is O(log N), since is a simple binary search.
Once obtained the size of the dictionary, a second ordinary binary search (again of complexity O(log N)) will bring on the desired answer.
Since O(log N)+O(log N) is O(log N), this algorithm complies with your requirement.
I'm in a hiring proccess which asked me this same problem...
My approach was a bit different, and considering the dictionary (webservice) I have, it's about 30% more efficient (for the words I've tested).
Here is the solution:
https://github.com/gustavompo/wordfinder
I'll not post the whole solution here because it's decoupled through classes and methods, but the core algorithm is this:
public WordFindingResult FindWord(string word)
{
var callsCount = 0;
var lowerLimit = new WordFindingLimit(0, null);
var upperLimit = new WordFindingLimit(int.MaxValue, null);
var wordToFind = new Word(word);
var wordIndex = _initialIndex;
while (callsCount <= _maximumCallsCount)
{
if (CouldNotFindWord(lowerLimit, upperLimit))
return new WordFindingResult(callsCount, -1, string.Empty, WordFindingResult.ErrorCodes.NOT_FOUND);
var wordFound = RetrieveWordAt(wordIndex);
callsCount++;
if (wordToFind.Equals(wordFound))
return new WordFindingResult(callsCount, wordIndex, wordFound.OriginalWordString);
else if (IsIndexTooHigh(wordToFind, wordFound))
{
upperLimit = new WordFindingLimit(wordIndex, wordFound);
wordIndex = IndexConsideringTooHighPreviousResult(lowerLimit, wordIndex);
}
else
{
lowerLimit = new WordFindingLimit(wordIndex, wordFound);
wordIndex = IndexConsideringTooLowPreviousResult(lowerLimit, upperLimit, wordToFind);
}
}
return new WordFindingResult(callsCount, -1, string.Empty, WordFindingResult.ErrorCodes.CALLS_LIMIT_EXCEEDED);
}
private int IndexConsideringTooHighPreviousResult(WordFindingLimit maxLowerLimit, int current)
{
return BinarySearch(maxLowerLimit.Index, current);
}
private int IndexConsideringTooLowPreviousResult(WordFindingLimit maxLowerLimit, WordFindingLimit minUpperLimit, Word target)
{
if (AreLowerAndUpperLimitsDefined(maxLowerLimit, minUpperLimit))
return BinarySearch(maxLowerLimit.Index, minUpperLimit.Index);
var scoreByIndexPosition = maxLowerLimit.Index / maxLowerLimit.Word.Score;
var indexOfTargetBasedInScore = (int)(target.Score * scoreByIndexPosition);
return indexOfTargetBasedInScore;
}

Categories

Resources