Binary search in Java - learning it "my way"

Binary search in Java - learning it "my way" - java

So I'm trying to teach myself how to implement a binary search in Java, as the topic might have given away, but am having some trouble.
See, I tend to be a little stubborn, and I'd rather not just copy some implementation off the internet.
In order to teach myself this, I created a very (VERY) rough little class which looks as follows:
public class bSearch{
/**
* #param args
*/
public static void main(String[] args) {
int one = 1;
int two = 2;
int three = 3;
int four = 4;
int five = 5;
int six = 6;
ArrayList tab = new ArrayList();
tab.add(one);
tab.add(two);
tab.add(three);
tab.add(four);
tab.add(five);
tab.add(six);
System.out.println(bSearch(tab, 53));
}
#SuppressWarnings({ "rawtypes", "unchecked" })
public static int bSearch(ArrayList tab, int key) {
if (tab.size() == 0)
return 0;
if ((int) tab.get(tab.size() / 2) == key)
return key;
ArrayList smallerThanKey = new ArrayList();
ArrayList largerThanKey = new ArrayList();
for (int i = 0; i < (tab.size() + 1) / 2; i++) {
smallerThanKey.add(tab.get(i));
}
System.out.println("Smaller array = " + smallerThanKey);
for (int i = (tab.size() + 1) / 2; i < tab.size(); i++) {
largerThanKey.add(tab.get(i));
}
System.out.println("Larger array = " + largerThanKey);
if (key < (int) tab.get(tab.size() / 2)) {
bSearch(smallerThanKey, key);
} else {
bSearch(largerThanKey, key);
}
return key;
}
}
As you can see, it's pretty far from beautiful, but it's clear enough for a noobie like myself to understand, anyway.
Now, here's the problem; when I feed it a number that is in the ArrayList, it feeds the number back to me (hurray!), but when I feed it a number that's not in the ArrayList, it still feeds me my number back to me (boo!).
I have a feeling my error is very minor, but I just can't see it.
Or am I all wrong, and there is some larger fundamental error?
Your help is deeply appreciated!
UPDATE
Thanks for all the constructive comments and answers! Many helpful pointer in the right direction by several of you. +1 for everyone who bumped me along the right path.
By following the advice you gave, mostly relating to my recursions not ending properly, I added a few return statements, as follows;
if (key < (int) tab.get(tab.size() / 2)) {
return bSearch(smallerThanKey, key);
} else {
return bSearch(largerThanKey, key);
}
Now, what this does is one step closer to what I want to achieve.
I now get 0 if the number is nowhere to be found, and the number itself if it is to be found. Thus progress is being made!
However, it does not work if I have it search for a negative number or zero (not that I know why I should, but just throwing that out there).
Is there a fix for this, or am I barking up the wrong tree in questioning?

EDIT
Just as a quick solution to the exact question you're asking: you need to change the last few lines to the following
if (key < (int) tab.get(tab.size() / 2)) {
return bSearch(smallerThanKey, key);
} else {
return bSearch(largerThanKey, key);
}
}
Having said that, let me point out a few more issues that I see here:
(a) you can use generics. That is use ArrayList<Integer> rather than just ArrayList this will save you from all those casts.
(b) Instead of returning the value that you found you'd be better off returning the index in the ArrayList where the value is located, or -1 if it was not found. Here's why: returning the key provides the caller with very little new information. I mean - the caller already known what key is. If you return the index to the key you let the caller know if the key was found or not, and if it was found where in the list it resides.
(c) You essentially copying the entire list each time you go into bSearch(): you copy roughly half of the list into smallerThanKey and (roughly) half into greaterThanKey. This means that the complexity of this implementation is not O(log n) but instead O(n).
(EDIT #2)
Summarizing points (a), (b), (c) here's how one could write that method:
public static int bSearch(ArrayList<Integer> tab, int key) {
return bSearch(tab, 0, tab.size(), key);
}
public static int bSearch(ArrayList<Integer> tab, int begin, int end, int key) {
int size = end - begin;
if (size <= 0)
return -1;
int midPoint = (begin + end) / 2;
int midValue = tab.get(midPoint);
if (midValue == key)
return midPoint;
if (key < midValue) {
return bSearch(tab, begin, midPoint, key);
} else {
return bSearch(tab, midPoint + 1, end, key);
}
}
As you can see, I added a second method that takes a begin, end parameters. These parameters let the method which part of the list it should look at. This is much cheaper than creating a new list and copying elements to it. Instead, the recursive function just uses the list object and simply calls itself with new begin, end values.
The return value is now the index of the key inside the list (or -1 if not found).

Your recursion is not properly ended. At the end of the method you recursively call the bSearchmethod for the left or right part of the array. At that point you need to return the search result of the recursive calls.
The idea of the binary search is: If your current node is not the key, look at the left if the value of the current node is bigger than the key or look at the right if it is smaller. So after looking there you need to return the search result from there.
if (key < (int) tab.get(tab.size() / 2)) {
return bSearch(smallerThanKey, key);
} else {
return bSearch(largerThanKey, key);
}
As a side remark, have a look at System.arraycopy and it is always a good idea to not suppress warnings.

I think the issue is here:
if (key < (int) tab.get(tab.size() / 2)) {
bSearch(smallerThanKey, key);
} else {
bSearch(largerThanKey, key);
}
return key;
You're just throwing away the result of your recursive call to bSearch and returning key. So it isn't really much of a surprise you get back whatever number you feed into the method.
Remember how binary search is supposed to work -- if the value isn't in the middle, return the result of searching in the left/right half of the array. So you need to do something with those recursive calls....
And with binary search, you really should be more concerned about finding the location of whatever you're looking for, not its value -- you know that already! So what you think was the binary search working right was a bit mistaken -- searching for 1 should have returned 0 -- the index/location of 1.
Also, you shouldn't need to deal with copying arrays and such -- that's an operation that is unnecessary for searches. Just use parameters to indicate where to begin/end searching.

Related

How to transfer an outside recursion program into a non-recursive form (using stack not CPS)？

there are many questions about how to convert recursive to non-recursive, and I also can convert some recursive programs to non-recursive form
note: I use an generalized way (user defined Stack), because I think it is easy to understand, and I use Java, so can not use GOTO keyword.
Things don't always go so well, when I meet the Backtracking, I am stuck. for example, The subset problem. and my code is here: recursive call with loop
when i use user defined Stack to turn it to non-recursive form. I do not know how to deal with the loop (in the loop existing recursive call).
I googled found that there is many methods such as CPS. and I know there is an iterative template of subset problem. but i only want to use user defined Stack to solve.
Can someone provide some clues to turn this kind of recursive(recursive with loop) to non-recursive form(by using user defined Stack, not CPS etc..) ?
here is my code recursive to non-recusive(Inorder-Traversal), because of there is no loop with recursive call, so i can easily do it. also when recursive program with a return value, I can use a reference and pass it to the function as a param. from the code, I use the Stack to simulated the recursive call, and use "state" variable to the next call point(because java does not allow using GOTO).
The following is the information I have collected. It seems that all of them does not satisfy the question I mentioned(some use goto that java not allowed, some is very simple recursive means that no nested recursive call or recursive call with loop ).
1 Old Dominion University
2 codeproject
----------------------------------Split Line--------------------------------------
Thks u all. after when I post the question... It took me all night to figure it out. here is my solution: non-recursive subset problem solution, and the comment of the code is my idea.
To sum up. what i stuck before is how to deal with the foo-loop, actually, we can just simply ignore it. because we are using loop+stack, we can do a simple judgment on whether to meet the conditions.

On your stack, have you thought about pushing i (the iteration variable)?
By doing this, when you pop this value, you know at which iteration of the loop you were before you pushed on the stack and therefore, you can iterate to the next i and continue your algorithm.

Non-negative numbers only for simplicity. (Also no IntFunction.)
The power function, as defined here, is a very simple case.
int power(int x, int exponent) {
if (exponent == 0) {
return 1;
} else if (exponent % 2 == 0) {
int y = power(x, exponent /2);
return y * y;
} else {
return x * power(x, exponent - 1);
}
}
Now the stack is there to do in the reverse order to a partial result, what you did in recursion with the result.
int power(final int x, int exponent) {
Stack<Function<Integer, Integer>> opStack = new Stack<>();
final Function<Integer, Integer> square = n -> n * n;
final Function<Integer, Integer> multiply = n -> x * n;
while (exponent > 0) {
if (exponent % 2 == 0) {
exponent /= 2;
opStack.push(square);
} else {
--exponent;
opStack.push(multiply);
}
}
int result = 1;
while (!opStack.isEmpty()) {
result = opStack.pop().apply(result);
}
return result;
}
An alternative would be to "encode" the two branches of if-else (odd/even exponent) by a boolean:
int power(final int x, int exponent) {
BooleanStack stack = new BooleanStack<>();
while (exponent > 0) {
boolean even = exponent % 2 == 0;
stack.push(even);
if (even) {
exponent /= 2;
} else {
--exponent;
}
}
int result = 1;
while (!stack.isEmpty()) {
result *= stack.pop() ? result : x;
}
return result;
}
So one has to distinghuish:
what one does to prepare the recursive arguments
what one does with the partial results of the recursive calls
how one can merge/handle several recursive calls in the function
exploit nice things, like x being a final constant
Not difficult, puzzling maybe, so have fun.

Find all valid words when given a string of characters (Recursion / Binary Search)

I'd like some feedback on a method I tried to implement that isn't working 100%. I'm making an Android app for practice where the user is given 20 random letters. The user then uses these letters to make a word of whatever size. It then checks a dictionary to see if it is a valid English word.
The part that's giving me trouble is with showing a "hint". If the user is stuck, I want to display the possible words that can be made. I initially thought recursion. However, with 20 letters this can take quite a long time to execute. So, I also implemented a binary search to check if the current recursion path is a a prefix to anything in the dictionary. I do get valid hints to be output however it's not returning all possible words. Do I have a mistake here in my recursion thinking? Also, is there a recommended, faster algorithm? I've seen a method in which you check each word in a dictionary and see if the characters can make each word. However, I'd like to know how effective my method is vs. that one.
private static void getAllWords(String letterPool, String currWord) {
//Add to possibleWords when valid word
if (letterPool.equals("")) {
//System.out.println("");
} else if(currWord.equals("")){
for (int i = 0; i < letterPool.length(); i++) {
String curr = letterPool.substring(i, i+1);
String newLetterPool = (letterPool.substring(0, i) + letterPool.substring(i+1));
if(dict.contains(curr)){
possibleWords.add(curr);
}
boolean prefixInDic = binarySearch(curr);
if( !prefixInDic ){
break;
} else {
getAllWords(newLetterPool, curr);
}
}
} else {
//Every time we add a letter to currWord, delete from letterPool
//Attach new letter to curr and then check if in dict
for(int i=0; i<letterPool.length(); i++){
String curr = currWord + letterPool.substring(i, i+1);
String newLetterPool = (letterPool.substring(0, i) + letterPool.substring(i+1));
if(dict.contains(curr)) {
possibleWords.add(curr);
}
boolean prefixInDic = binarySearch(curr);
if( !prefixInDic ){
break;
} else {
getAllWords(newLetterPool, curr);
}
}
}
private static boolean binarySearch(String word){
int max = dict.size() - 1;
int min = 0;
int currIndex = 0;
boolean result = false;
while(min <= max) {
currIndex = (min + max) / 2;
if (dict.get(currIndex).startsWith(word)) {
result = true;
break;
} else if (dict.get(currIndex).compareTo(word) < 0) {
min = currIndex + 1;
} else if(dict.get(currIndex).compareTo(word) > 0){
max = currIndex - 1;
} else {
result = true;
break;
}
}
return result;
}

The simplest way to speed up your algorithm is probably to use a Trie (a prefix tree)
Trie data structures offer two relevant methods. isWord(String) and isPrefix(String), both of which take O(n) comparisons to determine whether a word or prefix exist in a dictionary (where n is the number of letters in the argument). This is really fast because it doesn't matter how large your dictionary is.
For comparison, your method for checking if a prefix exists in your dictionary using binary search is O(n*log(m)) where n is the number of letters in the string and m is the number of words in the dictionary.
I coded up a similar algorithm to yours using a Trie and compared it to the code you posted (with minor modifications) in a very informal benchmark.
With 20-char input, the Trie took 9ms. The original code didn't complete in reasonable time so I had to kill it.
Edit:
As to why your code doesn't return all hints, you don't want to break if the prefix is not in your dict. You should continue to check the next prefix instead.

Is there a recommended, faster algorithm?
See Wikipedia article on "String searching algorithm", in particular the section named "Algorithms using a finite set of patterns", where "finite set of patterns" is your dictionary.
The Aho–Corasick algorithm listed first might be a good choice.

Given a number, find which numbers below it divide it using recursion

I can't seem to figure this one out. I need to count how many numbers below a given number in which it is divisible.
Here is what I've tried:
public int testing(int x) {
if (x == 0) {
System.out.println("zero");
return x;
}
else if ((x % (x-1)) == 0) {
System.out.println("does this work?");
x--;
}
return testing(x-1);
}
That doesn't work and I don't know where to go from here. Anyone know what to do?

This is what is wrong:
public int testing(int x) {
If you want to make it recursive, you need to pass both the number to test and the number that you are currently checking. The first one will not change through the recursion, the second one will decrement. You cannot do what you express with only one parameter (unless you use a global variable).

This is not a task that should be solved with recursion.
If you MUST use recursion, the simplest way to do it is to have a second parameter, which is essentially an "I have checked until this number". Then you can increase/decrease this (depending on if you start at 0 or the initial number) and call the recursive on that.
Thing is, Java isn't a functional language, so doing all this is actually kind of dumb, so whoever gave you this exercise probably needs a bop on the head.

Your problem is that your expression x % (x - 1) is using the "current" value of x, which decrements on every call to the recursive function. Your condition will be false all the way down to 2 % (2 - 1).
Using a for loop is a much better way to handle this task (and look at the Sieve of Eratosthenes), but if you really have to use recursion (for homework), you'll need to pass in the original value being factored as well as the current value being tried.

You have a problem with your algorithm. Notice the recursion only ends when x == 0, meaning that your function will always return 0 (if it returns at all).
In addition, your algorithm doesn't seem to make any sense. You are basically trying to find all factors of a number, but there's only one parameter, x.
Try to make meaningful names for your variables and the logic will be easier to read/follow.
public int countFactors(int number, int factorToTest, int numFactors)
{
if (factorToTest == 0) // now you are done
return numFactors;
else
// check if factorToTest is a factor of number
// adjust the values appropriately and recurse
}

There is no need to use recursion here. Here's a non-recursive solution:
public int testing(int n) {
int count = 0;
for (int i = 1; i < n; i++)
if (n % i == 0)
count++;
return count;
}
BTW, you should probably call this something other than testing.

Using recursion:
private static int getFactorCount(int num) {
return getFactorCount(num, num - 1);
}
private static int getFactorCount(int num, int factor) {
return factor == 0 ? 0 : (num % factor == 0 ? 1 : 0)
+ getFactorCount(num, factor - 1);
}
public static void main(String[] args) {
System.out.println(getFactorCount(20)); // gives 5
System.out.println(getFactorCount(30)); // gives 7
}

Find word in dictionary of unknown size using only a method to get a word by index

A few days ago I had interview in some big company, name is not required :), and interviewer asked me to find solution to the next task:
Predefined:
There is dictionary of words with unspecified size, we just know that all words in dictionary are sorted (for example by alphabet). Also we have just a one method
String getWord(int index) throws IndexOutOfBoundsException
Needs:
Need to develop algorithm to find some input word in dictionary using java. For this we should implement method
public boolean isWordInTheDictionary(String word)
Limitations:
We cannot change the internal structure of dictionary, we have no access to internal structure, we do not know counts of elements in dictionary.
Issues:
I have developed modified-binary search, and will publish my variant(works variant) of algorithm, but are there another variants with logarithmic complexity? My variant has complexity O(logN).
My variant of implementation:
public class Dictionary {
private static final int BIGGEST_TOP_MASK = 0xF00000;
private static final int LESS_TOP_MASK = 0x0F0000;
private static final int FULL_MASK = 0xFFFFFF;
private String[] data;
private static final int STEP = 100; // for real test step should be Integer.MAX_VALUE
private int shiftIndex = -1;
private static final int LESS_MASK = 0x0000FF;
private static final int BIG_MASK = 0x00FF00;
public Dictionary() {
data = getData();
}
String getWord(int index) throws IndexOutOfBoundsException {
return data[index];
}
public String[] getData() {
return new String[]{"a", "aaaa", "asss", "az", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "test", "u", "v", "w", "x", "y", "z"};
}
public boolean isWordInTheDictionary(String word) {
boolean isFound = false;
int constantIndex = STEP; // predefined step
int flag = 0;
int i = 0;
while (true) {
i++;
if (flag == FULL_MASK) {
System.out.println("Word is not found ... Steps " + i);
break;
}
try {
String data = getWord(constantIndex);
if (null != data) {
int compareResult = word.compareTo(data);
if (compareResult > 0) {
if ((flag & LESS_MASK) == LESS_MASK) {
constantIndex = prepareIndex(false, constantIndex);
if (shiftIndex == 1)
flag |= BIGGEST_TOP_MASK;
} else {
constantIndex = constantIndex * 2;
}
flag |= BIG_MASK;
} else if (compareResult < 0) {
if ((flag & BIG_MASK) == BIG_MASK) {
constantIndex = prepareIndex(true, constantIndex);
if (shiftIndex == 1)
flag |= LESS_TOP_MASK;
} else {
constantIndex = constantIndex / 2;
}
flag |= LESS_MASK;
} else {
// YES!!! We found word.
isFound = true;
System.out.println("Steps " + i);
break;
}
}
} catch (IndexOutOfBoundsException e) {
if (flag > 0) {
constantIndex = prepareIndex(true, constantIndex);
flag |= LESS_MASK;
} else constantIndex = constantIndex / 2;
}
}
return isFound;
}
private int prepareIndex(boolean isBiggest, int constantIndex) {
shiftIndex = (int) Math.ceil(getIndex(shiftIndex == -1 ? constantIndex : shiftIndex));
if (isBiggest)
constantIndex = constantIndex - shiftIndex;
else
constantIndex = constantIndex + shiftIndex;
return constantIndex;
}
private double getIndex(double constantIndex) {
if (constantIndex <= 1)
return 1;
return constantIndex / 2;
}
}

It sounds like the part they really want you to think about is how to handle the fact that you don't know the size of the dictionary. I think they assume that you can give them a binary search. So the real question is how do you manipulate the range of the search as it progresses.
Once you have found a value in the dictionary that is greater than your search target (or out of bounds), the rest looks like standard binary search. The hard part is how do you optimally expand the range when the target value is greater than the dictionary value that you've looked up. It looks like you are expanding by a factor of 1.5. This could be really problematic with a huge dictionary and a small fixed initial step like you have (100). Think if there were 50 million words how many times your algorithm would have to expand the range upwards if you're searching for 'zebra'.
Here's an idea: use the ordered nature of the collection to your advantage by assuming the first letter of each word is evenly distributed amongst the letters of the alphabet (this will never be true, but without knowing more about the collection of words it's probably the best you can do). Then weight the amount of your range expansion by how far from the end you would expect the dictionary word to be.
So if you took your initial step of 100 and looked up the dictionary word at that index and it was 'aardvark', you would expand your range a lot more for the next step than if it was 'walrus.' Still O(log n) but probably much better for most collections of words.

Here is an alternative implementation that uses Collections.binarySearch. It fails if one of the words in the list starts with the Character '\uffff' (that is Unicode 0xffff and not a legal not a valid unicode character).
public static class ListProxy extends AbstractList<String> implements RandomAccess
{
#Override public String get( int index )
{
try {
return getWord( index );
} catch( IndexOutOfBoundsException ex ) {
return "\uffff";
}
}
#Override public int size()
{
return Integer.MAX_VALUE;
}
}
public static boolean isWordInTheDictionary( String word )
{
return Collections.binarySearch( new ListProxy(), word ) >= 0;
}
Update: I modified it so that it implements RandomAccess since the binarySearch in Collections would otherwise use a iterator based search on such a large list which would be extremely slow. This should now however be decently fast since the binary search will need only 31 iterations even though the List pretends to be as large as possible.
Here is a slightly modified version that remembers the smallest failed index to converge its proclaimed size to the actual size of the dictionary en passant and thus avoids almost all exceptions in successive lookups. Although you would need to create a new ListProxy instance whenever the size of the dictionary could have changed.
public static class ListProxy extends AbstractList<String> implements RandomAccess
{
private int size = Integer.MAX_VALUE;
#Override public String get( int index )
{
try {
if( index < size )
return getWord( index );
} catch( IndexOutOfBoundsException ex ) {
size = index;
}
return "\uffff";
}
#Override public int size()
{
return size;
}
}
private static ListProxy listProxy = new ListProxy();
public static boolean isWordInTheDictionary( String word )
{
return Collections.binarySearch( listProxy , word ) >= 0;
}

You have the right idea, but I think your implementation is overly complicated. You want to do a binary search, but you don't know what the upper bound is. So instead of starting at the middle, you start at index 1 (assuming dictionary indexes start at 0).
If the word you're looking for is "less than" the current dictionary word, halve the distance between the current index and your "low" value. ("low" starts at 0, of course).
If the word you're looking for is "greater than" the word at the index you just examined, then either halve the distance between the current index and your "high" value ("high" starts at 2) or, if index and "high" are the same, double the index.
If doubling the index gives you an out of range exception, you halve the distance between the current value and the doubled value. So if going from 16 to 32 throws an exception, try 24. And, of course, keep track of the fact that 32 is more than the max.
So a search sequence might look like 1, 2, 4, 8, 16, 12, 14 - found!
It's the same concept as a binary search, but rather than starting with low = 0, high = n-1, you start with low = 0, high = 2, and double the high value when you need to. It's still O(log N), although the constant is going to be a bit larger than with a "normal" binary search.

You can incur a one-time cost of O(n), if you know that the dictionary will not change. You can add all the words in the dictionary to a hashtable, and then any subsequent calls to isWordInDictionary() will be O(1) (in theory).

Use the getWord() API to copy the entire contents of the dictionary into a more sensible data structure (e.g. hash table, trie, perhaps even augmented by a Bloom filter). ;-)

In a different language:
#!/usr/bin/perl
$t=0;
$cur=1;
$under=0;
$EOL=int(rand(1000000))+1;
$TARGET=int(rand(1000000))+1;
if ($TARGET>$EOL)
{
$x=$EOL;
$EOL=$TARGET;
$TARGET=$x;
}
print "Looking for $TARGET with EOL $EOL\n";
sub testWord($)
{
my($a)=#_;
++$t;
return 0 if ($a eq $TARGET);
return -2 if ($a > $EOL);
return 1 if ($a > $TARGET);
return -1;
}
while ($r = testWord($cur))
{
print "Tested $cur, got $r\n";
if ($r == 1) { $over=$cur; }
if ($r == -1) { $under=$cur; }
if ($r == -2) { $over = $cur; }
if ($over)
{
$cur = int(($over-$under)/2)+$under;
$cur++ if ($cur <= $under);
$cur-- if ($cur >= $over);
}
else
{
$cur *= 2;
}
}
print "Found $TARGET at $r in $t tests\n";
The main benefit of this one is it is a bit simpler to understand. I think it may be more efficient if your first guesses are below the target since I don't think you are taking advantage of the space you have already "searched", but that is just with a quick glance at your code. Since it is looking for numbers for simplicity, it doesn't have to deal with not finding the target, but that is an easy extension.

#Sergii Zagriichuk hope the interview went well. Good luck with that.
I think just as #alexcoco said Binary Search is the answer.
Other options I see are only available if you could extend the dictionary. You could make it slightly better. E.g. You could count the words on each letter, and keep their track this way you would effectively had to work only on a subset of words.
Or yea as guys are saying to entirely implement your own dictionary structure.
I know this doesn't answer you question properly. But I cannot see other possibilities.
BTW would be nice to see your algorithm.
EDIT:
Expanding on my comment under answer of bshields...
#Sergii Zagriichuk even better it would be to remember the last index where we had null (no word), I think. Then at each run you could check if it is still true. If not then expand the range to a 'previous index' obtained by reversing the binary search behaviour, so we have null again. This way you would always adjust the size of the range of your search algorithm, thus adapting to the current state of the dictionary as needed. Plus the changes would have to be significant in order to cause your range adjustment so the adjustment wouldn't have any real negative impact on the algorithm. Also dictionaries tend to be static in nature so this should work :)

On one hand yes you are right with binary search implementation. But on the other hand in case dictionary is static and is not changed between lookups - we could suggest different algorithm. Here we have common problem - string sorting/search is different comparing to sorting/searching int array, so getWord(int i).compareTo(string) is O(min(length0, length1)).
Suppose we have request to find words w0, w1, ... wN, during lookup we could build up a tree with indicies (probably some suffix tree will good enough for this task).
During next lookup request we have following set a1, a2, ... aM, so to decrease average time we could first decrease range by searching position in the tree.
The problem with this implementation is concurrency and memory usage, so next step is implementing strategy to make search tree smaller.
PS: main aim was to check ideas and problems you suggest.

Well i think the info that dictionary is sorted can be utilized in a better way.
Say you are looking for a word "Zebra" , whereas the first guess search resulted in "abcg".
So we can use this info in chossing the second guess index . like in my case the resulted word is starting with a , whereas i am looking for something starting with z. So rather than making a static jump , i can make some calculated jump based on the current result and desired result. So in this way suppose if my next jump takes me to the word "yvu" , i now i am very near , so i will make a rather slow small jump than in the prev case.

Here is my solution.. uses O(logn) operations. First part of the code tries to find a estimate of the length and then the second part takes advantage of the fact that the dictionary is sorted and performs a binary search.
boolean isWordInTheDictionary(String word){
if (word == null){
return false;
}
// estimate the length of the dictionary array
long len=2;
String temp= getWord(len);
while(true){
len = len * 2;
try{
temp = getWord(len);
}catch(IndexOutOfBoundsException e){
// found upped bound break from loop
break;
}
}
// Do a modified binary search using the estimated length
long beg = 0 ;
long end = len;
String tempWrd;
while(true){
System.out.println(String.format("beg: %s, end=%s, (beg+end)/2=%s ", beg,end,(beg+end)/2));
if(end - beg <= 1){
return false;
}
long idx = (beg+end)/2;
tempWrd = getWord(idx);
if(tempWrd == null){
end=idx;
continue;
}
if ( word.compareTo(tempWrd) > 0){
beg = idx;
}
else if(word.compareTo(tempWrd) < 0){
end= idx;
}else{
// found the word..
System.out.println(String.format("getword at index: %s, =%s", idx,getWord(idx)));
return true;
}
}
}

Assuming the dictionary is 0-based, I would decompose the search in two parts.
First, given that the index to parameter to getWord() is an integer, and assuming that the index must be a number between 0 and the maximum positive integer, perform a binary search over that range in order to find the maximum valid index (irrespective of the word values). This operation is O(log N), since is a simple binary search.
Once obtained the size of the dictionary, a second ordinary binary search (again of complexity O(log N)) will bring on the desired answer.
Since O(log N)+O(log N) is O(log N), this algorithm complies with your requirement.

I'm in a hiring proccess which asked me this same problem...
My approach was a bit different, and considering the dictionary (webservice) I have, it's about 30% more efficient (for the words I've tested).
Here is the solution:
https://github.com/gustavompo/wordfinder
I'll not post the whole solution here because it's decoupled through classes and methods, but the core algorithm is this:
public WordFindingResult FindWord(string word)
{
var callsCount = 0;
var lowerLimit = new WordFindingLimit(0, null);
var upperLimit = new WordFindingLimit(int.MaxValue, null);
var wordToFind = new Word(word);
var wordIndex = _initialIndex;
while (callsCount <= _maximumCallsCount)
{
if (CouldNotFindWord(lowerLimit, upperLimit))
return new WordFindingResult(callsCount, -1, string.Empty, WordFindingResult.ErrorCodes.NOT_FOUND);
var wordFound = RetrieveWordAt(wordIndex);
callsCount++;
if (wordToFind.Equals(wordFound))
return new WordFindingResult(callsCount, wordIndex, wordFound.OriginalWordString);
else if (IsIndexTooHigh(wordToFind, wordFound))
{
upperLimit = new WordFindingLimit(wordIndex, wordFound);
wordIndex = IndexConsideringTooHighPreviousResult(lowerLimit, wordIndex);
}
else
{
lowerLimit = new WordFindingLimit(wordIndex, wordFound);
wordIndex = IndexConsideringTooLowPreviousResult(lowerLimit, upperLimit, wordToFind);
}
}
return new WordFindingResult(callsCount, -1, string.Empty, WordFindingResult.ErrorCodes.CALLS_LIMIT_EXCEEDED);
}
private int IndexConsideringTooHighPreviousResult(WordFindingLimit maxLowerLimit, int current)
{
return BinarySearch(maxLowerLimit.Index, current);
}
private int IndexConsideringTooLowPreviousResult(WordFindingLimit maxLowerLimit, WordFindingLimit minUpperLimit, Word target)
{
if (AreLowerAndUpperLimitsDefined(maxLowerLimit, minUpperLimit))
return BinarySearch(maxLowerLimit.Index, minUpperLimit.Index);
var scoreByIndexPosition = maxLowerLimit.Index / maxLowerLimit.Word.Score;
var indexOfTargetBasedInScore = (int)(target.Score * scoreByIndexPosition);
return indexOfTargetBasedInScore;
}

Huffman Tree Encoding

My Huffman tree which I had asked about earlier has another problem! Here is the code:
package huffman;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.util.ArrayList;
import java.util.PriorityQueue;
import java.util.Scanner;
public class Huffman {
public ArrayList<Frequency> fileReader(String file)
{
ArrayList<Frequency> al = new ArrayList<Frequency>();
Scanner s;
try {
s = new Scanner(new FileReader(file)).useDelimiter("");
while (s.hasNext())
{
boolean found = false;
int i = 0;
String temp = s.next();
while(!found)
{
if(al.size() == i && !found)
{
found = true;
al.add(new Frequency(temp, 1));
}
else if(temp.equals(al.get(i).getString()))
{
int tempNum = al.get(i).getFreq() + 1;
al.get(i).setFreq(tempNum);
found = true;
}
i++;
}
}
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return al;
}
public Frequency buildTree(ArrayList<Frequency> al)
{
Frequency r = al.get(1);
PriorityQueue<Frequency> pq = new PriorityQueue<Frequency>();
for(int i = 0; i < al.size(); i++)
{
pq.add(al.get(i));
}
/*while(pq.size() > 0)
{
System.out.println(pq.remove().getString());
}*/
for(int i = 0; i < al.size() - 1; i++)
{
Frequency p = pq.remove();
Frequency q = pq.remove();
int temp = p.getFreq() + q.getFreq();
r = new Frequency(null, temp);
r.left = p;
r.right = q;
pq.add(r); // put in the correct place in the priority queue
}
pq.remove(); // leave the priority queue empty
return(r); // this is the root of the tree built
}
public void inOrder(Frequency top)
{
if(top == null)
{
return;
}
else
{
inOrder(top.left);
System.out.print(top.getString() +", ");
inOrder(top.right);
return;
}
}
public void printFreq(ArrayList<Frequency> al)
{
for(int i = 0; i < al.size(); i++)
{
System.out.println(al.get(i).getString() + "; " + al.get(i).getFreq());
}
}
}
What needs to be done now is I need to create a method that will search through the tree to find the binary code (011001 etc) to the specific character. What is the best way to do this? I thought maybe I would do a normal search through the tree as if it were an AVL tree going to the right if its bigger or left if it's smaller.
But because the nodes don't use ints doubles etc. but only using objects that contain characters as strings or null to signify its not a leaf but only a root. The other option would be to do an in-order run through to find the leaf that I'm looking for but at the same time how would I determine if I went right so many times or left so many times to get the character.
package huffman;
public class Frequency implements Comparable {
private String s;
private int n;
public Frequency left;
public Frequency right;
Frequency(String s, int n)
{
this.s = s;
this.n = n;
}
public String getString()
{
return s;
}
public int getFreq()
{
return n;
}
public void setFreq(int n)
{
this.n = n;
}
#Override
public int compareTo(Object arg0) {
Frequency other = (Frequency)arg0;
return n < other.n ? -1 : (n == other.n ? 0 : 1);
}
}
What I'm trying to do is find the binary code to actually get to each character. So if I were trying to encode aabbbcccc how would I create a string holding the binary code for a going left is 0 and going right is 1.
What has me confused is because you can't determine where anything is because the tree is obviously unbalanced and there is no determining if a character is right or left of where you are. So you have to search through the whole tree but if you get to a node that isn't what you are looking for, you have backtrack to another root to get to the other leaves.

Traverse through the huffman tree nodes to get a map like {'a': "1001", 'b': "10001"} etc. You can use this map to get the binary code to a specific character.
If you need to do in reverse, just handle it as a state machine:
state = huffman_root
for each bit
if (state.type == 'leaf')
output(state.data);
state = huffman_root
state = state.leaves[bit]
Honestly said, I didn't look into your code. It ought be pretty obvious what to do with the fancy tree.

Remember, if you have 1001, you will never have a 10010 or 10011. So your basic method looks like this (in pseudocode):
if(input == thisNode.key) return thisNode.value
if(input.endsWith(1)) return search(thisNode.left)
else return search(thisNode.right)
I didn't read your program to figure out how to integrate it, but that's a key element of huffman encoding in a nutshell
Try something like this - you're trying to find token. So if you wanted to find the String for "10010", you'd do search(root,"10010")
String search(Frequency top, String token) {
return search(top,token,0);
}
// depending on your tree, you may have to switch top.left and top.right
String search(Frequency top, String token, int depth) {
if(token.length() == depth) return "NOT FOUND";
if(token.length() == depth - 1) return top.getString();
if(token.charAt(depth) == '0') return search(top.left,token,depth+1);
else return search(top.right,token,depth+1);
}

I considered two options when I was having a go at Huffman coding encoding tree.
option 1: use pointer based binary tree. I coded most of this and then felt that, to trace up the tree from the leaf to find an encoding, I needed parent pointers. other wise, like mentioned in this post, you do a search of the tree which is not a solution to finding the encoding straight away. The disadvantage of the pointer based tree is that, I have to have 3 pointers for every node in the tree which I thought was too much. The code to follow the pointers is simple but more complicated that in option 2.
option 2: use an array based tree to represent the encoding tree that you will use on the run to encode and decode. so if you want the encoding of a character, you find the character in the array. Pretty straight forward, I use a table so smack right and there I get the leaf. now I trace up to the root which is at index 1 in the array. I do a (current_index / 2) for the parent. if child index is parent /2 it is a left and otherwise right.
option 2 was pretty easy to code up and although the array can have a empty spaces. I thought it was better in performance than a pointer based tree. Besides identifying the root and leaf now is a matter of indices rather than object type. ;) This will also be very usefull if you have to send your tree!?
also, you dont search (root, 10110) while decoding the Huffman code. You just walk the tree through the stream of encoded bitstream, take a left or right based on your bit and when you reach the leaf, you output the character.
Hope this was helpful.
Harisankar Krishna Swamy (example)

I guess your homework is either done or very late by now, but maybe this will help someone else.
It's actually pretty simple. You create a tree where 0 goes right and 1 goes left. Reading the stream will navigate you through the tree. When you hit a leaf, you found a letter and start over from the beginning. Like glowcoder said, you will never have a letter on a non-leaf node. The tree also covers every possible sequence of bits. So navigating in this way always works no matter the encoded input.
I had an assignment to write an huffman encoder/decoder just like you a while ago and I wrote a blog post with the code in Java and a longer explanation : http://www.byteauthor.com/2010/09/huffman-coding-in-java/
PS. Most of the explanation is on serializing the huffman tree with the least possible number of bits but the encoding/decoding algorithms are pretty straightforward in the sources.

Here's a Scala implementation: http://blog.flotsam.nl/2011/10/huffman-coding-done-in-scala.html

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.