Using Binary Trees to find Anagrams

Using Binary Trees to find Anagrams - java

I am currently trying to create a method that uses a binary tree that finds anagrams of a word inputted by the user.
If the tree does not contain any other anagram for the word (i.e., if the key was not in the tree or the only element in the associated linked list was the word provided by the user), the message "no anagram found " gets printed
For example, if key "opst" appears in the tree with an associated linked list containing the words "spot", "pots", and "tops", and the user gave the word "spot", the program should print "pots" and "tops" (but not spot).
public boolean find(K thisKey, T thisElement){
return find(root, thisKey, thisElement);
}
public boolean find(Node current, K thisKey, T thisElement){
if (current == null)
return false;
else{
int comp = current.key.compareTo(thisKey);
if (comp>0)
return find(current.left, thisKey, thisElement);
else if(comp<0)
return find(current.right, thisKey, thisElement);
else{
return current.item.find(thisElement);
}
}
}
While I created this method to find if the element provided is in the tree (and the associated key), I was told not to reuse this code for finding anagrams.
K is a generic type that extends Comparable and represents the Key, T is a generic type that represents an item in the list.
If extra methods I've done are required, I can edit this post, but I am absolutely lost. (Just need a pointer in the right direction)

It's a little unclear what exactly is tripping you up (beyond "I've written a nice find method but am not allowed to use it."), so I think the best thing to do is start from the top.
I think you will find that once you get your data structured in just the right way, the actual algorithms will follow relatively easily (many computer science problems share this feature.)
You have three things:
1) Many linked lists, each of which contains the set of anagrams of some set of letters. I am assuming you can generate these lists as you need to.
2) A binary tree, that maps Strings (keys) to lists of anagrams generated from those strings. Again, I'm assuming that you are able to perform basic operations on these treed--adding elements, finding elements by key, etc.
3) A user-inputted String.
Insight: The anagrams of a group of letters form an equivalence class. This means that any member of an anagram list can be used as the key associated with the list. Furthermore, it means that you don't need to store in your tree multiple keys that point to the same list (provided that we are a bit clever about structuring our data; see below).
In concrete terms, there is no need to have both "spot" and "opts" as keys in the tree pointing to the same list, because once you can find the list using any anagram of "spot", you get all the anagrams of "spot".
Structuring your data cleverly: Given our insight, assume that our tree contains exactly one key for each unique set of anagrams. So "opts" maps to {"opts", "pots", "spot", etc.}. What happens if our user gives us a String that we're not using as the key for its set of anagrams? How do we figure out that if the user types "spot", we should find the list that is keyed by "opts"?
The answer is to normalize the data stored in our data structures. This is a computer-science-y way of saying that we enforce arbitrary rules about how we store the data. (Normalizing data is a useful technique that appears repeatedly in many different computer science domains.) The first rule is that we only ever have ONE key in our tree that maps to a given linked list. Second, what if we make sure that each key we actually store is predictable--that is we know that we should search for "opts" even if the user types "spot"?
There are many ways to achieve this predictability--one simple one is to make sure that the letters of every key are in alphabetical order. Then, we know that every set of anagrams will be keyed by the (unique!) member of the set that comes first in alphabetical order. Consistently enforcing this rule makes it easy to search the tree--we know that no matter what string the user gives us, the key we want is the string formed from alphabetizing the user's input.
Putting it together: I'll provide the high-level algorithm here to make this a little more concrete.
1) Get a String from the user (hold on to this String, we'll need it later)
2) Turn this string into a search key that follows our normalization scheme
(You can do this in the constructor of your "K" class, which ensures that you will never have a non-normalized key anywhere in your program.)
3) Search the tree for that key, and get the linked list associated with it. This list contains every anagram of the user's input String.
4) Print every item in the list that isn't the user's original string (see why we kept the string handy?)
Takeaways:
Frequently, your data will have some special features that allow you to be clever. In this case it is the fact that any member of an anagram list can be the sole key we store for that list.
Normalizing your data give you predictability and allows you to reason about it effectively. How much more difficult would the "find" algorithm be if each key could be an arbitrary member of its anagram list?
Corollary: Getting your data structures exactly right (What am I storing? How are the pieces connected? How is it represented?) will make it much easier to write your algorithms.

What about sorting the characters in the words, and then compare that.

Related

Best algorithm and data structure to compare two big lists

Everyday I receive one list of 30-40k lines, each line contains meaningful or meaningless names like fastcar, ultrafastcar, blablablacar etc.
I also have one big list which consists of the all words in any language (about 50k lines).
And i want to compare first list against second in order to filter which includes(or starts with - ends with) the words from the second list. I mean If word "ultrafastcar" then it will not be filtered but "blablacar" will be filtered out.
I have prepared some Java codes but it takes too long to compare lists. I have used ArrayLists and compared them with contains(), startsWith() methods. Are ArrayLists correct choice and what else algorithm can i use to compare them except these methods.

You could try implementing a ternary search tree with the second list and then check if the words in the first exist in the tree.

Fastest way to build an index (list of substrings with lines of occurrence) of a String?

Problem:
Essentially, my goal is to build an ArrayList of IndexEntry objects from a text file. An IndexEntry has the following fields: String word, representing this unique word in the text file, and ArrayList numsList, a list containing the lines of the text file in which word occurs.
The ArrayList I build must keep the IndexEntries sorted so that their word fields are in alphabetical order. However, I want to do this in the fastest way possible. Currently, I visit each word as it appears in the text file and use binary search to determine if an IndexEntry for that word already exists in order to add the current line number to its numsList. In the case of an IndexEntry not existing I create a new one in the appropriate spot in order to maintain alphabetical order.
Example:
_
One
Two
One
Three
_
Would yield an ArrayList of IndexEntries whose output as a String (in the order of word, numsList) is:
One [1, 5], Three [7], Two [3]
Keep in mind that I am working with much larger text files, with many occurrences of the same word.
Question:
Is binary search the fastest way to approach this problem? I am still a novice at programming in Java, and am curious about searching algorithms that might perform better in this scenario or the relative time complexity of using a Hash Table when compared with my current solution.

You could try a TreeMap or a ConcurrentSkipListMap which will keep your index sorted.
However, if you only need a sorted list at the end of your indexing, good old HashMap<String, List> is the way to go (ArrayList as value is probably a safe bet as well)
When you are done, get the values of the map and sort them once by key.
Should be good enough for a couple hundred megabytes of text files.
If you are on Java 8, use the neat computeIfAbsent and computeIfPresent methods.

Comparing sets of randomly assigned codes in Java to assign a name

Good day,
I honestly do not know how to phrase the problem in the title, thus the generic description. Actually I have a set of ~150 codes, which are combined to produce a single string, like this "a_b_c_d". Valid combinations contain 1-4 code combinations plus the '-' character if no value is assigned, and each code is only used once( "a_a..." is not considered valid). These sets of codes are then assigned to a unique name. Not all combinations are logical, but if a combination is valid then the order of the codes does not matter (if "f_g_e_-" is valid, then "e_g_f_-","e_f_-_ g_" is valid, and they all have the same name). I have taken the time and assigned each valid combination to its unique name and tried to create a single parser to read these values and produce the name.
I think the problem is apparent. Since the order does not matter, I have to check for every possible combination. The codes cannot be strictly orderd, since there are some codes who have meaning in any position.So, this is impossible to accomplish with a simple parser. Is there an optimal way to do this, or will I have to force the user to use some kind of order against the standard?

Try using TreehMap to store the code (string) and and its count (int). increment the count for the code every time it is encountered in the string.
After processing the whole string if you find the count for any code > 1 then string has repeated codes and is invalid, else valid.
Traversing TreeMap will be sorted based on key value. Traverse the TreeMap to generate code sequence that will be sorted.

ArrayList Searching Multiple Words

int index = Collections.binarySearch(myList, SearchWord);
System.out.println(myList.get(index));
Actually, i stored 1 million words to Array List, now i need to search the particular word through the key. The result is not a single word, it may contain multiple words.
Example is suppose if i type "A" means output is [Aarhus, Aaron, Ababa,...]. The result depends on searching word. How can i do it and which sorting algorithm is best in Collections.

Options:
If you want to stick to the array list, sort it before you search. Then find first key that matches your search criteria and iterate from it onward until you find a key that does not match. Collect all matching keys into some buffer structure. Bingo, you have your answer.
Change data structure to a tree. Either
a simple binary tree - your get all keys sorted automatically. Navigate the tree in depth first fashion. Until you find a key that does not match.
a fancy trie structure. That way you get your get all keys sorted automatically plus you get a significant performance boost due to efficient storage. Rest is the same, navigate the tree, collect matching keys.

Java - Swapping Values and Keys in one Map?

I am making an encrypt and decrypt program for my programming class, however I am a year ahead of the group so I thought I'd simplify things by using what I learned last year. I decided to use a Tree Map. What the program does is it takes in a file, reads the first line which contains the encrypt data of how the letters will be coded. It is in a format such as "A->B","B->C","C->A" etc. and then a blank line for line 2 and the third line contains the message. I used reg. expressions to remove the char's I do not need from the text file, mapped the Keys to the first letter and then set those values to the arrowed letter. (A is key, B is value) So if the message said ABC, it would become BCA. I am wondering, as for decrypting, if there was a way to easily flip the Keys and Values to where if input was, A key = B val, it would swap to B key = A val. Just looking for an easier method than what I am currently doing with collections and iterators.

Just looking for an easier method than what I am currently doing with collections and iterators.
This is the only way you could possibly do it, the reason being that in a general map, there could be several keys mapping to the same value, in which case there would be no way to automatically determine what to do with duplicate keys in the resulting map.

Sounds like you want a Bi-Directional Map, something like Guava BiMap

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.