Java - Swapping Values and Keys in one Map? - java

I am making an encrypt and decrypt program for my programming class, however I am a year ahead of the group so I thought I'd simplify things by using what I learned last year. I decided to use a Tree Map. What the program does is it takes in a file, reads the first line which contains the encrypt data of how the letters will be coded. It is in a format such as "A->B","B->C","C->A" etc. and then a blank line for line 2 and the third line contains the message. I used reg. expressions to remove the char's I do not need from the text file, mapped the Keys to the first letter and then set those values to the arrowed letter. (A is key, B is value) So if the message said ABC, it would become BCA. I am wondering, as for decrypting, if there was a way to easily flip the Keys and Values to where if input was, A key = B val, it would swap to B key = A val. Just looking for an easier method than what I am currently doing with collections and iterators.

Just looking for an easier method than what I am currently doing with collections and iterators.
This is the only way you could possibly do it, the reason being that in a general map, there could be several keys mapping to the same value, in which case there would be no way to automatically determine what to do with duplicate keys in the resulting map.

Sounds like you want a Bi-Directional Map, something like Guava BiMap

Related

Fastest way to build an index (list of substrings with lines of occurrence) of a String?

Problem:
Essentially, my goal is to build an ArrayList of IndexEntry objects from a text file. An IndexEntry has the following fields: String word, representing this unique word in the text file, and ArrayList numsList, a list containing the lines of the text file in which word occurs.
The ArrayList I build must keep the IndexEntries sorted so that their word fields are in alphabetical order. However, I want to do this in the fastest way possible. Currently, I visit each word as it appears in the text file and use binary search to determine if an IndexEntry for that word already exists in order to add the current line number to its numsList. In the case of an IndexEntry not existing I create a new one in the appropriate spot in order to maintain alphabetical order.
Example:
_
One
Two
One
Three
_
Would yield an ArrayList of IndexEntries whose output as a String (in the order of word, numsList) is:
One [1, 5], Three [7], Two [3]
Keep in mind that I am working with much larger text files, with many occurrences of the same word.
Question:
Is binary search the fastest way to approach this problem? I am still a novice at programming in Java, and am curious about searching algorithms that might perform better in this scenario or the relative time complexity of using a Hash Table when compared with my current solution.
You could try a TreeMap or a ConcurrentSkipListMap which will keep your index sorted.
However, if you only need a sorted list at the end of your indexing, good old HashMap<String, List> is the way to go (ArrayList as value is probably a safe bet as well)
When you are done, get the values of the map and sort them once by key.
Should be good enough for a couple hundred megabytes of text files.
If you are on Java 8, use the neat computeIfAbsent and computeIfPresent methods.

Comparing sets of randomly assigned codes in Java to assign a name

Good day,
I honestly do not know how to phrase the problem in the title, thus the generic description. Actually I have a set of ~150 codes, which are combined to produce a single string, like this "a_b_c_d". Valid combinations contain 1-4 code combinations plus the '-' character if no value is assigned, and each code is only used once( "a_a..." is not considered valid). These sets of codes are then assigned to a unique name. Not all combinations are logical, but if a combination is valid then the order of the codes does not matter (if "f_g_e_-" is valid, then "e_g_f_-","e_f_-_ g_" is valid, and they all have the same name). I have taken the time and assigned each valid combination to its unique name and tried to create a single parser to read these values and produce the name.
I think the problem is apparent. Since the order does not matter, I have to check for every possible combination. The codes cannot be strictly orderd, since there are some codes who have meaning in any position.So, this is impossible to accomplish with a simple parser. Is there an optimal way to do this, or will I have to force the user to use some kind of order against the standard?
Try using TreehMap to store the code (string) and and its count (int). increment the count for the code every time it is encountered in the string.
After processing the whole string if you find the count for any code > 1 then string has repeated codes and is invalid, else valid.
Traversing TreeMap will be sorted based on key value. Traverse the TreeMap to generate code sequence that will be sorted.

Good collection for keeping track of characters

Just a general question (and I'm sort of new to java) but what would be a good collection that I could add objects to, and keep track of how many of each I've added? For example, if I added the alphabet a character at a time, it would have 26 different characters, and an associated value of 1 for each. Likewise, adding 'z' 10 times would have z with an associated 10. Suggestions? The name "hashtable" had sounded promising, but I don't think I want to use that...
First thing that comes to mind is a Dictionary. The key would be the ASCII value of the character, and the value would be the number of times it is used. Not necessarily the most efficient way to do it, but it is one of the easiest.
You could also do it with a single array, and offset the value 0 to be the first ASCII character.
If you want an extremely fast implementation, a HashMap is actually a very good idea.
For concurrency, you can use a ConcurrentHashMap.
There's no need to use a special data structure as simply using a HashMap should work well. When adding a char, myChar, you call get(myChar), and if null, create a new item for the map for that Character with an Integer value of 1. If the Map returns an Integer, simply add one to it, and then put it back into the Map.
Multiset is the data structure for this purpose. Guava has a implementation of it.
Multiset<Character> charFrequency=HashMultiset.create();
charFrequency.add(char1);
charFrequency.add(char1);
charFrequency.count(char1)

Using Binary Trees to find Anagrams

I am currently trying to create a method that uses a binary tree that finds anagrams of a word inputted by the user.
If the tree does not contain any other anagram for the word (i.e., if the key was not in the tree or the only element in the associated linked list was the word provided by the user), the message "no anagram found " gets printed
For example, if key "opst" appears in the tree with an associated linked list containing the words "spot", "pots", and "tops", and the user gave the word "spot", the program should print "pots" and "tops" (but not spot).
public boolean find(K thisKey, T thisElement){
return find(root, thisKey, thisElement);
}
public boolean find(Node current, K thisKey, T thisElement){
if (current == null)
return false;
else{
int comp = current.key.compareTo(thisKey);
if (comp>0)
return find(current.left, thisKey, thisElement);
else if(comp<0)
return find(current.right, thisKey, thisElement);
else{
return current.item.find(thisElement);
}
}
}
While I created this method to find if the element provided is in the tree (and the associated key), I was told not to reuse this code for finding anagrams.
K is a generic type that extends Comparable and represents the Key, T is a generic type that represents an item in the list.
If extra methods I've done are required, I can edit this post, but I am absolutely lost. (Just need a pointer in the right direction)
It's a little unclear what exactly is tripping you up (beyond "I've written a nice find method but am not allowed to use it."), so I think the best thing to do is start from the top.
I think you will find that once you get your data structured in just the right way, the actual algorithms will follow relatively easily (many computer science problems share this feature.)
You have three things:
1) Many linked lists, each of which contains the set of anagrams of some set of letters. I am assuming you can generate these lists as you need to.
2) A binary tree, that maps Strings (keys) to lists of anagrams generated from those strings. Again, I'm assuming that you are able to perform basic operations on these treed--adding elements, finding elements by key, etc.
3) A user-inputted String.
Insight: The anagrams of a group of letters form an equivalence class. This means that any member of an anagram list can be used as the key associated with the list. Furthermore, it means that you don't need to store in your tree multiple keys that point to the same list (provided that we are a bit clever about structuring our data; see below).
In concrete terms, there is no need to have both "spot" and "opts" as keys in the tree pointing to the same list, because once you can find the list using any anagram of "spot", you get all the anagrams of "spot".
Structuring your data cleverly: Given our insight, assume that our tree contains exactly one key for each unique set of anagrams. So "opts" maps to {"opts", "pots", "spot", etc.}. What happens if our user gives us a String that we're not using as the key for its set of anagrams? How do we figure out that if the user types "spot", we should find the list that is keyed by "opts"?
The answer is to normalize the data stored in our data structures. This is a computer-science-y way of saying that we enforce arbitrary rules about how we store the data. (Normalizing data is a useful technique that appears repeatedly in many different computer science domains.) The first rule is that we only ever have ONE key in our tree that maps to a given linked list. Second, what if we make sure that each key we actually store is predictable--that is we know that we should search for "opts" even if the user types "spot"?
There are many ways to achieve this predictability--one simple one is to make sure that the letters of every key are in alphabetical order. Then, we know that every set of anagrams will be keyed by the (unique!) member of the set that comes first in alphabetical order. Consistently enforcing this rule makes it easy to search the tree--we know that no matter what string the user gives us, the key we want is the string formed from alphabetizing the user's input.
Putting it together: I'll provide the high-level algorithm here to make this a little more concrete.
1) Get a String from the user (hold on to this String, we'll need it later)
2) Turn this string into a search key that follows our normalization scheme
(You can do this in the constructor of your "K" class, which ensures that you will never have a non-normalized key anywhere in your program.)
3) Search the tree for that key, and get the linked list associated with it. This list contains every anagram of the user's input String.
4) Print every item in the list that isn't the user's original string (see why we kept the string handy?)
Takeaways:
Frequently, your data will have some special features that allow you to be clever. In this case it is the fact that any member of an anagram list can be the sole key we store for that list.
Normalizing your data give you predictability and allows you to reason about it effectively. How much more difficult would the "find" algorithm be if each key could be an arbitrary member of its anagram list?
Corollary: Getting your data structures exactly right (What am I storing? How are the pieces connected? How is it represented?) will make it much easier to write your algorithms.
What about sorting the characters in the words, and then compare that.

Detect changes in random ordered input (hash function?)

I'm reading lines of text that can come in any order. The problem is that the output can actually be indentical to the previous output. How can I detect this, without sorting the output first?
Is there some kind of hash function that can take identical input, but in any order, and still produce the same result?
The easiest way would seem to be to hash each line on the way in, storing the hash and the original data, and then compare each new hash with your collection of existing hashes. If you get a positive, you could compare the actual data, to make sure it's not a false positive - though this would be extremely rare, you could go with a quicker hash algorithm, like MD5 or CRC (instead of something like SHA, which is slower but less likely to collide), just so it's quick, and then compare the actual data when you get a hit.
So you have input like
A B C D
D E F G
C B A D
and you need to detect that the first and third lines are identical?
If you want to find out if two files contain the same set of lines, but in a different order, you can use a regular hash function on each line individually, then combine them with a function where ordering doesn't matter, like addition.
If the lines are fairly long, you could just keep a list of the hashes of each line -- sort those and compare with previous outputs.
If you don't need a 100% fool-proof solution, you could store the hash of each line in a Bloom filter (look it up on Wikipedia) and compare the Bloom filters at the end of processing. This can give you false positives (i.e. you think you have the same output but it isn't really the same) but you can tweak the error rate by adjusting the size of the Bloom filter...
If you add up the ASCII values of each character, you'd get the same result regardless of order.
(This may be a bit too simplified, but perhaps it sparks an idea for you.
See Programming Pearls, section 2.8, for an interesting back story.)
Any of the hash-based methods may produce bad results because more than one string can produce the same hash. (It's not likely, but it's possible.) This is particularly true of the suggestion to add the hashes, since you would essentially be taking a particularly bad hash of the hash values.
A hash method should only be attempted if it's not critical that you miss a change or spot a change where none exists.
The most accurate way would be to keep a Map using the line strings as key and storing the count of each as the value. (If each string can only appear once, you don't need the count.) Compute this for the expected set of lines. Duplicate this collection to examine the incoming lines, reducing the count for each line as you see it.
If you encounter a line with a zero count (or no map entry at all), you've seen a line you didn't expect.
If you end this with non-zero entries remaining in the Map, you didn't see something you expected.
Well the problem specification is a bit limited.
As I understand it you wish to see if several strings contain the same elements regardless of order.
For example:
A B C
C B A
are the same.
The way to do this is to create a set of the values then compare the sets. To create a set do:
HashSet set = new HashSet();
foreach (item : string) {
set.add(item);
}
Then just compare the contents of the sets by running through one of the sets and comparing it w/others. The execution time will be O(N) instead of O(NlogN) for the sorting example.

Categories

Resources