Which data structure should I use? - java

I want to store some words and their occurrence times in a website, and I don't know which structure I should use.
Every time I add a word in the structure, it first checks if the word already exists, if yes, the occurrence times plus one, if not, add the word into the structure. Thus I can find an element very fast by using this structure. I guess I should use a hashtable or hashmap, right?
And I also want to get a sorted list, thus the structure can be ranked in a short time.
Forgot to mention, I am using Java to write it.
Thanks guys! :)

A HashMap seems like it would suit you well. If you need a thread-safe option, then go with ConcurrentHashMap.
For example:
Map<String, Integer> wordOccurenceMap = new HashMap<>();
"TreeMap provides guaranteed O(log n) lookup time (and insertion etc), whereas HashMap provides O(1) lookup time if the hash code disperses keys appropriately. Unless you need the entries to be sorted, I'd stick with HashMap." -part of Jon Skeet's answer in TreeMap or HashMap.

TreeMap is the better solution, if you want both Sorting functionality and counting words.
Custom Trie can make more efficient but it's not required unless you are modifying the words.

Define a Hashmap with word as the key and counter as the value
Map<String,Integer> wordsCountMap = new HashMap<String,Integer>();
Then add the logic like this:
When you get a word, check for it in the map using containsKey method
If key(word) is found, fetch the value using get and increment the value
If key(word) is not found, add the value using thw word as key and put with count 1 as value

So, you could use HashMap, but don't forget about multythreading. Is this data structure could be accessed throught few thread? Also, you could use three map in a case that data have some hirarchy (e.g. in a case of rakning and sort it by time). Also, you could look throught google guava collections, probably, they will be more sutabile for you.

Any Map Implementation Will Do. If Localized Changes prefer HashMap otherWise
ConcurrentHashMap for multithreading.
Remember to use any stemming Library.
stemming library in java
for example working and work logically are same word.
Remember Integer is immutable see example below
Example :
Map<String, Integer> occurrence = new ConcurrentHashMap<String, Integer>();
synchronized void addWord(String word) { // may need to synchronize this method
String stemmedWord = stem(word);
Integer count = occurrence.get(stemmedWord)
if(count == null) {
count = new Integer(0);
}
count ++;
occurrence.put(stemmedWord, count);
**// the above is necessary as Integer is immutable**
}

Related

Is there an efficient way of checking if HashMap contains keys that map to the same value?

I basically need to know if my HashMap has different keys that map to the same value. I was wondering if there is a way other than checking each keys value against all other values in the map.
Update:
Just some more information that will hopefully clarify what I'm trying to accomplish. Consider a String "azza". Say that I'm iterating over this String and storing each character as a key, and it's corresponding value is some other String. Let's say I eventually get to the last occurrence of 'a' and the value is already be in the map.This would be fine if the key corresponding with the value that is already in the map is also 'a'. My issue occurs when 'a' and 'z' both map to the same value. Only if different keys map to the same value.
Sure, the fastest to both code and execute is:
boolean hasDupeValues = new HashSet<>(map.values()).size() != map.size();
which executes in O(n) time.
Sets don't allow duplicates, so the set will be smaller than the values list if there are dupes.
Very similar to EJP's and Bohemian's answer above but with streams:
boolean hasDupeValues = map.values().stream().distinct().count() != map.size();
You could create a HashMap that maps values to lists of keys. This would take more space and require (slightly) more complex code, but with the benefit of greatly higher efficiency (amortized O(1) vs. O(n) for the method of just looping all values).
For example, say you currently have HashMap<Key, Value> map1, and you want to know which keys have the same value. You create another map, HashMap<Value, List<Key>> map2.
Then you just modify map1 and map2 together.
map1.put(key, value);
if(!map2.containsKey(value)) {
map2.put(value, new ArrayList<Key>);
}
map2.get(value).add(key);
Then to get all keys that map to value, you just do map2.get(value).
If you need to put/remove in many different places, to make sure that you don't forget to use map2 you could create your own data structure (i.e. a separate class) that contains 2 maps and implement put/remove/get/etc. for that.
Edit: I may have misunderstood the question. If you don't need an actual list of keys, just a simple "yes/no" answer to "does the map already contain this value?", and you want something better than O(n), you could keep a separate HashMap<Value, Integer> that simply counts up how many times the value occurs in the map. This would take considerably less space than a map of lists.
You can check whether a map contains a value already by calling map.values().contains(value). This is not as efficient as looking up a key in the map, but still, it's O(n), and you don't need to create a new set just in order to count its elements.
However, what you seem to need is a BiMap. There is no such thing in the Java standard library, but you can build one relatively easily by using two HashMaps: one which maps keys to values and one which maps values to keys. Every time you map a key to a value, you can then check in amortized O(1) whether the value already is mapped to, and if it isn't, map the key to the value in the one map and the value to the key in the other.
If it is an option to create a new dependency for your project, some third-party libraries contain ready-made bimaps, such as Guava (BiMap) and Apache Commons (BidiMap).
You could iterate over the keys and save the current value in the Set.
But, before inserting that value in a Set, check if the Set already contains that value.
If this is true, it means that a previous key already contains the same value.
Map<Integer, String> map = new HashMap<>();
Set<String> values = new HashSet<>();
Set<Integter> keysWithSameValue = new HashSet<>();
for(Integer key : map.keySet()) {
if(values.contains(map.get(key))) {
keysWithSameValue.add(key);
}
values.add(map.get(key));
}

Data structure for writing a dictionary in java

Which data structure would be more preferable for making a dictionary in Java ? What would be better a tree or a hash-table ?
A Map. Nothing else. If you want sorting, use a TreeMap. Otherwise: HashMap.
A Dictionary maps a (key-)word to it's description or translation.
I would use something like
Map<String,Integer> dictionary = Collections.synchronizedMap(new TreeMap<String,Integer>());
Instead of Integer as the value of a String key, you can use a Class object which probably can hold a list that contains all the positions of that word inside the document
There are methods for easy retrieval of key values from the TreeMap. Following is the way you get an iterator out of TreeMap.
Set<Entry<String,Integer>> set = dictionary.entrySet();
Iterator<Entry<String,Integer>> entryItr = set.iterator();
Entry<String,Integer> entry = null;
while(entryItr.hasnext()){
entry = entryItr.next();
// Do whatever you want.
}
I would use a Trie, especially for memory efficiency and also prefix lookups.
I have an implementation that implements the map interface under APL on github: https://github.com/Blazebit/blaze-utils/tree/c7b1fa586590d121d9f44c1686cb58de0349eb0b/blaze-common-utils
Check it out and maybe it fits your needs better than a simple map.
According to the lecture in Introduction to Algorithms of MIT, I would say it is better to go with hash-tables. Because you can do operations in O(1) instead of O(logn)
https://www.youtube.com/watch?v=0M_kIqhwbFo

How to check if a key in a Map starts with a given String value

I'm looking for a method like:
myMap.containsKeyStartingWith("abc"); // returns true if there's a key starting with "abc" e.g. "abcd"
or
MapUtils.containsKeyStartingWith(myMap, "abc"); // same
I wondered if anyone knew of a simple way to do this
Thanks
This can be done with a standard SortedMap:
Map<String,V> tailMap = myMap.tailMap(prefix);
boolean result = (!tailMap.isEmpty() && tailMap.firstKey().startsWith(prefix));
Unsorted maps (e.g. HashMap) don't intrinsically support prefix lookups, so for those you'll have to iterate over all keys.
From the map, you can get a Set of Keys, and in case they are String, you can iterate over the elements of the Set and check for startsWith("abc")
To build on Adel Boutros answer/comment about the efficiency of iterating keys, you could encapsulate key iteration in a Map subclass or decorator.
Extending HashMap would give you a class to put the method in and keep map-specific code out of your method, so lowering complexity and making the code more natural to read.

Finding the highest-n values in a Map

I have a large map of String->Integer and I want to find the highest 5 values in the map. My current approach involves translating the map into an array list of pair(key, value) object and then sorting using Collections.sort() before taking the first 5. It is possible for a key to have its value updated during the course of operation.
I think this approach is acceptable single threaded, but if I had multiple threads all triggering the transpose and sort frequently it doesn't seem very efficient. The alternative seems to be to maintain a separate list of the highest 5 entries and keep it updated when relevant operations on the map take place.
Could I have some suggestions/alternatives on optimizing this please? Am happy to consider different data structures if there is benefit.
Thanks!
Well, to find the highest 5 values in a Map, you can do that in O(n) time where any sort is slower than that.
The easiest way is to simply do a for loop through the entry set of the Map.
for (Entry<String, Integer> entry: map.entrySet()) {
if (entry.getValue() > smallestMaxSoFar)
updateListOfMaximums();
}
You could use two Maps:
// Map name to value
Map<String, Integer> byName
// Maps value to names
NavigableMap<Integer, Collection<String>> byValue
and make sure to always keep them in sync (possibly wrap both in another class which is responsible for put, get, etc). For the highest values use byValue.navigableKeySet().descendingIterator().
I think this approach is acceptable single threaded, but if I had multiple threads all triggering the transpose and sort frequently it doesn't seem very efficient. The alternative seems to be to maintain a separate list of the highest 5 entries and keep it updated when relevant operations on the map take place.
There is an approach in between that you can take as well. When a thread requests a "sorted view" of the map, create a copy of the map and then handle the sorting on that.
public List<Integer> getMaxFive() {
Map<String, Integer> copy = null;
synchronized(lockObject) {
copy = new HashMap<String, Integer>(originalMap);
}
//sort the copy as usual
return list;
}
Ideally if you have some state (such as this map) accessed by multiple threads, you are encapsulating the state behind some other class so that each thread is not updating the map directly.
I would create a method like:
private static int[] getMaxFromMap(Map<String, Integer> map, int qty) {
int[] max = new int[qty];
for (int a=0; a<qty; a++) {
max[a] = Collections.max(map.values());
map.values().removeAll(Collections.singleton(max[a]));
if (map.size() == 0)
break;
}
return max;
}
Taking advantage of Collections.max() and Collections.singleton()
There are two ways of doing that easily:
Put the map into a heap structure and retrive the n elements you want from it.
Iterate through the map and update a list of n highest values using each entry.
If you want to retrive an unknown or a large number of highest values the first method is the way to go. If you have a fixed small amount of values to retrieve, the second might be easier to understand for some programmers.
Personally, I prefer the first method.
Please try another data structure. Suppose there's a class named MyClass which its attributes are key (String) and value (int). MyClass, of course, needs to implement Comparable interface. Another approach is to create a class named MyClassComparator which extends Comparator.
The compareTo (no matter where it is) method should be defined like this:
compareTo(parameters){
return value2 - value1; // descending
}
The rest is easy. Using List and invoking Collections.sort(parameters) method will do the sorting part.
I don't know what sorting algorithm Collections.sort(parameters) uses. But if you feel that some data may come over time, you will need an insertion sort. Since it's good for a data that nearly sorted and it's online.
If modifications are rare, I'd implement some SortedByValHashMap<K,V> extends HashMap <K,V>, similar to LinkedHashMap) that keeps the entries ordered by value.

Java queue and multi-dimension array

First of all, this is my code (just started learning java):
Queue<String> qe = new LinkedList<String>();
qe.add("b");
qe.add("a");
qe.add("c");
qe.add("d");
qe.add("e");
My question:
Is it possible to add element to the queue with two values, like:
qe.add("a","1"); // where 1 is integer
So, that I know element "a" have value 1. If I want to add a number let say "2" to element a, I will have like a => 3.
If this cant be done, what else in java classes that can handle this? I tried to use multi-dimention array, but its kinda hard to do the queue, like pop, push etc. (Maybe I am wrong)
How to call specific element in the queue? Like, call element a, to check its value.
[Note]
Please don't give me links that ask me to read java docs. I was reading, and I still dont get it. The reason why I ask here is because, I know I can find the answer faster and easier.
You'd want to combine a Queue<K> with a Map<K,V>:
Put the keys (e.g. "a", "b") into the Queue<K>
Assign the mapping of the keys to values (e.g. "a"=>3) in the Map<K,V>
I think you're asking for a dictionary type in Java.
Map<String, Integer> map = new HashMap<String, Integer>();
map.put("a", 1);
map.put("b", 2);
You can then access them by key - in this case the String you choose as the key.
int value = map.get("a");
Value in this case will return 1.
Is that what you want?
You want to use a HashMap instead of LinkedList. HashMap is a dictionary-like structure that allows you to create associations, for instance a=>1.
Check out JavaDocs for HashMap to get a grasp how to use it:-).
I think what you are asking for is LinkedHashMap which is a combination of a Queue and a HashMap. While you are able to store the key and value pairs, it would also remember the order like Queue does. The only thing is you'd have to use an iterator since there is no poll() method, however you can visit each element in the order that they were added.

Categories

Resources