Data structure for writing a dictionary in java - java

Which data structure would be more preferable for making a dictionary in Java ? What would be better a tree or a hash-table ?

A Map. Nothing else. If you want sorting, use a TreeMap. Otherwise: HashMap.
A Dictionary maps a (key-)word to it's description or translation.

I would use something like
Map<String,Integer> dictionary = Collections.synchronizedMap(new TreeMap<String,Integer>());
Instead of Integer as the value of a String key, you can use a Class object which probably can hold a list that contains all the positions of that word inside the document
There are methods for easy retrieval of key values from the TreeMap. Following is the way you get an iterator out of TreeMap.
Set<Entry<String,Integer>> set = dictionary.entrySet();
Iterator<Entry<String,Integer>> entryItr = set.iterator();
Entry<String,Integer> entry = null;
while(entryItr.hasnext()){
entry = entryItr.next();
// Do whatever you want.
}

I would use a Trie, especially for memory efficiency and also prefix lookups.
I have an implementation that implements the map interface under APL on github: https://github.com/Blazebit/blaze-utils/tree/c7b1fa586590d121d9f44c1686cb58de0349eb0b/blaze-common-utils
Check it out and maybe it fits your needs better than a simple map.

According to the lecture in Introduction to Algorithms of MIT, I would say it is better to go with hash-tables. Because you can do operations in O(1) instead of O(logn)
https://www.youtube.com/watch?v=0M_kIqhwbFo

Related

Search a Map for multiple keys in parallel

Given a Map<String, Collection<String>> up to 1M items. I know what to query that Map for 5K keys, of which I'm unsure whether they are in the map or not.
Currently, I'm using a TreeMap and search for each item, one by one. Which seems sub-optimal. Is there an, already, implemented way to query a Map for X keys?
The result of the search should be a subset of items, which are found in the Map, for further querying - ordering is irrelevant.
I was hoping to use stream, but, apparently, that's only for Collections.
Note: the number are impressions, from what I've seen in the map, probably not the upper limit...
There is no better way than querying your map for each element:
List<V> vs = keysToSearch.stream()
.map(k -> map.get(k))
.filter(Objects::nonNull)
.collect(Collectors.toList())
You can also try using a parallelStream if your data structures work in a concurrent environment.
assuming memory is not a problem for you. here is one way of doing it.
by using retainAll
Set<String> mapKeys = new HashSet<String>(myMap.keySet());
mapKeys.retainAll(my5kKeys); //<--- all keys that match the my5kKeys...
If you have M items in your map, and K keys you are searching for, then your best-case efficiency is O(min(M, K)). If M is very large, the best you can do is to check each K (perhaps in parallel, but you must do each).
If it were the case that M turned out to be much smaller than K, then you could do better by only checking through all M values to see if they existed in K. In any event, you want to check the smaller set's values against the larger.
There is no better way then to create a loop and search for all the keys individually.
A method like retainAll is just a wrapper around such a loop written by somebody else.
However, the important thing is to use a HashMap instead of a TreeMap. Hashmaps contains is O(1) while Treemap takes O(log(n)).
If you need the sorted collection for something else, you could put the data in both a TreeMap and a HashMap.

Creating Dictionary in java?

Everywhere on net, here is the way
Map<String, String> map = new HashMap<String, String>();
map.put("dog", "type of animal");
System.out.println(map.get("dog"));
My point is should it not be Treemap considering dictionary has to be sorted? Agreed lookup wont be optimized in case of Treemap but considering sorting its best data structure
UPDATE :- one more requirement is return the lexicographically nearest word if the word searched is not present . I am not sure how to achieve it?
If you need the map sorted by its keys, then use TreeMap, which "...provides guaranteed log(n) time cost for the containsKey, get, put and remove operations." If not, use the more general HashMap (which "...provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets..."), or one of the other Map implementations, depending on your need.
If you want to get value for given key and if the probability of having the exact match of key in hashmap is less then using hashmap wont give you benefit of direct lookup.
If using TreeMap you can get list of keys which is already ordered and can perform a binary search on the list. While searching compare key lexicographically. Continue binary search till the lexicographic distance between two keys is minimum or 0.
Dictionary is no longer a term used in the language. You'll get multiple answers.
I know that Objective-C uses a class called Dictionary that is as a Key / Value data structure. The fact that it's named Dictionary leads me to believe that is the ordering of the objects, I imagine the Key has to be a string or char
So, it depends on the entire question.
When someone says they want to create a Key/Value data structure that is ordered alphabetically, or a "Dictionary", the answer is:
TreeMap<String, Object> map = new TreeMap<>()
If someone is asking how to create a Key/Value object similar to a Dictionary in whatever language, they will likely get any of the java.util classes that implement the Map<K, V> interface, for example HashMap, TreeMap. A good answer would be a TreeMap.
In this case telling someone to use a HashMap is not debatable, because the answer is as vague as the question.

Data structure that has key-value mapping as well as ordering

I need a data structure that provides key-value mappings, like a Map, but that also allows me to fetch the key based on an (int) index (e.g. myKey = myDS.get(index)), without having to iterate over the data structure to get the key at the desired index.
I thought of using LinkedHashMap, but I don't see a way to get the key at a given index. Am I missing something in LinkedHashMap? Or is there another data structure I can use?
EDIT:
This is not a duplicate. The correct answer to the other question is to use some sort of SortedMap; however, that is not the correct answer to this question, since I'd like to be able to retrieve an Entry from the data structure via an Integer index, which is not supported in any Java library.
LinkedHashMap provides a hash table/doubly linked list implementation of the Map interface. Since it extends HashMap, it's still backed by an array, but there is also a doubly-linked list of Entry objects to ensure that the iteration order is predictable.
So, basically what it means is that when you iterate through the map like so:
for (Map.Entry<keyType,valueType>> entry : linkedHashMap.entrySet())
{
System.out.println("Key: " + entry.getKey().toString() +
" Value: " + entry.getValue.toString());
}
it will print in the order that you added the keys, as opposed to a non-linked Map, which will not print in insertion order. You cannot access the elements of the array like you want to, because the array that backs the hash is not in order. Only the doubly linked list is ordered.
Solution:
What you are looking for is a LinkedMap from Apache Commons.
AFAIK, there is no single data structure that will do this. There is certainly not one in the standard Java collection suite.
Also LinkedHashMap is not the solution because you cannot efficiently index a LinkedHashMap.
If you want to do index-based lookup as well as keep-based lookup, solution needs to be a combination of two data structures.
A Map<Key, Value> and an ArrayList<Value> is the simpler approach, but it has a couple of problems:
- Insertion and deletion of values from the ArrayList is expensive, unless you are inserting / deleting at the tail end of the list.
- Insertion and deletion makes the list positions unstable,.
If you want stable indexes and scalable insertion and deletion, then you need a Map<Key, Value> and a Map<Integer, Value> ... and a way to manage (i.e. recycle) the index values.
The Apache Commons LinkedMap class is a possible solution, except that it suffers from the problem that index values are not stable in the face of insertions and deletions.
How about using:
Map<String, String> map = new LinkedHashMap<String, String>();
List<Entry<String, String>> mapAsList = new ArrayList<Map.Entry<String,String>>(map.entrySet());
mapAsList.get(index);
I do not believe there is a collection for this; collections are either based on the idea that you want to know exactly where an element is (lists) or that you want quick access based on some key or criteria (maps); doing both would be very resource-intensive to maintain.
Of course, you can make something like this, as rocketboy's answer suggests, but I'm guessing it's not really possible to make efficient.
There is no direct DS in the standard Java Collections API to provide a indexed map. However, the following should let you achieve the result:
// An ordered map
Map<K, V> map = new LinkedHashMap<K, V>();
// To create indexed list, copy the references into an ArrayList (backed by an array)
List<Entry<K, V>> indexedList = new ArrayList<Map.Entry<K, V>>(map.entrySet());
// Get the i'th term
<Map.Entry<K,V>> entry = indexedList.get(index);
K key = entry.getKey();
V value = entry.getValue();
You might still want to retain the concerns of data persistence in the map separate from the retrieval.
Update:
Or use LinkedMap from Apache Commons as suggested by Steve.

Which data structure should I use?

I want to store some words and their occurrence times in a website, and I don't know which structure I should use.
Every time I add a word in the structure, it first checks if the word already exists, if yes, the occurrence times plus one, if not, add the word into the structure. Thus I can find an element very fast by using this structure. I guess I should use a hashtable or hashmap, right?
And I also want to get a sorted list, thus the structure can be ranked in a short time.
Forgot to mention, I am using Java to write it.
Thanks guys! :)
A HashMap seems like it would suit you well. If you need a thread-safe option, then go with ConcurrentHashMap.
For example:
Map<String, Integer> wordOccurenceMap = new HashMap<>();
"TreeMap provides guaranteed O(log n) lookup time (and insertion etc), whereas HashMap provides O(1) lookup time if the hash code disperses keys appropriately. Unless you need the entries to be sorted, I'd stick with HashMap." -part of Jon Skeet's answer in TreeMap or HashMap.
TreeMap is the better solution, if you want both Sorting functionality and counting words.
Custom Trie can make more efficient but it's not required unless you are modifying the words.
Define a Hashmap with word as the key and counter as the value
Map<String,Integer> wordsCountMap = new HashMap<String,Integer>();
Then add the logic like this:
When you get a word, check for it in the map using containsKey method
If key(word) is found, fetch the value using get and increment the value
If key(word) is not found, add the value using thw word as key and put with count 1 as value
So, you could use HashMap, but don't forget about multythreading. Is this data structure could be accessed throught few thread? Also, you could use three map in a case that data have some hirarchy (e.g. in a case of rakning and sort it by time). Also, you could look throught google guava collections, probably, they will be more sutabile for you.
Any Map Implementation Will Do. If Localized Changes prefer HashMap otherWise
ConcurrentHashMap for multithreading.
Remember to use any stemming Library.
stemming library in java
for example working and work logically are same word.
Remember Integer is immutable see example below
Example :
Map<String, Integer> occurrence = new ConcurrentHashMap<String, Integer>();
synchronized void addWord(String word) { // may need to synchronize this method
String stemmedWord = stem(word);
Integer count = occurrence.get(stemmedWord)
if(count == null) {
count = new Integer(0);
}
count ++;
occurrence.put(stemmedWord, count);
**// the above is necessary as Integer is immutable**
}

How to check if a key in a Map starts with a given String value

I'm looking for a method like:
myMap.containsKeyStartingWith("abc"); // returns true if there's a key starting with "abc" e.g. "abcd"
or
MapUtils.containsKeyStartingWith(myMap, "abc"); // same
I wondered if anyone knew of a simple way to do this
Thanks
This can be done with a standard SortedMap:
Map<String,V> tailMap = myMap.tailMap(prefix);
boolean result = (!tailMap.isEmpty() && tailMap.firstKey().startsWith(prefix));
Unsorted maps (e.g. HashMap) don't intrinsically support prefix lookups, so for those you'll have to iterate over all keys.
From the map, you can get a Set of Keys, and in case they are String, you can iterate over the elements of the Set and check for startsWith("abc")
To build on Adel Boutros answer/comment about the efficiency of iterating keys, you could encapsulate key iteration in a Map subclass or decorator.
Extending HashMap would give you a class to put the method in and keep map-specific code out of your method, so lowering complexity and making the code more natural to read.

Categories

Resources