Big database, many maps?

Big database, many maps? - java

As part of a student project I am building a large database, which could theoretically contain millions of objects.
I'm starting with the firstname (i.e. Person fName = "John").
My plan is to convert "John" to a hashcode, convert the hashcode into an Integer, then store it in the map as - (as integer comparisons are faster).
Now here is my problem - To make the iteration faster, I want to have separate static maps, accessed depending on the first letter of the name. Like
public class FirstNameList {
private static Map<Integer, String> a = new HashMap<Integer, String>();
private static Map<Integer, String> b = new HashMap<Integer, String>();
private static Map<Integer, String> c = new HashMap<Integer, String>();
// etc
public void addFName(String word) {
if (word.length() == 0)
throw new IllegalArgumentException("No name entered");
word = word.toLowerCase();
char x = word.charAt(0);
Integer i = word.hashCode();
x-correctMap.put(i, word);
}
However it does not feel very efficient to have 26 if statements to pick the correct list. Does anyone know a way to pick the correct map? Or just better ideas in general?

I have three suggestions: the first is a direct answer to your question:
First, you could store your maps in a map with the initial character as a key, then simply look up the correct map in the "master map."
Second, this won't be any faster -- likely slower -- than simply storing everything in a single map. Never assume you're more clever than the authors of a library class without measurements to prove there's a problem.
Third, you said "database", so why not use a database instead of all this? Postgres and MySQL are both free and easy to use, and will serve your needs well.

You could nest your maps, key of first map is for letter.

You could have a list of maps, but you likely don't want to use a map at all. Something like TreeMap or a trie would work better, and allow you to have a single data structure.
Also, taking the hashcode of a string and then hashing is likely to be no faster than just using a String.

Related

Java - multiple hashmaps pointing to the same key

I have multiple files which contains key=value string pairs. The keys are the same between the files, but the values differs. Each file can have 1000 plus of such pairs.
I want to store each file in a separate hashmap, ie map<KeyString, ValueString>, so if there are five files, then there will be five hashmaps.
To avoid duplicating the keys across each hashmap, is it possible to have each map reference the same key? Note that once the keys are added to the map, it will not be deleted.
I considered making the first file the 'base' as in the flyweight pattern, this base would be the intrinsic set of keys/values. The other remaining files would be the extrinsic set of values, but I don't know how to relate the values back to the base (intrinsic) keys without duplicating the keys?
I am open to a simpler/better approach.

I can think about a simpler approach. Instead of having Map<String, String> think of Map<String, List<String> or directly MultiMap<String, String> from guava.
If each key is in each file and all have values, you could store values from first file at 0th index, from the second at 1st index etc.
If it wouldn't work, I recommend a Collection<Map<String, String>, so you're able to iterate through your Maps. Then when you want to add value to one of the Maps, go through all keySets and if one of them contains that key, just put with object returned from this keySet.
Other solution would be to have a HashSet of keys that have already been put. This would be more efficient.

After reading in the keys, you can use String.intern().
When called, what it does is either:
add the String to the internal pool if it didn't exist already;
return the equivalent String from the pool if it already existed.
String#intern Javadoc

First of all, I don't see the problem with storing multiple instances of your String keys. 5 HashMaps * 1000 keys is a very small number, and you shouldn't have memory issues.
That said, if you still want to avoid duplicating the Strings, you can create the first HashMap, and then you the exact same keys for the other HashMaps.
For example, suppose map1 is the first HashMap and it is already populated with the contents of the first file.
You can write something like this to populate the 2nd HashMap:
for (String key : map1.keySet()) {
map2.put (key, someValue);
}
Of course you will have to find for each key of the first map the corresponding value of the second map. If the keys are not stored in the same order in the input files, this may require some preliminary sorting step.

Perhaps you could hold a static Map<> to map your keys to unique Integers and use those Integers for the keys to your map?
Something like:
class KeySharedMap<K,V> {
// The next key to use. Using Atomics for the auto-increment.
static final AtomicInteger next = new AtomicInteger(0);
// Static mapping of keys to unique Integers.
static final ConcurrentMap<Object,Integer> keys = new ConcurrentHashMap<>();
// The map indexed by Integer from the `keys`.
Map<Integer, V> map = new HashMap<>();
public V get(Object key) {
return map.get(keys.get(key));
}
public V put(Object key, V value) {
// Associate a unique integer for each unique key.
keys.computeIfAbsent(key,x -> next.getAndIncrement());
// Put it in my map.
return map.put(keys.get(key),value);
}
}
Yes, I realise that K is not used here but I suspect it would be necessary if you wish to implement Map<K,V>.

fast static key-value mapping

I have a set of unique key-value pairs, both key and value are strings. The number of pairs is very huge and finding the value of a certain string is extremely time-critical.
The pairs are computed beforehand and are given for a certain program. So i could just write a method containing:
public String getValue(String key)
{
//repeat for every pair
if(key.equals("abc"))
{
return "def";
}
}
but i am talking about more than 250,000 pairs and perhaps sorting them could be faster...
I am having a class that contains the getValue() method and can use its constructor, but has no connection to any database/file system etc. So every pair has to be defined inside the class.
Do you have any ideas that could be faster than a huge series of if-statements? Perhaps using a sorting map that gets the pairs presorted. Perhaps improve constructor-time by deserializing an already created map?
I would like your answers to contain a basic code example of your approach, I will comment answers with their corresponding time it took an a set of pairs!
Time-frame: for one constructor call and 20 calls of getValue() 1000 milliseconds.
Keys have a size of 256 and values have a size < 16

This is exactly what a hash table is made for. It provides O(1) lookup if implemented correctly, which means that as long as you have enough memory and your hash function and collision strategy are smart, it can get values for keys in constant time. Java has multiple hash-backed data structures, from the sounds of things a HashMap<String, String> would do the trick for you.
You can construct it like this:
Map<String, String> myHashTable = new HashMap<String, String>();
add values like this:
myHashTable.put("abcd","value corresponding to abcd");
and get the value for a key like this:
myHashTable.get("abcd");
You were on the right track when you had the intuition that running through all of the keys and checking was not efficient, that would be an O(n) runtime approach, since you'd have to run through all n elements.

Which data structure should I use?

I want to store some words and their occurrence times in a website, and I don't know which structure I should use.
Every time I add a word in the structure, it first checks if the word already exists, if yes, the occurrence times plus one, if not, add the word into the structure. Thus I can find an element very fast by using this structure. I guess I should use a hashtable or hashmap, right?
And I also want to get a sorted list, thus the structure can be ranked in a short time.
Forgot to mention, I am using Java to write it.
Thanks guys! :)

A HashMap seems like it would suit you well. If you need a thread-safe option, then go with ConcurrentHashMap.
For example:
Map<String, Integer> wordOccurenceMap = new HashMap<>();
"TreeMap provides guaranteed O(log n) lookup time (and insertion etc), whereas HashMap provides O(1) lookup time if the hash code disperses keys appropriately. Unless you need the entries to be sorted, I'd stick with HashMap." -part of Jon Skeet's answer in TreeMap or HashMap.

TreeMap is the better solution, if you want both Sorting functionality and counting words.
Custom Trie can make more efficient but it's not required unless you are modifying the words.

Define a Hashmap with word as the key and counter as the value
Map<String,Integer> wordsCountMap = new HashMap<String,Integer>();
Then add the logic like this:
When you get a word, check for it in the map using containsKey method
If key(word) is found, fetch the value using get and increment the value
If key(word) is not found, add the value using thw word as key and put with count 1 as value

So, you could use HashMap, but don't forget about multythreading. Is this data structure could be accessed throught few thread? Also, you could use three map in a case that data have some hirarchy (e.g. in a case of rakning and sort it by time). Also, you could look throught google guava collections, probably, they will be more sutabile for you.

Any Map Implementation Will Do. If Localized Changes prefer HashMap otherWise
ConcurrentHashMap for multithreading.
Remember to use any stemming Library.
stemming library in java
for example working and work logically are same word.
Remember Integer is immutable see example below
Example :
Map<String, Integer> occurrence = new ConcurrentHashMap<String, Integer>();
synchronized void addWord(String word) { // may need to synchronize this method
String stemmedWord = stem(word);
Integer count = occurrence.get(stemmedWord)
if(count == null) {
count = new Integer(0);
}
count ++;
occurrence.put(stemmedWord, count);
**// the above is necessary as Integer is immutable**
}

Comparing TreeMap contents gives incorrect answer

I use a TreeMap as a 'key' inside another TreeMap
ie
TreeMap<TreeMap<String, String>, Object>
In my code 'object' is a personal construct, but for this intance I have used a string.
I have created a pair of TreeMaps to test the TreeMap.CompareTo() and TreeMap.HashCode() methods. this starts with the following...
public class TreeMapTest
public void testTreeMap()
{
TreeMap<String, String> first = new TreeMap<String, String>();
TreeMap<String, String> second = new TreeMap<String, String>();
first.put("one", "une");
first.put("two", "deux");
first.put("three", "trois");
second.put("une", "one");
second.put("deux", "two");
second.put("trois", "three");
TreeMap<TreeMap<String, String>, String> english = new TreeMap<TreeMap<String, String>, String>();
TreeMap<TreeMap<String, String>, String> french = new TreeMap<TreeMap<String, String>, String>();
english.put(first, "english");
french.put(second, "french");
From here I now call the the english item to see if it contains the key
if (english.containsKey(second))
{
System.out.println("english contains the key");
//throws error of ClassCastException: Java.util.TreeMap cannot be cast to
//Java.Lang.Comparable, reading the docs suggests this is the feature if the key is
//not of a supported type.
//this error does not occur if I use a HashMap structure for all maps, why is
//this key type supported for one map structure but not another?
}
However I should note that both HashMap and TreeMap point to the same HashCode() method in the AbstractMap parent.
My first thought was to convert my TreeMap to a HashMap, but this seemed a bit soppy! So I decided to apply the hashCode() method to the 2 treemap objects.
int hc1 = first.hashCode();
int hc2 = second.hashCode();
if(hc1 == hc2)
{
systom.out.printline("values are equal " + hc1 + " " + hc2);
}
prints the following
values are equal 3877431 & 3877431
For me the hashcode should be different as the key values are different, I can't find details on the implementation difference of the hashCode() method between HashMap and TreeMap.
Please not the following.
changing the Keys only to HashMap doesn't stop the ClassCastException error. Changing all the maps to a HashMap does. so there is something with the containsKey() method in TreeMap that isn't working properly, or I have missunderstood - can anyone explain what?
The section where I get the hashCode of the first and second map objects always produces the same output (no matter if I use a Hash or Tree map here) However the if(english.ContainsKey(second)) doesn't print any message when HashMaps are used, so there is obviously something in the HashMap implementation that is different for the compareTo() method.
My principle questions are.
Where can I find details of the types of keys for use in TreeMap objects (to prevent future 'ClassCastException' errors).
If I can't use a certain type of object as a key, why am I allowed to insert it as a key into the TreeMap in the first place? (surely if I can insert it I should be able to check if the key exists?)
Can anyone suggest another construct that has ordered inster / retrieval to replace my TreeMap key objects?
Or have I potentially found strange behaviour. From my understanding I should be able to do a drop in replacement of TreeMap for HashMap, or have I stumbled upon a fringe scenario?
Thanks in advance for your comments.
David.
ps. the problem isn't a problem in my code as I use a personal utility to create a hash that becomes dependent on the Key and Value pairs (ie I calculate key hash values differently to value hash values... sorry that if is a confusing sentence!) I assume that the hashCode method just sums all the values together without considering if a item is a key or a value.
pps. I'm not sure if this is a good question or not, any pointers on how to improve it?
Edit.
from the responses people seem to think I'm doing some sort of fancy language dictionary stuff, not a surprise from my example, so sorry for that. I used this as an example as it came easily to my brain, was quick to write and demonstrated my question.
The real problem is as follows.
I'm accessing a legacy DB structure, and it doesn't talk nicely to anything (result sets aren't forward and reverse readable etc). So I grab the data and create objects from them.
The smallest object represents a single row in a table (this is the object that in the above example I have used a string value 'english' or 'french' for.
I have a collection of these rowObjects, each row has an obvious key (this is the TreeMap that points to the related rowObject).
i don't know if that makes things any clearer!
Edit 2.
I feel I need to elaborate a little further as to my choice of originaly using
hashMap<HashMap<String,string>, dataObject>
for my data structure, then converting to TreeMap to gain an ordered view.
In edit 1 I said that the legacy DB doesn't play nicely (this is an issue with the JDBC.ODBC I suspect, and I'm not about to acquire a JDBC to communicate with the DB). The truth is I apply some modifications to the data as as I create my java 'dataObject'. This means that although the DB may spit out the results in ascending or descending order, I have no way of knowing what order they are inserted into my dataObject. Using a likedHashMap seems like a nice solution (see duffymo's suggestion) but I later need to extract the data in an ordered fashion, not just consecutively (LinkedHashMap only preserves insertion order), and I'm not inclined to mess around with ordering everything and making copies when I need to insert a new item in between 2 others, TreMap would do this for me... but if I create a specific object for the key it will simply contain a TreeMap as a member, and obviously I will then need to supply a compareTo and hashCode method. So why not just extent TreeMap (allthough Duffymo has a point about throwing that solution out)!

This is not a good idea. Map keys must be immutable to work properly, and yours are not.
What are you really trying to do? When I see people doing things like this with data structures, it makes me think that they really need an object but have forgotten that Java's an object-oriented language.
Looks like you want a crude dictionary to translate between languages. I'd create a LanguageLookup class that embedded those Maps and provide some methods to make it easier for users to interact with it. Better abstraction and encapsulation, more information hiding. Those should be your design objectives. Think about how to add other languages besides English and French so you can use it in other contexts.
public class LanguageLookup {
private Map<String, String> dictionary;
public LanguageLookup(Map<String, String> words) {
this.dictionary = ((words == null) ? new HashMap<String, String>() : new HashMap<String, String>(words));
}
public String lookup(String from) {
return this.dictionary.get(from);
}
public boolean hasWord(String word) {
return this.dictionary.containsKey(word);
}
}
In your case, it looks like you want to translate an English word to French and then see if the French dictionary contains that word:
Map<String, String> englishToFrenchWords = new HashMap<String, String>();
englishToFrenchWords.put("one", "une");
Map<String, String> frenchToEnglishWords = new HashMap<String, String>();
frenchToEnglishWords.put("une", "one");
LanguageLookup englishToFrench = new LanguageLookup(englishToFrenchWords);
LanguageLookup frenchToEnglish = new LanguageLookup(frenchToEnglishWords);
String french = englishToFrench.lookup("one");
boolean hasUne = frenchToEnglish.hasWord(french);

Your TreeMap is not Comparable so you can't add it to a SortedMap and its not immutable so you can't add it to a HashMap. What you could use an IdentityMap but suspect an EnumMap is a better choice.
enum Language { ENGLISH, FRENCH }
Map<Language, Map<Language, Map<String, String>>> dictionaries =
new EnumMap<>(Language.class);
Map<Language, Map<String, String>> fromEnglishMap = new EnumMap<>(Language.class);
dictionaries.put(Language.ENGLISH, fromEnglishMap);
fromEnglishMap.put(Language.FRENCH, first);
Map<Language, Map<String, String>> fromFrenchMap = new EnumMap<>(Language.class);
dictionaries.put(Language.FRENCH, fromFrenchMap);
fromEnglishMap.put(Language.ENGLISH, second);
Map<String, String> fromEnglishToFrench= dictionaries.get(Language.ENGLISH)
.get(Language.FRENCH);

To the problem why Hashmap works and Treemap does not:
A Treemap is a "sorted map", meaning that the entries are sorted according to the key. This means that the key must be comparable, by implementing the Comparable interface. Maps usually do NOT implement this, and I would highly suggest you do not create a custom type to add this feature. As duffymo mentions, using maps as keys is a BAD idea.

Finding the highest-n values in a Map

I have a large map of String->Integer and I want to find the highest 5 values in the map. My current approach involves translating the map into an array list of pair(key, value) object and then sorting using Collections.sort() before taking the first 5. It is possible for a key to have its value updated during the course of operation.
I think this approach is acceptable single threaded, but if I had multiple threads all triggering the transpose and sort frequently it doesn't seem very efficient. The alternative seems to be to maintain a separate list of the highest 5 entries and keep it updated when relevant operations on the map take place.
Could I have some suggestions/alternatives on optimizing this please? Am happy to consider different data structures if there is benefit.
Thanks!

Well, to find the highest 5 values in a Map, you can do that in O(n) time where any sort is slower than that.
The easiest way is to simply do a for loop through the entry set of the Map.
for (Entry<String, Integer> entry: map.entrySet()) {
if (entry.getValue() > smallestMaxSoFar)
updateListOfMaximums();
}

You could use two Maps:
// Map name to value
Map<String, Integer> byName
// Maps value to names
NavigableMap<Integer, Collection<String>> byValue
and make sure to always keep them in sync (possibly wrap both in another class which is responsible for put, get, etc). For the highest values use byValue.navigableKeySet().descendingIterator().

I think this approach is acceptable single threaded, but if I had multiple threads all triggering the transpose and sort frequently it doesn't seem very efficient. The alternative seems to be to maintain a separate list of the highest 5 entries and keep it updated when relevant operations on the map take place.
There is an approach in between that you can take as well. When a thread requests a "sorted view" of the map, create a copy of the map and then handle the sorting on that.
public List<Integer> getMaxFive() {
Map<String, Integer> copy = null;
synchronized(lockObject) {
copy = new HashMap<String, Integer>(originalMap);
}
//sort the copy as usual
return list;
}
Ideally if you have some state (such as this map) accessed by multiple threads, you are encapsulating the state behind some other class so that each thread is not updating the map directly.

I would create a method like:
private static int[] getMaxFromMap(Map<String, Integer> map, int qty) {
int[] max = new int[qty];
for (int a=0; a<qty; a++) {
max[a] = Collections.max(map.values());
map.values().removeAll(Collections.singleton(max[a]));
if (map.size() == 0)
break;
}
return max;
}
Taking advantage of Collections.max() and Collections.singleton()

There are two ways of doing that easily:
Put the map into a heap structure and retrive the n elements you want from it.
Iterate through the map and update a list of n highest values using each entry.
If you want to retrive an unknown or a large number of highest values the first method is the way to go. If you have a fixed small amount of values to retrieve, the second might be easier to understand for some programmers.
Personally, I prefer the first method.

Please try another data structure. Suppose there's a class named MyClass which its attributes are key (String) and value (int). MyClass, of course, needs to implement Comparable interface. Another approach is to create a class named MyClassComparator which extends Comparator.
The compareTo (no matter where it is) method should be defined like this:
compareTo(parameters){
return value2 - value1; // descending
}
The rest is easy. Using List and invoking Collections.sort(parameters) method will do the sorting part.
I don't know what sorting algorithm Collections.sort(parameters) uses. But if you feel that some data may come over time, you will need an insertion sort. Since it's good for a data that nearly sorted and it's online.

If modifications are rare, I'd implement some SortedByValHashMap<K,V> extends HashMap <K,V>, similar to LinkedHashMap) that keeps the entries ordered by value.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Big database, many maps? - java

You could nest your maps, key of first map is for letter.

You could have a list of maps, but you likely don't want to use a map at all. Something like TreeMap or a trie would work better, and allow you to have a single data structure. Also, taking the hashcode of a string and then hashing is likely to be no faster than just using a String.

Related

Java - multiple hashmaps pointing to the same key

fast static key-value mapping

Which data structure should I use?

Comparing TreeMap contents gives incorrect answer

Finding the highest-n values in a Map

Categories

Resources