Memory efficient multivaluemap - java

Hi I have the following problem:
I'm storing strings and a corresponding list of integer values in an MultiValueMap<String, Integer>
I'm storing about 13 000 000 million strings and one string can have up to 500 or more values.
For every single value i will have random access on the Map. So worst case are 13 000 000* 500 put calls. Now the speed of the map is good but the memory overhead gets quite high. A MultiValueMap<String, Integer> is nothing else then a HashMap/TreeMap<String, <ArrayList<Integer>>. Both HashMap and TreeMap have quite a lot of memory Overhead. I wont be modifying the map once it is done, but I need it to be fast and as small as possible for random access in a program. (I'm storing it on disk and loading it on start, the serialized map file takes up about 600mb but in memory its about 3gb?)
the most memory efficient thing would be, to store the String in sorted string array and have a corresponding two dimensional int array for values. So access would be a binary search on the string array and getting the corresponding values.
Now I have three ways to get there:
I use a sorted MultivalueMap (TreeMap) for the creation phase to store everything.After I'm finished with getting all values, I get the string array by calling map.keyset().toArray(new String[0]); Make a two dimensional int array and get all the values from the multivaluemap.
Pro: It's easy to implement, It is still fast during creation.
Con: It takes up even more memory during the copying from Map to Arrays.
I use Arrays or maybe ArrayLists from the start and store everything in there
Pro: least memory overhead.
Con: this would be enormously slow because i would have to sort/copy the Array every time a add a new Key, Also i would need to implement my own (propably even slower) sorting to keep the corresponding int array in the same order like the strings. Hard to implement
I use Arrays and a MultivalueMap as buffer. After the program finished 10% or 20% of the creation phase, I will add the values to the Arrays and keep them in order, then start a new Map.
Pro: Propably still fast enough and memory efficient enough.
Con: Hard to implement.
None of these solutions really feel right to me. Do you know any other solutions to this problem, maybe a memory efficient (MultiValue)Map implementation?
I know I could be using a database so don't bother posting it as an answer. I want to know how i could do this without using a database.

If you switched to Guava's Multimap -- I have no idea if that's possible for your application -- you might be able to use Trove and get
ListMultimap<String, Integer> multimap = Multimaps.newListMultimap(
new HashMap<String, Collection<Integer>>(),
new Supplier<List<Integer>>() {
public List<Integer> get() {
return new TIntListDecorator();
}
});
which will make a ListMultimap that uses a HashMap to map to List values backed by int[] arrays, which should be memory-efficient, though you'll pay a small speed penalty because of boxing. You might be able to do something similar for MultiValueMap, though I have no idea what library that's from.

You can use compressed strings to reduce drastically the memory usage.
Parameters to configure your JVM
Comparison of its usage between various java versions
Furthermore, there are other more drastic solutions (it would require some reimplementation):
Memory-disk based list implementation or suggestions about NoSQL database.

Depending on which Integer values you store in your map, a large amount of your heap memory overhead may be caused by having distinct Integer instances, which take up much more RAM than a primitive int value.
Consider using a Map from String to one of the many IntArrayList implementations floating around (e.g. in Colt or in Primitive Collections for Java), which basically implement a List backed by an int array, instead of a being backed by an array of Integer instances.

First, consider the memory taken by the integers. You said that the range will be about 0-4000000. 24 bits is enough to represent 16777216 distinct values. If that is acceptable, you could use byte arrays for the integers, with 3 bytes per integer, and save 25%. You would have to index into the array something like this:
int getPackedInt(byte[] array, int index) {
int i = index*3;
return ((array[i] & 0xFF)<<16) + ((array[i+1] & 0xFF) <<8) + (array[i+2] & 0xFF);
}
int storePackedInt(byte[] array, int index, int value) {
assert value >= 0 && value <= 0xFFFFFF;
int i = index*3;
array[i] = (byte)((value>>16) & 0xFF);
array[i+1] = (byte)((value>>8) & 0xFF);
array[i+2] = (byte)(value & 0xFF);
}
Can you say anything about the distribution of the integers? If many of them will fit in 16 bits, you could use an encoding with a variable number of bytes per number (something like UTF-8 does for representing characters).
Next, consider whether you can save memory on the Strings. What are the characteristics of the Strings? How long will they typically be? Will many strings share prefixes? A compression scheme tailored to the characteristics of your application could save a lot of space (as falsarella pointed out). OR, if many strings will share prefixes, storing them in some type of search trie could be more efficient. (There is a type of trie called "patricia" which might be suitable for this application.) As a bonus, note that searching for Strings in a trie can be faster than searching a hash map (though you'd have to benchmark to see if that is true in your application).
Will the Strings all be ASCII? If so, 50% of the memory used for Strings will be wasted, as a Java char is 16 bits. Again, in this case, you could consider using byte arrays.
If you only need to look Strings up, not iterate over the stored Strings, you could also consider something rather unconventional: hash the Strings, and keep only the hash. Since different String can hash to the same value, there is a chance that a String which was never stored, may still be "found" by a search. But if you use enough bits for the hash value (and a good hash function), you can make that chance so infinitesimally small that it will almost certainly never happen in the estimated lifespan of the universe.
Finally, there is the memory for the structure itself, which holds the Strings and integers. I already suggested using a trie, but if you decide not to do that, nothing will use less memory than parallel arrays -- one sorted array of Strings (which you can do binary search on, as you said), and a parallel array of arrays of integers. After you do a binary search to find an index into the String array, you can use the same index to access the array-of-integer array.
While you are building the structure, if you do decide that a search trie is a good choice, I would just use that directly. Otherwise, you could do 2 passes: one to build up a set of strings (then put them into an array and sort them), and a second pass to add the arrays of integers.

If there are patterns to your key strings, especially common roots, then a a Trie could be an effective method of storing significantly less data.
Here's the code for a working TrieMap.
NB: The usual advice on using EntrySet to iterate across Maps does not apply to Tries. They are exceptionally inefficient in a Trie so please avoid requesting one if at all possible.
/**
* Implementation of a Trie structure.
*
* A Trie is a compact form of tree that takes advantage of common prefixes
* to the keys.
*
* A normal HashSet will take the key and compute a hash from it, this hash will
* be used to locate the value through various methods but usually some kind
* of bucket system is used. The memory footprint resulting becomes something
* like O(n).
*
* A Trie structure essentuially combines all common prefixes into a single key.
* For example, holding the strings A, AB, ABC and ABCD will only take enough
* space to record the presence of ABCD. The presence of the others will be
* recorded as flags within the record of ABCD structure at zero cost.
*
* This structure is useful for holding similar strings such as product IDs or
* credit card numbers.
*
*/
public class TrieMap<V> extends AbstractMap<String, V> implements Map<String, V> {
/**
* Map each character to a sub-trie.
*
* Could replace this with a 256 entry array of Tries but this will handle
* multibyte character sets and I can discard empty maps.
*
* Maintained at null until needed (for better memory footprint).
*
*/
protected Map<Character, TrieMap<V>> children = null;
/**
* Here we store the map contents.
*/
protected V leaf = null;
/**
* Set the leaf value to a new setting and return the old one.
*
* #param newValue
* #return old value of leaf.
*/
protected V setLeaf(V newValue) {
V old = leaf;
leaf = newValue;
return old;
}
/**
* I've always wanted to name a method something like this.
*/
protected void makeChildren () {
if ( children == null ) {
// Use a TreeMap to ensure sorted iteration.
children = new TreeMap<Character, TrieMap<V>>();
}
}
/**
* Finds the TrieMap that "should" contain the key.
*
* #param key
*
* The key to find.
*
* #param grow
*
* Set to true to grow the Trie to fit the key.
*
* #return
*
* The sub Trie that "should" contain the key or null if key was not found and
* grow was false.
*/
protected TrieMap<V> find(String key, boolean grow) {
if (key.length() == 0) {
// Found it!
return this;
} else {
// Not at end of string.
if (grow) {
// Grow the tree.
makeChildren();
}
if (children != null) {
// Ask the kids.
char ch = key.charAt(0);
TrieMap<V> child = children.get(ch);
if (child == null && grow) {
// Make the child.
child = new TrieMap<V>();
// Store the child.
children.put(ch, child);
}
if (child != null) {
// Find it in the child.
return child.find(tail(key), grow);
}
}
}
return null;
}
/**
* Remove the head (first character) from the string.
*
* #param s
*
* The string.
*
* #return
*
* The same string without the first (head) character.
*
*/
// Suppress warnings over taking a subsequence
private String tail(String s) {
return s.substring(1, s.length());
}
/**
*
* Add a new value to the map.
*
* Time footprint = O(s.length).
*
* #param s
*
* The key defining the place to add.
*
* #param value
*
* The value to add there.
*
* #return
*
* The value that was there, or null if it wasn't.
*
*/
#Override
public V put(String key, V value) {
V old = null;
// If empty string.
if (key.length() == 0) {
old = setLeaf(value);
} else {
// Find it.
old = find(key, true).put("", value);
}
return old;
}
/**
* Gets the value at the specified key position.
*
* #param o
*
* The key to the location.
*
* #return
*
* The value at that location, or null if there is no value at that location.
*/
#Override
public V get(Object o) {
V got = null;
if (o != null) {
String key = (String) o;
TrieMap<V> it = find(key, false);
if (it != null) {
got = it.leaf;
}
} else {
throw new NullPointerException("Nulls not allowed.");
}
return got;
}
/**
* Remove the value at the specified location.
*
* #param o
*
* The key to the location.
*
* #return
*
* The value that was removed, or null if there was no value at that location.
*/
#Override
public V remove(Object o) {
V old = null;
if (o != null) {
String key = (String) o;
if (key.length() == 0) {
// Its me!
old = leaf;
leaf = null;
} else {
TrieMap<V> it = find(key, false);
if (it != null) {
old = it.remove("");
}
}
} else {
throw new NullPointerException("Nulls not allowed.");
}
return old;
}
/**
* Count the number of values in the structure.
*
* #return
*
* The number of values in the structure.
*/
#Override
public int size() {
// If I am a leaf then size increases by 1.
int size = leaf != null ? 1 : 0;
if (children != null) {
// Add sizes of all my children.
for (Character c : children.keySet()) {
size += children.get(c).size();
}
}
return size;
}
/**
* Is the tree empty?
*
* #return
*
* true if the tree is empty.
* false if there is still at least one value in the tree.
*/
#Override
public boolean isEmpty() {
// I am empty if I am not a leaf and I have no children
// (slightly quicker than the AbstaractCollection implementation).
return leaf == null && (children == null || children.isEmpty());
}
/**
* Returns all keys as a Set.
*
* #return
*
* A HashSet of all keys.
*
* Note: Although it returns Set<S> it is actually a Set<String> that has been
* home-grown because the original keys are not stored in the structure
* anywhere.
*/
#Override
public Set<String> keySet() {
// Roll them a temporary list and give them a Set from it.
return new HashSet<String>(keyList());
}
/**
* List all my keys.
*
* #return
*
* An ArrayList of all keys in the tree.
*
* Note: Although it returns List<S> it is actually a List<String> that has been
* home-grown because the original keys are not stored in the structure
* anywhere.
*
*/
protected List<String> keyList() {
List<String> contents = new ArrayList<String>();
if (leaf != null) {
// If I am a leaf, a null string is in the set.
contents.add((String) "");
}
// Add all sub-tries.
if (children != null) {
for (Character c : children.keySet()) {
TrieMap<V> child = children.get(c);
List<String> childContents = child.keyList();
for (String subString : childContents) {
// All possible substrings can be prepended with this character.
contents.add((String) (c + subString.toString()));
}
}
}
return contents;
}
/**
* Does the map contain the specified key.
*
* #param key
*
* The key to look for.
*
* #return
*
* true if the key is in the Map.
* false if not.
*/
public boolean containsKey(String key) {
TrieMap<V> it = find(key, false);
if (it != null) {
return it.leaf != null;
}
return false;
}
/**
* Represent me as a list.
*
* #return
*
* A String representation of the tree.
*/
#Override
public String toString() {
List<String> list = keyList();
//Collections.sort((List<String>)list);
StringBuilder sb = new StringBuilder();
Separator comma = new Separator(",");
sb.append("{");
for (String s : list) {
sb.append(comma.sep()).append(s).append("=").append(get(s));
}
sb.append("}");
return sb.toString();
}
/**
* Clear down completely.
*/
#Override
public void clear() {
children = null;
leaf = null;
}
/**
* Return a list of key/value pairs.
*
* #return
*
* The entry set.
*/
public Set<Map.Entry<String, V>> entrySet() {
Set<Map.Entry<String, V>> entries = new HashSet<Map.Entry<String, V>>();
List<String> keys = keyList();
for (String key : keys) {
entries.add(new Entry<String,V>(key, get(key)));
}
return entries;
}
/**
* An entry.
*
* #param <S>
*
* The type of the key.
*
* #param <V>
*
* The type of the value.
*/
private static class Entry<S, V> implements Map.Entry<S, V> {
protected S key;
protected V value;
public Entry(S key, V value) {
this.key = key;
this.value = value;
}
public S getKey() {
return key;
}
public V getValue() {
return value;
}
public V setValue(V newValue) {
V oldValue = value;
value = newValue;
return oldValue;
}
#Override
public boolean equals(Object o) {
if (!(o instanceof TrieMap.Entry)) {
return false;
}
Entry e = (Entry) o;
return (key == null ? e.getKey() == null : key.equals(e.getKey()))
&& (value == null ? e.getValue() == null : value.equals(e.getValue()));
}
#Override
public int hashCode() {
int keyHash = (key == null ? 0 : key.hashCode());
int valueHash = (value == null ? 0 : value.hashCode());
return keyHash ^ valueHash;
}
#Override
public String toString() {
return key + "=" + value;
}
}
}

Related

Java - how to get a key object (or entry) stored in HashMap by key?

I'd like to get the "canonical" key object for each key usable to query a map. See here:
Map<UUID, String> map = new HashMap();
UUID a = new UUID("ABC...");
map.put(a, "Tu nejde o zamykání.");
UUID b = new UUID("ABC...");
String string = map.get(b); // This gives that string.
// This is what I am looking for:
UUID againA = map.getEntry(b).key();
boolean thisIsTrue = a == againA;
A HashMap uses equals(), which is the same for multiple unique objects. So I want to get the actual key from the map, which will always be the same, no matter what object was used to query the map.
Is there a way to get the actual key object from the map? I don't see anything in the interface, but perhaps some clever trick I overlooked?
(Iterating all entries or keys doesn't count.)
Is there a way to get the actual key object from the map?
OK, so I am going to make some assumptions about what you mean. After all, you said that your question doesn't need clarification, so the obvious meaning that I can see must be the correct one. Right? :-)
The answer is No. There isn't a way.
Example scenario (not compileable!)
UUID uuid = UUID.fromString("xxxx-yyy-zzz");
UUID uuid2 = UUID.fromString("xxxx-yyy-zzz"); // same string
println(uuid == uuid2); // prints false
println(uuid.equals(true)); // prints true
Map<UUID, String> map = new ...
map.put(uuid, "fred");
println(map.get(uuid)); // prints fred
println(map.get(uuid2)); // prints fred (because uuid.equals(uuid2) is true)
... but, the Map API does not provide a way to find the actual key (in the example above it is uuid) in the map apart from iterating the key or entry sets. And I'm not aware of any existing Map class (standard or 3rd-party) that does provide this1.
However, you could implement your own Map class with an additional method for returning the actual key object. There is no technical reason why you couldn't, though you would have more code to write, test, maintain, etcetera.
But I would add that I agree with Jim Garrison. If you have a scenario where you have UUID objects (with equality-by-value semantics) and you also want to implement equality by identity semantics, then there is probably something wrong with your application's design. The correct approach would be to change the UUID.fromString(...) implementation to always return the same UUID object for the same input string.
1 - This is not to say that such a map implementation doesn't exist. But if it does, you should be able to find it if you look hard enough Note that Questions asking us to find or recommend a library are off-topic!
There is a (relatively) simple way of doing this. I’ve done so in my applications from time to time, when needed ... not for the purpose of == testing, but to reduce the number of identical objects being stored when tens of thousand of objects exist, and are cross-referenced with each other. This significantly reduced my memory usage, and improved performance ... while still using equals() for equality tests.
Just maintain a parallel map for interning the keys.
Map<UUID, UUID> interned_keys = ...
UUID key = ...
if (interned_keys.contains(key))
key = interned_keys.get(key)
Of course, it is far better when the object being stored knows what its own identity is. Then you get the interning basically for free.
class Item {
UUID key;
// ...
}
Map<UUID, Item> map = ...
map.put(item.key, item);
UUID key = ...
key = map.get(key).key; // get interned key
I think there are valid reasons for wanting the actual key. For example, to save memory. Also keep in mind that the actual key may store other objects. For instance, suppose you have a vertex of a graph. The vertex can store the actual data (Say a String, for instance), as well as the incident vertices. A vertex hash value can be dependent only on the data. So to look up a vertex with some data,
D, look up a vertex with data, D,and with with no incident values. Now if you can return the actual vertex in the map you will be able to get the actual incident to the vertex.
It seems to me that many map implementations could easily provide a getEntry method. For example, the HashMap implementation for get is:
public V get(Object key) {
Node<K,V> e;
return (e = getNode(hash(key), key)) == null ? null : e.value;
}
final Node<K,V> getNode(int hash, Object key) {
Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
if ((tab = table) != null && (n = tab.length) > 0 &&
(first = tab[(n - 1) & hash]) != null) {
if (first.hash == hash && // always check first node
((k = first.key) == key || (key != null && key.equals(k))))
return first;
if ((e = first.next) != null) {
if (first instanceof TreeNode)
return ((TreeNode<K,V>)first).getTreeNode(hash, key);
do {
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
return e;
} while ((e = e.next) != null);
}
}
return null;
}
One could use the getNode method to return an Entry:
public getEntry(Object key){
Node<K,V> e = getNode(hash(key),key);
if(e == null) return null;
return new Entry<>(e.key,e.value);
}
The easiest way is to duplicate the reference to the key in the value using a generic Pair type, like this:
HashMap<UUID,Pair<UUID,String>> myMap = new HashMap<>();
When you put them in the map, you provide the reference to the key to the pair. The cost is one reference per entry.
void add(UUID uuid, String str)
{
myMap.put(uuid,Pair.of(uuid,str));
}
Pair<UUID,String> get(UUID uuid)
{
return myMap.get(uuid);
}
Then getFirst() of the Pair is your key. getSecond() is the value.
Whatever you do, it's going to cost you in either time or space.
Your Pair class will be something like:
public class Pair<A,B>
{
private final A a;
private final B b;
public Pair(A a, B b)
{
this.a = a;
this.b = b;
}
/**
* #return the first argument of the Pair
*/
public A getFirst()
{
return this.a;
}
/**
* #return the second argument of the Pair
*/
public B getSecond()
{
return this.b;
}
/**
* Create a Pair.
*
* #param a The first argument (of type A)
* #param b The second argument (of type B)
*
* #return A Pair of A and B
*/
public static <A,B> Pair<A,B> of(A a, B b)
{
return new Pair<>(a,b);
}
// Don't forget to get your IDE to produce a hashcode()
// and equals() method for you, depending
// on if you allow nulls or not, or DIY.
}
it could help. You can use a for each like below.
Map<String,Object> map = new HashMap<>();
map.put("hello1", new String("Hello"));
map.put("hello2", new String("World"));
map.put("hello3", new String("How"));
map.put("hello4", new String("Are u"));
for(Map.Entry<String,Object> e: map.entrySet()){
System.out.println(e.getKey());
}

trouble understanding implementation of hash table with chaining

I'm studying on hash table with chaining in java by its implementation. The trouble is about get() method. An index value is determined with key.hashCode() % table.length. Assume that the table size is 10 and key.hashCode() is 124 so index is found as 4. In for each loop table[index] is started from table[4], AFAIK index is being incremented one by one 4,5,6,7... so on. But what about indices 0,1,2,3? Are they been checked? (I think no) Isn't there any possibility that occurring of key on one of the indices? (I think yes). The other issue that there are null checks but initially there is no any null assignment for key and value. So how can the checking work? Is null assigned as soon as private LinkedList<Entry<K, V>>[] table is declared?
// Data Structures: Abstraction and Design Using Java, Koffman, Wolfgang
package KW.CH07;
import java.util.AbstractMap;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.StringJoiner;
/**
* Hash table implementation using chaining.
* #param <K> The key type
* #param <V> The value type
* #author Koffman and Wolfgang
**/
public class HashtableChain<K, V>
// Insert solution to programming project 7, chapter -1 here
implements KWHashMap<K, V> {
/** The table */
private LinkedList<Entry<K, V>>[] table;
/** The number of keys */
private int numKeys;
/** The capacity */
private static final int CAPACITY = 101;
/** The maximum load factor */
private static final double LOAD_THRESHOLD = 3.0;
// Note this is equivalent to java.util.AbstractMap.SimpleEntry
/** Contains key-value pairs for a hash table.
#param <K> the key type
#param <V> the value type
*/
public static class Entry<K, V>
// Insert solution to programming project 6, chapter -1 here
{
/** The key */
private final K key;
/** The value */
private V value;
/**
* Creates a new key-value pair.
* #param key The key
* #param value The value
*/
public Entry(K key, V value) {
this.key = key;
this.value = value;
}
/**
* Retrieves the key.
* #return The key
*/
#Override
public K getKey() {
return key;
}
/**
* Retrieves the value.
* #return The value
*/
#Override
public V getValue() {
return value;
}
/**
* Sets the value.
* #param val The new value
* #return The old value
*/
#Override
public V setValue(V val) {
V oldVal = value;
value = val;
return oldVal;
}
// Insert solution to programming exercise 3, section 4, chapter 7 here
}
// Constructor
public HashtableChain() {
table = new LinkedList[CAPACITY];
}
// Constructor for test purposes
HashtableChain(int capacity) {
table = new LinkedList[capacity];
}
/**
* Method get for class HashtableChain.
* #param key The key being sought
* #return The value associated with this key if found;
* otherwise, null
*/
#Override
public V get(Object key) {
int index = key.hashCode() % table.length;
if (index < 0) {
index += table.length;
}
if (table[index] == null) {
return null; // key is not in the table.
}
// Search the list at table[index] to find the key.
for (Entry<K, V> nextItem : table[index]) {
if (nextItem.getKey().equals(key)) {
return nextItem.getValue();
}
}
// assert: key is not in the table.
return null;
}
/**
* Method put for class HashtableChain.
* #post This key-value pair is inserted in the
* table and numKeys is incremented. If the key is already
* in the table, its value is changed to the argument
* value and numKeys is not changed.
* #param key The key of item being inserted
* #param value The value for this key
* #return The old value associated with this key if
* found; otherwise, null
*/
#Override
public V put(K key, V value) {
int index = key.hashCode() % table.length;
if (index < 0) {
index += table.length;
}
if (table[index] == null) {
// Create a new linked list at table[index].
table[index] = new LinkedList<>();
}
// Search the list at table[index] to find the key.
for (Entry<K, V> nextItem : table[index]) {
// If the search is successful, replace the old value.
if (nextItem.getKey().equals(key)) {
// Replace value for this key.
V oldVal = nextItem.getValue();
nextItem.setValue(value);
return oldVal;
}
}
// assert: key is not in the table, add new item.
table[index].addFirst(new Entry<>(key, value));
numKeys++;
if (numKeys > (LOAD_THRESHOLD * table.length)) {
rehash();
}
return null;
}
/** Returns true if empty
#return true if empty
*/
#Override
public boolean isEmpty() {
return numKeys == 0;
}
}
Assume that the table size is 10 and key.hashCode() is 124 so index is found as 4. In for each loop table[index] is started from table[4]
Correct.
there are null checks but initially there is no any null assignment for key and value. So how can the checking work?
When an array of objects is initialized, all values are set to null.
index is being incremented one by one 4,5,6,7... so on. But what about indices 0,1,2,3? Are they been checked? (I think no) Isn't there any possibility that occurring of key on one of the indices? (I think yes).
Looks like there's some misunderstanding here. First, think of the data structure like this (with data having already been added to it):
table:
[0] -> null
[1] -> LinkedList -> item 1 -> item 2 -> item 3
[2] -> LinkedList -> item 1
[3] -> null
[4] -> LinkedList -> item 1 -> item 2
[5] -> LinkedList -> item 1 -> item 2 -> item 3 -> item 4
[6] -> null
Another important point is that the hash code for a given key should not change, so it will always map to the same index in the table.
So say we call get with a value who's hash code maps it to 3, then we know that it's not in the table:
if (table[index] == null) {
return null; // key is not in the table.
}
If another key comes in that maps to 1, now we need to iterate over the LinkedList:
// LinkedList<Entry<K, V>> list = table[index]
for (Entry<K, V> nextItem : table[index]) {
// iterate over item 1, item 2, item 3 until we find one that is equal.
if (nextItem.getKey().equals(key)) {
return nextItem.getValue();
}
}
I think you aren't quite visualizing your hash table correctly. There are two equally good simple implementations of a hash table.
Method 1 uses linked lists: An array (well, Vector, actually) of linked lists.
Given a "key", you derive a hash value for that key(*). You take the remainder of that hash value relative to the current size of the vector, let's call that "x". Then you sequentially search the linked list that vector[x] points to for a match to your key.
(*) You hope that the hash values will be reasonably well-distributed. There are complex algorithms for doing this. Let's hope your JVM implementation of HashCode does a good job of this.
Method 2 avoids linked lists: you create a Vector and compute an index into the Vector (as above). Then you look at the Vector.get(x). If that's the key you want, your return the corresponding value. Let's assume it's not. Then you look at Vector.get(x+1), Vector.get(x+2), etc. Eventually, one of the following three things will happen:
a) You find the key you are looking for. Then you return the corresponding value.
b) you find an empty entry (key == null). Return null or whatever value you have chosen to mean "this isn't the droid you're looking for".
c) you have examined every entry in the Vector. Again, return null or whatever.
Checking for (c) is a precaution, so that if the Hash Table happens to be full you won't loop forever. If the hash table is about to be full (you can keep a count of how many entries have been used) you should reallocate a bigger hash table. IDeally, you want to keep the hash table sparse enough that you never get anywhere near searching the whole table: that vitiates the whole purpose of a hash table -- that you can search it in much less than linear time, ideally in order 1 (that is, the number of comparisons is <= a small constant). I would suggest that you allocate a Vector that is at least 10x the number of entries you expect to put in it.
The use of the word "chaining" in you questions suggests to me that you want to implement the second type of hash table.
Btw, you should never use 10 as the size of a hash table. The size should be a prime number.
Hope this helps.

How could HashMap assurance same index when a duplicate key added with different `tab.length`?

The following piece of code is used to add an element to a HashMap (from Android 5.1.1 source tree), I'm very confused this statement:int index = hash & (tab.length - 1);, how could this map assurance the same index when a duplicate key added with different tab.length?
For example, assume that we have a new empty HashMap hMap. Firstly, we add pair ("1","1") to it, assume tab.length equals 1 at this time, then we add many pairs to this map, assume tab.length equals "x", now we add a duplicate pair ("1","1") to it, notice that the tab.length is changed, so the index's value int index = hash & (tab.length - 1); may also changed.
/**
* Maps the specified key to the specified value.
*
* #param key
* the key.
* #param value
* the value.
* #return the value of any previous mapping with the specified key or
* {#code null} if there was no such mapping.
*/
#Override public V put(K key, V value) {
if (key == null) {
return putValueForNullKey(value);
}
int hash = Collections.secondaryHash(key);
HashMapEntry<K, V>[] tab = table;
int index = hash & (tab.length - 1);
for (HashMapEntry<K, V> e = tab[index]; e != null; e = e.next) {
if (e.hash == hash && key.equals(e.key)) {
preModify(e);
V oldValue = e.value;
e.value = value;
return oldValue;
}
}
// No entry for (non-null) key is present; create one
modCount++;
if (size++ > threshold) {
tab = doubleCapacity();
index = hash & (tab.length - 1);
}
addNewEntry(key, value, hash, index);
return null;
}
When table need to reconstruct, it will first re-computing the index of older element, so the index will follow the changes of table's length.

JAVA HashMap 2D, cant get the right approach to make a 2D HashMap, i mean a HashMap into another HashMap

I want to make a board of Students' names and Subjects and each student has a grade in each subject (or not.. he can leave the exam and doesnt write it, and then his case will be empty). I want to use just HashMaps. I mean, it will be something like that:
HashMap<String,HashMap<String,String>> bigBoard =
new HashMap<String,HashMap<String,String>>();
but I think, I dont have the right idea, because for each subject, there will be many grades (values) so that won't be possible. Do I have to make a map for each student? with his subject? but then the table on output won't be arranged. Do you have a proposition?
I would like a table that look like something like that for example.
Column-Key →
Rowkey↓ Mathematics Physics Finance
Daniel Dolter 1.3 3.7
Micky Mouse 5
Minnie Mouse 1.7 n/a
Dagobert Duck 4.0 1.0
(I would use all the keys/values as Strings, it will be more simple like that.)
After the implementation of our class (for example class-name is String2D), we should use it like that.
public static void main(String[] args) {
String2D map2D = new String2D();
map2D.put("Daniel Doster", "Practical Mathematics", "1.3");
map2D.put("Daniel Doster", "IT Systeme", "3.7");
map2D.put("Micky Mouse", "Finance", "5");
map2D.put("Minnie Mouse", "IT Systeme", "1.7");
map2D.put("Minnie Mouse", "Finance", "n/a");
map2D.put("Dagobert Duck", "Practical Mathematics", "4.0");
map2D.put("Dagobert Duck", "Finance", "1.0");
System.out.println(map2D);
}
No "HashMap" will be seen.. and Arrays aren't allowed
You can use this class:
public class BiHashMap<K1, K2, V> {
private final Map<K1, Map<K2, V>> mMap;
public BiHashMap() {
mMap = new HashMap<K1, Map<K2, V>>();
}
/**
* Associates the specified value with the specified keys in this map (optional operation). If the map previously
* contained a mapping for the key, the old value is replaced by the specified value.
*
* #param key1
* the first key
* #param key2
* the second key
* #param value
* the value to be set
* #return the value previously associated with (key1,key2), or <code>null</code> if none
* #see Map#put(Object, Object)
*/
public V put(K1 key1, K2 key2, V value) {
Map<K2, V> map;
if (mMap.containsKey(key1)) {
map = mMap.get(key1);
} else {
map = new HashMap<K2, V>();
mMap.put(key1, map);
}
return map.put(key2, value);
}
/**
* Returns the value to which the specified key is mapped, or <code>null</code> if this map contains no mapping for
* the key.
*
* #param key1
* the first key whose associated value is to be returned
* #param key2
* the second key whose associated value is to be returned
* #return the value to which the specified key is mapped, or <code>null</code> if this map contains no mapping for
* the key
* #see Map#get(Object)
*/
public V get(K1 key1, K2 key2) {
if (mMap.containsKey(key1)) {
return mMap.get(key1).get(key2);
} else {
return null;
}
}
/**
* Returns <code>true</code> if this map contains a mapping for the specified key
*
* #param key1
* the first key whose presence in this map is to be tested
* #param key2
* the second key whose presence in this map is to be tested
* #return Returns true if this map contains a mapping for the specified key
* #see Map#containsKey(Object)
*/
public boolean containsKeys(K1 key1, K2 key2) {
return mMap.containsKey(key1) && mMap.get(key1).containsKey(key2);
}
public void clear() {
mMap.clear();
}
}
And then create use it like this:
BiHashMap<String,String,String> bigBoard = new BiHashMap<String,String,String>();
However for performance you may want to store the different grades in an array (assuming that you have a fix set of courses)
I don't think a nested hashmap is the way to go. Create a Student class and Subject class.
public class Student{
private ArrayList<Subject> SubjectList = new ArrayList<Subject>();
private String name;
public Student(String name){
this.name=name;
}
public void addSubject(Subject s){
SubjectList.add(s);
}
public String getName(){
return this.name;
}
//...add methods for other operations
}
public class Subject{
private ArrayList<double > GradeList = new ArrayList<double>();
private String name;
public Subject(String name){
this.name=name;
}
public void addGrade(double s){
GradeList.add(s);
}
//...add methods for other operations
}
Then you can store the Students instances in a hashmap.
public static void main(String[] args){
HashMap<Students> hm = new HashMap<Students>();
Student s = new Student("Daniel Dolter");
Subject sub = new Subject("Mathematics");
sub.addGrades(1.3);
s.addSubject(sub);
hm.put(s.getName(),s);
}
With Java 8 it is possible to use computeIfAbsent to insert a default value if it is empty.
So you can simply use this as the type of the 2d-map:
Map<RowType, Map<ColumnType, ValueType>> map = new WhateverMap<>();
let's say all types are int:
int get(int x, int y)
return map.computeIfAbsent(x, (key)->new WhateverMap<>()).computeIfAbsent(y,(key)->0);
}
void put(int x, int y, int value)
return map.computeIfAbsent(x, (key)->new WhateverMap<>()).put(y,value);
}
Note that is not atomic. therefore this is not thread-safe even if WhateverMap is.
You can use Google Guava's Table<R, C, V> collection. It is similar to eabraham's answer. A value V is keyed by a row R and a column C. It is a better alternative to using HashMap<R, HashMap<C, V>> which becomes quickly unreadable and difficult to work with.
See their GitHub Wiki for more information.

Java: Is there a container which effectively combines HashMap and ArrayList?

I keep finding a need for a container which is both a HashMap (for fast lookup on a key type) and an ArrayList (for fast access by integer index).
LinkedHashMap is almost right, in that it keeps an iterable list, but it is unfortunately a linked list... retrieving the Nth element requires iterating from 1 to N.
Is there a container type which fits this bill and which I've somehow missed? What do other people do when they need to access the same set of data by key and by index?
Take a look at Apache Commons LinkedMap.
If you are removing (in the middle) as well as accessing by index and by key (which means that the indexes are changing), you are possible out of look - I think there simply can't be an implementation which provides O(1) for both of remove (by index, key or iterator) and get(index). This is why we have both LinkedList (with iterator.remove() or remove(0) in O(1)) and ArrayList (with get(index) in O(1)) in the standard API.
You could have both removing and index-getting in O(log n) if you use a tree structure instead of array or linked list (which could be combined with a O(1) key based read access - getting the index for your key-value-pair would still need O(log n), though).
If you don't want to remove anything, or can live with following indexed not shifted (i.e. remove(i) being equivalent to set(i, null), there is nothing which forbids having both O(1) index and key access - in fact, then the index is simply a second key here, so you could simply use a HashMap and a ArrayList (or two HashMaps) then, with a thin wrapper combining both.
Edit: So, here is an implementation of ArrayHashMap like described in the last paragraph above (using the "expensive remove" variant). It implements the interface IndexedMap below. (If you don't want to copy+paste here, both are also in my github account which will be updated in case of later changes).
package de.fencing_game.paul.examples;
import java.util.*;
/**
* A combination of ArrayList and HashMap which allows O(1) for read and
* modifiying access by index and by key.
* <p>
* Removal (either by key or by index) is O(n), though,
* as is indexed addition of a new Entry somewhere else than the end.
* (Adding at the end is in amortized O(1).)
* </p>
* <p>
* (The O(1) complexity for key based operations is under the condition
* "if the hashCode() method of the keys has a suitable distribution and
* takes constant time", as for any hash-based data structure.)
* </p>
* <p>
* This map allows null keys and values, but clients should think about
* avoiding using these, since some methods return null to show
* "no such mapping".
* </p>
* <p>
* This class is not thread-safe (like ArrayList and HashMap themselves).
* </p>
* <p>
* This class is inspired by the question
* Is there a container which effectively combines HashMap and ArrayList? on Stackoverflow.
* </p>
* #author Paŭlo Ebermann
*/
public class ArrayHashMap<K,V>
extends AbstractMap<K,V>
implements IndexedMap<K,V>
{
/**
* Our backing map.
*/
private Map<K, SimpleEntry<K,V>> baseMap;
/**
* our backing list.
*/
private List<SimpleEntry<K,V>> entries;
/**
* creates a new ArrayHashMap with default parameters.
* (TODO: add more constructors which allow tuning.)
*/
public ArrayHashMap() {
this.baseMap = new HashMap<K,SimpleEntry<K,V>>();
this.entries = new ArrayList<SimpleEntry<K,V>>();
}
/**
* puts a new key-value mapping, or changes an existing one.
*
* If new, the mapping gets an index at the end (i.e. {#link #size()}
* before it gets increased).
*
* This method runs in O(1) time for changing an existing value,
* amortized O(1) time for adding a new value.
*
* #return the old value, if such, else null.
*/
public V put(K key, V value) {
SimpleEntry<K,V> entry = baseMap.get(key);
if(entry == null) {
entry = new SimpleEntry<K,V>(key, value);
baseMap.put(key, entry);
entries.add(entry);
return null;
}
return entry.setValue(value);
}
/**
* retrieves the value for a key.
*
* This method runs in O(1) time.
*
* #return null if there is no such mapping,
* else the value for the key.
*/
public V get(Object key) {
SimpleEntry<K,V> entry = baseMap.get(key);
return entry == null ? null : entry.getValue();
}
/**
* returns true if the given key is in the map.
*
* This method runs in O(1) time.
*
*/
public boolean containsKey(Object key) {
return baseMap.containsKey(key);
}
/**
* removes a key from the map.
*
* This method runs in O(n) time, n being the size of this map.
*
* #return the old value, if any.
*/
public V remove(Object key) {
SimpleEntry<K,V> entry = baseMap.remove(key);
if(entry == null) {
return null;
}
entries.remove(entry);
return entry.getValue();
}
/**
* returns a key by index.
*
* This method runs in O(1) time.
*
*/
public K getKey(int index) {
return entries.get(index).getKey();
}
/**
* returns a value by index.
*
* This method runs in O(1) time.
*
*/
public V getValue(int index) {
return entries.get(index).getValue();
}
/**
* Returns a set view of the keys of this map.
*
* This set view is ordered by the indexes.
*
* It supports removal by key or iterator in O(n) time.
* Containment check runs in O(1).
*/
public Set<K> keySet() {
return new AbstractSet<K>() {
public void clear() {
entryList().clear();
}
public int size() {
return entries.size();
}
public Iterator<K> iterator() {
return keyList().iterator();
}
public boolean remove(Object key) {
return keyList().remove(key);
}
public boolean contains(Object key) {
return keyList().contains(key);
}
};
} // keySet()
/**
* Returns a set view of the entries of this map.
*
* This set view is ordered by the indexes.
*
* It supports removal by entry or iterator in O(n) time.
*
* It supports adding new entries at the end, if the key
* is not already used in this map, in amortized O(1) time.
*
* Containment check runs in O(1).
*/
public Set<Map.Entry<K,V>> entrySet() {
return new AbstractSet<Map.Entry<K,V>>() {
public void clear() {
entryList().clear();
}
public int size() {
return entries.size();
}
public Iterator<Map.Entry<K,V>> iterator() {
return entryList().iterator();
}
public boolean add(Map.Entry<K,V> e) {
return entryList().add(e);
}
public boolean contains(Object o) {
return entryList().contains(o);
}
public boolean remove(Object o) {
return entryList().remove(o);
}
};
} // entrySet()
/**
* Returns a list view of the entries of this map.
*
* This list view is ordered by the indexes.
*
* It supports removal by entry, iterator or sublist.clear in O(n) time.
* (n being the length of the total list, not the sublist).
*
* It supports adding new entries at the end, if the key
* is not already used in this map, in amortized O(1) time.
*
* Containment check runs in O(1).
*/
public List<Map.Entry<K,V>> entryList() {
return new AbstractList<Map.Entry<K,V>>() {
public void clear() {
baseMap.clear();
entries.clear();
}
public Map.Entry<K,V> get(int index) {
return entries.get(index);
}
public int size() {
return entries.size();
}
public Map.Entry<K,V> remove(int index) {
Map.Entry<K,V> e = entries.remove(index);
baseMap.remove(e.getKey());
return e;
}
public void add(int index, Map.Entry<K,V> newEntry) {
K key = newEntry.getKey();
SimpleEntry<K,V> clone = new SimpleEntry<K,V>(newEntry);
if(baseMap.containsKey(key)) {
throw new IllegalArgumentException("duplicate key " +
key);
}
entries.add(index, clone);
baseMap.put(key, clone);
}
public boolean contains(Object o) {
if(o instanceof Map.Entry) {
SimpleEntry<K,V> inMap =
baseMap.get(((Map.Entry<?,?>)o).getKey());
return inMap != null &&
inMap.equals(o);
}
return false;
}
public boolean remove(Object o) {
if (!(o instanceof Map.Entry)) {
Map.Entry<?,?> e = (Map.Entry<?,?>)o;
SimpleEntry<K,V> inMap = baseMap.get(e.getKey());
if(inMap != null && inMap.equals(e)) {
entries.remove(inMap);
baseMap.remove(inMap.getKey());
return true;
}
}
return false;
}
protected void removeRange(int fromIndex, int toIndex) {
List<SimpleEntry<K,V>> subList =
entries.subList(fromIndex, toIndex);
for(SimpleEntry<K,V> entry : subList){
baseMap.remove(entry.getKey());
}
subList.clear();
}
};
} // entryList()
/**
* Returns a List view of the keys in this map.
*
* It allows index read access and key containment check in O(1).
* Changing a key is not allowed.
*
* Removal by key, index, iterator or sublist.clear runs in O(n) time
* (this removes the corresponding values, too).
*/
public List<K> keyList() {
return new AbstractList<K>() {
public void clear() {
entryList().clear();
}
public K get(int index) {
return entries.get(index).getKey();
}
public int size() {
return entries.size();
}
public K remove(int index) {
Map.Entry<K,V> e = entries.remove(index);
baseMap.remove(e.getKey());
return e.getKey();
}
public boolean remove(Object key) {
SimpleEntry<K,V> entry = baseMap.remove(key);
if(entry == null) {
return false;
}
entries.remove(entry);
return true;
}
public boolean contains(Object key) {
return baseMap.containsKey(key);
}
protected void removeRange(int fromIndex, int toIndex) {
entryList().subList(fromIndex, toIndex).clear();
}
};
} // keyList()
/**
* Returns a List view of the values in this map.
*
* It allows get and set by index in O(1) time (set changes the mapping).
*
* Removal by value, index, iterator or sublist.clear is possible
* in O(n) time, this removes the corresponding keys too (only the first
* key with this value for remove(value)).
*
* Containment check needs an iteration, thus O(n) time.
*/
public List<V> values() {
return new AbstractList<V>() {
public int size() {
return entries.size();
}
public void clear() {
entryList().clear();
}
public V get(int index) {
return entries.get(index).getValue();
}
public V set(int index, V newValue) {
Map.Entry<K,V> e = entries.get(index);
return e.setValue(newValue);
}
public V remove(int index) {
Map.Entry<K,V> e = entries.remove(index);
baseMap.remove(e.getKey());
return e.getValue();
}
protected void removeRange(int fromIndex, int toIndex) {
entryList().subList(fromIndex, toIndex).clear();
}
};
} // values()
/**
* an usage example method.
*/
public static void main(String[] args) {
IndexedMap<String,String> imap = new ArrayHashMap<String, String>();
for(int i = 0; i < args.length-1; i+=2) {
imap.put(args[i], args[i+1]);
}
System.out.println(imap.values());
System.out.println(imap.keyList());
System.out.println(imap.entryList());
System.out.println(imap);
System.out.println(imap.getKey(0));
System.out.println(imap.getValue(0));
}
}
Here the interface:
package de.fencing_game.paul.examples;
import java.util.*;
/**
* A map which additionally to key-based access allows index-based access
* to keys and values.
* <p>
* Inspired by the question Is there a container which effectively combines HashMap and ArrayList? on Stackoverflow.
* </p>
* #author Paŭlo Ebermann
* #see ArrayHashMap
*/
public interface IndexedMap<K,V>
extends Map<K,V>
{
/**
* returns a list view of the {#link #entrySet} of this Map.
*
* This list view supports removal of entries, if the map is mutable.
*
* It may also support indexed addition of new entries per the
* {#link List#add add} method - but this throws an
* {#link IllegalArgumentException} if the key is already used.
*/
public List<Map.Entry<K,V>> entryList();
/**
* returns a list view of the {#link #keySet}.
*
* This list view supports removal of keys (with the corresponding
* values), but does not support addition of new keys.
*/
public List<K> keyList();
/**
* returns a list view of values contained in this map.
*
* This list view supports removal of values (with the corresponding
* keys), but does not support addition of new values.
* It may support the {#link List#set set} operation to change the
* value for a key.
*/
public List<V> values();
/**
* Returns a value of this map by index.
*
* This is equivalent to
* {# #values() values()}.{#link List#get get}{#code (index)}.
*/
public V getValue(int index);
/**
* Returns a key of this map by index.
*
* This is equivalent to
* {# #keyList keyList()}.{#link List#get get}{#code (index)}.
*/
public K getKey(int index);
}
Why don't you simply keep the HashMap and then use hashMap.entrySet().toArray(); as suggested here?
You could do it yourself, but here's an example implemenation. The corresponding google searcht term would be "ArrayMap".
I'm not sure but maybe commons collections or google collections has such a map.
Edit:
You can create a hashmap that is implemented using an arraylist, i.e. it would work like LinkedHashMap in that the insertion order defines the list index. This would provide fast get(Index) (O(1)) and get(Name) (O(1)) access, insertion would be O(1) as well (unless the array must be extended), but deletion would be O(n) since deleting the first element would require all indices to be updated.
The trick could be done by a map that internally holds a Map for key-> index and then an ArrayList.
get(Key) would then be (simple example without error checking):
list.get(keyIndexMap.get(key));

Categories

Resources